"LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices"

The code for implementing the LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices.

Requirements

Intel SGX Hardware and Gramine

A device equipped with Intel SGX is required as the hardware to run the code. It is recommended to test on Linux since we have not tested Gramine in Windows.

The following steps are necessary to build a Gramine environment.

Linux-SGX Driver. SGX-Driver is required to be installed, which is the fundemental environment. Please refer to Linux-SGX Respository to build from source-code. For some versions of CPUs and systems, SGX may already be integrated in the system driver.
Gramine-SGX. Gramine-SGX is a libOS which supports runing application in SGX without modification. Please follow the Gramine Respository to install the Gramine.
Test. You can test your Gramine according to this simple Demo.

OP-TEE Environment

Please following Official Documentation to install OP-TEE system, and test it with examples.

It is suggested to install OP-TEE>=1.0 to achieve acceleration with ARM NEON. For NVIDIA Jetson devices, this is already installed in L4T systems.

Python Environment

Our code is tested in python 3.9, and is theoretically suitable for python >= 3.8. We provide a simple instruction to configure the essential python libraries.

For the training experiments, a basic Torch environment with GPU support is necessary, along with the Huggingface and PEFT libraries. No specific versions of these libraries are needed.

For the experiments focused on inference latency, you will need both NumPy and CuPy.

CUDA Environment

Our code works well under CUDA 12.1, and is supposed to work under CUDA>=12.0. cuDNN is also suggested to support inference acceleration, although you can run without it.

DataSet

All the dataset, except for E2E dataset, are available in Huggingface, and our code will automatically download them.

Running Experiments

Model Accuracy

All the code are under acc directory, formulated as {dataset}/{model}. We have provided a simple shell to run training experiments. For example:

cd ./acc/alpaca/llama-7b
./command.sh

Each shell will train at least two adapters and composite them. The time consumed is usually long, in excess of 24 hours. You can adjust the number of adapters easily in code.

The hyper-parameters should be met with settings in here.

Inference Latency

All the code are under latency directory, formulated as inference/inference_{platform}/{model}/{method}. We have provided a simple shell to run training experiments. For example:

TEEFormer on SGX:

cd ./latency/inference_sgx/llama/ours
./run.sh

TEE-shielded inference:

cd ./latency/inference_sgx/llama/tee_inference
./run.sh

For TrustZone

We provide a demo to run single layer inference of adapters. You need to change the Makefile to your own settings, and run the following commands:

cd ./latency/inference_trustzone/1layer_demo
make clean && make
sudo cp /ta/{uuid}.ta {path2your optee_armtz}
sudo ./host/inference_demo

The commands above is tested on-device. Notably, as our experimental code regarding latency in OPTEE is so big, they are suggested to be cross-compiled in the workstation using toolchains provided by Nvidia Jetson Resources.

ARM NEON GEMM

Just run following command to compare it with naive GEMM:

cd ./latency/inference_trustzone/gemm_optimization
./run.sh

Model Stealing Attacks

Once you train the victim model, use Knockoff to conduct attacks. Notably, some libraries could be too recent, so you may need older versions.

License

The project is released under MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
acc		acc
latency		latency
LICENSE		LICENSE
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

"LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices"

Requirements

Intel SGX Hardware and Gramine

OP-TEE Environment

Python Environment

CUDA Environment

DataSet

Running Experiments

Model Accuracy

Inference Latency

TEEFormer on SGX:

TEE-shielded inference:

For TrustZone

ARM NEON GEMM

Model Stealing Attacks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

"LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices"

Requirements

Intel SGX Hardware and Gramine

OP-TEE Environment

Python Environment

CUDA Environment

DataSet

Running Experiments

Model Accuracy

Inference Latency

TEEFormer on SGX:

TEE-shielded inference:

For TrustZone

ARM NEON GEMM

Model Stealing Attacks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages