Skip to content

D1aoBoomm/LLMGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

"LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices"

The code for implementing the LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices.

Requirements

Intel SGX Hardware and Gramine

A device equipped with Intel SGX is required as the hardware to run the code. It is recommended to test on Linux since we have not tested Gramine in Windows.

The following steps are necessary to build a Gramine environment.

  1. Linux-SGX Driver. SGX-Driver is required to be installed, which is the fundemental environment. Please refer to Linux-SGX Respository to build from source-code. For some versions of CPUs and systems, SGX may already be integrated in the system driver.

  2. Gramine-SGX. Gramine-SGX is a libOS which supports runing application in SGX without modification. Please follow the Gramine Respository to install the Gramine.

  3. Test. You can test your Gramine according to this simple Demo.

OP-TEE Environment

Please following Official Documentation to install OP-TEE system, and test it with examples.

It is suggested to install OP-TEE>=1.0 to achieve acceleration with ARM NEON. For NVIDIA Jetson devices, this is already installed in L4T systems.

Python Environment

Our code is tested in python 3.9, and is theoretically suitable for python >= 3.8. We provide a simple instruction to configure the essential python libraries.

For the training experiments, a basic Torch environment with GPU support is necessary, along with the Huggingface and PEFT libraries. No specific versions of these libraries are needed.

For the experiments focused on inference latency, you will need both NumPy and CuPy.

CUDA Environment

Our code works well under CUDA 12.1, and is supposed to work under CUDA>=12.0. cuDNN is also suggested to support inference acceleration, although you can run without it.

DataSet

All the dataset, except for E2E dataset, are available in Huggingface, and our code will automatically download them.

Running Experiments

Model Accuracy

All the code are under acc directory, formulated as {dataset}/{model}. We have provided a simple shell to run training experiments. For example:

cd ./acc/alpaca/llama-7b
./command.sh

Each shell will train at least two adapters and composite them. The time consumed is usually long, in excess of 24 hours. You can adjust the number of adapters easily in code.

The hyper-parameters should be met with settings in here.

Inference Latency

All the code are under latency directory, formulated as inference/inference_{platform}/{model}/{method}. We have provided a simple shell to run training experiments. For example:

TEEFormer on SGX:

cd ./latency/inference_sgx/llama/ours
./run.sh
TEE-shielded inference:
cd ./latency/inference_sgx/llama/tee_inference
./run.sh
For TrustZone

We provide a demo to run single layer inference of adapters. You need to change the Makefile to your own settings, and run the following commands:

cd ./latency/inference_trustzone/1layer_demo
make clean && make
sudo cp /ta/{uuid}.ta {path2your optee_armtz}
sudo ./host/inference_demo

The commands above is tested on-device. Notably, as our experimental code regarding latency in OPTEE is so big, they are suggested to be cross-compiled in the workstation using toolchains provided by Nvidia Jetson Resources.

ARM NEON GEMM

Just run following command to compare it with naive GEMM:

cd ./latency/inference_trustzone/gemm_optimization
./run.sh

Model Stealing Attacks

Once you train the victim model, use Knockoff to conduct attacks. Notably, some libraries could be too recent, so you may need older versions.

License

The project is released under MIT license.

About

"LLMGuard: Safeguarding Real-Time Inference for Large Language Models on Edge Devices" TOSEM 2026

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors