Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Run the commands below to set up the ExecuTorch internal dependencies:

```bash
git submodule sync
git submodule update --init
git submodule update --init --recursive
./install_executorch.sh
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,9 @@ After cloning the repository, the project's submodules are updated, and two scri
git clone https://github.com/pytorch/executorch.git
cd executorch
git submodule sync
git submodule update --init
git submodule update --init --recursive
./install_executorch.sh
./examples/models/llama2/install_requirements.sh
./examples/models/llama/install_requirements.sh
```

When these scripts finish successfully, ExecuTorch is all set up. That means it's time to dive into the world of Llama models!
Original file line number Diff line number Diff line change
Expand Up @@ -23,52 +23,21 @@ The next steps explain how to compile and run the Llama 3 model.

## Download and export the Llama 3 8B model

To get started with Llama 3, you can obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page.
To get started with Llama 3, you can obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page.

Request access by filling out your details, and read through and accept the Responsible Use Guide. This grants you a license and a download link that is valid for 24 hours. The Llama 3 8B model is used for this part, but the same instructions apply for other models.

Clone the Llama 3 Git repository and install the dependencies:
Use the `llama-stack` library to download the model after having the license granted.

```bash
git clone https://github.com/meta-llama/llama-models
cd llama-models
pip install -e .
pip install buck torchao
pip install llama-stack
llama model download --source meta --model-id meta-llama/Llama-3.1-8B
```

Run the script to download, and paste the download link from the email when prompted:

```bash
cd models/llama3_1
./download.sh
```

You are asked which models you would like to download. Enter `meta-llama-3.1-8b` to get the model used for this Learning Path:

```output
**** Model list ***
- meta-llama-3.1-405b
- meta-llama-3.1-70b
- meta-llama-3.1-8b
- meta-llama-guard-3-8b
- prompt-guard
```

After entering `meta-llama-3.1-8b` you are prompted again with the available models:

```output
**** Available models to download: ***
- meta-llama-3.1-8b-instruct
- meta-llama-3.1-8b
Enter the list of models to download without spaces or press Enter for all:
```

Enter `meta-llama-3.1-8b` to start the download.

When the download is finished, you can list the files in the new directory:

```bash
ls Meta-Llama-3.1-8B
ls /home/pi/.llama/checkpoints/Llama3.1-8B
```

The output is:
Expand All @@ -85,34 +54,26 @@ If you encounter the error "Sorry, we could not process your request at this mom

The next step is to generate a `.pte` file that can be used for prompts. From the `executorch` directory, compile the model executable. Note the quantization option, which reduces the model size significantly.

If you've followed the tutorial, this should now take you to the `executorch` base directory.

Navigate back to the top-level directory of the `executorch` repository:

```bash {cwd="executorch"}
cd ../../../
```

You are now in `$HOME/executorch` and ready to create the model file for ExecuTorch.
If you've followed the tutorial, you should be in the `executorch` base directory.

Run the Python command below to create the model file, `llama3_kv_sdpa_xnn_qe_4_32.pte`.
Run the Python command below to create the model file, `llama3_kv_sdpa_xnn_qe_4_32.pte`.

```bash
python -m examples.models.llama2.export_llama --checkpoint llama-models/models/llama3_1/Meta-Llama-3.1-8B/consolidated.00.pth \
-p llama-models/models/llama3_1/Meta-Llama-3.1-8B/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w \
python -m examples.models.llama.export_llama --checkpoint /home/pi/.llama/checkpoints/Llama3.1-8B/consolidated.00.pth \
-p /home/pi/.llama/checkpoints/Llama3.1-8B/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w \
--group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' \
--embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
```

Where `consolidated.00.pth` and `params.json` are the paths to the downloaded model files, found in `llama3/Meta-Llama-3-8B`.
Where `consolidated.00.pth` and `params.json` are the paths to the downloaded model files, found in `/home/pi/.llama/checkpoints/Llama3.1-8B`.

This step takes some time and will run out of memory if you have 32 GB RAM or less.
This step takes some time and will run out of memory if you have 32 GB RAM or less.

## Compile and build the executable

Follow the steps below to build ExecuTorch and the Llama runner to run models.

The final step for running the model is to build `llama_main` and `llama_main` which are used to run the Llama 3 model.
The final step for running the model is to build `llama_main` and `llama_main` which are used to run the Llama 3 model.

First, compile and build ExecuTorch with `cmake`:

Expand All @@ -127,6 +88,9 @@ cmake -DPYTHON_EXECUTABLE=python \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \
-DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=ON \
-DEXECUTORCH_BUILD_EXTENSION_LLM=ON \
-Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release
```
Expand All @@ -141,9 +105,9 @@ cmake -DPYTHON_EXECUTABLE=python \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-Bcmake-out/examples/models/llama2 \
examples/models/llama2
cmake --build cmake-out/examples/models/llama2 -j16 --config Release
-Bcmake-out/examples/models/llama \
examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --config Release
```

The CMake build options are available on [GitHub](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L59).
Expand All @@ -152,17 +116,17 @@ When the build completes, you have everything you need to test the model.

## Run the model

Use `llama_main` to run the model with a sample prompt:
Use `llama_main` to run the model with a sample prompt:

``` bash
cmake-out/examples/models/llama2/llama_main \
cmake-out/examples/models/llama/llama_main \
--model_path=llama3_kv_sdpa_xnn_qe_4_32.pte \
--tokenizer_path=./llama-models/models/llama3_1/Meta-Llama-3.1-8B/tokenizer.model \
--tokenizer_path=/home/pi/.llama/checkpoints/Llama3.1-8B/tokenizer.model \
--cpu_threads=4 \
--prompt="Write a python script that prints the first 15 numbers in the Fibonacci series. Annotate the script with comments explaining what the code does."
```

You can use `cmake-out/examples/models/llama2/llama_main --help` to read about the options.
You can use `cmake-out/examples/models/llama2/llama_main --help` to read about the options.

If all goes well, you will see the model output along with some memory statistics. Some output has been omitted for better readability.

Expand All @@ -185,5 +149,5 @@ I 00:00:46.844400 executorch:runner.cpp:134] append_eos_to_prompt: 0

You now know how to run a Llama model in Raspberry Pi OS using ExecuTorch. You can experiment with different prompts and different numbers of CPU threads.

If you have access to the RPi 5, continue to the next section to see how to deploy the software to the board and run it.
If you have access to the RPi 5, continue to the next section to see how to deploy the software to the board and run it.

Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This final section explains how to test the model by experimenting with differen

## Set up your Raspberry Pi 5

If you want to see how the LLM behaves in an embedded environment, you need a Raspberry Pi 5 running Raspberry Pi OS.
If you want to see how the LLM behaves in an embedded environment, you need a Raspberry Pi 5 running Raspberry Pi OS.

Install Raspberry Pi OS using the [Raspberry Pi documentation](https://www.raspberrypi.com/documentation/computers/getting-started.html). There are numerous ways to prepare an SD card, but Raspberry Pi recommends [Raspberry Pi Imager](https://www.raspberrypi.com/software/) on a Windows, Linux, or macOS computer with an SD card slot or SD card adapter.

Expand All @@ -19,22 +19,21 @@ The 8GB RAM Raspberry Pi 5 model is preferred for exploring an LLM.

## Collect the files into an archive

There are just a few files that you need to transfer to the Raspberry Pi 5. You can bundle them together and transfer them from the running container to the development machine, and then to the Raspberry Pi 5.
There are just a few files that you need to transfer to the Raspberry Pi 5. You can bundle them together and transfer them from the running container to the development machine, and then to the Raspberry Pi 5.

You should still be in the container, in the `$HOME/executorch` directory.
You should still be in the container, in the `$HOME/executorch` directory.

The commands below copy the needed files to a new directory. The model file is very large and takes time to copy.

Run the commands below to collect the files:

```bash
mkdir llama3-files
cp cmake-out/examples/models/llama2/llama_main ./llama3-files/llama_main
cp llama-models/models/llama3_1/Meta-Llama-3.1-8B/params.json ./llama3-files/params.json
cp llama-models/models/llama3_1/Meta-Llama-3.1-8B/tokenizer.model ./llama3-files/tokenizer.model
cp cmake-out/examples/models/llama/llama_main ./llama3-files/llama_main
cp /home/pi/.llama/checkpoints/Llama3.1-8B/params.json ./llama3-files/params.json
cp /home/pi/.llama/checkpoints/Llama3.1-8B/tokenizer.model ./llama3-files/tokenizer.model
cp llama3_kv_sdpa_xnn_qe_4_32.pte ./llama3-files/llama3_kv_sdpa_xnn_qe_4_32.pte
cp ./cmake-out/examples/models/llama2/runner/libllama_runner.so ./llama3-files
cp ./cmake-out/lib/libextension_module.so ./llama3-files
cp ./cmake-out/examples/models/llama/runner/libllama_runner.so ./llama3-files
```

Compress the files into an archive using the `tar` command:
Expand All @@ -45,7 +44,7 @@ tar czvf llama3-files.tar.gz ./llama3-files

Next, copy the compressed tar file out of the container to the development computer. This is done using the `docker cp` command from the development machine.

Open a new shell or terminal on the development machine where Docker is running the container.
Open a new shell or terminal on the development machine where Docker is running the container.

Find the `CONTAINER ID` for the running container:

Expand All @@ -60,7 +59,7 @@ CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAME
88c34c899c8c rpi-os "/bin/bash" 7 hours ago Up 7 hours fervent_vaughan
```

Your `CONTAINER ID` will be different so substitute your value.
Your `CONTAINER ID` will be different so substitute your value.

Copy the compressed file out of the container:

Expand All @@ -70,17 +69,17 @@ docker cp 88c34c899c8c:/home/pi/executorch/llama3-files.tar.gz .

## Transfer the archive to the Raspberry Pi 5

Now you can transfer the archive from the development machine to your Raspberry Pi 5.
Now you can transfer the archive from the development machine to your Raspberry Pi 5.

There are multiple ways to do this: via cloud storage services, with a USB thumb drive, or using SSH. Use any method that is convenient for you.
There are multiple ways to do this: via cloud storage services, with a USB thumb drive, or using SSH. Use any method that is convenient for you.

For example, you can use `scp` running from a terminal in your Raspberry Pi 5 device as shown. Follow the same option as you did in the previous step.

```bash
scp llama3-files.tar.gz <pi-user>@<pi-ip>:~/
```

Substitute the username and the IP address of the Raspberry Pi 5.
Substitute the username and the IP address of the Raspberry Pi 5.

The file is very large so you can also consider using a USB drive.

Expand All @@ -91,7 +90,7 @@ Finally, log in to the Raspberry Pi 5 and run the model in a terminal using the
Extract the file:

```bash
tar xvfz llama3-files.tar.gz
tar xvfz llama3-files.tar.gz
```

Change to the new directory:
Expand All @@ -108,9 +107,9 @@ LD_LIBRARY_PATH=. ./llama_main --model_path=llama3_kv_sdpa_xnn_qe_4_32.pte --to
```

{{% notice Note %}}
The `llama_main` program uses dynamic linking, so you need to inform the dynamic linker to look for the 2 libraries in the current directory.
The `llama_main` program uses dynamic linking, so you need to inform the dynamic linker to look for the 2 libraries in the current directory.
{{% /notice %}}

From here, you can experiment with different prompts and command line options on your Raspberry Pi 5.

Make sure to exit your container and clean up any development resources you created.
Make sure to exit your container and clean up any development resources you created.