diff --git a/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md b/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md index ed0a605229..85b368fbed 100644 --- a/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md +++ b/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md @@ -50,7 +50,7 @@ Run the commands below to set up the ExecuTorch internal dependencies: ```bash git submodule sync -git submodule update --init +git submodule update --init --recursive ./install_executorch.sh ``` diff --git a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/executorch.md b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/executorch.md index aede079e7c..e623ead0c8 100644 --- a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/executorch.md +++ b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/executorch.md @@ -60,9 +60,9 @@ After cloning the repository, the project's submodules are updated, and two scri git clone https://github.com/pytorch/executorch.git cd executorch git submodule sync -git submodule update --init +git submodule update --init --recursive ./install_executorch.sh -./examples/models/llama2/install_requirements.sh +./examples/models/llama/install_requirements.sh ``` When these scripts finish successfully, ExecuTorch is all set up. That means it's time to dive into the world of Llama models! \ No newline at end of file diff --git a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/llama3.md b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/llama3.md index a66733a77c..5b576665d4 100755 --- a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/llama3.md +++ b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/llama3.md @@ -23,52 +23,21 @@ The next steps explain how to compile and run the Llama 3 model. ## Download and export the Llama 3 8B model -To get started with Llama 3, you can obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. +To get started with Llama 3, you can obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. Request access by filling out your details, and read through and accept the Responsible Use Guide. This grants you a license and a download link that is valid for 24 hours. The Llama 3 8B model is used for this part, but the same instructions apply for other models. -Clone the Llama 3 Git repository and install the dependencies: +Use the `llama-stack` library to download the model after having the license granted. ```bash -git clone https://github.com/meta-llama/llama-models -cd llama-models -pip install -e . -pip install buck torchao +pip install llama-stack +llama model download --source meta --model-id meta-llama/Llama-3.1-8B ``` -Run the script to download, and paste the download link from the email when prompted: - -```bash -cd models/llama3_1 -./download.sh -``` - -You are asked which models you would like to download. Enter `meta-llama-3.1-8b` to get the model used for this Learning Path: - -```output - **** Model list *** - - meta-llama-3.1-405b - - meta-llama-3.1-70b - - meta-llama-3.1-8b - - meta-llama-guard-3-8b - - prompt-guard -``` - -After entering `meta-llama-3.1-8b` you are prompted again with the available models: - -```output - **** Available models to download: *** - - meta-llama-3.1-8b-instruct - - meta-llama-3.1-8b -Enter the list of models to download without spaces or press Enter for all: -``` - -Enter `meta-llama-3.1-8b` to start the download. - When the download is finished, you can list the files in the new directory: ```bash -ls Meta-Llama-3.1-8B +ls /home/pi/.llama/checkpoints/Llama3.1-8B ``` The output is: @@ -85,34 +54,26 @@ If you encounter the error "Sorry, we could not process your request at this mom The next step is to generate a `.pte` file that can be used for prompts. From the `executorch` directory, compile the model executable. Note the quantization option, which reduces the model size significantly. -If you've followed the tutorial, this should now take you to the `executorch` base directory. - -Navigate back to the top-level directory of the `executorch` repository: - -```bash {cwd="executorch"} -cd ../../../ -``` - -You are now in `$HOME/executorch` and ready to create the model file for ExecuTorch. +If you've followed the tutorial, you should be in the `executorch` base directory. -Run the Python command below to create the model file, `llama3_kv_sdpa_xnn_qe_4_32.pte`. +Run the Python command below to create the model file, `llama3_kv_sdpa_xnn_qe_4_32.pte`. ```bash -python -m examples.models.llama2.export_llama --checkpoint llama-models/models/llama3_1/Meta-Llama-3.1-8B/consolidated.00.pth \ --p llama-models/models/llama3_1/Meta-Llama-3.1-8B/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w \ +python -m examples.models.llama.export_llama --checkpoint /home/pi/.llama/checkpoints/Llama3.1-8B/consolidated.00.pth \ +-p /home/pi/.llama/checkpoints/Llama3.1-8B/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w \ --group_size 128 -d fp32 --metadata '{"get_bos_id":128000, "get_eos_id":128001}' \ --embedding-quantize 4,32 --output_name="llama3_kv_sdpa_xnn_qe_4_32.pte" ``` -Where `consolidated.00.pth` and `params.json` are the paths to the downloaded model files, found in `llama3/Meta-Llama-3-8B`. +Where `consolidated.00.pth` and `params.json` are the paths to the downloaded model files, found in `/home/pi/.llama/checkpoints/Llama3.1-8B`. -This step takes some time and will run out of memory if you have 32 GB RAM or less. +This step takes some time and will run out of memory if you have 32 GB RAM or less. ## Compile and build the executable Follow the steps below to build ExecuTorch and the Llama runner to run models. -The final step for running the model is to build `llama_main` and `llama_main` which are used to run the Llama 3 model. +The final step for running the model is to build `llama_main` and `llama_main` which are used to run the Llama 3 model. First, compile and build ExecuTorch with `cmake`: @@ -127,6 +88,9 @@ cmake -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \ + -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \ + -DEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER=ON \ + -DEXECUTORCH_BUILD_EXTENSION_LLM=ON \ -Bcmake-out . cmake --build cmake-out -j16 --target install --config Release ``` @@ -141,9 +105,9 @@ cmake -DPYTHON_EXECUTABLE=python \ -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ -DEXECUTORCH_BUILD_XNNPACK=ON \ -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ - -Bcmake-out/examples/models/llama2 \ - examples/models/llama2 -cmake --build cmake-out/examples/models/llama2 -j16 --config Release + -Bcmake-out/examples/models/llama \ + examples/models/llama +cmake --build cmake-out/examples/models/llama -j16 --config Release ``` The CMake build options are available on [GitHub](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L59). @@ -152,17 +116,17 @@ When the build completes, you have everything you need to test the model. ## Run the model -Use `llama_main` to run the model with a sample prompt: +Use `llama_main` to run the model with a sample prompt: ``` bash -cmake-out/examples/models/llama2/llama_main \ +cmake-out/examples/models/llama/llama_main \ --model_path=llama3_kv_sdpa_xnn_qe_4_32.pte \ ---tokenizer_path=./llama-models/models/llama3_1/Meta-Llama-3.1-8B/tokenizer.model \ +--tokenizer_path=/home/pi/.llama/checkpoints/Llama3.1-8B/tokenizer.model \ --cpu_threads=4 \ --prompt="Write a python script that prints the first 15 numbers in the Fibonacci series. Annotate the script with comments explaining what the code does." ``` -You can use `cmake-out/examples/models/llama2/llama_main --help` to read about the options. +You can use `cmake-out/examples/models/llama2/llama_main --help` to read about the options. If all goes well, you will see the model output along with some memory statistics. Some output has been omitted for better readability. @@ -185,5 +149,5 @@ I 00:00:46.844400 executorch:runner.cpp:134] append_eos_to_prompt: 0 You now know how to run a Llama model in Raspberry Pi OS using ExecuTorch. You can experiment with different prompts and different numbers of CPU threads. -If you have access to the RPi 5, continue to the next section to see how to deploy the software to the board and run it. +If you have access to the RPi 5, continue to the next section to see how to deploy the software to the board and run it. diff --git a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/run.md b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/run.md index d347f9d481..1e1ee80ff6 100644 --- a/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/run.md +++ b/content/learning-paths/embedded-and-microcontrollers/rpi-llama3/run.md @@ -9,7 +9,7 @@ This final section explains how to test the model by experimenting with differen ## Set up your Raspberry Pi 5 -If you want to see how the LLM behaves in an embedded environment, you need a Raspberry Pi 5 running Raspberry Pi OS. +If you want to see how the LLM behaves in an embedded environment, you need a Raspberry Pi 5 running Raspberry Pi OS. Install Raspberry Pi OS using the [Raspberry Pi documentation](https://www.raspberrypi.com/documentation/computers/getting-started.html). There are numerous ways to prepare an SD card, but Raspberry Pi recommends [Raspberry Pi Imager](https://www.raspberrypi.com/software/) on a Windows, Linux, or macOS computer with an SD card slot or SD card adapter. @@ -19,9 +19,9 @@ The 8GB RAM Raspberry Pi 5 model is preferred for exploring an LLM. ## Collect the files into an archive -There are just a few files that you need to transfer to the Raspberry Pi 5. You can bundle them together and transfer them from the running container to the development machine, and then to the Raspberry Pi 5. +There are just a few files that you need to transfer to the Raspberry Pi 5. You can bundle them together and transfer them from the running container to the development machine, and then to the Raspberry Pi 5. -You should still be in the container, in the `$HOME/executorch` directory. +You should still be in the container, in the `$HOME/executorch` directory. The commands below copy the needed files to a new directory. The model file is very large and takes time to copy. @@ -29,12 +29,11 @@ Run the commands below to collect the files: ```bash mkdir llama3-files -cp cmake-out/examples/models/llama2/llama_main ./llama3-files/llama_main -cp llama-models/models/llama3_1/Meta-Llama-3.1-8B/params.json ./llama3-files/params.json -cp llama-models/models/llama3_1/Meta-Llama-3.1-8B/tokenizer.model ./llama3-files/tokenizer.model +cp cmake-out/examples/models/llama/llama_main ./llama3-files/llama_main +cp /home/pi/.llama/checkpoints/Llama3.1-8B/params.json ./llama3-files/params.json +cp /home/pi/.llama/checkpoints/Llama3.1-8B/tokenizer.model ./llama3-files/tokenizer.model cp llama3_kv_sdpa_xnn_qe_4_32.pte ./llama3-files/llama3_kv_sdpa_xnn_qe_4_32.pte -cp ./cmake-out/examples/models/llama2/runner/libllama_runner.so ./llama3-files -cp ./cmake-out/lib/libextension_module.so ./llama3-files +cp ./cmake-out/examples/models/llama/runner/libllama_runner.so ./llama3-files ``` Compress the files into an archive using the `tar` command: @@ -45,7 +44,7 @@ tar czvf llama3-files.tar.gz ./llama3-files Next, copy the compressed tar file out of the container to the development computer. This is done using the `docker cp` command from the development machine. -Open a new shell or terminal on the development machine where Docker is running the container. +Open a new shell or terminal on the development machine where Docker is running the container. Find the `CONTAINER ID` for the running container: @@ -60,7 +59,7 @@ CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAME 88c34c899c8c rpi-os "/bin/bash" 7 hours ago Up 7 hours fervent_vaughan ``` -Your `CONTAINER ID` will be different so substitute your value. +Your `CONTAINER ID` will be different so substitute your value. Copy the compressed file out of the container: @@ -70,9 +69,9 @@ docker cp 88c34c899c8c:/home/pi/executorch/llama3-files.tar.gz . ## Transfer the archive to the Raspberry Pi 5 -Now you can transfer the archive from the development machine to your Raspberry Pi 5. +Now you can transfer the archive from the development machine to your Raspberry Pi 5. -There are multiple ways to do this: via cloud storage services, with a USB thumb drive, or using SSH. Use any method that is convenient for you. +There are multiple ways to do this: via cloud storage services, with a USB thumb drive, or using SSH. Use any method that is convenient for you. For example, you can use `scp` running from a terminal in your Raspberry Pi 5 device as shown. Follow the same option as you did in the previous step. @@ -80,7 +79,7 @@ For example, you can use `scp` running from a terminal in your Raspberry Pi 5 de scp llama3-files.tar.gz @:~/ ``` -Substitute the username and the IP address of the Raspberry Pi 5. +Substitute the username and the IP address of the Raspberry Pi 5. The file is very large so you can also consider using a USB drive. @@ -91,7 +90,7 @@ Finally, log in to the Raspberry Pi 5 and run the model in a terminal using the Extract the file: ```bash -tar xvfz llama3-files.tar.gz +tar xvfz llama3-files.tar.gz ``` Change to the new directory: @@ -108,9 +107,9 @@ LD_LIBRARY_PATH=. ./llama_main --model_path=llama3_kv_sdpa_xnn_qe_4_32.pte --to ``` {{% notice Note %}} -The `llama_main` program uses dynamic linking, so you need to inform the dynamic linker to look for the 2 libraries in the current directory. +The `llama_main` program uses dynamic linking, so you need to inform the dynamic linker to look for the 2 libraries in the current directory. {{% /notice %}} From here, you can experiment with different prompts and command line options on your Raspberry Pi 5. -Make sure to exit your container and clean up any development resources you created. \ No newline at end of file +Make sure to exit your container and clean up any development resources you created. \ No newline at end of file