Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,23 @@ The first step is to prepare a development environment with the required softwar
- Android NDK version 28.0.12433566.
- Java 17 JDK.
- Git.
- Python 3.10 or later (these instructions have been tested with 3.10 and 3.12)
- Python 3.10 or later (these instructions have been tested with 3.10 and 3.12).

The instructions assume macOS with Apple Silicon, an x86 Debian, or Ubuntu Linux machine with at least 16GB of RAM.
The instructions assume macOS with Apple Silicon, an x86 Debian, or an Ubuntu Linux machine, with at least 16GB of RAM.

## Install Android Studio and Android NDK

Follow these steps to install and configure Android Studio:

1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/).

2. Start Android Studio and open the `Settings` dialog.
2. Start Android Studio and open the **Settings** dialog.

3. Navigate to `Languages & Frameworks -> Android SDK`.
3. Navigate to **Languages & Frameworks**, then **Android SDK**.

4. In the `SDK Platforms` tab, check `Android 14.0 ("UpsideDownCake")`.
4. In the **SDK Platforms** tab, check **Android 14.0 ("UpsideDownCake")**.

Next, install the specific version of the Android NDK that you need by first installing the Android command line tools:
Next, install the specific version of the Android NDK that you require by first installing the Android command line tools:

Linux:

Expand All @@ -52,7 +52,7 @@ Unzip the Android command line tools:
unzip commandlinetools.zip -d android-sdk
```

Install the NDK in the directory that Android Studio installed the SDK. This is generally `~/Library/Android/sdk` by default. Set the requirement environment variables:
Install the NDK in the same directory that Android Studio installed the SDK. This is generally `~/Library/Android/sdk` by default. Set the requirement environment variables:

```
export ANDROID_HOME="$(realpath ~/Library/Android/sdk)"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,13 @@ python3.10 -m venv executorch
source executorch/bin/activate
```

The prompt of your terminal has (executorch) as a prefix to indicate the virtual environment is active.
The prompt of your terminal has `executorch` as a prefix to indicate the virtual environment is active.

### Option 2: Create a Conda virtual environment

Install Miniconda on your development machine by following the [Installing conda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) instructions.

Once `conda` is installed create the environment:
Once `conda` is installed, create the environment:

```bash
conda create -yn executorch python=3.10.0
Expand All @@ -46,4 +46,4 @@ git submodule update --init
./examples/models/llama/install_requirements.sh
```

When these scripts finish successfully, ExecuTorch is all set up. That means it's time to dive into the world of Llama models!
When these scripts finish successfully, ExecuTorch is set up. That means it's time to dive into the world of Llama models!
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Llama models are powerful and versatile, having the ability to generate coherent
* Virtual assistants.
* Language translation.

Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and this [responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).
Please note that the models are subject to the [acceptable use policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md) and [this responsible use guide](https://ai.meta.com/static-resource/responsible-use-guide/).

## Results

Expand All @@ -43,7 +43,7 @@ The results are for two different groupsizes, with max_seq_len 2048, and 1000 sa
|Llama 2 7B | 9.2 | 10.2 | 10.7
|Llama 3 8B | 7.9 | 9.4 | 9.7

Note that groupsize less than 128 was not enabled, since such a model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.
Note that groupsize less than 128 was not enabled in this example, since the model was still too large. This is because current efforts have focused on enabling FP32, and support for FP16 is under way.

What this implies for model size is:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ layout: learningpathall

## Download and export the Llama 3.2 1B model

To get started with Llama 3, you obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. Request the access by filling out your details and read through and accept the Responsible Use Guide. This grants you a license and a download link which is valid for 24 hours. The Llama 3.2 1B model is used for this part, but the same instructions apply for other options as well with minimal modification.
To get started with Llama 3, you can obtain the pre-trained parameters by visiting [Meta's Llama Downloads](https://llama.meta.com/llama-downloads/) page. Request access by filling out your details, and read through and accept the Responsible Use Guide. This grants you a license and a download link which is valid for 24 hours. The Llama 3.2 1B model is used for this exercise, but the same instructions apply to other options as well with minimal modification.

Install the `llama-stack` package from `pip`.
```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Cross-compile Llama runner to run on Android using the steps below.

### 1. Set Android NDK

Set the environment variable to point to the Android NDK.
Set the environment variable to point to the Android NDK:

``` bash
export ANDROID_NDK=$ANDROID_HOME/ndk/28.0.12433566/
Expand Down Expand Up @@ -52,7 +52,7 @@ cmake --build cmake-out-android -j7 --target install --config Release
Make sure you add -DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON option to enable support for KleidiAI kernels in ExecuTorch with XNNPack.
{{% /notice %}}

### 3. Build Llama runner for android
### 3. Build Llama runner for Android

Use `cmake` to cross-compile Llama runner:

Expand All @@ -78,7 +78,7 @@ You should now have `llama_main` available for Android.

## Run on Android via adb shell

### 1. Connect your android phone
### 1. Connect your Android phone

Connect your phone to your computer using a USB cable.

Expand Down Expand Up @@ -122,4 +122,4 @@ I 00:00:09.624432 executorch:stats.h:154] Time to first generated token: 0.2020
I 00:00:09.624434 executorch:stats.h:161] Sampling time over 127 tokens: 0.110000 (seconds)
```

You have successfully run the llama 3.1 1B model on your Android smartphone with ExecuTorch using KleidiAI kernels.
You have successfully run the Llama 3.1 1B model on your Android smartphone with ExecuTorch using KleidiAI kernels.
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,14 @@ You can use the Android demo application included in ExecuTorch repository [Llam
```

{{% notice Note %}}
<path_to_android_ndk> is the root for the NDK, which is usually under ~/Library/Android/sdk/ndk/XX.Y.ZZZZZ for macOS, and contains NOTICE and README.md. Make sure you can confirm <path_to_android_ndk>/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.
<path_to_android_ndk> is the root for the NDK, which is usually under ~/Library/Android/sdk/ndk/XX.Y.ZZZZZ for macOS, and contains NOTICE and README.md.
Make sure you can confirm <path_to_android_ndk>/build/cmake/android.toolchain.cmake is available for CMake to cross-compile.
{{% /notice %}}

3. (Optional) If you need to use tiktoken as the tokenizer (for LLaMA 3), set `EXECUTORCH_USE_TIKTOKEN=ON` and CMake uses it as the tokenizer. If you run other models like LLaMA 2, skip this step.
3. (Optional) If you need to use tiktoken as the tokenizer (for LLama 3), set `EXECUTORCH_USE_TIKTOKEN=ON`, and CMake uses it as the tokenizer. If you run other models such as LLama 2, skip this step.

``` bash
export EXECUTORCH_USE_TIKTOKEN=ON # Only for LLaMA3
export EXECUTORCH_USE_TIKTOKEN=ON # Only for LLama3
```

4. Run the following commands to set up the required JNI library:
Expand Down Expand Up @@ -83,7 +84,7 @@ If the files are not on the device, use the device explorer to copy them.

This is the recommended option.

1. Open Android Studio and select Open an existing Android Studio project and navigate to open `examples/demo-apps/android/LlamaDemo`.
1. Open Android Studio and select **Open an existing Android Studio project** and navigate to open `examples/demo-apps/android/LlamaDemo`.

2. Run the app (^R). This builds and launches the app on the phone.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ review:
answers:
- ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices.
- It is a Pytorch method to quantize LLMs.
- It is a program to execute pytorch models.
- It is a program to execute PyTorch models.
correct_answer: 1
explanation: >
ExecuTorch is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.
Expand All @@ -20,7 +20,7 @@ review:
- Llama is a family of large language models that uses publicly-available data for training.
correct_answer: 3
explanation: >
LLaMA is a state-of-the-art foundational large language model designed to enable researchers to advance their work in this subfield of AI.
LLama is a state-of-the-art foundational large language model designed to enable researchers to advance their work in this subfield of AI.

- questions:
question: >
Expand Down