diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md index 8b8c53c779..a81fd12c81 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md @@ -6,7 +6,7 @@ weight: 4 layout: learningpathall --- -In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the Arcee Foundation Model. This includes installing system packages and a Python environment. +In this step, you'll set up the Graviton4 instance with the tools and dependencies required to build and run the AFM-4.5B model. This includes installing system packages and a Python environment. ## Update the package list diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md index 95fa1e416f..1cdd269a23 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md @@ -7,7 +7,7 @@ layout: learningpathall --- ## Build the Llama.cpp inference engine -In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms,including Arm-based processors like AWS Graviton4. +In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model, optimized for inference on a range of hardware platforms, including Arm-based processors like AWS Graviton4. Even though AFM-4.5B uses a custom model architecture, you can still use the standard Llama.cpp repository - Arcee AI has contributed the necessary modeling code upstream. diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md index b680dcf7eb..db8d79dc36 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md @@ -32,7 +32,7 @@ This command does the following: - Runs the activation script, which modifies your shell environment - Updates your shell prompt to show `env-llama-cpp`, indicating the environment is active -- Updates `PATH` to use so the environment’s Python interpreter +- Updates `PATH` to use the environment’s Python interpreter - Ensures all `pip` commands install packages into the isolated environment ## Upgrade pip to the latest version @@ -72,7 +72,8 @@ After the installation completes, your virtual environment includes: - **NumPy**: for numerical computations and array operations - **Requests**: for HTTP operations and API calls - **Other dependencies**: additional packages required by llama.cpp's Python bindings and utilities -Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries + +Your environment is now ready to run Python scripts that integrate with the compiled Llama.cpp binaries. {{< notice Tip >}} Before running any Python commands, make sure your virtual environment is activated. {{< /notice >}} diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md index ec980c9f03..5c68008870 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md @@ -8,7 +8,8 @@ layout: learningpathall In this step, you’ll download the [AFM-4.5B](https://huggingface.co/arcee-ai/AFM-4.5B) model from Hugging Face, convert it to the GGUF format for compatibility with `llama.cpp`, and generate quantized versions to optimize memory usage and improve inference speed. -**Note: if you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available.** +{{% notice Note %}} +If you want to skip the model optimization process, [GGUF](https://huggingface.co/arcee-ai/AFM-4.5B-GGUF) versions are available. {{% /notice %}} Make sure to activate your virtual environment before running any commands. The instructions below walk you through downloading and preparing the model for efficient use on AWS Graviton4. @@ -28,11 +29,11 @@ pip install huggingface_hub hf_xet This command installs: - `huggingface_hub`: Python client for downloading models and datasets -- `hf_xet`: Git extension for fetching large model files stored on Hugging Face +- `hf_xet`: Git extension for fetching large model files hosted on Hugging Face These tools include the `hf` command-line interface you'll use next. -## Login to the Hugging Face Hub +## Log in to the Hugging Face Hub ```bash hf auth login @@ -86,7 +87,7 @@ This command creates a 4-bit quantized version of the model: - `llama-quantize` is the quantization tool from Llama.cpp. - `afm-4-5B-F16.gguf` is the input GGUF model file in 16-bit precision. - `Q4_0` applies zero-point 4-bit quantization. -- This reduces the model size by approximately 45% (from ~15GB to ~8GB). +- This reduces the model size by approximately ~70% (from ~15GB to ~4.4GB). - The quantized model will use less memory and run faster, though with a small reduction in accuracy. - The output file will be `afm-4-5B-Q4_0.gguf`. @@ -104,7 +105,7 @@ bin/llama-quantize models/afm-4-5b/afm-4-5B-F16.gguf models/afm-4-5b/afm-4-5B-Q8 This command creates an 8-bit quantized version of the model: - `Q8_0` specifies 8-bit quantization with zero-point compression. -- This reduces the model size by approximately 70% (from ~15GB to ~4.4GB). +- This reduces the model size by approximately ~45% (from ~15GB to ~8GB). - The 8-bit version provides a better balance between memory usage and accuracy than 4-bit quantization. - The output file is named `afm-4-5B-Q8_0.gguf`. - Commonly used in production scenarios where memory resources are available.