diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md index 36a784d664..772e4d96c5 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/01_launching_a_graviton4_instance.md @@ -6,17 +6,13 @@ weight: 2 layout: learningpathall --- -## System Requirements +## Requirements - An AWS account - - Quota for c8g instances in your preferred region + - Access to launch an EC2 instance of type `c8g.4xlarge` (or larger) with at least 128 GB of storage - - A Linux or MacOS host - - - A c8g instance (4xlarge or larger) - - - At least 128GB of storage +For more information about creating an EC2 instance using AWS refer to [Getting Started with AWS](/learning-paths/servers-and-cloud-computing/csp/aws/). ## AWS Console Steps @@ -49,12 +45,14 @@ Follow these steps to launch your EC2 instance using the AWS Management Console: 3. **Secure the Key File** - Move the downloaded `.pem` file to the SSH configuration directory + ```bash mkdir -p ~/.ssh mv arcee-graviton4-key.pem ~/.ssh ``` - - Set proper permissions (on Mac/Linux): + - Set proper permissions on macOS or Linux: + ```bash chmod 400 ~/.ssh/arcee-graviton4-key.pem ``` @@ -105,9 +103,12 @@ Follow these steps to launch your EC2 instance using the AWS Management Console: - In the dropdown list, select "My IP". - Note 1: you will only be able to connect to the instance from your current host, which is the safest setting. We don't recommend selecting "Anywhere", which would allow anyone on the Internet to attempt to connect. Use at your own risk. - Note 2: although this demonstration only requires SSH access, feel free to use one of your existing security groups as long as it allows SSH traffic. +{{% notice Notes %}} +You will only be able to connect to the instance from your current host, which is the safest setting. Selecting "Anywhere" allows anyone on the Internet to attempt to connect; use at your own risk. + +Although this demonstration only requires SSH access, it is possible to use one of your existing security groups as long as it allows SSH traffic. +{{% /notice %}} 5. **Configure Storage** @@ -161,7 +162,7 @@ Follow these steps to launch your EC2 instance using the AWS Management Console: - **AMI Selection**: The Ubuntu 24.04 LTS AMI must be ARM64 compatible for Graviton processors -- **Security**: please think twice about allowing SSH from anywhere (0.0.0.0/0). We strongly recommend restricting access to your IP address +- **Security**: Think twice about allowing SSH from anywhere (0.0.0.0/0). It is strongly recommended to restrict access to your IP address. - **Storage**: The 128GB EBS volume is sufficient for the Arcee model and dependencies diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md index f8e09c292e..c85c8f0bc4 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/02_setting_up_the_instance.md @@ -6,7 +6,7 @@ weight: 3 layout: learningpathall --- -In this step, we'll set up the Graviton4 instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment. +In this step, you'll set up the Graviton4 instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment. ## Step 1: Update Package List @@ -29,7 +29,7 @@ sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv li This command installs all the essential development tools and dependencies: -- **cmake**: Cross-platform build system generator that we'll use to compile Llama.cpp +- **cmake**: Cross-platform build system generator used to compile Llama.cpp - **gcc & g++**: GNU C and C++ compilers for building native code - **git**: Version control system for cloning repositories - **python3**: Python interpreter for running Python-based tools and scripts @@ -39,9 +39,9 @@ This command installs all the essential development tools and dependencies: The `-y` flag automatically answers "yes" to prompts, making the installation non-interactive. -## What's Ready Now +## What's Ready Now? -After completing these steps, your Graviton4 instance will have: +After completing these steps, your Graviton4 instance has: - A complete C/C++ development environment for building Llama.cpp - Python 3 with pip for managing Python packages diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md index 713fff1696..b4cdcf0f7a 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/03_building_llama_cpp.md @@ -6,11 +6,9 @@ weight: 4 layout: learningpathall --- -In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Graviton4. +In this step, you'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including Arm-based processors like Graviton4. -Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code. - -Here are all the steps. +Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of Llama.cpp as the Arcee AI team has contributed the appropriate modeling code. ## Step 1: Clone the Repository @@ -26,7 +24,7 @@ This command clones the Llama.cpp repository from GitHub to your local machine. cd llama.cpp ``` -Change into the llama.cpp directory where we'll perform the build process. This directory contains the CMakeLists.txt file and source code structure. +Change into the llama.cpp directory to run the build process. This directory contains the `CMakeLists.txt` file and source code structure. ## Step 3: Configure the Build with CMake @@ -35,13 +33,15 @@ cmake -B . ``` This command uses CMake to configure the build system: + - `-B .` specifies that the build files should be generated in the current directory - CMake will detect your system's compiler, libraries, and hardware capabilities - It will generate the appropriate build files (Makefiles on Linux) based on your system configuration -Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4: -```bash +The CMake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4: + +```output -- ARM feature DOTPROD enabled -- ARM feature SVE enabled -- ARM feature MATMUL_INT8 enabled @@ -69,7 +69,7 @@ This command compiles the Llama.cpp project: The build process will compile the C++ source code into executable binaries optimized for your ARM64 architecture. This should only take a minute. -## What Gets Built +## What is built? After successful compilation, you'll have several key command-line executables in the `bin` directory: - `llama-cli` - The main inference executable for running LLaMA models diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md index d3f9ebcac3..f21d281408 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/04_install_python_dependencies_for_llama_cpp.md @@ -6,9 +6,7 @@ weight: 5 layout: learningpathall --- -In this step, we'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures we have a clean, isolated Python environment with all the necessary packages for model optimization. - -Here are all the steps. +In this step, you'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures you have a clean, isolated Python environment with all the necessary packages for model optimization. ## Step 1: Create a Python Virtual Environment @@ -30,14 +28,14 @@ source env-llama-cpp/bin/activate This command activates the virtual environment: - The `source` command executes the activation script, which modifies your current shell environment -- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. We will reflect this in the following commands. +- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. This will be reflected in the following commands. - All subsequent `pip` commands will install packages into this isolated environment - The `PATH` environment variable is updated to prioritize the virtual environment's Python interpreter ## Step 3: Upgrade pip to the Latest Version ```bash -(env-llama-cpp) pip install --upgrade pip +pip install --upgrade pip ``` This command ensures you have the latest version of pip: @@ -49,7 +47,7 @@ This command ensures you have the latest version of pip: ## Step 4: Install Project Dependencies ```bash -(env-llama-cpp) pip install -r requirements.txt +pip install -r requirements.txt ``` This command installs all the Python packages specified in the requirements.txt file: @@ -58,7 +56,7 @@ This command installs all the Python packages specified in the requirements.txt - This ensures everyone working on the project uses the same package versions - The installation will include packages needed for model loading, inference, and any Python bindings for Llama.cpp -## What Gets Installed +## What is installed? After successful installation, your virtual environment will contain: - **NumPy**: For numerical computations and array operations diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md index fffb81a79a..e293e74ff7 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/05_downloading_and_optimizing_afm45b.md @@ -6,24 +6,24 @@ weight: 6 layout: learningpathall --- -In this step, we'll download the AFM-4.5B model from Hugging Face, convert it to the GGUF format for use with Llama.cpp, and create quantized versions to optimize memory usage and inference speed. +In this step, you'll download the AFM-4.5B model from Hugging Face, convert it to the GGUF format for use with Llama.cpp, and create quantized versions to optimize memory usage and inference speed. The first release of the [Arcee Foundation Model](https://www.arcee.ai/blog/announcing-the-arcee-foundation-model-family) family, [AFM-4.5B](https://www.arcee.ai/blog/deep-dive-afm-4-5b-the-first-arcee-foundational-model) is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish -Here are all the steps to download and optimize the model for AWS Graviton4. Make sure to run them in the virtual environment you created at the previous step. +Here are the steps to download and optimize the model for AWS Graviton4. Make sure to run them in the virtual environment you created at the previous step. ## Step 1: Install the Hugging Face libraries ```bash -(env-llama-cpp) pip install huggingface_hub hf_xet +pip install huggingface_hub hf_xet ``` -This command installs the Hugging Face Hub Python library, which provides tools for downloading models and datasets from the Hugging Face platform. The library includes the `huggingface-cli` command-line interface that we'll use to download the AFM-4.5B model. The `hf_xet` library provides additional functionality for efficient data transfer and caching when downloading large models from Hugging Face Hub. +This command installs the Hugging Face Hub Python library, which provides tools for downloading models and datasets from the Hugging Face platform. The library includes the `huggingface-cli` command-line interface that you can use to download the AFM-4.5B model. ## Step 2: Download the AFM-4.5B Model ```bash -(env-llama-cpp) huggingface-cli download arcee-ai/afm-4.5B --local-dir models/afm-4-5b +huggingface-cli download arcee-ai/afm-4.5B --local-dir models/afm-4-5b ``` This command downloads the AFM-4.5B model from the Hugging Face Hub: @@ -35,8 +35,8 @@ This command downloads the AFM-4.5B model from the Hugging Face Hub: ## Step 3: Convert to GGUF Format ```bash -(env-llama-cpp) python3 convert_hf_to_gguf.py models/afm-4-5b -(env-llama-cpp) deactivate +python3 convert_hf_to_gguf.py models/afm-4-5b +deactivate ``` The first command converts the downloaded Hugging Face model to the GGUF (GGML Universal Format) format: @@ -46,7 +46,7 @@ The first command converts the downloaded Hugging Face model to the GGUF (GGML U - It outputs a single `afm-4-5B-F16.gguf` ~15GB file in the `models/afm-4-5b/` directory - GGUF is the native format used by Llama.cpp and provides efficient loading and inference -Then, we deactivate the Python virtual environment as future commands won't require it. +Next, deactivate the Python virtual environment as future commands won't require it. ## Step 4: Create Q4_0 Quantized Version @@ -81,7 +81,7 @@ This command creates an 8-bit quantized version of the model: **ARM Optimization**: Similar to Q4_0, ARM has contributed optimized kernels for Q8_0 quantization that take advantage of Neoverse v2 instruction sets. These optimizations provide excellent performance for 8-bit operations while maintaining higher accuracy compared to 4-bit quantization. -## What You'll Have +## What is available now? After completing these steps, you'll have three versions of the AFM-4.5B model: - `afm-4-5B-F16.gguf` - The original full-precision model (~15GB) diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/06_running_inference.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/06_running_inference.md index 7898ab02a5..e9670cb603 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/06_running_inference.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/06_running_inference.md @@ -6,7 +6,7 @@ weight: 7 layout: learningpathall --- -Now that we have our AFM-4.5B models in GGUF format, we can run inference using various Llama.cpp tools. In this step, we'll explore different ways to interact with the model for text generation, benchmarking, and evaluation. +Now that you have the AFM-4.5B models in GGUF format, you can run inference using various Llama.cpp tools. In this step, you'll explore different ways to interact with the model for text generation, benchmarking, and evaluation. ## Using llama-cli for Interactive Text Generation @@ -19,6 +19,7 @@ bin/llama-cli -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -n 256 --color ``` This command starts an interactive session with the model: + - `-m models/afm-4-5b/afm-4-5B-Q8_0.gguf` specifies the model file to load - `-n 512` sets the maximum number of tokens to generate per response - The tool will prompt you to enter text, and the model will generate a response @@ -29,7 +30,7 @@ In this example, `llama-cli` uses 16 vCPUs. You can try different values with `- Once you start the interactive session, you can have conversations like this: -``` +```console > Give me a brief explanation of the attention mechnanism in transformer models. In transformer models, the attention mechanism allows the model to focus on specific parts of the input sequence when computing the output. Here's a simplified explanation: @@ -50,6 +51,7 @@ The attention mechanism allows transformer models to selectively focus on specif To exit the interactive session, type `Ctrl+C` or `/bye`. This will display performance statistics: + ```bash llama_perf_sampler_print: sampling time = 26.66 ms / 356 runs ( 0.07 ms per token, 13352.84 tokens per second) llama_perf_context_print: load time = 782.72 ms @@ -62,7 +64,7 @@ In this example, our 8-bit model running on 16 threads generated 355 tokens, at ### Example Non-Interactive Session -Now, let's try the 4-bit model in non-interactive mode: +Now, try the 4-bit model in non-interactive mode: ```bash bin/llama-cli -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -n 256 --color -no-cnv -p "Give me a brief explanation of the attention mechnanism in transformer models." @@ -116,7 +118,7 @@ curl -X POST http://localhost:8080/v1/chat/completions \ }' ``` -You should get an answer similar to this one: +You get an answer similar to this one: ```json { @@ -153,4 +155,4 @@ You should get an answer similar to this one: } ``` -You could also interact with the server using Python with the [OpenAI client library](https://github.com/openai/openai-python), enabling streaming responses, and other features. +You can also interact with the server using Python with the [OpenAI client library](https://github.com/openai/openai-python), enabling streaming responses, and other features. diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md index 7788ddde5f..bf390d985e 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/07_evaluating_the_quantized_models.md @@ -63,7 +63,7 @@ The results should look like this: It's pretty amazing to see that with only 4 threads, the 4-bit model can still generate at the very comfortable speed of 15 tokens per second. We could definitely run several copies of the model on the same instance to serve concurrent users or applications. -You could also try [`llama-batched-bench`](https://github.com/ggml-org/llama.cpp/tree/master/tools/batched-bench) to benchmark performance on batch sizes larger than 1. +You can also try [`llama-batched-bench`](https://github.com/ggml-org/llama.cpp/tree/master/tools/batched-bench) to benchmark performance on batch sizes larger than 1. ## Using llama-perplexity for Model Evaluation @@ -74,7 +74,7 @@ The `llama-perplexity` tool evaluates the model's quality on text datasets by ca ### Downloading a Test Dataset -First, let's download the Wikitest-2 test dataset. +First, download the Wikitest-2 test dataset. ```bash sh scripts/get-wikitext-2.sh @@ -82,7 +82,8 @@ sh scripts/get-wikitext-2.sh ### Running Perplexity Evaluation -Now, let's measure perplexity on the test dataset +Next, measure perplexity on the test dataset. + ```bash bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-F16.gguf -f wikitext-2-raw/wiki.test.raw bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q8_0.gguf -f wikitext-2-raw/wiki.test.raw @@ -106,16 +107,13 @@ bin/llama-perplexity -m models/afm-4-5b/afm-4-5B-Q4_0.gguf -f wikitext-2-raw/wik tail -f ppl.sh.log ``` - Here are the full results. - | Model | Generation Speed (tokens/s, 16 vCPUs) | Memory Usage | Perplexity (Wikitext-2) | |:-------:|:----------------------:|:------------:|:----------:| | F16 | ~15–16 | ~15 GB | TODO | | Q8_0 | ~25 | ~8 GB | TODO | | Q4_0 | ~40 | ~4.4 GB | TODO | - -*Please remember to terminate the instance in the AWS console when you're done testing* +When you have finished your benchmarking and evaluation, make sure to terminate your AWS EC2 instance in the AWS Management Console to avoid incurring unnecessary charges for unused compute resources. diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md index 73a859ffce..a7effd0311 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/08_conclusion.md @@ -11,19 +11,17 @@ layout: learningpathall Congratulations! You have successfully completed the journey of deploying the Arcee AFM-4.5B foundation model on AWS Graviton4. -*Please remember to terminate the instance in the AWS console when you're done testing* +Here is a summary of what you learned. -Let's recap what we accomplished. +### What you built -### What We Built - -Throughout this learning path, you've: +Using this Learning Path, you have: 1. **Launched a Graviton4-powered EC2 instance** - Set up a c8g.4xlarge instance running Ubuntu 24.04 LTS, leveraging AWS's latest Arm-based processors for optimal performance and cost efficiency. 2. **Configured the development environment** - Installed essential tools and dependencies, including Git, build tools, and Python packages needed for machine learning workloads. -3. **Built llama.cpp from source** - Compiled the optimized inference engine specifically for Arm64 architecture, ensuring maximum performance on Graviton4 processors. +3. **Built Llama.cpp from source** - Compiled the optimized inference engine specifically for Arm64 architecture, ensuring maximum performance on Graviton4 processors. 4. **Downloaded and optimized AFM-4.5B** - Retrieved the 4.5-billion parameter Arcee Foundation Model and converted it to the efficient GGUF format, then created quantized versions (8-bit and 4-bit) to balance performance and memory usage. @@ -62,7 +60,7 @@ Now that you have a fully functional AFM-4.5B deployment, here are some exciting - Develop content generation tools - Integrate with existing applications via REST APIs -The combination of Arcee AI's efficient foundation models, llama.cpp's optimized inference engine, and AWS Graviton4's powerful Arm processors creates a compelling platform for deploying production-ready AI applications. Whether you're building chatbots, content generators, or research tools, this stack provides the performance, cost efficiency, and flexibility needed for modern AI workloads. +The combination of Arcee AI's efficient foundation models, Llama.cpp's optimized inference engine, and AWS Graviton4's powerful Arm processors creates a compelling platform for deploying production-ready AI applications. Whether you're building chatbots, content generators, or research tools, this stack provides the performance, cost efficiency, and flexibility needed for modern AI workloads. For more information on Arcee AI and how we can help you build high-quality, secure, and cost-efficient AI, solution, please visit [www.arcee.ai](https://www.arcee.ai). diff --git a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/_index.md b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/_index.md index abb4b07551..4623d917d2 100644 --- a/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/_index.md +++ b/content/learning-paths/servers-and-cloud-computing/arcee-foundation-model-on-aws/_index.md @@ -1,26 +1,25 @@ --- title: Deploy Arcee AFM-4.5B on AWS Graviton4 - draft: true cascade: draft: true minutes_to_complete: 30 -who_is_this_for: This is an introductory topic for developers and engineers who want to deploy the Arcee AFM-4.5B small language model on an AWS Arm-based instance. AFM-4.5B is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish +who_is_this_for: This is an introductory topic for developers and engineers who want to deploy the Arcee AFM-4.5B small language model on an AWS Arm-based instance. AFM-4.5B is a 4.5-billion-parameter frontier model that delivers excellent accuracy, strict compliance, and very high cost-efficiency. It was trained on almost 7 trillion tokens of clean, rigorously filtered data, and has been tested across a wide range of languages, including Arabic, English, French, German, Hindi, Italian, Korean, Mandarin, Portuguese, Russian, and Spanish. learning_objectives: - - Launch and set up an Arm-based Graviton4 virtual machine on Amazon Web Services - - Build llama.cpp from source - - Download AFM-4.5B from Hugging Face - - Quantize AFM-4.5B with llama.cpp - - Deploy the model and run inference with llama.cpp - - Evaluate the quality of quantized models by measuring perplexity + - Launch and set up an Arm-based Graviton4 virtual machine on Amazon Web Services. + - Build Llama.cpp from source. + - Download AFM-4.5B from Hugging Face. + - Quantize AFM-4.5B with Llama.cpp. + - Deploy the model and run inference with Llama.cpp. + - Evaluate the quality of quantized models by measuring perplexity. prerequisites: - - An Amazon Web Services account, with quota for c8g instances - - Basic familiarity with SSH + - An [AWS account](https://aws.amazon.com/) with permission to launch c8g (Graviton4) instances. + - Basic familiarity with SSH. author: Julien Simon