Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: Launching a Graviton4 instance
weight: 2

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## System Requirements

- An AWS account

- Quota for c8g instances in your preferred region

- A Linux or MacOS host

- A c8g instance (4xlarge or larger)

- At least 128GB of storage

## AWS Console Steps

Follow these steps to launch your EC2 instance using the AWS Management Console:

### Step 1: Create an SSH Key Pair

1. **Navigate to EC2 Console**

- Go to the [AWS Management Console](https://console.aws.amazon.com)

- Search for "EC2" and click on "EC2" service

2. **Create Key Pair**

- In the left navigation pane, click "Key Pairs" under "Network & Security"

- Click "Create key pair"

- Enter name: `arcee-graviton4-key`

- Select "RSA" as the key pair type

- Select ".pem" as the private key file format

- Click "Create key pair"

- The private key file will automatically download to your computer

3. **Secure the Key File**

- Move the downloaded `.pem` file to the SSH configuration directory
```bash
mkdir -p ~/.ssh
mv arcee-graviton4-key.pem ~/.ssh
```

- Set proper permissions (on Mac/Linux):
```bash
chmod 400 ~/.ssh/arcee-graviton4-key.pem
```

### Step 2: Launch EC2 Instance

1. **Start Instance Launch**

- In the left navigation pane, click "Instances" under "Instances"

- Click "Launch instances" button

2. **Configure Instance Details**

- **Name and tags**: Enter `Arcee-Graviton4-Instance` as the instance name

- **Application and OS Images**:
- Click "Quick Start" tab

- Select "Ubuntu"

- Choose "Ubuntu Server 24.04 LTS (HVM), SSD Volume Type"

- **Important**: Ensure the architecture shows "64-bit (ARM)" for Graviton compatibility

- **Instance type**:
- Click on "Select instance type"

- Select `c8g.4xlarge` or larger

3. **Configure Key Pair**

In "Key pair name", select the SSH keypair you created earlier (`Arcee-Graviton4-Instance`)

4. **Configure Network Settings**

- **Network**: Select a VPC with a least one public subnet.

- **Subnet**: Select a public subnet in the VPC

- **Auto-assign Public IP**: Enable

- **Firewall (security groups)**

- Click on "Create security group"

- Click on "Allow SSH traffic from"

- In the dropdown list, select "My IP".

Note 1: you will only be able to connect to the instance from your current host, which is the safest setting. We don't recommend selecting "Anywhere", which would allow anyone on the Internet to attempt to connect. Use at your own risk.

Note 2: although this demonstration only requires SSH access, feel free to use one of your existing security groups as long as it allows SSH traffic.

5. **Configure Storage**

- **Root volume**:
- Size: `128` GB

- Volume type: `gp3`

7. **Review and Launch**

- Review all settings in the "Summary" section

- Click "Launch instance"

### Step 3: Monitor Instance Launch

1. **View Launch Status**

After a few seconds, you should see a message similar to this one:

`Successfully initiated launch of instance (i-<unique instance ID>)`

If instance launch fails, please review your settings and try again.

2. **Get Connection Information**

- Click on the instance id, or look for the instance in the Instances list in the EC2 console.

- In the "Details" tab of the instance, note the "Public DNS" host name

- This is the host name you'll use to connect via SSH, aka `PUBLIC_DNS_HOSTNAME`

### Step 4: Connect to Your Instance

1. **Open Terminal/Command Prompt**

2. **Connect via SSH**
```bash
ssh -i ~/.ssh/arcee-graviton4-key.pem ubuntu@<PUBLIC_DNS_HOSTNAME>
```

3. **Accept Security Warning**

- When prompted about authenticity of host, type `yes`

- You should now be connected to your Ubuntu instance

### Important Notes

- **Region Selection**: Ensure you're in your preferred AWS region before launching

- **AMI Selection**: The Ubuntu 24.04 LTS AMI must be ARM64 compatible for Graviton processors

- **Security**: please think twice about allowing SSH from anywhere (0.0.0.0/0). We strongly recommend restricting access to your IP address

- **Storage**: The 128GB EBS volume is sufficient for the Arcee model and dependencies

- **Backup**: Consider creating AMIs or snapshots for backup purposes


Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Setting up the instance
weight: 3

### FIXED, DO NOT MODIFY
layout: learningpathall
---

In this step, we'll set up the Graviton4 instance with all the necessary tools and dependencies required to build and run the Arcee Foundation Model. This includes installing the build tools and Python environment.

## Step 1: Update Package List

```bash
sudo apt-get update
```

This command updates the local package index from the repositories:

- Downloads the latest package lists from all configured APT repositories
- Ensures you have the most recent information about available packages and their versions
- This is a best practice before installing new packages to avoid potential conflicts
- The package index contains metadata about available packages, their dependencies, and version information

## Step 2: Install System Dependencies

```bash
sudo apt-get install cmake gcc g++ git python3 python3-pip python3-virtualenv libcurl4-openssl-dev unzip -y
```

This command installs all the essential development tools and dependencies:

- **cmake**: Cross-platform build system generator that we'll use to compile Llama.cpp
- **gcc & g++**: GNU C and C++ compilers for building native code
- **git**: Version control system for cloning repositories
- **python3**: Python interpreter for running Python-based tools and scripts
- **python3-pip**: Python package installer for managing Python dependencies
- **python3-virtualenv**: Tool for creating isolated Python environments
- **libcurl4-openssl-dev**: client-side URL transfer library

The `-y` flag automatically answers "yes" to prompts, making the installation non-interactive.

## What's Ready Now

After completing these steps, your Graviton4 instance will have:

- A complete C/C++ development environment for building Llama.cpp
- Python 3 with pip for managing Python packages
- Git for cloning repositories
- All necessary build tools for compiling optimized ARM64 binaries

The system is now prepared for the next steps: building Llama.cpp and downloading the Arcee Foundation Model.
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
title: Building Llama.cpp
weight: 4

### FIXED, DO NOT MODIFY
layout: learningpathall
---

In this step, we'll build Llama.cpp from source. Llama.cpp is a high-performance C++ implementation of the LLaMA model that's optimized for inference on various hardware platforms, including ARM-based processors like Graviton4.

Even though AFM-4.5B has a custom model architecture, we're able to use the vanilla version of llama.cpp as the Arcee AI team has contributed the appropriate modeling code.

Here are all the steps.

## Step 1: Clone the Repository

```bash
git clone https://github.com/ggerganov/llama.cpp
```

This command clones the Llama.cpp repository from GitHub to your local machine. The repository contains the source code, build scripts, and documentation needed to compile the inference engine.

## Step 2: Navigate to the Project Directory

```bash
cd llama.cpp
```

Change into the llama.cpp directory where we'll perform the build process. This directory contains the CMakeLists.txt file and source code structure.

## Step 3: Configure the Build with CMake

```bash
cmake -B .
```

This command uses CMake to configure the build system:
- `-B .` specifies that the build files should be generated in the current directory
- CMake will detect your system's compiler, libraries, and hardware capabilities
- It will generate the appropriate build files (Makefiles on Linux) based on your system configuration

Note: The cmake output should include the information below, indicating that the build process will leverage the Neoverse V2 architecture's specialized instruction sets designed for AI/ML workloads. These optimizations are crucial for achieving optimal performance on Graviton4:

```bash
-- ARM feature DOTPROD enabled
-- ARM feature SVE enabled
-- ARM feature MATMUL_INT8 enabled
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+dotprod+i8mm+sve
```

- **DOTPROD: Dot Product** - Hardware-accelerated dot product operations for neural network computations
- **SVE: Scalable Vector Extension** - Advanced vector processing capabilities that can handle variable-length vectors up to 2048 bits, providing significant performance improvements for matrix operations
- **MATMUL_INT8: Matrix multiplication units** - Dedicated hardware for efficient matrix operations common in transformer models, accelerating the core computations of large language models
- **FMA: Fused Multiply-Add - Optimized floating-point operations that combine multiplication and addition in a single instruction
- **FP16 Vector Arithmetic - Hardware support for 16-bit floating-point vector operations, reducing memory usage while maintaining good numerical precision

## Step 4: Compile the Project

```bash
cmake --build . --config Release -j16
```

This command compiles the Llama.cpp project:
- `--build .` tells CMake to build the project using the files in the current directory
- `--config Release` specifies a Release build configuration, which enables optimizations and removes debug symbols
- `-j16` runs the build with 16 parallel jobs, which speeds up compilation on multi-core systems like Graviton4

The build process will compile the C++ source code into executable binaries optimized for your ARM64 architecture. This should only take a minute.

## What Gets Built

After successful compilation, you'll have several key command-line executables in the `bin` directory:
- `llama-cli` - The main inference executable for running LLaMA models
- `llama-server` - A web server for serving model inference over HTTP
- `llama-quantize` - a tool for model quantization to reduce memory usage
- Various utility programs for model conversion and optimization

You can find more information in the llama.cpp [GitHub repository](https://github.com/ggml-org/llama.cpp/tree/master/tools).

These binaries are specifically optimized for ARM64 architecture and will provide excellent performance on your Graviton4 instance.
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Installing Python dependencies for llama.cpp
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

In this step, we'll set up a Python virtual environment and install the required dependencies for working with Llama.cpp. This ensures we have a clean, isolated Python environment with all the necessary packages for model optimization.

Here are all the steps.

## Step 1: Create a Python Virtual Environment

```bash
virtualenv env-llama-cpp
```

This command creates a new Python virtual environment named `env-llama-cpp`:
- Virtual environments provide isolated Python environments that prevent conflicts between different projects
- The `env-llama-cpp` directory will contain its own Python interpreter and package installation space
- This isolation ensures that the Llama.cpp dependencies won't interfere with other Python projects on your system
- Virtual environments are essential for reproducible development environments

## Step 2: Activate the Virtual Environment

```bash
source env-llama-cpp/bin/activate
```

This command activates the virtual environment:
- The `source` command executes the activation script, which modifies your current shell environment
- Depending on you sheel, your command prompt may change to show `(env-llama-cpp)` at the beginning, indicating the active environment. We will reflect this in the following commands.
- All subsequent `pip` commands will install packages into this isolated environment
- The `PATH` environment variable is updated to prioritize the virtual environment's Python interpreter

## Step 3: Upgrade pip to the Latest Version

```bash
(env-llama-cpp) pip install --upgrade pip
```

This command ensures you have the latest version of pip:
- Upgrading pip helps avoid compatibility issues with newer packages
- The `--upgrade` flag tells pip to install the newest available version
- This is a best practice before installing project dependencies
- Newer pip versions often include security fixes and improved package resolution

## Step 4: Install Project Dependencies

```bash
(env-llama-cpp) pip install -r requirements.txt
```

This command installs all the Python packages specified in the requirements.txt file:
- The `-r` flag tells pip to read the package list from the specified file
- `requirements.txt` contains a list of Python packages and their version specifications
- This ensures everyone working on the project uses the same package versions
- The installation will include packages needed for model loading, inference, and any Python bindings for Llama.cpp

## What Gets Installed

After successful installation, your virtual environment will contain:
- **NumPy**: For numerical computations and array operations
- **Requests**: For HTTP operations and API calls
- **Other dependencies**: Specific packages needed for Llama.cpp Python integration

The virtual environment is now ready for running Python scripts that interact with the compiled Llama.cpp binaries. Remember to always activate the virtual environment (`source env-llama-cpp/bin/activate`) before running any Python code related to this project.
Loading