diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_index.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_index.md new file mode 100644 index 0000000000..4ad2f424b4 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_index.md @@ -0,0 +1,51 @@ +--- +title: How to run AI Agent Application on CPU with llama.cpp and llama-cpp-agent using KleidiAI + +minutes_to_complete: 45 + +who_is_this_for: This Learning Path is for software developers, ML engineers, and those looking to run AI Agent Application locally. + +learning_objectives: + - Set up llama-cpp-python optimised for Arm servers. + - Learn how to optimise LLM models to run locally. + - Learn how to create custom tools for ML models. + - Learn how to use AI Agents for applications. + +prerequisites: + - An AWS Gravition instance (m7g.xlarge) + - Basic understanding of Python and Prompt Engineering + - Understanding of LLM fundamentals. + +author: Andrew Choi + +### Tags +skilllevels: Introductory +subjects: ML +armips: + - Neoverse +tools_software_languages: + - Python + - AWS Gravition +operatingsystems: + - Linux + + + +further_reading: + - resource: + title: llama.cpp + link: https://github.com/ggml-org/llama.cpp + type: documentation + - resource: + title: llama-cpp-agent + link: https://llama-cpp-agent.readthedocs.io/en/latest/ + type: documentation + + + +### FIXED, DO NOT MODIFY +# ================================================================================ +weight: 1 # _index.md always has weight of 1 to order correctly +layout: "learningpathall" # All files under learning paths have this same wrapper +learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_next-steps.md new file mode 100644 index 0000000000..c3db0de5a2 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/_next-steps.md @@ -0,0 +1,8 @@ +--- +# ================================================================================ +# FIXED, DO NOT MODIFY THIS FILE +# ================================================================================ +weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation. +title: "Next Steps" # Always the same, html page title. +layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing. +--- diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/agent-output.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/agent-output.md new file mode 100644 index 0000000000..225ff40553 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/agent-output.md @@ -0,0 +1,90 @@ +--- +title: AI Agent Overview and Test Results +weight: 5 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Explain how LLM which function to use + +Below is a brief explanation of how LLM can be configured and used to execute Agent tasks. + +- This code creates an instance of the quantized `llama3.1` model for more efficient inference on Arm-based systems. +``` +llama_model = Llama( + model_path="./models/llama3.1-8b-instruct.Q4_0_arm.gguf", + n_batch=2048, + n_ctx=10000, + n_threads=64, + n_threads_batch=64, +) +``` + +- Here, you define a provider that leverages the llama.cpp Python bindings. +``` +provider = LlamaCppPythonProvider(llama_model) +``` + +- The function’s docstring guides the LLM on when and how to invoke it. +``` +def function(a,b): + """ + Description about when the function should be called + + Args: + a: description of the argument a + b: description of the argument b + + Returns: + Description about the function's output + """ + + # ... body of your function goes here +``` + +- `from_functions` creates an instance of `LlmStructuredOutputSettings` by passing in a list of callable Python functions. The LLM can then decide if and when to use these functions based on user queries. +``` +LlmStructuredOutputSettings.from_functions([function1, function2, etc]) +``` + +- With this, the user’s prompt is collected and processed through `LlamaCppAgent`. The agent decides whether to call any defined functions based on the request. +``` +user = input("Please write your prompt here: ") + +llama_cpp_agent = LlamaCppAgent( + provider, + debug_output=True, + system_prompt="You're a helpful assistant to answer User query.", + predefined_messages_formatter_type=MessagesFormatterType.LLAMA_3, +) + +result = llama_cpp_agent.get_chat_response( + user, structured_output_settings=output_settings, llm_sampling_settings=settings +) +``` + + +## Example + +- If the user asks, “What is the current time?”, the AI Agent will choose to call the `get_current_time()` function, returning a result in **H:MM AM/PM** format. + +![Prompt asking for the current time](test_prompt.png) + +- As part of the prompt, a list of executable functions is sent to the LLM, allowing the agent to select the appropriate function: + +![Display of available functions in the terminal](test_functions.png) + +- After the user prompt, the AI Agent decides to invoke the function and return thre result: + +![get_current_time function execution](test_output.png) + + + + +## Next Steps +- You can ask different questions to trigger and execute other functions. +- Extend your AI agent by defining custom functions so it can handle specific tasks. You can also re-enable the `TaviliySearchResults` function to unlock search capabilities within your environment. + + + diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent-backend.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent-backend.md new file mode 100644 index 0000000000..477758a8b8 --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent-backend.md @@ -0,0 +1,185 @@ +--- +title: Python Script to Execute the AI Agent Application +weight: 4 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Python Script for AI Agent Application +Once you set up the environment, create a Python script which will execute the AI Agent Applicaion: + +### Option A +- Clone the repository +```bash +cd ~ +git clone https://github.com/jc2409/ai-agent.git +``` + +### Option B +- Creat a Python file: +```bash +cd ~ +touch agent.py +``` + +- Copy and paste the following code: +```bash +from enum import Enum +from typing import Union +from pydantic import BaseModel, Field +from llama_cpp_agent import MessagesFormatterType +from llama_cpp_agent.chat_history.messages import Roles +from llama_cpp_agent.llm_output_settings import LlmStructuredOutputSettings +from llama_cpp_agent import LlamaCppFunctionTool +from llama_cpp_agent import FunctionCallingAgent +from llama_cpp_agent import MessagesFormatterType +from llama_cpp_agent import LlamaCppAgent +from llama_cpp_agent.providers import LlamaCppPythonProvider +from llama_cpp import Llama +# import os +# from dotenv import load_dotenv +# from langchain_community.tools import TavilySearchResults # Uncomment this to enable search function + + +# load_dotenv() + +# os.environ.get("TAVILY_API_KEY") + +llama_model = Llama( + model_path="./models/llama3.1-8b-instruct.Q4_0_arm.gguf", # make sure you use the correct path for the quantized model + n_batch=2048, + n_ctx=10000, + n_threads=64, + n_threads_batch=64, +) + +provider = LlamaCppPythonProvider(llama_model) + + +def open_webpage(): + """ + Open Learning Path Website when user asks the agent regarding Arm Learning Path + """ + import webbrowser + + url = "https://learn.arm.com/" + webbrowser.open(url, new=0, autoraise=True) + + +def get_current_time(): + """ + Returns the current time in H:MM AM/PM format. + """ + import datetime # Import datetime module to get current time + + now = datetime.datetime.now() # Get current time + return now.strftime("%I:%M %p") # Format time in H:MM AM/PM format + + +class MathOperation(Enum): + ADD = "add" + SUBTRACT = "subtract" + MULTIPLY = "multiply" + DIVIDE = "divide" + + +def calculator( + number_one: Union[int, float], + number_two: Union[int, float], + operation: MathOperation, +) -> Union[int, float]: + """ + Perform a math operation on two numbers. + + Args: + number_one: First number + number_two: Second number + operation: Math operation to perform + + Returns: + Result of the mathematical operation + + Raises: + ValueError: If the operation is not recognized + """ + if operation == MathOperation.ADD: + return number_one + number_two + elif operation == MathOperation.SUBTRACT: + return number_one - number_two + elif operation == MathOperation.MULTIPLY: + return number_one * number_two + elif operation == MathOperation.DIVIDE: + return number_one / number_two + else: + raise ValueError("Unknown operation.") + +# Uncomment the following function to enable web search functionality (You will need to install langchain-community) +# def search_from_the_web(content: str): +# """ +# Search useful information from the web to answer User's question + +# Args: +# content: Useful question to retrieve data from the web to answer user's question +# """ +# tool = TavilySearchResults( +# max_results=1, +# search_depth="basic" +# ) +# result = tool.invoke({"query":content}) +# return result + +settings = provider.get_provider_default_settings() + +settings.temperature = 0.65 +# settings.top_p = 0.85 +# settings.top_k = 60 +# settings.tfs_z = 0.95 +settings.max_tokens = 4096 + +output_settings = LlmStructuredOutputSettings.from_functions( + [get_current_time, open_webpage, calculator], allow_parallel_function_calling=True +) + + +def send_message_to_user_callback(message: str): + print(message) + + +def run_web_search_agent(): + user = input("Please write your prompt here: ") + if user == "exit": + return + + llama_cpp_agent = LlamaCppAgent( + provider, + debug_output=True, + system_prompt="You're a helpful assistant to answer User query.", + predefined_messages_formatter_type=MessagesFormatterType.LLAMA_3, + ) + + result = llama_cpp_agent.get_chat_response( + user, structured_output_settings=output_settings, llm_sampling_settings=settings + ) + + print("----------------------------------------------------------------") + print("Response from AI Agent:") + print(result) + print("----------------------------------------------------------------") + +if __name__ == '__main__': + run_web_search_agent() +``` + +## Run the Python Script + +You are now ready to test the AI Agent. Use the following command in a terminal to start the application: +```bash +python3 agent.py +``` + +{{% notice Note %}} + +If it takes too long to process, try to terminate the application and try again. + +{{% /notice %}} diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent.md new file mode 100644 index 0000000000..efaf03f6fb --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/ai-agent.md @@ -0,0 +1,47 @@ +--- +title: Introduction to AI Agents and Agent Use Cases +weight: 2 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Defining AI Agents + +An AI Agent is best understood as an integrated system that goes beyond standard text generation by equipping Large Language Models (LLMs) with tools and domain knowledge. Here’s a closer look at the underlying elements: + +- **System**: Each AI Agent functions as an interconnected ecosystem of components. + - **Environment**: The domain in which the AI Agent operates. For instance, in a system that books travel itineraries, the relevant environment might include airline reservation systems and hotel booking tools. + - **Sensors**: Methods the AI Agent uses to observe its surroundings. In the travel scenario, these could be APIs that inform the agent about seat availability on flights or room occupancy in hotels. + - **Actuators**: Ways the AI Agent exerts influence within that environment. In the example of a travel agent, placing a booking or modifying an existing reservation serves as the agent’s “actuators.” + +- **Large Language Models**: While the notion of agents is not new, LLMs bring powerful language comprehension and data-processing capabilities to agent setups. +- **Performing Actions**: Rather than just produce text, LLMs within an agent context interpret user instructions and interact with tools to achieve specific objectives. +- **Tools**: The agent’s available toolkit depends on the software environment and developer-defined boundaries. In the travel agent example, these tools might be limited to flight and hotel reservation APIs. +- **Knowledge**: Beyond immediate data sources, the agent can fetch additional details—perhaps from databases or web services—to enhance decision-making. + +--- + +## Varieties of AI Agents + +AI Agents come in multiple forms. The table below provides an overview of some agent types and examples illustrating their roles in a travel-booking system: + +| **Agent Category** | **Key Characteristics** | **Example in Travel** | +|--------------------------|--------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------| +| **Simple Reflex Agents** | Act directly based on set rules or conditions. | Filters incoming messages and forwards travel-related emails to a service center. | +| **Model-Based Agents** | Maintain an internal representation of the world and update it based on new inputs. | Monitors flight prices and flags dramatic fluctuations, guided by historical data. | +| **Goal-Based Agents** | Execute actions with the aim of meeting designated objectives. | Figures out the necessary route (flights, transfers) to get from your current location to your target destination. | +| **Utility-Based Agents** | Use scoring or numerical metrics to compare and select actions that fulfill a goal. | Balances cost versus convenience when determining which flights or hotels to book. | +| **Learning Agents** | Adapt over time by integrating lessons from previous feedback or experiences. | Adjusts future booking suggestions based on traveler satisfaction surveys. | +| **Hierarchical Agents** | Split tasks into sub-tasks and delegate smaller pieces of work to subordinate agents.| Cancels a trip by breaking down the process into individual steps, such as canceling a flight, a hotel, and a car rental. | +| **Multi-Agent Systems** | Involve multiple agents that may cooperate or compete to complete tasks. | Cooperative: Different agents each manage flights, accommodations, and excursions. Competitive: Several agents vie for limited rooms. | + +--- + +## Ideal Applications for AI Agents + +While the travel scenario illustrates different categories of AI Agents, there are broader circumstances where agents truly excel: + +- **Open-Ended Challenges**: Complex tasks with no predetermined procedure, requiring the agent to determine the necessary steps. +- **Procedural or Multi-Step Tasks**: Endeavors requiring numerous phases or tool integrations, allowing the agent to switch between resources. +- **Continual Improvement**: Contexts where feedback loops enable the agent to refine its behaviors for better outcomes in the future. diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/set-up.md b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/set-up.md new file mode 100644 index 0000000000..540d1287ce --- /dev/null +++ b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/set-up.md @@ -0,0 +1,90 @@ +--- +title: Set up the Environment to Run an AI Application Locally +weight: 3 + +### FIXED, DO NOT MODIFY +layout: learningpathall +--- + +## Before you begin + +This Learning Path demonstrates how to build an AI Agent Application using open-source Large Language Models (LLMs) optimized for Arm architecture. The AI Agent can use Large Language Models (LLMs) to perform actions by accessing tools and knowledge. The instructions in this Learning Path have been designed for Arm servers running Ubuntu 22.04 LTS. You need an Arm server instance with at least 4 cores and 16GB of memory to run this example. The instructions have been tested on an AWS EC2 `m7g.xlarge instance`. + +## Overview + +In this Learning Path, you learn how to build an AI Agent application using llama-cpp-python and llama-cpp-agent. llama-cpp-python is a Python binding from llama.cpp that enables efficient LLM inference on Arm CPUs and llama-cpp-agent provides an interface for processing text using agentic chains with tools. + +## Installation + +Set up the virtual environment and install dependencies: +```bash +sudo apt-get update +sudo apt-get upgrade +sudo apt install python3-pip python3-venv cmake -y +python3 -m venv ai-agent +source ai-agent/bin/activate +``` + +Install the `llama-cpp-python` package using pip: + +```bash +pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu +``` + +Install the `llama-cpp-agent` and `pydantic` packages using pip: + +```bash +pip install llama-cpp-agent pydantic +``` + + + +## Model Download + +1. Create and navigate to a models directory: + +```bash +mkdir models +cd models +``` +2. Download the Hugging Face model: + +```bash +wget https://huggingface.co/chatpdflocal/llama3.1-8b-gguf/resolve/main/ggml-model-Q4_K_M.gguf +``` + +## Building llama.cpp + +1. Navigate to your home directory: + +```bash +cd ~ +``` + +2. Clone the llama.cpp repository: + +```bash +git clone https://github.com/ggerganov/llama.cpp +``` + +3. Build llama.cpp: + > Note: By default, `llama.cpp` builds for CPU only on Linux and Windows. No extra switches are needed for Arm CPU builds. + +```bash +cd llama.cpp +mkdir build +cd build +cmake .. -DCMAKE_CXX_FLAGS="-mcpu=native" -DCMAKE_C_FLAGS="-mcpu=native" +cmake --build . -v --config Release -j `nproc` +``` + +## Model Quantization + +After building, quantize the model using the following command: + +```bash +cd bin +./llama-quantize --allow-requantize ../../../models/ggml-model-Q4_K_M.gguf ../../../models/llama3.1-8b-instruct.Q4_0_arm.gguf Q4_0 +``` + +This process will create a quantized version of the model optimized for your system. \ No newline at end of file diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_functions.png b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_functions.png new file mode 100644 index 0000000000..7960beff9e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_functions.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_output.png b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_output.png new file mode 100644 index 0000000000..1b4018939e Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_output.png differ diff --git a/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_prompt.png b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_prompt.png new file mode 100644 index 0000000000..f4aa3ede82 Binary files /dev/null and b/content/learning-paths/servers-and-cloud-computing/ai-agent-on-cpu/test_prompt.png differ