# Using Arcee-Agent on SageMaker through Model Packages

This sample notebook shows you how to deploy [Arcee Agent](https://huggingface.co/arcee-ai/Arcee-Agent) using Amazon SageMaker. Arcee Agent is a cutting-edge 7B parameter language model developed by [Arcee.ai](https://www.arcee.ai) specifically for function calling and tool use. Here, we'll build an agent app that invokes the Yahoo Finance API to retrive company information.

Arcee Agent excels at interpreting, executing, and chaining function calls. This capability allows it to interact seamlessly with a wide range of external tools, APIs, and services. The model is compatible with various tool use formats, including Glaive FC v2, Salesforce, and Agent-FLAN. Arcee-Agent performs best when using the VLLM OpenAI FC format, but it also excels with prompt-based solutions.

Initialized from Qwen2-7B, it rivals the performance of much larger models while maintaining efficiency and speed. This model is particularly suited for developers, researchers, and businesses looking to implement sophisticated AI-driven solutions without the computational overhead of larger language models.

## Use cases
Arcee Agent's unique capabilities make it an invaluable asset for businesses across various industries. Here are some specific use cases:

* Customer Support Automation:
    Implement AI-driven chatbots that handle complex customer inquiries and support tickets.
    Automate routine support tasks such as password resets, order tracking, and FAQ responses.
    Integrate with CRM systems to provide personalized customer interactions based on user history.

* Sales and Marketing Automation:
    Automate lead qualification and follow-up using personalized outreach based on user behavior.
    Generate dynamic marketing content tailored to specific audiences and platforms.
    Analyze customer feedback from various sources to inform marketing strategies.

* Operational Efficiency:
    Automate administrative tasks such as scheduling, data entry, and report generation.
    Implement intelligent assistants for real-time data retrieval and analysis from internal databases.
    Streamline project management with automated task assignment and progress tracking.

* Financial Services Automation:
    Automate financial reporting and compliance checks.
    Implement AI-driven financial advisors for personalized investment recommendations.
    Integrate with financial APIs to provide real-time market analysis and alerts.

* Healthcare Solutions:
    Automate patient record management and data retrieval for healthcare providers.

* E-commerce Enhancements:
    Create intelligent product recommendation systems based on user preferences and behavior.
    Automate inventory management and supply chain logistics.
    Implement AI-driven pricing strategies and promotional campaigns.

* Human Resources Automation:
    Automate candidate screening and ranking based on resume analysis and job requirements.
    Implement virtual onboarding assistants to guide new employees through the onboarding process.
    Analyze employee feedback and sentiment to inform HR policies and practices.

* Legal Services Automation:
    Automate contract analysis and extraction of key legal terms and conditions.
    Implement AI-driven tools for legal research and case law summarization.
    Develop virtual legal assistants to provide preliminary legal advice and document drafting.

* Educational Tools:
    Create personalized learning plans and content recommendations for students.
    Automate grading and feedback for assignments and assessments.

* Manufacturing and Supply Chain Automation:
    Optimize production schedules and inventory levels using real-time data analysis.
    Implement predictive maintenance for machinery and equipment.
    Automate quality control processes through data-driven insights.

## Pre-requisites
1. Before running this notebook, please make sure you got this notebook from the model catalog on SageMaker AWS Management Console.
1. **Note**: This notebook contains elements which render correctly in Jupyter interface. Open this notebook from an Amazon SageMaker Notebook Instance or Amazon SageMaker Studio.
1. Ensure that IAM role used has **AmazonSageMakerFullAccess**.

## Contents
1. [Select model package](#1.-Select-model-package)

2. [Create an endpoint and perform real-time inference](#2.-Create-an-endpoint-and-perform-real-time-inference)
    1. [Define the endpoint configuration](#A.-Define-the-endpoint-configuration)
    2. [Create the endpoint](#B.-Create-the-endpoint)
    3. [Define a test payload](#C.-Define-a-test-payload)
    4. [Perform real-time inference](#D.-Perform-real-time-inference)
    5. [Visualize output](#E.-Visualize-output)

3. [Clean-up](#4.-Clean-up)
    1. [Delete the model](#A.-Delete-the-model)
    2. [Delete the endpoint](#B.-Delete-the-endpoint)

## Usage instructions
You can run this notebook one cell at a time (By using Shift+Enter for running a cell).

In [None]:
%%sh
pip install -q yfinance

In [None]:
import yfinance

## 1. Select the model package
Confirm that you received this notebook from model catalog on SageMaker AWS Management Console.

In [None]:
# Mapping for Model Packages (initially only us-east-1 and eu-west-1 is supported)
model_package_map = {
    "us-east-1": "<Model Partner to specify Model package ARN corresponding to their AWS region>",
    "us-west-2": "arn:aws:sagemaker:us-west-2:014498647618:model-package/Arcee-Agent-tgi-test",
    "eu-west-1": "<Model Partner to specify Model package ARN corresponding to their AWS region>",
}

In [None]:
import datetime
import json

import boto3
import sagemaker
from IPython.display import Markdown, display
from sagemaker import ModelPackage, get_execution_role

In [None]:
region = boto3.Session().region_name
if region not in model_package_map.keys():
    raise ("UNSUPPORTED REGION")

model_package_arn = model_package_map[region]

In [None]:
role = get_execution_role()
sagemaker_session = sagemaker.Session()
runtime_sm_client = boto3.client("runtime.sagemaker")

## 2. Create an endpoint and perform real-time inference

In this example, we're deploying Arcee-Agent on a SageMaker real-time endpoint hosted on a GPU instance. If you need general information on real-time inference with Amazon SageMaker, please refer to the SageMaker [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html).

The endpoint runs a Hugging Face [Deep Learning Container](https://huggingface.co/docs/sagemaker/index), powered by the Hugging Face [Text Generation Inference](https://huggingface.co/docs/text-generation-inference/index) Server (TGI). TGI enables high-performance text generation for the most popular open-source language models. 

For flexibility, you can pick from two sample configurations, depending your use case and the instances types available to you. Please make sure to run just one of the configuration cells below.

The endpoint configuration focuses on cost efficiency. It uses a [g5.2xlarge](https://aws.amazon.com/ec2/instance-types/g5/) instance. This instance has a single NVDIA A10G GPU, with 24 GB of GPU RAM. Arcee-Agent has 7 billion 16-bit parameters, which can easily fit without the need for quantization.

For context size, we use the default value defined by the TGI inference server, i.e. 4 KB.

We enable the [OpenAI Messages API](https://huggingface.co/docs/text-generation-inference/messages_api) available in TGI. This will alllow you to invoke the endpoint in the same way you would invoke an OpenAI model. Likewise, the output format will be identical to the OpenAI models. If that's not desirable, you can simply comment out the line setting `MESSAGES_API_ENABLED` to `true`.



### A. Define the endpoint configuration

In [None]:
model_name = "Arcee-Agent"
real_time_inference_instance_type = "ml.g5.2xlarge"

### B. Create the endpoint

In [None]:
# create a deployable model from the model package.
model = ModelPackage(
    role=role,
    model_package_arn=model_package_arn,
    sagemaker_session=sagemaker_session
)

# create a unique endpoint name
timestamp = "{:%Y-%m-%d-%H-%M-%S}".format(datetime.datetime.now())
endpoint_name = f"{model_name}-{timestamp}"
print(f"Deploying endpoint {endpoint_name}")

# deploy the model
response = model.deploy(
    initial_instance_count=1,
    instance_type=real_time_inference_instance_type,
    endpoint_name=endpoint_name,
    model_data_download_timeout=3600,
    container_startup_health_check_timeout=600,
)

Once the endpoint is in service, you will be able to perform real-time inference.

### C. Define a test payload

Let's define a function to invoke the endpoint and run a sample query to check that the model works as it should. We're not using its agent behavior, just answering a simple question.

In [None]:
def invoke_endpoint(system_prompt, user_prompt, max_tokens=128):
    model_sample_input = {
        "model": "tgi",
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        "max_tokens": max_tokens,
    }
    response = runtime_sm_client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType="application/json",
        Body=json.dumps(model_sample_input),
    )
    output = json.loads(response["Body"].read())
    return output["choices"][0]["message"]["content"]

In [None]:
response = invoke_endpoint(
    "As a friendly technical assistant engineer, answer the question in detail.",
    "Why are transformers better models than LSTM?",
    max_tokens=1024,
)

display(Markdown(response))

### D. Define the agent behavior and its functions

Now, let's define the system prompt. Its purpose is to tell the model which functions are available and when it should call them. We list four functions to get the latest stock price, the CEO name, a company summary, and a general-purpose function to answer other types of questions.

In [None]:
agent_prompt = """You have four primary functions: checking the last price of a specified stock, finding the name of a company CEO, finding what a company does, and answering specific questions about a company. Use the appropriate function based on the user's query.

### Functions:

1. **get_stock_price(company_name: str, stock_symbol: str) -> str**
- This function returns the last close price of a specified stock.
- Input: A company name (e.g., Mc Donalds), which you must convert to a stock symbol (e.g., MCD).
- Output: A string containing the last close price of the specified stock (e.g., "The last closing price of Mc Donalds (MCD) is $250.00").

2. **get_ceo_name(company_name: str, stock_symbol: str) -> str**
- This function returns the name of the CEO of a specified company.
- Input: A company name (e.g., Mc Donalds), which you must convert to a stock symbol (e.g., MCD).
- Output: A string containing the name of the CEO of the specified company (e.g., "The CEO of Mc Donalds is John Doe").

3. **get_company_summary(company_name: str, stock_symbol: str) -> str**
- This function returns a summary describing the business activities of a specified company.
- Input: A company name (e.g., Mc Donalds), which you must convert to a stock symbol (e.g., MCD).
- Output: A string containing a detailed summary of the specified company's business activities.

4. **answer_general_question(question: str) -> str**
- This function answers questions in general.
- Input: a user question.
- Output: your best answer to the user question.

### Instructions:

- If the user asks a question related to the price of a stock, use the `get_stock_price` function.
- If the user asks a question related to the CEO of a company, use the `get_ceo_name` function.
- If the user asks a general question about a company's activities, use the `get_company_summary` function.
- If the user asks any other question, use the `answer_general_question` function. Only return the result of the function call, not your internal reasoning.
- Only return the result of the function call.
"""

Next, we write the code for these functions. The first three use the Yahoo Finance [`yfinance`](https://pypi.org/project/yfinance) package, the last one simply invokes the model.

In [None]:
def get_stock_price(company_name, stock_symbol):
    stock = yfinance.Ticker(stock_symbol)
    price = stock.history(period="1d")["Close"].values[0]
    return (
        f"The last closing price of {company_name} ({stock_symbol}) was ${price:.2f}."
    )


def get_ceo_name(company_name, stock_symbol):
    stock = yfinance.Ticker(stock_symbol)
    info = stock.info
    ceo = info["companyOfficers"][0]["name"]
    return f"The CEO of {company_name} is {ceo}. The full job title is {info['companyOfficers'][0]['title']}."


def get_company_summary(company_name, stock_symbol):
    stock = yfinance.Ticker(stock_symbol)
    summary = stock.info["longBusinessSummary"]
    return (
        f"{company_name} ({stock_symbol}) is a company that is involved in {summary}."
    )


def answer_general_question(question):
    return invoke_endpoint("You are a helpful AI assistant", question, max_tokens=2048)

Next, we write a function invoking the agent behavior in Arcee-Agent. The response is the Python function call to perform, which we print. Then, we extract it from the model response, run it, and return the result.

In [None]:
def run_agent(user_prompt):
    try:
        func = invoke_endpoint(agent_prompt, user_prompt).strip()
        print(f"Running this function call: {func}")
        code = f"result = {func}"
        local_vars = {}
        exec(code, globals(), local_vars)
        ans = local_vars.get("result").strip()
        return ans
    except Exception as e:
        print(f"Error: {e}")
        return

### E. Run functions

Now, let's run some examples. Feel free to try the prompts below and use your own. You should be able to get information on any listed company.

In [None]:
def ask_question(user_prompt):
    response = run_agent(user_prompt)
    display(Markdown(response))

In [None]:
user_prompt = "Who is the CEO of Tesla?"
ask_question(user_prompt)

In [None]:
user_prompt = "What the stock price for Deutsche Telekom?"
ask_question(user_prompt)

In [None]:
user_prompt = "What does Cummins build?"
ask_question(user_prompt)

In [None]:
user_prompt = "Who are the main competitors of Hilton?"
ask_question(user_prompt)

Now that you have successfully performed a real-time inference, you do not need the endpoint any more. You can terminate the endpoint to avoid being charged.

## 4. Clean-up

Please don't forget to run the cells below to delete all resources and avoid unecessary charges.

### A. Delete the endpoint

In [None]:
model.sagemaker_session.delete_endpoint(endpoint_name)
model.sagemaker_session.delete_endpoint_config(endpoint_name)

### B. Delete the model

In [None]:
model.delete_model()

Thank you for trying out Arcee-Agent on SageMaker. We have only scratched the surface of what you can do with this model.

We'd be happy to hear from you, learn more about your use case, and help you build your next AI-driven solution. Please reach out to julien@arcee.ai.