# Tool Calling Use Cases

Tool/Function calling in the LLama 3.1 and 3.2 models. 

It allows to more sophisticated tasks like real-time info, performing complex maths, coding assistance or building agentic applications

## Get API keys

In [7]:
from utils import get_tavily_api_key #for search
TAVILY_API_KEY = get_tavily_api_key() # personalize this

## how it works? 
The user informs Llama in the Prompt of the available tools to support the task the user is asking for. 
LLama responds with a direct answer or an indication tool and optimal parameters that should be used by the user and then send it back to Llama. 

It can also be handled automatically by the API/Model

## Load helper functions

In [8]:
from utils import llama31
from utils import cprint
import json

## Define tool system prompt

In [9]:
from datetime import datetime

current_date = datetime.now()
formatted_date = current_date.strftime("%d %B %Y")
print(formatted_date)

09 March 2025


In [11]:
tool_system_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}
"""

## The brave_search built-in tool

In [12]:
prompt = tool_system_prompt + f"""<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the current weather in Menlo Park, California?

<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

In [13]:
response = llama31(prompt)
print(response)



brave_search.call(query="current weather in Menlo Park, California")


In [14]:
no_tool_call_prompt = tool_system_prompt + f"""<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the population of California?

<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""

In [15]:
no_tool_call_response = llama31(no_tool_call_prompt)
print(no_tool_call_response)
#the model decided this is sth that doesn't need a tool for (not real-time data)



As of my knowledge cutoff in 2023, the estimated population of California is approximately 39.5 million people. However, please note that this information may have changed since my knowledge cutoff date. For the most up-to-date information, I recommend checking with the United States Census Bureau or other reliable sources.


### Calling the search API

In [1]:
from tavily import TavilyClient
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

result = tavily_client.search("current weather in Menlo Park, California")
cprint(result)

NameError: name 'TAVILY_API_KEY' is not defined

### Reprompting Llama with search tool response

In [17]:
search_result = result["results"][0]["content"]
print(search_result)

{'location': {'name': 'Menlo Park', 'region': 'California', 'country': 'United States of America', 'lat': 37.4539, 'lon': -122.1811, 'tz_id': 'America/Los_Angeles', 'localtime_epoch': 1741527030, 'localtime': '2025-03-09 06:30'}, 'current': {'last_updated_epoch': 1741527000, 'last_updated': '2025-03-09 06:30', 'temp_c': 6.1, 'temp_f': 43.0, 'is_day': 0, 'condition': {'text': 'Clear', 'icon': '//cdn.weatherapi.com/weather/64x64/night/113.png', 'code': 1000}, 'wind_mph': 3.4, 'wind_kph': 5.4, 'wind_degree': 147, 'wind_dir': 'SSE', 'pressure_mb': 1022.0, 'pressure_in': 30.18, 'precip_mm': 0.0, 'precip_in': 0.0, 'humidity': 75, 'cloud': 0, 'feelslike_c': 5.2, 'feelslike_f': 41.3, 'windchill_c': 7.0, 'windchill_f': 44.6, 'heatindex_c': 7.3, 'heatindex_f': 45.1, 'dewpoint_c': 6.6, 'dewpoint_f': 43.9, 'vis_km': 16.0, 'vis_miles': 9.0, 'uv': 0.0, 'gust_mph': 7.0, 'gust_kph': 11.3}}


In [18]:
prompt = tool_system_prompt + f"""<|eot_id|><|start_header_id|>user<|end_header_id|>
What is the current weather in Menlo Park, California?

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|python_tag|>{response}<|eom_id|> 
<|start_header_id|>ipython<|end_header_id|>

{search_result}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
print(prompt)

#response is the application provided (search) for the user question

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: 09 March 2025
<|eot_id|><|start_header_id|>user<|end_header_id|>
What is the current weather in Menlo Park, California?

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|python_tag|>

brave_search.call(query="current weather in Menlo Park, California")<|eom_id|>
<|start_header_id|>ipython<|end_header_id|>

{'location': {'name': 'Menlo Park', 'region': 'California', 'country': 'United States of America', 'lat': 37.4539, 'lon': -122.1811, 'tz_id': 'America/Los_Angeles', 'localtime_epoch': 1741527030, 'localtime': '2025-03-09 06:30'}, 'current': {'last_updated_epoch': 1741527000, 'last_updated': '2025-03-09 06:30', 'temp_c': 6.1, 'temp_f': 43.0, 'is_day': 0, 'condition': {'text': 'Clear', 'icon': '//cdn.weatherapi.com/weather/64x64/night/113.png', 'code': 1000}, 'wind_mph': 3.4, 'wind_kph': 5.4, 'wind_degree': 147,

In [19]:
response = llama31(prompt)
print(response)

The current weather in Menlo Park, California is clear with a temperature of 6.1°C (43.0°F) and a wind speed of 5.4 km/h (3.4 mph) from the southeast. The humidity is 75% and the pressure is 1022 mb (30.18 in). There is no precipitation and the visibility is 16 km (9.9 miles).


## Using the higher-level message

In [20]:
system_prompt_content = f"""
Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}
"""

In [21]:
messages = [
    {"role": "system", "content":  system_prompt_content},
    {"role": "user",   "content": "What is the current weather in Menlo Park, California?"}
  ]

In [22]:
response = llama31(messages)
print(response)

brave_search.call(query="current weather in Menlo Park, California")


In [23]:
messages = [
    {"role": "system",     "content":  system_prompt_content},
    {"role": "user",       "content": "What is the current weather in Menlo Park, California?"},
    {"role": "assistant",  "content": response},
    {"role": "ipython",    "content": search_result}
  ]

In [24]:
response = llama31(messages)
print(response)

The current weather in Menlo Park, California is clear with a temperature of 6.1°C (43.0°F) and a wind speed of 3.4 mph (5.4 km/h) from the southeast. The humidity is 75% and the pressure is 1022 mb (30.18 in). There is no precipitation and the visibility is 16 km (9.9 miles). The UV index is 0.0 and the wind gust is 7.0 mph (11.3 km/h).


## The Wolfram Alpha tool

In [25]:
math_problem = "Can you help me solve this equation: x^3 - 2x^2 - x + 2 = 0?"

In [26]:
messages = [
    {"role": "system",  "content": system_prompt_content},
    {"role": "user",    "content": math_problem}
]

In [27]:
response = llama31(messages)
print(response)

import sympy as sp

# Define the variable
x = sp.symbols('x')

# Define the equation
equation = x**3 - 2*x**2 - x + 2

# Solve the equation
solutions = sp.solve(equation, x)

print(solutions)


### Calling the Wolfram Alpha tool

In [28]:
from utils import wolfram_alpha
tool_result = wolfram_alpha("solve x^3 - 2x^2 - x + 2 = 0")
print(tool_result)

x = ± 1
x = 2


### Checking the result

In [29]:
from sympy import symbols, Eq, solve

x = symbols('x')                           # Define the variable
equation = Eq(x**3 - 2*x**2 - 1*x + 2, 0) # Define the equation
solution = solve(equation, x)              # Solve the equation

print(solution)

[-1, 1, 2]


### Reprompting Llama with Wolfram Alpha tool response

In [30]:
messages = [
    {"role": "system", "content": system_prompt_content},
    {"role": "user",      "content": math_problem},
    {"role": "assistant", "content": response},
    {"role": "ipython",   "content": tool_result}
]
#in this case, the response is not the app, but the code

In [31]:
response = llama31(messages)
print(response)

The solutions to the equation x^3 - 2x^2 - x + 2 = 0 are x = 1, x = -1, and x = 2.


## The code interpreter built-in tool

In [32]:
loan_question = (
    "How much is the monthly payment, total payment, "
    "and total interest paid for a 30 year mortgage of $1M "
    "at a fixed rate of 6% with a 20% down payment?"
)

In [33]:
messages = [
    {"role": "system",     "content": system_prompt_content},
    {"role": "user",       "content": loan_question},
  ]

In [34]:
response = llama31(messages)
print(response)

import math

# Define the variables
loan_amount = 1000000  # $1M
interest_rate = 0.06  # 6%
loan_term = 30  # 30 years
down_payment = 200000  # 20% of $1M

# Calculate the loan amount after down payment
loan_amount_after_down_payment = loan_amount - down_payment

# Calculate the monthly interest rate
monthly_interest_rate = interest_rate / 12

# Calculate the number of payments
number_of_payments = loan_term * 12

# Calculate the monthly payment
monthly_payment = loan_amount_after_down_payment * (monthly_interest_rate * (1 + monthly_interest_rate) ** number_of_payments) / ((1 + monthly_interest_rate) ** number_of_payments - 1)

# Calculate the total payment
total_payment = monthly_payment * number_of_payments

# Calculate the total interest paid
total_interest_paid = total_payment - loan_amount_after_down_payment

print(f"Monthly payment: ${math.ceil(monthly_payment)}")
print(f"Total payment: ${math.ceil(total_payment)}")
print(f"Total interest paid: ${math.ceil(total_interest_paid)}")

### Use Case: Calculate loan payments

In [35]:
from utils import calculate_loan
monthly_payment, total_payment, total_interest_paid = calculate_loan(
    loan_amount = 1000000, 
    annual_interest_rate = 0.06, 
    loan_term = 30, 
    down_payment = 200000)

print(f"Monthly payment: ${(monthly_payment)}")
print(f"Total payment: ${(total_payment)}")
print(f"Total interest paid: ${(total_interest_paid)}")

Monthly payment: $4796
Total payment: $1726705
Total interest paid: $926705


## Generating the code in Java

In [36]:
messages = [
    {"role":    "system",     
     "content": system_prompt_content + "\nGenerate the code in Java."},
    {"role":    "user",       "content": loan_question},
  ]

In [37]:
response = llama31(messages)
print(response)

import java.util.Scanner;

public class MortgageCalculator {
    public static void main(String[] args) {
        Scanner scanner = new Scanner(System.in);

        // Input values
        double principal = 1000000; // $1M
        double downPayment = 0.2 * principal; // 20% of $1M
        double loanAmount = principal - downPayment; // $800,000
        double annualInterestRate = 0.06; // 6%
        int years = 30;

        // Calculate monthly interest rate
        double monthlyInterestRate = annualInterestRate / 12;

        // Calculate number of payments
        int numberOfPayments = years * 12;

        // Calculate monthly payment
        double monthlyPayment = loanAmount * monthlyInterestRate * Math.pow(1 + monthlyInterestRate, numberOfPayments) / (Math.pow(1 + monthlyInterestRate, numberOfPayments) - 1);

        // Calculate total payment
        double totalPayment = monthlyPayment * numberOfPayments;

        // Calculate total interest paid
        double totalInterest

### Reprompting Llama with Code Interpreter tool response

In [38]:
code_interpreter_tool_response="""
Monthly payment: $4796
Total payment: $1726705
Total interest paid: $926705
"""

In [39]:
messages = [
    {"role": "system", "content":  system_prompt_content},
    {"role": "user",      "content": loan_question},
    {"role": "assistant", "content": response},
    {"role": "ipython",   "content": code_interpreter_tool_response}
  ]
#I provide the response given in a code and some data and it interprets that I want it to perform the operations in the code over that data

In [40]:
response = llama31(messages)
print(response)

The monthly payment for a 30-year mortgage of $1M at a fixed rate of 6% with a 20% down payment is $4,796. The total payment over the life of the loan is $1,726,705. The total interest paid over the life of the loan is $926,705.


## Llama 3.1 custom tool calling

In [41]:
from utils import trending_songs, get_boiling_point

country = "US"
top_num = 5
top_songs = trending_songs(country, top_num)
print(f"Top {top_num} trending songs in {country}:")
print(top_songs)

Top 5 trending songs in US:
['Blinding Lights - The Weeknd', 'Levitating - Dua Lipa', 'Peaches - Justin Bieber', 'Save Your Tears - The Weeknd', 'Good 4 U - Olivia Rodrigo']


## Prompt Llama 3.1 for custom tool call

In [42]:
user_prompt = """
Answer the user's question by using the following functions if needed.
If none of the functions can be used, please say so.
Functions (in JSON format):
{
    "type": "function", "function": {
        "name": "get_boiling_point",
        "description": "Get the boiling point of a liquid",
        "parameters": {
            "type": "object", "properties": [
                {"liquid_name": {"type": "object", "description": "name of the liquid"}},
                {"celsius": {"type": "object", "description": "whether to use celsius"}}
            ], "required": ["liquid_name"]
        }
    }
}
{
    "type": "function", "function": {
        "name": "trending_songs",
        "description": "Returns the trending songs on a Music site",
        "parameters": {
            "type": "object", "properties": [
                {"country": {"type": "object", "description": "country to return trending songs for"}},
                {"n": {"type": "object", "description": "The number of songs to return"}}
            ], "required": ["country"]
        }
    }
}

Question: Can you check the top 5 trending songs in US?
"""

In [43]:
messages = [
    {
      "role": "system", "content":  f"""
Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}
"""},
    {"role": "user", "content": user_prompt}
  ]

In [44]:
result = llama31(messages,405)
print(result)

{"type": "function", "name": "trending_songs", "parameters": {"country": "US", "n": "5"}}


### Calling the custom tool

In [45]:
custom_tools = {"trending_songs": trending_songs,
                "get_boiling_point": get_boiling_point}

In [46]:
res = json.loads(result)
function_name = res['name']
parameters = list(res['parameters'].values())
function_name, parameters

('trending_songs', ['US', '5'])

In [47]:
tool_result = custom_tools[function_name](*parameters)
#* = allow an arbitrary number of inputs
tool_result

['Blinding Lights - The Weeknd',
 'Levitating - Dua Lipa',
 'Peaches - Justin Bieber',
 'Save Your Tears - The Weeknd',
 'Good 4 U - Olivia Rodrigo']

### Reprompting Llama with custom tool call result

In [48]:
messages = [
    {
      "role": "system", "content":  f"""
Environment: ipython
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}
"""},
    {"role": "user", "content": user_prompt},
    {"role": "assistant", "content": result},
    {"role": "ipython", "content": ','.join(tool_result)}
  ]
#its needed to use ipython when making use of a custom function

In [49]:
response = llama31(messages, 70)
print(response)

The top 5 trending songs in the US are:

1. Blinding Lights - The Weeknd
2. Levitating - Dua Lipa
3. Peaches - Justin Bieber
4. Save Your Tears - The Weeknd
5. Good 4 U - Olivia Rodrigo


In [55]:
tool_system_prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Environment: ipython
Tools: brave_search, wolfram_alpha
Cutting Knowledge Date: December 2023
Today Date: {formatted_date}
"""

stats_question = "Look into the web for guidelines to start a statistical analysis. Like descriptive how to understand the type of variables and what distributions they are following"

messages = [
    {"role":    "system",     
     "content": system_prompt_content + "\nGenerate a bulledpoint guideline"},
    {"role":    "user",       "content": stats_question},
  ]
#that task should be explicited in the system turn?
response_stats = llama31(messages)
print(response_stats)

Here are some guidelines to start a statistical analysis:

**I. Understanding the Data**

* **Explore the data**: Look at the distribution of the variables, check for missing values, and identify any outliers.
* **Understand the type of variables**: Identify the type of variables you are working with (e.g., continuous, categorical, ordinal).
* **Check for normality**: Use plots (e.g., Q-Q plot, histogram) and statistical tests (e.g., Shapiro-Wilk test) to check if the data follows a normal distribution.

**II. Descriptive Statistics**

* **Calculate summary statistics**: Calculate mean, median, mode, standard deviation, and variance for continuous variables.
* **Calculate frequency tables**: Calculate frequency tables for categorical variables.
* **Visualize the data**: Use plots (e.g., bar chart, box plot) to visualize the data.

**III. Understanding Variable Relationships**

* **Check for correlations**: Use correlation coefficients (e.g., Pearson's r, Spearman's rho) to check for re

Here are some guidelines to start a statistical analysis:

**I. Understanding the Data**

* **Explore the data**: Look at the distribution of the variables, check for missing values, and identify any outliers.
* **Understand the type of variables**: Identify the type of variables you are working with (e.g., continuous, categorical, ordinal).
* **Check for normality**: Use plots (e.g., Q-Q plot, histogram) and statistical tests (e.g., Shapiro-Wilk test) to check if the data follows a normal distribution.

**II. Descriptive Statistics**

* **Calculate summary statistics**: Calculate mean, median, mode, standard deviation, and variance for continuous variables.
* **Calculate frequency tables**: Calculate frequency tables for categorical variables.
* **Visualize the data**: Use plots (e.g., bar chart, box plot) to visualize the data.

**III. Understanding Variable Relationships**

* **Check for correlations**: Use correlation coefficients (e.g., Pearson's r, Spearman's rho) to check for relationships between variables.
* **Check for interactions**: Use interaction terms in regression models to check for interactions between variables.

**IV. Data Transformation**

* **Check for skewness**: Use plots (e.g., Q-Q plot, histogram) and statistical tests (e.g., Shapiro-Wilk test) to check for skewness.
* **Transform the data**: Use transformations (e.g., log transformation, square root transformation) to normalize the data.

**V. Choosing the Right Statistical Test**

* **Determine the research question**: Determine the research question and the type of analysis needed.
* **Choose the right statistical test**: Choose the right statistical test based on the type of data and the research question.

**VI. Interpreting Results**

* **Interpret the results**: Interpret the results of the statistical analysis in the context of the research question.
* **Check for assumptions**: Check for assumptions of the statistical test (e.g., normality, independence).

Here are some resources that provide more detailed guidelines:

* "Statistical Analysis" by UCLA Academic Technology Services
* "Data Analysis" by Coursera
* "Statistical Inference" by Stanford University
* "Data Science" by DataCamp

Note: These guidelines are general and may need to be adapted to specific research questions and data types.

Here is a bulleted list of key points to remember:

* Understand the type of variables and their distributions
* Check for normality and skewness
* Calculate summary statistics and frequency tables
* Visualize the data
* Check for correlations and interactions
* Choose the right statistical test
* Interpret the results in the context of the research question
* Check for assumptions of the statistical test

In [57]:
messages = [
    {"role": "system",     
     "content": system_prompt_content + "\nGenerate the text in Markdown format"},
    {"role": "user", "content": stats_question},
    {"role": "assistant", "content": response_stats},
    {"role": "user", "content": "Generate a complex example (not typical distributions) to apply this in. Also, create documentation for all the concepts used. Provide me either with a synthetic .csv file made by your own or a real-world dataset from the Internet" }
  ]
response = llama31(messages, 405)
print(response)


**Example:**

Suppose we are interested in understanding the relationship between the concentration of various pollutants in the air and the rate of respiratory diseases in a given population. We have collected data on the concentration of particulate matter (PM2.5), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO) in the air, as well as the rate of respiratory diseases (RD) in the population.

The data is collected from various monitoring stations across the city, and we want to understand how the concentration of these pollutants affects the rate of respiratory diseases.

**Dataset:**

I will generate a synthetic dataset for this example. The dataset will consist of 1000 observations, each representing a monitoring station. The variables in the dataset are:

* PM2.5: concentration of particulate matter (μg/m3)
* NO2: concentration of nitrogen dioxide (ppb)
* O3: concentration of ozone (ppb)
* CO: concentration of carbon monoxide (ppm)
* RD: rate of respiratory diseases (p

**Example:**

Suppose we are interested in understanding the relationship between the concentration of various pollutants in the air and the rate of respiratory diseases in a given population. We have collected data on the concentration of particulate matter (PM2.5), nitrogen dioxide (NO2), ozone (O3), and carbon monoxide (CO) in the air, as well as the rate of respiratory diseases (RD) in the population.

The data is collected from various monitoring stations across the city, and we want to understand how the concentration of these pollutants affects the rate of respiratory diseases.

**Dataset:**

I will generate a synthetic dataset for this example. The dataset will consist of 1000 observations, each representing a monitoring station. The variables in the dataset are:

* PM2.5: concentration of particulate matter (μg/m3)
* NO2: concentration of nitrogen dioxide (ppb)
* O3: concentration of ozone (ppb)
* CO: concentration of carbon monoxide (ppm)
* RD: rate of respiratory diseases (per 100,000 people)

The dataset will be generated using a combination of normal, lognormal, and Weibull distributions to simulate real-world data.

**Synthetic Dataset (.csv file):**

Here is a sample of the dataset:

| PM2.5 | NO2 | O3 | CO | RD |
| --- | --- | --- | --- | --- |
| 10.2 | 20.5 | 30.1 | 5.6 | 120.1 |
| 12.1 | 22.3 | 31.4 | 6.2 | 130.5 |
| 11.5 | 21.9 | 30.7 | 5.9 | 125.3 |
| ... | ... | ... | ... | ... |

**Documentation:**

Here is a brief explanation of the concepts used in this example:

* **Particulate Matter (PM2.5)**: PM2.5 refers to particles that are 2.5 micrometers or smaller in diameter. These particles can come from a variety of sources, including vehicle exhaust, industrial activities, and natural sources like dust and pollen.
* **Nitrogen Dioxide (NO2)**: NO2 is a gas that is produced by the burning of fossil fuels, such as gasoline and diesel fuel. It can also come from industrial processes and natural sources like lightning.
* **Ozone (O3)**: O3 is a gas that is produced when nitrogen oxides and volatile organic compounds react in the presence of sunlight. It can also come from industrial processes and natural sources like lightning.
* **Carbon Monoxide (CO)**: CO is a gas that is produced by the incomplete burning of fossil fuels, such as gasoline and diesel fuel. It can also come from industrial processes and natural sources like wildfires.
* **Rate of Respiratory Diseases (RD)**: RD refers to the number of cases of respiratory diseases, such as asthma and chronic obstructive pulmonary disease (COPD), per 100,000 people.

**Statistical Analysis:**

To understand the relationship between the concentration of pollutants and the rate of respiratory diseases, we can perform a multiple linear regression analysis. This will allow us to control for the effects of multiple pollutants on the rate of respiratory diseases.

We can also perform a principal component analysis (PCA) to reduce the dimensionality of the data and identify patterns in the relationships between the pollutants.

**Code:**

Here is some sample code in Python to perform the analysis:
```python
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression

# Load the dataset
df = pd.read_csv('pollutants.csv')

# Perform PCA to reduce dimensionality
pca = PCA(n_components=2)
pca_df = pca.fit_transform(df[['PM2.5', 'NO2', 'O3', 'CO']])

# Perform multiple linear regression
X = pca_df
y = df['RD']
model = LinearRegression()
model.fit(X, y)

# Print the coefficients
print(model.coef_)
```
This code performs PCA to reduce the dimensionality of the data and then performs multiple linear regression to understand the relationship between the concentration of pollutants and the rate of respiratory diseases.

Note: This is just a sample code and may need to be modified to suit the specific needs of the analysis.

**Real-World Dataset:**

If you prefer to use a real-world dataset, here are a few options:

* Air Quality Index (AQI) dataset from the United States Environmental Protection Agency (EPA)
* Air pollution dataset from the World Health Organization (WHO)
* Respiratory disease dataset from the Centers for Disease Control and Prevention (CDC)

Please note that these datasets may require additional processing and cleaning before they can be used for analysis.

In [53]:
#This is where I customize the tool calling capability to a real-world scenario of my interest
#WIP : extract the information from a single book (free copyright) = (https://hastie.su.domains/ISLP/ISLP_website.pdf.download.html)
#WIP: I want to custom this tool calling to look into the linkedIn posts of a very well-know statistician that provides valuables insights on statistical methods

# Lab 7: Llama Stack

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from dotenv import load_dotenv
import os
_ = load_dotenv() #loads 'TOGETHER_API_KEY'

In [3]:
#!pip install llama-stack==0.1.0 llama-stack-client==0.1.0

In [4]:
!llama stack build --list-templates

+------------------------------+-----------------------------------------------------------------------------+
[1m[97m| Template Name                | Description                                                                 |[0m
+------------------------------+-----------------------------------------------------------------------------+
| bedrock                      | Use AWS Bedrock for running LLM inference and safety                        |
+------------------------------+-----------------------------------------------------------------------------+
| cerebras                     | Use Cerebras for running LLM inference                                      |
+------------------------------+-----------------------------------------------------------------------------+
| experimental-post-training   | Experimental template for post training                                     |
+------------------------------+-----------------------------------------------------------

In [5]:
!llama stack list-apis

+-------------------+
[1m[97m| API               |[0m
+-------------------+
| inference         |
+-------------------+
| safety            |
+-------------------+
| agents            |
+-------------------+
| vector_io         |
+-------------------+
| datasetio         |
+-------------------+
| scoring           |
+-------------------+
| eval              |
+-------------------+
| post_training     |
+-------------------+
| tool_runtime      |
+-------------------+
| telemetry         |
+-------------------+
| models            |
+-------------------+
| shields           |
+-------------------+
| vector_dbs        |
+-------------------+
| datasets          |
+-------------------+
| scoring_functions |
+-------------------+
| eval_tasks        |
+-------------------+
| tool_groups       |
+-------------------+
| inspect           |
+-------------------+


In [6]:
!llama-stack-client configure --endpoint https://llama-stack.together.ai

Done! You can now use the Llama Stack Client CLI with endpoint https://llama-stack.together.ai


In [7]:
!llama-stack-client models list

┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃[1m [0m[1midentifier   [0m[1m [0m┃[1m [0m[1mprovider_id   [0m[1m [0m┃[1m [0m[1mprovider_res…[0m[1m [0m┃[1m [0m[1mmetadata      [0m[1m [0m┃[1m [0m[1mmodel_type[0m[1m [0m┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ meta-llama/L… │ together       │ meta-llama/M… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/M… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/M… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/L… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/L… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/L… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/L… │ {}             │ llm        │
│ meta-llama/L… │ together       │ meta-llama/M… │ {}       

In [8]:
import os
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(base_url=f"https://llama-stack.together.ai")

models = client.models.list()
print(models)

[Model(identifier='meta-llama/Llama-3.1-8B-Instruct', metadata={}, api_model_type='llm', provider_id='together', provider_resource_id='meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo', type='model', model_type='llm'), Model(identifier='meta-llama/Llama-3.1-70B-Instruct', metadata={}, api_model_type='llm', provider_id='together', provider_resource_id='meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo', type='model', model_type='llm'), Model(identifier='meta-llama/Llama-3.1-405B-Instruct-FP8', metadata={}, api_model_type='llm', provider_id='together', provider_resource_id='meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo', type='model', model_type='llm'), Model(identifier='meta-llama/Llama-3.2-3B-Instruct', metadata={}, api_model_type='llm', provider_id='together', provider_resource_id='meta-llama/Llama-3.2-3B-Instruct-Turbo', type='model', model_type='llm'), Model(identifier='meta-llama/Llama-3.2-11B-Vision-Instruct', metadata={}, api_model_type='llm', provider_id='together', provider_resource_id='

#  Llama Stack Inference

In [None]:
LLAMA_STACK_API_TOGETHER_URL="https://llama-stack.together.ai"
LLAMA31_8B_INSTRUCT = "meta-llama/Llama-3.1-8B-Instruct"

from llama_stack_client import LlamaStackClient
import json

def run_main():
    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )

    response = client.inference.chat_completion(
        model_id=LLAMA31_8B_INSTRUCT,
        messages=[
            {"role": "system", "content": "Who wrote the book Innovator's Dilemma? How about Charlotte's Web?"},
            {"role": "user", "content": "which book was published first?"}            
        ],
        x_llama_stack_provider_data=json.dumps({"together_api_key": os.getenv('TOGETHER_API_KEY')})
    )

    print(response.completion_message.content)
    
run_main()

# Llama Stack Agent

In [None]:
import os
from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig

In [None]:
async def run_main():
    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )    
    
    agent_config = AgentConfig(
        model=LLAMA31_8B_INSTRUCT,
        instructions="You are a helpful assistant",
        enable_session_persistence=False,
    )

    agent = Agent(client, agent_config)
    session_id = agent.create_session("test-session")

    prompts = [
        "Who wrote the book Charlotte's Web?",
        "Three best quotes?",
    ]

    for prompt in prompts:
        print(f"User> {prompt}")
        response = agent.create_turn(
            messages=[
                {
                    "role": "user",
                    "content": prompt,
                }
            ],
            session_id=session_id,
        )

        for log in EventLogger().log(response):
            log.print()
        
await run_main()

# Llama Stack with Llama 3.2 vision model

In [None]:
from PIL import Image
import matplotlib.pyplot as plt

def display_image(path):
  img = Image.open(path)
  plt.imshow(img)
  plt.axis('off')
  plt.show()

display_image("./content/Llama_Repo.jpeg")

In [None]:
LLAMA32_11B_INSTRUCT = "meta-llama/Llama-3.2-11B-Vision-Instruct"

import base64

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        base64_string = base64.b64encode(image_file.read()).decode("utf-8")
        base64_url = f"data:image/png;base64,{base64_string}"
        return base64_url

async def run_main(image_path, prompt):
    base64_image = encode_image(image_path)
    
    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )    
    
    agent_config = AgentConfig(
        model=LLAMA32_11B_INSTRUCT,
        instructions="You are a helpful assistant",
        enable_session_persistence=False,
    )

    agent = Agent(client, agent_config)
    session_id = agent.create_session("test-session")

    response = agent.create_turn(
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "image": {
                         "url": {
                              "uri": encode_image(image_path)
                         }
                    }
                },
                {
                    "type": "text",
                    "text": prompt,
                }
            ]
        }],
        session_id=session_id,
    )

    for log in EventLogger().log(response):
        log.print()

In [None]:
await run_main("./content/Llama_Repo.jpeg",
         "How many different colors are those llamas? What are those colors?")

In [None]:
import mimetypes
from termcolor import cprint
from llama_stack_client.lib.inference.event_logger import EventLogger

async def run_main(image_path: str, prompt):
    client = LlamaStackClient(
        base_url=LLAMA_STACK_API_TOGETHER_URL,
    )    

    message = {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": {
                     "url": {
                          "uri": encode_image(image_path)
                     }
                }
            },
            {
                "type": "text",
                "text": prompt,
            }
        ]       
    }

    cprint("User> Sending image for analysis...", "green")
    response = client.inference.chat_completion(
        messages=[message],
        model_id=LLAMA32_11B_INSTRUCT,
        stream=False,
    )

    print(response.completion_message.content.lower().strip())

In [None]:
await run_main("./content/Llama_Repo.jpeg",
     "How many different colors are those llamas?\
     What are those colors?")

The await keyword is used within an asynchronous function to pause its execution until a specified coroutine has completed. 

This essential concept in asynchronous programming allows Python to yield control back to the event loop, enabling other asynchronous tasks to run while the awaited coroutine is in progress. With await, we can effectively handle multiple I/O-bound (Input-Output) operations concurrently, leading to more responsive and efficient code.

A coroutine is a regular function with the ability to pause its execution when encountering an operation that may take a while to complete.
The async keyword creates a coroutine.