In [None]:
%reload_ext autoreload
%autoreload 2

# Agent-Based Task Execution

This notebook demonstrates how to use the `Agent` pipeline from **[OnPrem.LLM](https://github.com/amaiya/onprem)** to create autonomous agents that can execute complex tasks using a variety of tools.

## Setup the LLM and Agent

First, we'll set up our LLM instance that will power our agent. We'll first use **GPT-4o-mini**, as cloud LLMs tend to handle agentic workflows faster (and sometimes better), but you can also try any number of tinier local models. (At the end of this notebook, we illustrate this using one of the examples.)

In [None]:
# | notest

from onprem import LLM
from onprem.pipelines import Agent
llm = LLM('openai/gpt-4o-mini', mute_stream=True) 
agent = Agent(llm)

## Giving the Agent Access to Tools

Next, we will give the agent access tools when executing a given task.  Examples of tool types include the ability to do the following:
1. perform a web search
2. visit a web page
3. search your documents stored within a vector store (e.g., agentic RAG)
4. accessing Python interpreter
5. accessing an MCP server
6. executing a custom function that you provide (i.e., implementing your own custom tools)

### Example: Using a Custom Tool with Web Search

In this example, we will give the agent access to web search and a function that returns today's date to find historical events for the current day.

In [None]:
# | notest

def today() -> str:
    """
    Gets the current date and time

    Returns:
        current date and time
    """
    from datetime import datetime
    return datetime.today().isoformat()

agent.add_function_tool(today)
agent.add_websearch_tool()
for tup in agent.tools.items():
    print(tup)

('today', <smolagents.tools.tool.<locals>.SimpleTool object>)
('websearch', <smolagents.default_tools.WebSearchTool object>)


In [None]:
# |  notest

answer = agent.run("Any famous events that happened on today's date?")

In [None]:
# |  notest

from IPython.display import display, Markdown
display(Markdown(answer))

Several historical events occurred on June 24th, including:
1. In 1374, cases of St. John's Dance mysteriously erupted in Aachen, Germany.
2. In 1497, John Cabot landed in Newfoundland, marking the first European landing at the site since the Vikings.
3. In 1340, King Edward III's fleet defeated the French fleet at the Battle of Sluys during the 100 Years War.
4. On June 24, 1973, an arson fire at the UpStairs Lounge in New Orleans' LGBT community resulted in 32 deaths.

### Example: Web Information Extraction

In the next example, we will use the Web View tool to extract information from a Web page.

In [None]:
# | notest

agent = Agent(llm)
agent.add_webview_tool()
answer = agent.run("What is the highest level of education of the person listed on this page: https://arun.maiya.net?")

In [None]:
# | notest

print(answer)

Ph.D. in Computer Science


### Example: Agentic RAG

You can also give the agent access to a vector store containing our documents.  In this example, we will ingest a document about Generative AI into a vector store and provide the store to the agent as a tool.

In [None]:
# | notest

from onprem.ingest.stores import DenseStore

store = DenseStore.create('/tmp/myvectordb')
store.ingest('tests/sample_data/docx_example/')

Creating new vectorstore at /tmp/myvectordb
Loading documents from tests/sample_data/docx_example/


Loading new documents: 100%|██████████████████████| 1/1 [00:04<00:00,  4.05s/it]
Processing and chunking 1 new documents: 100%|█████████████████████████████████████| 1/1 [00:00<00:00, 2184.53it/s]


Split into 17 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...


100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.67it/s]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods





So far, we have been using the default tool-calling agent, which relies on the LLM to generate precise JSON structures that specify tool names and arguments required to complete a task.  

By constrast, the code-agent generates and runs code to solve a task. We will use the code-agent in this final example.

In [None]:
# | notest

agent = Agent(llm, agent_type='code')
agent.add_vectorstore_tool(name='genai_search', 
                           description='Searches a database of information on generative AI.',
                           store=store)

answer = agent.run("Summarize some generative AI use cases in Markdown format. You may need to run at least three queries of the database.")

In [None]:
# |  notest

from IPython.display import display, Markdown
display(Markdown(answer))


# Generative AI Use Cases

Generative AI refers to a category of artificial intelligence technologies that can create content in response to user prompts. Here are some key applications of generative AI:

## 1. Content Creation
- **Text Generation**: Tools like ChatGPT generate coherent and relevant text based on prompts.
- **Image Generation**: Models such as DALL-E create unique images from text descriptions.
- **Music and Video Creation**: Generative models can compose music or create video clips based on user inputs.

## 2. Business Applications
- **Personalized Recommendations**: In online shopping, generative AI analyzes user behavior to recommend products tailored to individual preferences.
- **Chatbots and Digital Assistants**: Applications like Amazon Alexa and Google Assistant help users by providing information and performing tasks based on voice commands.

## 3. Health and Fitness
- **Health Monitoring**: Apps like Fitbit use generative AI to analyze data and provide health recommendations.
- **Telemedicine**: AI systems assist healthcare professionals by generating patient reports and insights based on input data.

## 4. Social Media
- **Content Moderation**: Generative AI helps filter out harmful or inappropriate content from social media feeds.
- **Algorithm-Driven Feeds**: AI systems personalize content delivery on platforms like Facebook and Instagram based on user interaction patterns.

These use cases illustrate the versatility of generative AI across various sectors, enhancing both productivity and user experience.


## Using Local Models in Agentic Workflows

Some local (on-premises) models are better-suited for agentic workflows than others.  For instance, `ollama/llama3.2` is a 3B-parameter model and not recommended. In this example, we will use `ollama/llama3.1:8b` and repeat one of the earlier examples from above. The example below assumes you have already [installed  Ollama](https://ollama.com/) and pulled the Llama3.1:8b model with `ollama pull llama3.1:8b`.

**Important Note:** If using a local model, it is important to set the context window to a higher value (e.g., 8192) to support agentic workflows.

- For the llama.cpp backend, supply `n_ctx=8192` as a parameter to `LLM`.
- For the Ollama backend, supply `num_ctx=8192` as a parameter to `LLM`.
- For the Hugging Face transformers backend (e.g., `LLM(model_id='hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', device='cuda'`), context window is set automatically, so no extra parameters are necessary.

In [None]:
# | notest

from onprem import LLM
from onprem.pipelines import Agent
llm = LLM("ollama/llama3.1:8b", mute_stream=True, num_ctx=8182) 

agent = Agent(llm)
agent.add_webview_tool()
answer = agent.run("What is the highest level of education of the person listed on this page: https://arun.maiya.net?")

In [None]:
# | notest
print(answer)

Ph.D.
