# **GenAI Agents Workshop!!!**
### **Agenda:**
1. What is an agent?
2. Notable tools/frameworks 
3. Basic agent setup
4. Quick survey of best models in as of now
5. Your turn - brainstorming / office hours session

## **1. What is an agent?**
Realistically - no well defined answer, but there's 3 things people mean when they say "agent":

1. **LLMs working independently**
    - Can reliably use interfaces like a web browser, API, etc.
    - Make decisions with little to no human guidance
    - Ex. LLM reviews and automatically organizes PRs on GitHub.
2. **LLMs working with other LLMs**
    - Why multiple agents? Either they exist via multiple parties (ex. Microsoft coding agent tries to fix bug with Nvidia coding agent for a Windows CUDA bug), or each LLM has a specific task and does it well
3. **LLMs improving based on feedback from environment**
    - Just as humans do, LLMs react to new information around them and adjust accordingly - similar to #1.
    - Ex. In the event of a system outage LLM redirects customers of a service to alternative offerings, notifies them of the outage, etc.


Further reading:
* [Google Agents Whitepaper](https://www.kaggle.com/whitepaper-agents)
* [Microsoft on Agents](https://www.microsoft.com/en-us/research/wp-content/uploads/2024/02/AgentAI_position.pdf)

## **2. Tools/frameworks to note**
There's been quite a bit of work done to make developing agentic systems easier:
* [Langchain](https://www.langchain.com/):
    * Create graphs with 1+ LLMs to work together
    * Comprehensive SDK (also kind of complicated, best for complex setups)
* [outlines (dottext-ai)](https://github.com/dottxt-ai/outlines)
    * Enables structured text generation (i.e. CSV, JSON schemas, custom grammar, etc.)
    * Helps feed LLM output into API/other tool
* [MCP - Model Context Protocol (Anthropic)](https://www.anthropic.com/news/model-context-protocol)
    * USB for LLMs - connect different APIs/tools with LLMs
    * Another way to think about it - app store for LLMs to extend their functionality
    * Try it in Claude Desktop!
    * One missing feature: no webhook support

In [None]:
# Install tools
# Windows (powershell or winget) - recommended over WSL to avoid having to setup GUI on WSL
# !powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
!winget install --id=astral-sh.uv  -e

# Mac/Linux
#!curl -LsSf https://astral.sh/uv/install.sh | less


In [None]:
# Env setup
!uv sync
!playwright install chromium

## API Keys
We'll be using Gemini for this demo, because it's free and has a decent API. Go to aistudio.google.com and get a free API key.

Then create a file `.env` and add the following line:
```bash

GOOGLE_API_KEY=<API KEY GOES HERE>

```

## **Browser Use Demo**
There are quite a few tools out there to turn LLMs into agents, but perhaps the single most useful one is letting models interact with web browsers. We'll show a very minimal demo to let models perform a search for housing given some constraints, and put a few matches into a CSV File.

Check out browser use [here](https://browser-use.com/).

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Controller
from dotenv import load_dotenv
from pydantic import BaseModel
from typing import List

load_dotenv()

class Entry(BaseModel):
    url: str
    monthly_rent: int
    addr: str
    apartment_name: str

class Entries(BaseModel):
    data: List[Entry]

async def main():
    agent = Agent(
        task="""
        I'm looking for housing in Seattle, WA. My budget is $1400 for a 1 br 1 bath unit, should have in-unit laundry. Find at least 2 units available for June 2025 move-in. Avoid zillow.com for searching. Be sure to output the apartment's url, monthly rent, address, and name
        """,
        llm=ChatGoogleGenerativeAI(model="gemini-2.0-flash"),
        controller=Controller(output_model=Entries)
    )
    history = await agent.run()
    res = history.final_result()
    print(res)

await main()

### **Your turn!**

You have 2 options:

1. Finish the demo: Your job is to design + implement a system that can autonomously perform the housing search for you - the demo finds leads as it is, but the website/listing may be out of date, may require filling out an application, etc. The system is up to you, but there should be a strong case that the user does not have to perform more work than laying out their housing needs.

Ideas:
* Email MCP to reach out to property managers to inquire for housing
* Find roommates on social networks (Discord, SMS, etc.) with your accounts to find someone to live with.
* Google Maps MCP API to find housing more reliably, find nearby places (e.g. you want to live close to your work office).
* Be creative, think of something yourself!!!

2. Come up with your own problem to automate with agents!! Some ideas:
* **Joining a lab at GT:** A lot of people have trouble finding an interesting lab to join that is willing to take them. Design a system that can find labs at GT, email the professor with a well-crafted message, and highlight your own relevant skills.
* **Proactive securities market researcher:** People miss out on good times to buy stocks (think gamestop, tarriff drop, etc.) - have a system ingest new info as it comes in and evaluate whether it's worth buying/selling/doing something else.


*Note: MTC will cover any API costs for the workshop :)*

### Available Models
* Gemini
    - 2.5 Pro, **currently ranks highest on most benchmarks**: Free version | $1.25 / 1 M tokens
    - 2.0 Flash: Free version | $0.10 / 1 M tokens
* Claude
    - 3.7 Sonnet, **strong coding/reasoning abilities**: $3 / 1 M tokens
* OpenAI
    - o3, most powerful reasoning model: $2 / 1 M tokens
    - 4o, standard model, good price/perf balance: $2.5 / 1 M tokens
