# Using an LLM Agent with Web Search to Classify Sports Teams from a CSV File

This notebook demonstrates how to use a LangChain-based LLM agent with a web search tool to classify sports teams listed in a CSV file. Each row in the CSV contains a team name. The agent will use real-time web search to determine the sport associated with each team and add a corresponding label.

## Step 1: Install Required Packages

We will use the LangChain framework, OpenAI for LLM access, and the Tavily web search integration. Pandas will be used to work with CSV files.

In [None]:
!pip install --quiet langchain_openai pandas tavily-python langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.4/63.4 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m438.4/438.4 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h

## Step 2: Set Up API Keys

You will need:
- An OpenAI API key
- A Tavily API key (sign up at https://app.tavily.com/)

These are required to access the LLM and the web search service.

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
os.environ["TAVILY_API_KEY"] = getpass.getpass("Enter your Tavily API key: ")

Enter your OpenAI API key: ··········
Enter your Tavily API key: ··········


## Step 3: Initialize the Agent with Web Search Tool

We will set up an LLM agent with the Tavily web search tool. The agent will decide when to invoke the search tool to classify each team.

In [None]:
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from langchain_community.tools.tavily_search import TavilySearchResults

# Initialize model and tools
llm = ChatOpenAI(temperature=0)
search_tool = TavilySearchResults()
agent = initialize_agent([search_tool], llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

  agent = initialize_agent([search_tool], llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)


## Step 4: Load the CSV File

We load the CSV file that contains the names of sports teams. Each row has one team name under a column named 'Team'.

In [None]:
import pandas as pd

# Path to your CSV file
csv_path = "sports_teams.csv"
df = pd.read_csv(csv_path)
df.head()

Unnamed: 0,Team,Country,City,Stadium Capacity
0,Los Angeles Lakers,USA,Los Angeles,19000
1,Manchester United,UK,Manchester,74000
2,Toronto Maple Leafs,Canada,Toronto,18600
3,Golden State Warriors,USA,San Francisco,19500
4,New England Patriots,USA,Boston,65878


## Step 5: Classify Each Team Using the Agent

The agent will search the web and respond with the sport that each team plays. We apply this to each team in the dataset.

In [None]:
# Define a function that sends a query to an LLM agent to classify the sport associated with a given team name.
# The agent uses web search and reasoning to identify the sport, and returns only the name of the most well-known one.
# If an error occurs during the query process, it returns an error message instead.
def classify_team_with_agent(team_name):
    try:
        query = f"""What sport does the team '{team_name}' play? only return
        the sport name, if there are multiple sports associated with the name,
        only return the most famous one"""
        return agent.run(query)
    except Exception as e:
        return f"error: {str(e)}"

# Apply the classification function to the first 3 rows of the 'Team' column in the DataFrame.
# The result is stored in a new column called 'Sport'. This is useful for testing before applying to the full dataset.
df["Sport"] = df[:3]["Team"].apply(classify_team_with_agent)

df.head()

  return agent.run(query)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI should use the search engine to find the sport associated with the team 'Los Angeles Lakers'.
Action: tavily_search_results_json
Action Input: Los Angeles Lakers sport[0m
Observation: [36;1m[1;3m[{'title': 'Los Angeles Lakers - Wikipedia', 'url': 'https://en.wikipedia.org/wiki/Los_Angeles_Lakers', 'content': "The Los Angeles Lakers are an American professional basketball team based in Los Angeles. The Lakers compete in the National Basketball Association (NBA) as a member of the Pacific Division of the Western Conference. The Lakers play their home games at Crypto.com Arena, an arena they share with the Los Angeles Sparks of the Women's National Basketball Association (WNBA) and the Los Angeles Kings of the National Hockey League (NHL).[10] The Lakers are one of the most successful teams in the [...] Los Angeles Memorial Sports Arena. [...] entertainment as well as sport.[16] Second, the Lakers drafted Magic Johnson firs

Unnamed: 0,Team,Country,City,Stadium Capacity,Sport
0,Los Angeles Lakers,USA,Los Angeles,19000,Basketball
1,Manchester United,UK,Manchester,74000,
2,Toronto Maple Leafs,Canada,Toronto,18600,
3,Golden State Warriors,USA,San Francisco,19500,
4,New England Patriots,USA,Boston,65878,


## Step 6: Save the Updated Table

The resulting DataFrame will be saved to a new CSV file with the added sport classification column.

In [None]:
output_path = "classified_teams_with_web_search.csv"
df.to_csv(output_path, index=False)
print(f"Saved results to {output_path}")

Saved results to classified_teams_with_web_search.csv
