1. What is Tavily Search doing?
Tavily Search is like an AI-powered web search tool.

It takes a query you give (e.g., "Who won the last Wimbledon?") and searches the internet (especially trusted sources) to fetch fresh and relevant results.

It can control things like:

How many results you want (max_results=5)

Limit searches to specific domains (like only Wikipedia)

Control freshness (like past day, week, etc.)

It returns structured JSON data: a list of search results, usually including titles, summaries, URLs, and sometimes full content.

🔹 In short:

Tavily Search finds live information from the internet based on your query.

2. What is Tavily Extract doing?
Tavily Extract is different — it does not search the internet.

Instead, it takes specific URLs you give (e.g., Wikipedia page of Lionel Messi) and extracts important content from those pages.

It focuses on summarizing or extracting useful structured information from the webpage.

Options:

extract_depth="basic": simple summary

You can also extract images if you want (include_images=True)

🔹 In short:

Tavily Extract pulls information from existing web pages that you specify.

3. What is OpenAI (GPT) doing here?
You are using OpenAI's GPT models (like gpt-4o-mini) as the brain behind an "agent".

The agent can:

Decide: based on the user question, should it use Tavily Search or Extract?

Interpret the search or extraction results

Answer intelligently in natural language.

So, OpenAI here processes the results and turns them into good, human-like answers.

🔹 In short:

OpenAI (GPT) is the brain — it thinks, plans, reads the Tavily outputs, and responds smartly to the user.

Quick Analogy:

Task	Tool	Purpose
Find latest information from internet	Tavily Search	Do a real-time search
Extract content from known web pages	Tavily Extract	Scrape and summarize pages
Understand, plan, and reply to user	OpenAI GPT	Act like a smart agent

In [None]:
# Install required packages
%pip install -qU langchain langchain-tavily langgraph langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m61.4/62.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/1.2 MB[0m [31m8.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m22.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m16.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Import and setup keys
import os
import getpass
from google.colab import userdata
os.environ["TAVILY_API_KEY"]=userdata.get('TAVILY_API_KEY')
os.environ["OPENAI_API_KEY"]=userdata.get('OPENAI_API_KEY')

In [None]:
from langchain_tavily import TavilySearch, TavilyExtract
from langchain.chat_models import init_chat_model
from langgraph.prebuilt import create_react_agent

In [None]:
# Initialize Tavily Search tool
search_tool = TavilySearch(
    max_results=3,  # Get top 3 results
    topic="general",
)

# Initialize Tavily Extract tool
extract_tool = TavilyExtract(
    extract_depth="basic",
    include_images=False
)

# Initialize LLM
llm = init_chat_model("gpt-4o-mini", model_provider="openai")

In [None]:
def live_research_assistant(topic_query):
    print(f"🔍 Searching for '{topic_query}' on the internet...\n")

    # Step 1: Search Tavily
    search_results = search_tool.invoke({"query": topic_query})

    # The TavilySearch.invoke() method returns a dictionary
    # Access the results directly using the 'results' key
    results = search_results['results']

    # Collect URLs
    urls = [result['url'] for result in results]
    print(f"🌐 Found {len(urls)} URLs:")
    for i, url in enumerate(urls, 1):
        print(f"{i}. {url}")

    # Step 2: Extract details from top URLs
    print("\n📄 Extracting detailed information...\n")
    extracted_info = extract_tool.invoke({"urls": urls})
    # The TavilyExtract.invoke() method returns a dictionary
    extracted_data = extracted_info  # Access the data directly
    extracted_text = ""
    # Check if 'extracted_pages' key exists before accessing it
    if 'extracted_pages' in extracted_data:
        # Step 3: Summarize Extracted Content using LLM
        for page in extracted_data['extracted_pages']:
            extracted_text += f"\n\n### {page['title']}\n{page['text']}\n"

        # Final summary with LLM
        agent = create_react_agent(llm, [])
        user_input = f"Summarize the following extracted content in simple points:\n{extracted_text}"

        print("\n🧠 Generating final summary...\n")
        for step in agent.stream({"messages": user_input}, stream_mode="values"):
            step["messages"][-1].pretty_print()
    elif 'results' in extracted_data:
        if 'raw_content' in extracted_data['results'][0]:
          # Step 3: Summarize Extracted Content using LLM
          # Check if 'raw_content' is a list of dictionaries before iterating
          if isinstance(extracted_data['results'][0]['raw_content'], list) and all(isinstance(item, dict) for item in extracted_data['results'][0]['raw_content']):
            for page in extracted_data['results'][0]['raw_content']:
              # Access 'title' and 'text' only if they are keys in the dictionary
              title = page.get('title', '')  # Use .get() to avoid KeyError if 'title' is missing
              text = page.get('text', '')   # Use .get() to avoid KeyError if 'text' is missing
              extracted_text += f"\n\n### {title}\n{text}\n"
              # print(extracted_text)

              # Final summary with LLM
              agent = create_react_agent(llm, [])
              user_input = f"Summarize the following extracted content in simple points:\n{extracted_text}"

              print("\n🧠 Generating final summary...\n")
              for step in agent.stream({"messages": user_input}, stream_mode="values"):
                  step["messages"][-1].pretty_print()
          else:
              # Handle cases where 'raw_content' is not a list of dictionaries
              print(f"⚠️ Warning: 'raw_content' has an unexpected format: {extracted_data['results'][0]['raw_content']}")
    else:
        print(f"⚠️ Error: 'extracted_pages' not found in the extracted data. The data returned is: {extracted_data}")

In [None]:
live_research_assistant("What is latest discovery on Antarctica")

🔍 Searching for 'What is latest discovery on Antarctica' on the internet...

🌐 Found 3 URLs:
1. https://thedebrief.org/in-antarctica-an-iceberg-the-size-of-chicago-suddenly-broke-off-revealing-a-shocking-discovery/
2. https://www.cbsnews.com/news/antarctica-discovery-ice-shelf-ecosystem-research/
3. https://www.adn.com/nation-world/2025/03/20/a-huge-iceberg-broke-off-antarctica-what-scientists-found-under-it-startled-them/

📄 Extracting detailed information...

Researchers find thriving, never-before-seen ecosystem under Antarctic ice shelf: "This is unprecedented" - CBS News


Latest
U.S.
World
Politics
Trump Tariffs
Entertainment
HealthWatch
MoneyWatch
Crime
Space
Sports
Brand Studio


Local News
Baltimore
Bay Area
Boston
Chicago
Colorado
Detroit
Los Angeles
Miami
Minnesota
New York
Philadelphia
Pittsburgh
Sacramento
Texas


Live
CBS News 24/7
Baltimore
Bay Area
Boston
Chicago
Colorado
Detroit
Los Angeles
Miami
Minnesota
New York
Philadelphia
Pittsburgh
Sacramento
Texas
48 Hours
60 M