## Implement solution

### Subtask:
Write code based on the instructions.
1. Handles conversations: Develop an AI chatbot that can interpret and answer general  queries while distinguishing them from news requests

##**Subtask**:
Install Requirements

In [82]:
!pip install -r "/content/requirements.txt"



##**Subtask**:
Choose tokenizer
###NLTK
To fully set up the NLTK environment for text processing, I need to download the 'punkt' and 'punkt_tab' tokenizer data. This ensures the environment is ready for subsequent NLP tasks.

##OR

###spaCy
Alternate Tokenizer

In [96]:
import nltk
from nltk.tokenize import word_tokenize
import time # Import time for delays
##import spacy? alt tokenizer
nltk.download('punkt')
nltk.download('punkt_tab')
print("'punkt' tokenizer data downloaded successfully.")

'punkt' tokenizer data downloaded successfully.


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


##Subtask:
creat a function to tokenize the user input


In [84]:
def preprocess_text(text):
    """
    Tokenizes and normalizes the input text.
    Converts to lowercase and removes punctuation (for simplicity).
    """
    tokens = word_tokenize(text.lower())
    # Remove non-alphabetic tokens
    normalized_tokens = [word for word in tokens if word.isalpha()]
    return normalized_tokens

print("Helper function 'preprocess_text' defined successfully.")

Helper function 'preprocess_text' defined successfully.


###Test:
preprocess_text

In [98]:
# Test cases for preprocess_text function

print("--- Testing preprocess_text function ---")

# Test 1: Standard sentence with punctuation and mixed case
input_text_1 = "Hello, World! This is a Test sentence."
processed_1 = preprocess_text(input_text_1)
print(f"Input: '{input_text_1}'")
print(f"Processed: {processed_1}\n")

# Test 2: Sentence with numbers and special characters
input_text_2 = "Buy 10 apples for $2.50!"
processed_2 = preprocess_text(input_text_2)
print(f"Input: '{input_text_2}'")
print(f"Processed: {processed_2}\n")

# Test 3: Empty string
input_text_3 = ""
processed_3 = preprocess_text(input_text_3)
print(f"Input: '{input_text_3}'")
print(f"Processed: {processed_3}\n")

# Test 4: String with only punctuation and numbers
input_text_4 = "!@#$%^&*()12345"
processed_4 = preprocess_text(input_text_4)
print(f"Input: '{input_text_4}'")
print(f"Processed: {processed_4}\n")

# Test 5: Sentence with contractions (should be split and non-alphabetic removed)
input_text_5 = "I don't know, can't you tell?"
processed_5 = preprocess_text(input_text_5)
print(f"Input: '{input_text_5}'")
print(f"Processed: {processed_5}\n")

# Test 6: Text with leading/trailing spaces
input_text_6 = "  Some text with spaces  "
processed_6 = preprocess_text(input_text_6)
print(f"Input: '{input_text_6}'")
print(f"Processed: {processed_6}\n")

--- Testing preprocess_text function ---
Input: 'Hello, World! This is a Test sentence.'
Processed: ['hello', 'world', 'this', 'is', 'a', 'test', 'sentence']

Input: 'Buy 10 apples for $2.50!'
Processed: ['buy', 'apples', 'for']

Input: ''
Processed: []

Input: '!@#$%^&*()12345'
Processed: []

Input: 'I don't know, can't you tell?'
Processed: ['i', 'do', 'know', 'ca', 'you', 'tell']

Input: '  Some text with spaces  '
Processed: ['some', 'text', 'with', 'spaces']



## Subtask:
Define a function that categorizes queries into news or general

In [85]:
def recognize_intent(user_input):
    """
    Analyzes user input to classify the intent as either 'general_query' or 'news_request'.
    """
    processed_input = preprocess_text(user_input)
    lower_user_input = user_input.lower()

    # Define single-word news keywords
    news_single_keywords = ["news", "headlines"]

    # Define multi-word news phrases based on the Chatbot Scope and Requirements Definition
    # and observations from previous testing
    news_phrases_to_match = [
        "latest updates", "current events", "breaking news", "today's news",
        "recent developments", "what's happening", "tell me the news",
        "give me today's headlines", "what are the latest stories",
        "what is happening in the world"
    ]

    # Check for single-word keyword matches in processed tokens
    for token in processed_input:
        if token in news_single_keywords:
            return "news_request"

    # Check for multi-word phrase matches in the original lowercased input
    for phrase in news_phrases_to_match:
        if phrase in lower_user_input:
            return "news_request"

    return "general_query"

print("Intent recognition function 'recognize_intent' defined successfully.")

Intent recognition function 'recognize_intent' defined successfully.


###Test:
recognize_intent

In [99]:
# Test cases for recognize_intent function

print("--- Testing recognize_intent function ---")

# Test 1: Clear general query
input_ri_1 = "What is the capital of Japan?"
intent_ri_1 = recognize_intent(input_ri_1)
print(f"Input: '{input_ri_1}' -> Intent: {intent_ri_1}")

# Test 2: Clear news request (single keyword)
input_ri_2 = "Tell me the news."
intent_ri_2 = recognize_intent(input_ri_2)
print(f"Input: '{input_ri_2}' -> Intent: {intent_ri_2}")

# Test 3: Clear news request (phrase)
input_ri_3 = "What are the latest updates on space exploration?"
intent_ri_3 = recognize_intent(input_ri_3)
print(f"Input: '{input_ri_3}' -> Intent: {intent_ri_3}")

# Test 4: General query with some news-like words but not a news request
input_ri_4 = "What's the meaning of current?"
intent_ri_4 = recognize_intent(input_ri_4)
print(f"Input: '{input_ri_4}' -> Intent: {intent_ri_4}")

# Test 5: News request with a keyword in the middle
input_ri_5 = "I want to know about recent developments in AI."
intent_ri_5 = recognize_intent(input_ri_5)
print(f"Input: '{input_ri_5}' -> Intent: {intent_ri_5}")

# Test 6: Mixed query, should lean towards news if news phrases are strong
input_ri_6 = "Can you give me today's headlines and also tell me about dogs?"
intent_ri_6 = recognize_intent(input_ri_6)
print(f"Input: '{input_ri_6}' -> Intent: {intent_ri_6}")

# Test 7: Query with only a single news keyword
input_ri_7 = "headlines"
intent_ri_7 = recognize_intent(input_ri_7)
print(f"Input: '{input_ri_7}' -> Intent: {intent_ri_7}")

# Test 8: General knowledge question
input_ri_8 = "Who painted the Mona Lisa?"
intent_ri_8 = recognize_intent(input_ri_8)
print(f"Input: '{input_ri_8}' -> Intent: {intent_ri_8}")

# Test 9: More complex news request phrase
input_ri_9 = "What is happening in the world right now?"
intent_ri_9 = recognize_intent(input_ri_9)
print(f"Input: '{input_ri_9}' -> Intent: {intent_ri_9}")

# Test 10: General query that might contain common words found in news but isn't news
input_ri_10 = "Tell me about historical events."
intent_ri_10 = recognize_intent(input_ri_10)
print(f"Input: '{input_ri_10}' -> Intent: {intent_ri_10}")

--- Testing recognize_intent function ---
Input: 'What is the capital of Japan?' -> Intent: general_query
Input: 'Tell me the news.' -> Intent: news_request
Input: 'What are the latest updates on space exploration?' -> Intent: news_request
Input: 'What's the meaning of current?' -> Intent: general_query
Input: 'I want to know about recent developments in AI.' -> Intent: news_request
Input: 'Can you give me today's headlines and also tell me about dogs?' -> Intent: news_request
Input: 'headlines' -> Intent: news_request
Input: 'Who painted the Mona Lisa?' -> Intent: general_query
Input: 'What is happening in the world right now?' -> Intent: news_request
Input: 'Tell me about historical events.' -> Intent: general_query


## Subtask:
select news search and general search apis to use and set them up


In [86]:
import requests
import os
import time
from google.colab import userdata
from tavily import TavilyClient

# Get secrets from Colab userdata
NEWS_API_KEY = userdata.get('GNEWS_API_KEY')
print(NEWS_API_KEY)
TAVILY_API_KEY = userdata.get('TAVILY_API_KEY')

# For NewsAPI: NEWS_API_BASE_URL = "https://newsapi.org/v2/everything"
# For GNews API: NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"
NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"

# Initialize the Tavily client with the API key
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

effcca425122f0945d1d453b3dec4bd2


##Subtask:
define fetch news function to retrieve the news related to a query


In [87]:

def fetch_news(query):
    """
    Fetches news articles from a news API based on the query.
    """
    if not NEWS_API_KEY:
        return "News API key is not set. Please provide a valid API key."

    if not NEWS_API_BASE_URL:
        return "News API base URL is not set. Please provide a valid URL."

    try:
        params = {
            'q': query,
            'lang': 'en',
            'country': 'us', # Or other relevant country codes
            'max': 5, # Number of articles to fetch
            'apikey': NEWS_API_KEY
        }
        response = requests.get(NEWS_API_BASE_URL, params=params)
        response.raise_for_status() # Raise an exception for HTTP errors
        data = response.json()

        articles = data.get('articles', [])
        if not articles:
            return "Sorry, I couldn't find any news for that query."

        news_output = []
        for i, article in enumerate(articles):
            title = article.get('title', 'No Title')
            description = article.get('description', 'No Description')
            url = article.get('url', '#')
            news_output.append(f"{i+1}. {title}\n   {description}\n   Read more: {url}\n")

        return "\n".join(news_output)

    except requests.exceptions.RequestException as e:
        return f"Error fetching news: {e}"
    except ValueError as e:
        return f"Error parsing API response: {e}"

print("News fetching function 'fetch_news' defined successfully.")

News fetching function 'fetch_news' defined successfully.


### Tests `fetch_news` function

tests of function that fetches the related news question

In [103]:
# Test cases for fetch_news function

print("--- Testing fetch_news function ---")

# Test 1: Valid news query
news_query_1 = "technology news"
print(f"\nNews Query: '{news_query_1}'")
news_response_1 = fetch_news(news_query_1)
print(f"News Response:\n{news_response_1}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 2: Another valid news query
news_query_2 = "climate change updates"
print(f"\nNews Query: '{news_query_2}'")
news_response_2 = fetch_news(news_query_2)
print(f"News Response:\n{news_response_2}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 3: Query for which no news might be found
news_query_3 = "nonexistent topic news 12345"
print(f"\nNews Query: '{news_query_3}'")
news_response_3 = fetch_news(news_query_3)
print(f"News Response:\n{news_response_3}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 4: Query with a slightly different phrasing
news_query_4 = "latest economic news"
print(f"\nNews Query: '{news_query_4}'")
news_response_4 = fetch_news(news_query_4)
print(f"News Response:\n{news_response_4}")

--- Testing fetch_news function ---

News Query: 'technology news'
News Response:
1. Google to invest $40 billion in new data centers in Texas, Bloomberg News reports
   Alphabet's Google plans to invest $40 billion in three new data centers in Texas, Bloomberg News reported on Friday, as the technology giant accelerates efforts to expand capacity to support its artificial intelligence ambitions.
   Read more: https://www.reuters.com/business/google-invest-40-billion-new-data-centers-texas-bloomberg-news-reports-2025-11-14/

2. Fox News AI Newsletter: Russian robot faceplants in humiliating debut
   Stay informed on the latest AI technology advancements, industry breakthroughs, and the challenges shaping our future — including highlights like Russia’s robot’s humiliating debut and more.
   Read more: https://www.foxnews.com/tech/ai-newsletter-russian-robot-faceplants-humiliating-debut

3. Top Tech News Today, November 14, 2025
   Top Tech News Stories Today — Your Quick Briefing on the

##Subtask:
define handle news request function to prep query and pass it to news request function fetch_news

In [89]:
import re

def handle_news_request(user_input):
    """
    Handles a news-specific request by extracting keywords from the user input
    and fetching relevant news articles.
    """
    # Define phrases that indicate a news request but should be removed from the query
    news_request_phrases = [
        "tell me the news about",
        "give me today's headlines about",
        "what are the latest stories on",
        "what are the latest updates on",
        "latest news on",
        "news about",
        "headlines about",
        "current events about",
        "breaking news about",
        "today's news on",
        "recent developments on",
        "what's happening with",
        "tell me the news",
        "give me today's headlines",
        "what are the latest stories",
        "what is happening in the world",
        "news",
        "headlines",
        "latest updates",
        "current events",
        "breaking news",
        "today's news",
        "recent developments",
        "what's happening"
    ]

    query = user_input.lower()

    # Remove news request phrases from the query
    for phrase in news_request_phrases:
        if phrase in query:
            query = query.replace(phrase, "").strip()

    # Further clean the query by removing punctuation (except spaces) and extra spaces
    query = re.sub(r'[^a-zA-Z0-9\s]', '', query)
    query = re.sub(r'\s+', ' ', query).strip()

    # If the query becomes empty after removing phrases and cleaning, set a default
    if not query:
        query = "top stories"

    return fetch_news(query)

print("News request handler function 'handle_news_request' defined successfully.")

News request handler function 'handle_news_request' defined successfully.


###Test
handle_news_request

In [97]:
# Test cases for handle_news_request function

print("\n--- Testing handle_news_request function ---")

# Test 1: Clear news query
user_input_hr_1 = "latest news on science"
print(f"\nInput: '{user_input_hr_1}'")
response_hr_1 = handle_news_request(user_input_hr_1)
print(f"Response:\n{response_hr_1}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 2: News query with a less specific news phrase
user_input_hr_2 = "what's happening in politics?"
print(f"\nInput: '{user_input_hr_2}'")
response_hr_2 = handle_news_request(user_input_hr_2)
print(f"Response:\n{response_hr_2}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 3: Query that should default to 'top stories'
user_input_hr_3 = "news"
print(f"\nInput: '{user_input_hr_3}'")
response_hr_3 = handle_news_request(user_input_hr_3)
print(f"Response:\n{response_hr_3}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 4: Query with redundant news phrases and punctuation
user_input_hr_4 = "Tell me the breaking news about the global economy!"
print(f"\nInput: '{user_input_hr_4}'")
response_hr_4 = handle_news_request(user_input_hr_4)
print(f"Response:\n{response_hr_4}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 5: Query with only news phrases, expecting default query
user_input_hr_5 = "today's headlines"
print(f"\nInput: '{user_input_hr_5}'")
response_hr_5 = handle_news_request(user_input_hr_5)
print(f"Response:\n{response_hr_5}")


--- Testing handle_news_request function ---

Input: 'latest news on science'
Response:
1. Dr. Oz to Newsmax: MAHA Movement 'Radically Transparent'
   On President Donald Trump's 300th day in the White House, Dr. Mehmet Oz told Newsmax the MAHA movement's commitment to radical transparency is restoring trust in science.
   Read more: https://www.newsmax.com/newsmax-tv/mehmet-oz-newsmax-maha/2025/11/14/id/1234659/

2. Museum of Science and Industry workers set strike date as contract talks continue
   The strike date comes ahead of a scheduled bargaining session on Nov. 17, the final meeting before the deadline.
   Read more: https://chicago.suntimes.com/news/2025/11/14/museum-science-industry-workers-strike-date-contract-talks-labor

3. Brick Planet Brings LEGO Wonders to DMNS
   The massive LEGO art installation by Sean Kenney will be on display at the Denver Museum of Nature & Science through May 3.
   Read more: https://www.westword.com/arts-culture/brick-planet-brings-lego-wonders

## Subtask:
create a similar function for general searches


In [91]:
def fetch_web_results(query):
    """
    Fetches web search results using the Tavily API.
    """
    if not TAVILY_API_KEY or TAVILY_API_KEY == "YOUR_TAVILY_API_KEY_HERE":
        return "Tavily API key is not set. Please provide a valid API key."

    try:
        response = tavily_client.search(query=query, search_depth="advanced", max_results=5)

        results = response.get('results', [])
        if not results:
            return "Sorry, I couldn't find any relevant web results for that query using Tavily."

        # Synthesize a main answer from the first result's content or a concise summary
        # Tavily's response often includes a 'summary' field for the whole search
        # Or we can pick the content from the first result

        # Attempt to get a synthesized answer from the overall response if available
        synthesized_answer = response.get('answer', 'No summary available.')

        web_results_output = []
        if synthesized_answer and synthesized_answer != 'No summary available.':
            web_results_output.append(f"Here's what I found: {synthesized_answer}")
        else:
            # Fallback to the first result's content if no overall summary is provided
            first_result_content = results[0].get('content', 'No content available.')
            if first_result_content and len(first_result_content) > 100: # Take a snippet
                web_results_output.append(f"Here's a snippet from a top result: {first_result_content[:200]}...")
            elif first_result_content:
                web_results_output.append(f"Here's what I found: {first_result_content}")
            else:
                web_results_output.append("Here are some top results:")

        web_results_output.append("Top results:")

        for i, item in enumerate(results):
            title = item.get('title', 'No Title')
            url = item.get('url', '#')
            # Tavily 'content' can be long, so we take a snippet for description
            content_snippet = item.get('content', 'No description available.')
            if len(content_snippet) > 150:
                content_snippet = content_snippet[:150] + '...'

            web_results_output.append(f"{i+1}. {title}\n   {content_snippet}\n   Read more: {url}\n")

        return "\n".join(web_results_output)

    except Exception as e:
        return f"Error fetching web results from Tavily: {e}"

print("Web search function 'fetch_web_results' defined successfully.")

Web search function 'fetch_web_results' defined successfully.


###Test:
fetch_web_results

In [100]:
# Test cases for fetch_web_results function

print("--- Testing fetch_web_results function ---")

# Test 1: Standard factual query
query_fwr_1 = "What is the capital of Canada?"
print(f"\nQuery: '{query_fwr_1}'")
response_fwr_1 = fetch_web_results(query_fwr_1)
print(f"Response:\n{response_fwr_1}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 2: Query for a specific event or topic
query_fwr_2 = "History of artificial intelligence"
print(f"\nQuery: '{query_fwr_2}'")
response_fwr_2 = fetch_web_results(query_fwr_2)
print(f"Response:\n{response_fwr_2}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 3: Query with no expected results or very niche
query_fwr_3 = "Nonexistent mythical creature from obscure folklore"
print(f"\nQuery: '{query_fwr_3}'")
response_fwr_3 = fetch_web_results(query_fwr_3)
print(f"Response:\n{response_fwr_3}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 4: A more complex question that requires synthesis
query_fwr_4 = "Explain quantum computing in simple terms."
print(f"\nQuery: '{query_fwr_4}'")
response_fwr_4 = fetch_web_results(query_fwr_4)
print(f"Response:\n{response_fwr_4}")
time.sleep(2) # Add delay to avoid rate limiting

# Test 5: Query that should trigger the snippet fallback if no overall answer
query_fwr_5 = "Latest discoveries about dark matter"
print(f"\nQuery: '{query_fwr_5}'")
response_fwr_5 = fetch_web_results(query_fwr_5)
print(f"Response:\n{response_fwr_5}")

--- Testing fetch_web_results function ---

Query: 'What is the capital of Canada?'
Response:
Here's a snippet from a top result: Ottawa is the capital city of Canada. It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the Rideau River. Ottawa borders Gatineau, Quebec, and...
Top results:
1. Ottawa - Wikipedia
   Ottawa is the capital city of Canada. It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the ...
   Read more: https://en.wikipedia.org/wiki/Ottawa

2. The Capital Cities - Legislative Assembly of Ontario
   Canada’s capital is also the second-largest city in Ontario with a regional population of close to 1.5 million people. Queen Victoria chose Ottawa as ...
   Read more: https://www.ola.org/en/visit-learn/teach-learn-play/about-ontario/capital-cities

3. Everything to Know about the Capital Cities of Canada
   Ottawa is Canada's capital, named from an Algonquin w

## Subtask:
Define the function to create some canned answers and when there is no canned answer to search online using the fetch search results function


In [92]:
def handle_general_query(user_input):
    """
    Processes a general query and provides a response based on a simple rule-based system,
    with a fallback to web search for unknown queries.
    """
    lower_input = user_input.lower()

    # Simple keyword-based rule system
    if "what is ai" in lower_input or "define ai" in lower_input:
        return "AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines, especially computer systems."
    elif "capital of france" in lower_input:
        return "The capital of France is Paris."
    elif "who invented the light bulb" in lower_input:
        return "Thomas Edison is widely credited with inventing the practical incandescent light bulb."
    elif "how to tie a shoelace" in lower_input:
        return "To tie a shoelace, make two 'bunny ears' with the laces, cross them over, and then tuck one under the other before pulling tight."
    elif "your name" in lower_input or "who are you" in lower_input:
        return "I am an AI assistant designed to help with your queries."
    elif "hello" in lower_input or "hi" in lower_input:
        return "Hello! How can I assist you today?"
    elif "how are you" in lower_input:
        return "I am an AI, so I don't have feelings, but I'm ready to help you!"
    else:
        # Fallback to web search if no rule-based answer is found
        print(f"No direct answer found for '{user_input}'. Attempting web search...")
        web_search_result = fetch_web_results(user_input)
        if web_search_result:
            return web_search_result
        else:
            return "I'm not sure how to answer that general query. Could you please rephrase it or ask something else?"

print("Function 'handle_general_query' updated successfully with web search fallback.")

Function 'handle_general_query' updated successfully with web search fallback.


###Test:
handle_general_query

In [101]:
# Test cases for handle_general_query function

print("--- Testing handle_general_query function ---")

# Test 1: Query matching a direct rule (AI definition)
input_gq_1 = "What is AI?"
print(f"\nInput: '{input_gq_1}'")
response_gq_1 = handle_general_query(input_gq_1)
print(f"Response: {response_gq_1}")
time.sleep(2)

# Test 2: Query matching another direct rule (Capital of France)
input_gq_2 = "Capital of France"
print(f"\nInput: '{input_gq_2}'")
response_gq_2 = handle_general_query(input_gq_2)
print(f"Response: {response_gq_2}")
time.sleep(2)

# Test 3: Query matching a rule (Who invented the light bulb)
input_gq_3 = "Who invented the light bulb?"
print(f"\nInput: '{input_gq_3}'")
response_gq_3 = handle_general_query(input_gq_3)
print(f"Response: {response_gq_3}")
time.sleep(2)

# Test 4: Query triggering web search fallback (tallest mountain)
input_gq_4 = "What is the tallest mountain in the world?"
print(f"\nInput: '{input_gq_4}'")
response_gq_4 = handle_general_query(input_gq_4)
print(f"Response:\n{response_gq_4}")
time.sleep(2)

# Test 5: Query triggering web search fallback (more complex)
input_gq_5 = "Explain the theory of relativity."
print(f"\nInput: '{input_gq_5}'")
response_gq_5 = handle_general_query(input_gq_5)
print(f"Response:\n{response_gq_5}")
time.sleep(2)

# Test 6: Unknown query that might not yield good web results
input_gq_6 = "What color is happiness?"
print(f"\nInput: '{input_gq_6}'")
response_gq_6 = handle_general_query(input_gq_6)
print(f"Response:\n{response_gq_6}")
time.sleep(2)

# Test 7: Friendly greeting matching a direct rule
input_gq_7 = "Hello"
print(f"\nInput: '{input_gq_7}'")
response_gq_7 = handle_general_query(input_gq_7)
print(f"Response: {response_gq_7}")


--- Testing handle_general_query function ---

Input: 'What is AI?'
Response: AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines, especially computer systems.

Input: 'Capital of France'
Response: The capital of France is Paris.

Input: 'Who invented the light bulb?'
Response: Thomas Edison is widely credited with inventing the practical incandescent light bulb.

Input: 'What is the tallest mountain in the world?'
No direct answer found for 'What is the tallest mountain in the world?'. Attempting web search...
Response:
Here's a snippet from a top result: Mauna Kea, a shield volcano in Hawaii, is technically the world's tallest mountain from base to peak. However, its underwater base is far below sea level, so it doesn’t reach the elevation that Mount ...
Top results:
1. What is the tallest mountain in the world? Why it's up ... - USA Today
   Mauna Kea, a shield volcano in Hawaii, is technically the world's tallest mountain from 

##Subtask:
Define chatbot response function to create a hierarchy of intent to search type for responce



In [93]:
def chatbot_response(user_input):
    """
    Main function to process user input, determine intent, and provide a response.
    """
    intent = recognize_intent(user_input)

    if intent == "general_query":
        response = handle_general_query(user_input)
    elif intent == "news_request":
        response = handle_news_request(user_input)
        # Add a small delay after a news request to avoid hitting API rate limits
        time.sleep(2) # Sleep for 2 seconds
    else:
        response = "I'm sorry, I don't understand that request. Can you please rephrase it?"

    return response

print("Main chatbot response function 'chatbot_response' defined successfully.")

Main chatbot response function 'chatbot_response' defined successfully.


###Test
chatbot_responce

In [102]:
# Test with a general query
user_input_general = "What is the capital of France?"
print(f"User: {user_input_general}")
response_general = chatbot_response(user_input_general)
print(f"Chatbot: {response_general}\n")

# Test with another general query (that might trigger web search fallback)
user_input_general_web = "What is the tallest mountain in Africa?"
print(f"User: {user_input_general_web}")
response_general_web = chatbot_response(user_input_general_web)
print(f"Chatbot: {response_general_web}\n")

# Test with a news request
user_input_news = "Tell me the news about technology."
print(f"User: {user_input_news}")
response_news = chatbot_response(user_input_news)
print(f"Chatbot: {response_news}\n")

# Test with another news request
user_input_news_2 = "What are the latest updates on space exploration?"
print(f"User: {user_input_news_2}")
response_news_2 = chatbot_response(user_input_news_2)
print(f"Chatbot: {response_news_2}\n")

# Add more test cases for news requests
user_input_news_3 = "news on current events"
print(f"User: {user_input_news_3}")
response_news_3 = chatbot_response(user_input_news_3)
print(f"Chatbot: {response_news_3}\n")

user_input_news_4 = "breaking news in politics"
print(f"User: {user_input_news_4}")
response_news_4 = chatbot_response(user_input_news_4)
print(f"Chatbot: {response_news_4}\n")

user_input_news_5 = "what's happening in the economy?"
print(f"User: {user_input_news_5}")
response_news_5 = chatbot_response(user_input_news_5)
print(f"Chatbot: {response_news_5}\n")

user_input_news_6 = "today's headlines"
print(f"User: {user_input_news_6}")
response_news_6 = chatbot_response(user_input_news_6)
print(f"Chatbot: {response_news_6}\n")

User: What is the capital of France?
Chatbot: The capital of France is Paris.

User: What is the tallest mountain in Africa?
No direct answer found for 'What is the tallest mountain in Africa?'. Attempting web search...
Chatbot: Here's a snippet from a top result: The highest African mountain is Kilimanjaro, which has three peaks, named Kibo, Mawenzi and Shira, of which Kibo is the tallest and Mawenzi is sufficiently prominent to be a mountain. Mount Kenya is t...
Top results:
1. List of highest major summits of Africa - Wikipedia
   The highest African mountain is Kilimanjaro, which has three peaks, named Kibo, Mawenzi and Shira, of which Kibo is the tallest and Mawenzi is suffici...
   Read more: https://en.wikipedia.org/wiki/List_of_highest_major_summits_of_Africa

2. Kilimanjaro - National Geographic Education
   Located in Tanzania, Mount Kilimanjaro is the African continent's highest peak at 5,895 meters (19,340 feet). The majestic mountain is a snow-capped v...
   Read more: htt

##Impliment workflow

# Task
Refactor the existing chatbot's intent handling and response generation using `langgraph`, including defining a graph state, creating nodes for `recognize_intent`, `handle_general_query`, and `handle_news_request`, implementing conditional routing, building and compiling the graph, and integrating it into the `chatbot_response` function, while preserving the existing functionality and the `time.sleep` for news requests, then re-running the chatbot tests.

## Install LangGraph and define graph state

### Subtask:
Install the `langgraph` library and define the `GraphState` using TypedDict to manage the state passed between nodes.


**Reasoning**:
The subtask requires installing the `langgraph` library and defining a `GraphState` using `TypedDict`. This code block will perform both actions.



In [104]:
from typing import TypedDict

class GraphState(TypedDict):
    """
    Represents the state of our graph. Used to pass information between nodes.
    """
    user_input: str
    intent: str
    response: str

print("LangGraph installed and GraphState defined successfully.")

Collecting langgraph
  Downloading langgraph-1.0.3-py3-none-any.whl.metadata (7.8 kB)
Collecting langgraph-checkpoint<4.0.0,>=2.1.0 (from langgraph)
  Downloading langgraph_checkpoint-3.0.1-py3-none-any.whl.metadata (4.7 kB)
Collecting langgraph-prebuilt<1.1.0,>=1.0.2 (from langgraph)
  Downloading langgraph_prebuilt-1.0.4-py3-none-any.whl.metadata (5.2 kB)
Collecting langgraph-sdk<0.3.0,>=0.2.2 (from langgraph)
  Downloading langgraph_sdk-0.2.9-py3-none-any.whl.metadata (1.5 kB)
Collecting ormsgpack>=1.12.0 (from langgraph-checkpoint<4.0.0,>=2.1.0->langgraph)
  Downloading ormsgpack-1.12.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.2 kB)
Collecting langchain-core>=0.1 (from langgraph)
  Downloading langchain_core-1.0.5-py3-none-any.whl.metadata (3.6 kB)
Downloading langgraph-1.0.3-py3-none-any.whl (156 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.8/156.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langgraph_check

**Reasoning**:
The next step is to define the nodes for the LangGraph workflow, which will encapsulate the existing functions (`recognize_intent`, `handle_general_query`, and `handle_news_request`). Each node will take the `GraphState` as input and update it with the results of its operation.



In [112]:
def intent_recognition_node(state: GraphState) -> GraphState:
    """
    Node to recognize the intent of the user input.
    """
    user_input = state["user_input"]
    intent = recognize_intent(user_input)
    return {"intent": intent}

def general_query_node(state: GraphState) -> GraphState:
    """
    Node to handle general queries.
    """
    user_input = state["user_input"]
    response = handle_general_query(user_input)
    return {"response": response}

def news_request_node(state: GraphState) -> GraphState:
    """
    Node to handle news requests.
    """
    user_input = state["user_input"]
    response = handle_news_request(user_input)
    time.sleep(2) # Preserve the delay
    return {"response": response}

print("LangGraph nodes defined successfully.")

LangGraph nodes defined successfully.


**Reasoning**:
Now that the nodes are defined, I need to define the conditional routing logic using `langgraph` to connect these nodes. This involves importing `StateGraph`, creating a graph, adding the nodes, setting the entry point, defining the conditional edges based on the intent, and compiling the graph.



In [106]:
from langgraph.graph import StateGraph, END

def route_intent(state: GraphState):
    """
    Conditional edge based on intent.
    """
    if state["intent"] == "general_query":
        return "general_query_handler"
    elif state["intent"] == "news_request":
        return "news_request_handler"
    else:
        return "general_query_handler" # Fallback if intent is not recognized or an unexpected value

# Build the graph
workflow = StateGraph(GraphState)

# Add nodes to the graph
workflow.add_node("intent_recognizer", intent_recognition_node)
workflow.add_node("general_query_handler", general_query_node)
workflow.add_node("news_request_handler", news_request_node)

# Set the entry point
workflow.set_entry_point("intent_recognizer")

# Add conditional edges from the intent recognizer
workflow.add_conditional_edges(
    "intent_recognizer",
    route_intent,
    {
        "general_query_handler": "general_query_handler",
        "news_request_handler": "news_request_handler",
    },
)

# Set end points for the handlers
workflow.add_edge("general_query_handler", END)
workflow.add_edge("news_request_handler", END)

# Compile the graph
app = workflow.compile()

print("LangGraph workflow compiled successfully.")

LangGraph workflow compiled successfully.


**Reasoning**:
The next logical step is to integrate the newly compiled `langgraph` workflow into the `chatbot_response` function, replacing the old conditional logic with the graph execution. This will complete the refactoring of the chatbot's core logic.



In [107]:
def chatbot_response(user_input):
    """
    Main function to process user input using the LangGraph workflow.
    """
    initial_state = {"user_input": user_input, "intent": "", "response": ""}

    # Run the graph
    # The graph will execute nodes: intent_recognizer -> (general_query_handler OR news_request_handler)
    # and update the state.
    for s in app.stream(initial_state):
        # s contains the state after each step in the graph
        final_state = s

    return final_state.get("response", "I'm sorry, something went wrong and I couldn't process your request.")

print("Main chatbot response function 'chatbot_response' updated to use LangGraph successfully.")

Main chatbot response function 'chatbot_response' updated to use LangGraph successfully.


**Reasoning**:
The `chatbot_response` function has been refactored to use `langgraph`. Now, it is crucial to re-run the existing test cases for `chatbot_response` to verify that the refactoring maintained the intended functionality and that the bot responds correctly to both general queries and news requests.



In [108]:
import time

# Test with a general query
user_input_general = "What is the capital of France?"
print(f"User: {user_input_general}")
response_general = chatbot_response(user_input_general)
print(f"Chatbot: {response_general}\n")

# Test with another general query (that might trigger web search fallback)
user_input_general_web = "What is the tallest mountain in Africa?"
print(f"User: {user_input_general_web}")
response_general_web = chatbot_response(user_input_general_web)
print(f"Chatbot: {response_general_web}\n")

# Test with a news request
user_input_news = "Tell me the news about technology."
print(f"User: {user_input_news}")
response_news = chatbot_response(user_input_news)
print(f"Chatbot: {response_news}\n")

# Test with another news request
user_input_news_2 = "What are the latest updates on space exploration?"
print(f"User: {user_input_news_2}")
response_news_2 = chatbot_response(user_input_news_2)
print(f"Chatbot: {response_news_2}\n")

# Add more test cases for news requests
user_input_news_3 = "news on current events"
print(f"User: {user_input_news_3}")
response_news_3 = chatbot_response(user_input_news_3)
print(f"Chatbot: {response_news_3}\n")

user_input_news_4 = "breaking news in politics"
print(f"User: {user_input_news_4}")
response_news_4 = chatbot_response(user_input_news_4)
print(f"Chatbot: {response_news_4}\n")

user_input_news_5 = "what's happening in the economy?"
print(f"User: {user_input_news_5}")
response_news_5 = chatbot_response(user_input_news_5)
print(f"Chatbot: {response_news_5}\n")

user_input_news_6 = "today's headlines"
print(f"User: {user_input_news_6}")
response_news_6 = chatbot_response(user_input_news_6)
print(f"Chatbot: {response_news_6}\n")

User: What is the capital of France?
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: What is the tallest mountain in Africa?
No direct answer found for 'What is the tallest mountain in Africa?'. Attempting web search...
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: Tell me the news about technology.
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: What are the latest updates on space exploration?
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: news on current events
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: breaking news in politics
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: what's happening in the economy?
Chatbot: I'm sorry, something went wrong and I couldn't process your request.

User: today's headlines
Chatbot: I'm sorry, something went wrong and I c

**Reasoning**:
The previous test results indicate that the `chatbot_response` function is not correctly extracting the final response from the LangGraph workflow, as it consistently returns an error message. This is likely due to the way `app.stream` is being used. Using `app.invoke()` will directly execute the graph and return the final state, which should contain the generated response.



In [109]:
def chatbot_response(user_input):
    """
    Main function to process user input using the LangGraph workflow.
    """
    initial_state = {"user_input": user_input, "intent": "", "response": ""}

    # Run the graph directly to get the final state
    final_state = app.invoke(initial_state)

    return final_state.get("response", "I'm sorry, something went wrong and I couldn't process your request.")

print("Main chatbot response function 'chatbot_response' updated to use LangGraph's invoke method.")

Main chatbot response function 'chatbot_response' updated to use LangGraph's invoke method.


**Reasoning**:
The `chatbot_response` function has been updated to use `app.invoke()`, which is expected to resolve the issue of not correctly extracting the final response from the LangGraph workflow. It is crucial to re-run the existing test cases to verify that the refactoring maintained the intended functionality and that the bot responds correctly to both general queries and news requests.



In [110]:
import time

# Test with a general query
user_input_general = "What is the capital of France?"
print(f"User: {user_input_general}")
response_general = chatbot_response(user_input_general)
print(f"Chatbot: {response_general}\n")

# Test with another general query (that might trigger web search fallback)
user_input_general_web = "What is the tallest mountain in Africa?"
print(f"User: {user_input_general_web}")
response_general_web = chatbot_response(user_input_general_web)
print(f"Chatbot: {response_general_web}\n")

# Test with a news request
user_input_news = "Tell me the news about technology."
print(f"User: {user_input_news}")
response_news = chatbot_response(user_input_news)
print(f"Chatbot: {response_news}\n")

# Test with another news request
user_input_news_2 = "What are the latest updates on space exploration?"
print(f"User: {user_input_news_2}")
response_news_2 = chatbot_response(user_input_news_2)
print(f"Chatbot: {response_news_2}\n")

# Add more test cases for news requests
user_input_news_3 = "news on current events"
print(f"User: {user_input_news_3}")
response_news_3 = chatbot_response(user_input_news_3)
print(f"Chatbot: {response_news_3}\n")

user_input_news_4 = "breaking news in politics"
print(f"User: {user_input_news_4}")
response_news_4 = chatbot_response(user_input_news_4)
print(f"Chatbot: {response_news_4}\n")

user_input_news_5 = "what's happening in the economy?"
print(f"User: {user_input_news_5}")
response_news_5 = chatbot_response(user_input_news_5)
print(f"Chatbot: {response_news_5}\n")

user_input_news_6 = "today's headlines"
print(f"User: {user_input_news_6}")
response_news_6 = chatbot_response(user_input_news_6)
print(f"Chatbot: {response_news_6}\n")

User: What is the capital of France?
Chatbot: The capital of France is Paris.

User: What is the tallest mountain in Africa?
No direct answer found for 'What is the tallest mountain in Africa?'. Attempting web search...
Chatbot: Here's a snippet from a top result: The highest African mountain is Kilimanjaro, which has three peaks, named Kibo, Mawenzi and Shira, of which Kibo is the tallest and Mawenzi is sufficiently prominent to be a mountain. Mount Kenya is t...
Top results:
1. List of highest major summits of Africa - Wikipedia
   The highest African mountain is Kilimanjaro, which has three peaks, named Kibo, Mawenzi and Shira, of which Kibo is the tallest and Mawenzi is suffici...
   Read more: https://en.wikipedia.org/wiki/List_of_highest_major_summits_of_Africa

2. Kilimanjaro - National Geographic Education
   Located in Tanzania, Mount Kilimanjaro is the African continent's highest peak at 5,895 meters (19,340 feet). The majestic mountain is a snow-capped v...
   Read more: htt

## Summary:

### Data Analysis Key Findings
*   The `langgraph` library was successfully installed and the `GraphState` TypedDict was defined with `user_input`, `intent`, and `response` keys for managing workflow state.
*   Functional nodes for intent recognition, general query handling, and news request processing were successfully created and integrated into the `langgraph` workflow.
*   A `StateGraph` was constructed with conditional routing based on recognized intent, directing flow to either a "general\_query\_handler" or "news\_request\_handler" node, and the graph was successfully compiled.
*   An initial attempt to integrate `langgraph` into the `chatbot_response` function using `app.stream()` was unsuccessful, failing to extract the final response and yielding generic error messages across all test cases.
*   Refactoring the `chatbot_response` to use `app.invoke()` resolved the issue, enabling correct extraction of the final response from the `langgraph` workflow.
*   Post-correction testing confirmed that general queries received appropriate answers and news requests successfully triggered the dedicated news handler, preserving the original `time.sleep` delay, thus validating the refactored chatbot's functionality.

### Insights or Next Steps
*   When integrating `langgraph` or similar state-machine libraries, carefully select the appropriate execution method (`stream()` vs. `invoke()`) based on whether you need intermediate state updates or just the final result.
*   Thorough end-to-end testing is crucial after significant refactoring, especially when changing the underlying architecture, to ensure all original functionalities are maintained and new implementations work as expected.


**Reasoning**:
Now that the nodes are defined, I need to define the conditional routing logic using `langgraph` to connect these nodes. This involves importing `StateGraph`, creating a graph, adding the nodes, setting the entry point, defining the conditional edges based on the intent, and compiling the graph.

In [None]:
from langgraph.graph import StateGraph, END

def route_intent(state: GraphState):
    """
    Conditional edge based on intent.
    """
    if state["intent"] == "general_query":
        return "general_query_handler"
    elif state["intent"] == "news_request":
        return "news_request_handler"
    else:
        return "general_query_handler" # Fallback if intent is not recognized or an unexpected value

# Build the graph
workflow = StateGraph(GraphState)

# Add nodes to the graph
workflow.add_node("intent_recognizer", intent_recognition_node)
workflow.add_node("general_query_handler", general_query_node)
workflow.add_node("news_request_handler", news_request_node)

# Set the entry point
workflow.set_entry_point("intent_recognizer")

# Add conditional edges from the intent recognizer
workflow.add_conditional_edges(
    "intent_recognizer",
    route_intent,
    {
        "general_query_handler": "general_query_handler",
        "news_request_handler": "news_request_handler",
    },
)

# Set end points for the handlers
workflow.add_edge("general_query_handler", END)
workflow.add_edge("news_request_handler", END)

# Compile the graph
app = workflow.compile()

print("LangGraph workflow compiled successfully.")

# Task
Create a Streamlit UI for the chatbot by first creating a new Python file `app.py`. Inside `app.py`, consolidate all existing chatbot logic, including:
- imports (`nltk`, `re`, `requests`, `os`, `time`, `TavilyClient`, `userdata`, `TypedDict`, `StateGraph`, `END`)
- NLTK downloads
- API key initialization (for GNEWS_API_KEY and TAVILY_API_KEY)
- `preprocess_text` function
- `recognize_intent` function
- `fetch_news` function
- `handle_news_request` function
- `fetch_web_results` function
- `handle_general_query` function
- `GraphState` definition
- LangGraph node functions (`intent_recognition_node`, `general_query_node`, `news_request_node`)
- LangGraph workflow setup and compilation (`workflow`, `route_intent`, `app`)
- The `chatbot_response` function (using `app.invoke`)

Also, include necessary `streamlit` imports (`import streamlit as st`).

## Create Streamlit UI File

### Subtask:
Create a new Python file named `app.py`. This will serve as the container for all chatbot logic and the Streamlit interface.


**Reasoning**:
The subtask requires creating a new Python file named `app.py` and consolidating all the chatbot logic into it. I will use the `%%writefile` magic command within a code block to create this file and populate it with all the specified imports, NLTK downloads, API key setups, and function definitions, ensuring to add the `streamlit as st` import.



In [113]:
%%writefile app.py
import nltk
from nltk.tokenize import word_tokenize
import time
import requests
import os
from google.colab import userdata # Keep for local testing if needed, but will be removed for deployment
from tavily import TavilyClient
import re
from typing import TypedDict
from langgraph.graph import StateGraph, END
import streamlit as st

# NLTK Downloads
# These usually run once, but for a script, they might need a check or be pre-downloaded
try:
    nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
    nltk.download('punkt')
try:
    nltk.data.find('tokenizers/punkt_tab')
except nltk.downloader.DownloadError:
    nltk.download('punkt_tab')
# print("'punkt' tokenizer data downloaded successfully.") # Removed print for clean app.py

# API Key Initialization
# In a deployed Streamlit app, these would typically be loaded from environment variables
# or Streamlit secrets, not userdata.get(). For this step, I'll keep userdata.get()
# as it reflects the notebook state, but will add comments for deployment.
NEWS_API_KEY = userdata.get('GNEWS_API_KEY')
TAVILY_API_KEY = userdata.get('TAVILY_API_KEY')

# For NewsAPI: NEWS_API_BASE_URL = "https://newsapi.org/v2/everything"
# For GNews API: NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"
NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"

# Initialize the Tavily client with the API key
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

def preprocess_text(text):
    """
    Tokenizes and normalizes the input text.
    Converts to lowercase and removes punctuation (for simplicity).
    """
    tokens = word_tokenize(text.lower())
    # Remove non-alphabetic tokens
    normalized_tokens = [word for word in tokens if word.isalpha()]
    return normalized_tokens

def recognize_intent(user_input):
    """
    Analyzes user input to classify the intent as either 'general_query' or 'news_request'.
    """
    processed_input = preprocess_text(user_input)
    lower_user_input = user_input.lower()

    # Define single-word news keywords
    news_single_keywords = ["news", "headlines"]

    # Define multi-word news phrases based on the Chatbot Scope and Requirements Definition
    # and observations from previous testing
    news_phrases_to_match = [
        "latest updates", "current events", "breaking news", "today's news",
        "recent developments", "what's happening", "tell me the news",
        "give me today's headlines", "what are the latest stories",
        "what is happening in the world"
    ]

    # Check for single-word keyword matches in processed tokens
    for token in processed_input:
        if token in news_single_keywords:
            return "news_request"

    # Check for multi-word phrase matches in the original lowercased input
    for phrase in news_phrases_to_match:
        if phrase in lower_user_input:
            return "news_request"

    return "general_query"

def fetch_news(query):
    """
    Fetches news articles from a news API based on the query.
    """
    if not NEWS_API_KEY:
        return "News API key is not set. Please provide a valid API key."

    if not NEWS_API_BASE_URL:
        return "News API base URL is not set. Please provide a valid URL."

    try:
        params = {
            'q': query,
            'lang': 'en',
            'country': 'us', # Or other relevant country codes
            'max': 5, # Number of articles to fetch
            'apikey': NEWS_API_KEY
        }
        response = requests.get(NEWS_API_BASE_URL, params=params)
        response.raise_for_status() # Raise an exception for HTTP errors
        data = response.json()

        articles = data.get('articles', [])
        if not articles:
            return "Sorry, I couldn't find any news for that query."

        news_output = []
        for i, article in enumerate(articles):
            title = article.get('title', 'No Title')
            description = article.get('description', 'No Description')
            url = article.get('url', '#')
            news_output.append(f"{i+1}. {title}\n   {description}\n   Read more: {url}\n")

        return "\n".join(news_output)

    except requests.exceptions.RequestException as e:
        return f"Error fetching news: {e}"
    except ValueError as e:
        return f"Error parsing API response: {e}"

def handle_news_request(user_input):
    """
    Handles a news-specific request by extracting keywords from the user input
    and fetching relevant news articles.
    """
    # Define phrases that indicate a news request but should be removed from the query
    news_request_phrases = [
        "tell me the news about",
        "give me today's headlines about",
        "what are the latest stories on",
        "what are the latest updates on",
        "latest news on",
        "news about",
        "headlines about",
        "current events about",
        "breaking news about",
        "today's news on",
        "recent developments on",
        "what's happening with",
        "tell me the news",
        "give me today's headlines",
        "what are the latest stories",
        "what is happening in the world",
        "news",
        "headlines",
        "latest updates",
        "current events",
        "breaking news",
        "today's news",
        "recent developments",
        "what's happening"
    ]

    query = user_input.lower()

    # Remove news request phrases from the query
    for phrase in news_request_phrases:
        if phrase in query:
            query = query.replace(phrase, "").strip()

    # Further clean the query by removing punctuation (except spaces) and extra spaces
    query = re.sub(r'[^a-zA-Z0-9\s]', '', query)
    query = re.sub(r'\s+', ' ', query).strip()

    # If the query becomes empty after removing phrases and cleaning, set a default
    if not query:
        query = "top stories"

    return fetch_news(query)

def fetch_web_results(query):
    """
    Fetches web search results using the Tavily API.
    """
    if not TAVILY_API_KEY:
        return "Tavily API key is not set. Please provide a valid API key."

    try:
        response = tavily_client.search(query=query, search_depth="advanced", max_results=5)

        results = response.get('results', [])
        if not results:
            return "Sorry, I couldn't find any relevant web results for that query using Tavily."

        # Synthesize a main answer from the first result's content or a concise summary
        synthesized_answer = response.get('answer', 'No summary available.')

        web_results_output = []
        if synthesized_answer and synthesized_answer != 'No summary available.':
            web_results_output.append(f"Here's what I found: {synthesized_answer}")
        else:
            # Fallback to the first result's content if no overall summary is provided
            first_result_content = results[0].get('content', 'No content available.')
            if first_result_content and len(first_result_content) > 100: # Take a snippet
                web_results_output.append(f"Here's a snippet from a top result: {first_result_content[:200]}...")
            elif first_result_content:
                web_results_output.append(f"Here's what I found: {first_result_content}")
            else:
                web_results_output.append("Here are some top results:")

        web_results_output.append("Top results:")

        for i, item in enumerate(results):
            title = item.get('title', 'No Title')
            url = item.get('url', '#')
            # Tavily 'content' can be long, so we take a snippet for description
            content_snippet = item.get('content', 'No description available.')
            if len(content_snippet) > 150:
                content_snippet = content_snippet[:150] + '...'

            web_results_output.append(f"{i+1}. {title}\n   {content_snippet}\n   Read more: {url}\n")

        return "\n".join(web_results_output)

    except Exception as e:
        return f"Error fetching web results from Tavily: {e}"

def handle_general_query(user_input):
    """
    Processes a general query and provides a response based on a simple rule-based system,
    with a fallback to web search for unknown queries.
    """
    lower_input = user_input.lower()

    # Simple keyword-based rule system
    if "what is ai" in lower_input or "define ai" in lower_input:
        return "AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines, especially computer systems."
    elif "capital of france" in lower_input:
        return "The capital of France is Paris."
    elif "who invented the light bulb" in lower_input:
        return "Thomas Edison is widely credited with inventing the practical incandescent light bulb."
    elif "how to tie a shoelace" in lower_input:
        return "To tie a shoelace, make two 'bunny ears' with the laces, cross them over, and then tuck one under the other before pulling tight."
    elif "your name" in lower_input or "who are you" in lower_input:
        return "I am an AI assistant designed to help with your queries."
    elif "hello" in lower_input or "hi" in lower_input:
        return "Hello! How can I assist you today?"
    elif "how are you" in lower_input:
        return "I am an AI, so I don't have feelings, but I'm ready to help you!"
    else:
        # Fallback to web search if no rule-based answer is found
        # print(f"No direct answer found for '{user_input}'. Attempting web search...") # Removed print for clean app.py
        web_search_result = fetch_web_results(user_input)
        if web_search_result:
            return web_search_result
        else:
            return "I'm not sure how to answer that general query. Could you please rephrase it or ask something else?"

# Graph State Definition
class GraphState(TypedDict):
    """
    Represents the state of our graph. Used to pass information between nodes.
    """
    user_input: str
    intent: str
    response: str

# LangGraph Node Functions
def intent_recognition_node(state: GraphState) -> GraphState:
    """
    Node to recognize the intent of the user input.
    """
    user_input = state["user_input"]
    intent = recognize_intent(user_input)
    return {"intent": intent}

def general_query_node(state: GraphState) -> GraphState:
    """
    Node to handle general queries.
    """
    user_input = state["user_input"]
    response = handle_general_query(user_input)
    return {"response": response}

def news_request_node(state: GraphState) -> GraphState:
    """
    Node to handle news requests.
    """
    user_input = state["user_input"]
    response = handle_news_request(user_input)
    time.sleep(2) # Preserve the delay
    return {"response": response}

# LangGraph Workflow Setup and Compilation
def route_intent(state: GraphState):
    """
    Conditional edge based on intent.
    """
    if state["intent"] == "general_query":
        return "general_query_handler"
    elif state["intent"] == "news_request":
        return "news_request_handler"
    else:
        return "general_query_handler" # Fallback if intent is not recognized or an unexpected value

# Build the graph
workflow = StateGraph(GraphState)

# Add nodes to the graph
workflow.add_node("intent_recognizer", intent_recognition_node)
workflow.add_node("general_query_handler", general_query_node)
workflow.add_node("news_request_handler", news_request_node)

# Set the entry point
workflow.set_entry_point("intent_recognizer")

# Add conditional edges from the intent recognizer
workflow.add_conditional_edges(
    "intent_recognizer",
    route_intent,
    {
        "general_query_handler": "general_query_handler",
        "news_request_handler": "news_request_handler",
    },
)

# Set end points for the handlers
workflow.add_edge("general_query_handler", END)
workflow.add_edge("news_request_handler", END)

# Compile the graph
app = workflow.compile()

# Chatbot Response Function
def chatbot_response(user_input):
    """
    Main function to process user input using the LangGraph workflow.
    """
    initial_state = {"user_input": user_input, "intent": "", "response": ""}

    # Run the graph directly to get the final state
    final_state = app.invoke(initial_state)

    return final_state.get("response", "I'm sorry, something went wrong and I couldn't process your request.")

# Streamlit UI (Placeholder for the next steps)
# st.title("AI Chatbot")
# user_query = st.text_input("Ask me anything:")

# if user_query:
#     st.write(f"You: {user_query}")
#     response = chatbot_response(user_query)
#     st.write(f"Chatbot: {response}")

Writing app.py


**Reasoning**:
Now that the `app.py` file contains all the backend logic, the next step is to integrate the Streamlit UI components into this file to create an interactive chatbot interface. This involves adding the `st.title`, `st.text_input`, and `st.write` functions to handle user interaction and display responses.



In [114]:
%%writefile app.py
import nltk
from nltk.tokenize import word_tokenize
import time
import requests
import os
from google.colab import userdata # Keep for local testing if needed, but will be removed for deployment
from tavily import TavilyClient
import re
from typing import TypedDict
from langgraph.graph import StateGraph, END
import streamlit as st

# NLTK Downloads
# These usually run once, but for a script, they might need a check or be pre-downloaded
try:
    nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
    nltk.download('punkt')
try:
    nltk.data.find('tokenizers/punkt_tab')
except nltk.downloader.DownloadError:
    nltk.download('punkt_tab')
# print("'punkt' tokenizer data downloaded successfully.") # Removed print for clean app.py

# API Key Initialization
# In a deployed Streamlit app, these would typically be loaded from environment variables
# or Streamlit secrets, not userdata.get(). For this step, I'll keep userdata.get()
# as it reflects the notebook state, but will add comments for deployment.
NEWS_API_KEY = userdata.get('GNEWS_API_KEY')
TAVILY_API_KEY = userdata.get('TAVILY_API_KEY')

# For NewsAPI: NEWS_API_BASE_URL = "https://newsapi.org/v2/everything"
# For GNews API: NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"
NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"

# Initialize the Tavily client with the API key
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

def preprocess_text(text):
    """
    Tokenizes and normalizes the input text.
    Converts to lowercase and removes punctuation (for simplicity).
    """
    tokens = word_tokenize(text.lower())
    # Remove non-alphabetic tokens
    normalized_tokens = [word for word in tokens if word.isalpha()]
    return normalized_tokens

def recognize_intent(user_input):
    """
    Analyzes user input to classify the intent as either 'general_query' or 'news_request'.
    """
    processed_input = preprocess_text(user_input)
    lower_user_input = user_input.lower()

    # Define single-word news keywords
    news_single_keywords = ["news", "headlines"]

    # Define multi-word news phrases based on the Chatbot Scope and Requirements Definition
    # and observations from previous testing
    news_phrases_to_match = [
        "latest updates", "current events", "breaking news", "today's news",
        "recent developments", "what's happening", "tell me the news",
        "give me today's headlines", "what are the latest stories",
        "what is happening in the world"
    ]

    # Check for single-word keyword matches in processed tokens
    for token in processed_input:
        if token in news_single_keywords:
            return "news_request"

    # Check for multi-word phrase matches in the original lowercased input
    for phrase in news_phrases_to_match:
        if phrase in lower_user_input:
            return "news_request"

    return "general_query"

def fetch_news(query):
    """
    Fetches news articles from a news API based on the query.
    """
    if not NEWS_API_KEY:
        return "News API key is not set. Please provide a valid API key."

    if not NEWS_API_BASE_URL:
        return "News API base URL is not set. Please provide a valid URL."

    try:
        params = {
            'q': query,
            'lang': 'en',
            'country': 'us', # Or other relevant country codes
            'max': 5, # Number of articles to fetch
            'apikey': NEWS_API_KEY
        }
        response = requests.get(NEWS_API_BASE_URL, params=params)
        response.raise_for_status() # Raise an exception for HTTP errors
        data = response.json()

        articles = data.get('articles', [])
        if not articles:
            return "Sorry, I couldn't find any news for that query."

        news_output = []
        for i, article in enumerate(articles):
            title = article.get('title', 'No Title')
            description = article.get('description', 'No Description')
            url = article.get('url', '#')
            news_output.append(f"{i+1}. {title}\n   {description}\n   Read more: {url}\n")

        return "\n".join(news_output)

    except requests.exceptions.RequestException as e:
        return f"Error fetching news: {e}"
    except ValueError as e:
        return f"Error parsing API response: {e}"

def handle_news_request(user_input):
    """
    Handles a news-specific request by extracting keywords from the user input
    and fetching relevant news articles.
    """
    # Define phrases that indicate a news request but should be removed from the query
    news_request_phrases = [
        "tell me the news about",
        "give me today's headlines about",
        "what are the latest stories on",
        "what are the latest updates on",
        "latest news on",
        "news about",
        "headlines about",
        "current events about",
        "breaking news about",
        "today's news on",
        "recent developments on",
        "what's happening with",
        "tell me the news",
        "give me today's headlines",
        "what are the latest stories",
        "what is happening in the world",
        "news",
        "headlines",
        "latest updates",
        "current events",
        "breaking news",
        "today's news",
        "recent developments",
        "what's happening"
    ]

    query = user_input.lower()

    # Remove news request phrases from the query
    for phrase in news_request_phrases:
        if phrase in query:
            query = query.replace(phrase, "").strip()

    # Further clean the query by removing punctuation (except spaces) and extra spaces
    query = re.sub(r'[^a-zA-Z0-9\s]', '', query)
    query = re.sub(r'\s+', ' ', query).strip()

    # If the query becomes empty after removing phrases and cleaning, set a default
    if not query:
        query = "top stories"

    return fetch_news(query)

def fetch_web_results(query):
    """
    Fetches web search results using the Tavily API.
    """
    if not TAVILY_API_KEY:
        return "Tavily API key is not set. Please provide a valid API key."

    try:
        response = tavily_client.search(query=query, search_depth="advanced", max_results=5)

        results = response.get('results', [])
        if not results:
            return "Sorry, I couldn't find any relevant web results for that query using Tavily."

        # Synthesize a main answer from the first result's content or a concise summary
        synthesized_answer = response.get('answer', 'No summary available.')

        web_results_output = []
        if synthesized_answer and synthesized_answer != 'No summary available.':
            web_results_output.append(f"Here's what I found: {synthesized_answer}")
        else:
            # Fallback to the first result's content if no overall summary is provided
            first_result_content = results[0].get('content', 'No content available.')
            if first_result_content and len(first_result_content) > 100: # Take a snippet
                web_results_output.append(f"Here's a snippet from a top result: {first_result_content[:200]}...")
            elif first_result_content:
                web_results_output.append(f"Here's what I found: {first_result_content}")
            else:
                web_results_output.append("Here are some top results:")

        web_results_output.append("Top results:")

        for i, item in enumerate(results):
            title = item.get('title', 'No Title')
            url = item.get('url', '#')
            # Tavily 'content' can be long, so we take a snippet for description
            content_snippet = item.get('content', 'No description available.')
            if len(content_snippet) > 150:
                content_snippet = content_snippet[:150] + '...'

            web_results_output.append(f"{i+1}. {title}\n   {content_snippet}\n   Read more: {url}\n")

        return "\n".join(web_results_output)

    except Exception as e:
        return f"Error fetching web results from Tavily: {e}"

def handle_general_query(user_input):
    """
    Processes a general query and provides a response based on a simple rule-based system,
    with a fallback to web search for unknown queries.
    """
    lower_input = user_input.lower()

    # Simple keyword-based rule system
    if "what is ai" in lower_input or "define ai" in lower_input:
        return "AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines, especially computer systems."
    elif "capital of france" in lower_input:
        return "The capital of France is Paris."
    elif "who invented the light bulb" in lower_input:
        return "Thomas Edison is widely credited with inventing the practical incandescent light bulb."
    elif "how to tie a shoelace" in lower_input:
        return "To tie a shoelace, make two 'bunny ears' with the laces, cross them over, and then tuck one under the other before pulling tight."
    elif "your name" in lower_input or "who are you" in lower_input:
        return "I am an AI assistant designed to help with your queries."
    elif "hello" in lower_input or "hi" in lower_input:
        return "Hello! How can I assist you today?"
    elif "how are you" in lower_input:
        return "I am an AI, so I don't have feelings, but I'm ready to help you!"
    else:
        # Fallback to web search if no rule-based answer is found
        # print(f"No direct answer found for '{user_input}'. Attempting web search...") # Removed print for clean app.py
        web_search_result = fetch_web_results(user_input)
        if web_search_result:
            return web_search_result
        else:
            return "I'm not sure how to answer that general query. Could you please rephrase it or ask something else?"

# Graph State Definition
class GraphState(TypedDict):
    """
    Represents the state of our graph. Used to pass information between nodes.
    """
    user_input: str
    intent: str
    response: str

# LangGraph Node Functions
def intent_recognition_node(state: GraphState) -> GraphState:
    """
    Node to recognize the intent of the user input.
    """
    user_input = state["user_input"]
    intent = recognize_intent(user_input)
    return {"intent": intent}

def general_query_node(state: GraphState) -> GraphState:
    """
    Node to handle general queries.
    """
    user_input = state["user_input"]
    response = handle_general_query(user_input)
    return {"response": response}

def news_request_node(state: GraphState) -> GraphState:
    """
    Node to handle news requests.
    """
    user_input = state["user_input"]
    response = handle_news_request(user_input)
    time.sleep(2) # Preserve the delay
    return {"response": response}

# LangGraph Workflow Setup and Compilation
def route_intent(state: GraphState):
    """
    Conditional edge based on intent.
    """
    if state["intent"] == "general_query":
        return "general_query_handler"
    elif state["intent"] == "news_request":
        return "news_request_handler"
    else:
        return "general_query_handler" # Fallback if intent is not recognized or an unexpected value

# Build the graph
workflow = StateGraph(GraphState)

# Add nodes to the graph
workflow.add_node("intent_recognizer", intent_recognition_node)
workflow.add_node("general_query_handler", general_query_node)
workflow.add_node("news_request_handler", news_request_node)

# Set the entry point
workflow.set_entry_point("intent_recognizer")

# Add conditional edges from the intent recognizer
workflow.add_conditional_edges(
    "intent_recognizer",
    route_intent,
    {
        "general_query_handler": "general_query_handler",
        "news_request_handler": "news_request_handler",
    },
)

# Set end points for the handlers
workflow.add_edge("general_query_handler", END)
workflow.add_edge("news_request_handler", END)

# Compile the graph
app = workflow.compile()

# Chatbot Response Function
def chatbot_response(user_input):
    """
    Main function to process user input using the LangGraph workflow.
    """
    initial_state = {"user_input": user_input, "intent": "", "response": ""}

    # Run the graph directly to get the final state
    final_state = app.invoke(initial_state)

    return final_state.get("response", "I'm sorry, something went wrong and I couldn't process your request.")

# Streamlit UI
st.title("AI Chatbot")

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if user_query := st.chat_input("Ask me anything:"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": user_query})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(user_query)

    # Generate and display assistant response
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = chatbot_response(user_query)
            st.markdown(response)
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response})


Overwriting app.py


## Implement News Category Selection

### Subtask:
Add a user interface element for selecting news categories and manage the selected category in Streamlit's session state.


**Reasoning**:
To implement the news category selection, I will first modify the `app.py` file to include a Streamlit selectbox for categories and manage the selected category in `st.session_state`. Then, I will update the `handle_news_request` function to incorporate this selected category into the news query.



In [116]:
%%writefile app.py
import nltk
from nltk.tokenize import word_tokenize
import time
import requests
import os
from google.colab import userdata # Keep for local testing if needed, but will be removed for deployment
from tavily import TavilyClient
import re
from typing import TypedDict
from langgraph.graph import StateGraph, END
import streamlit as st

# NLTK Downloads
# These usually run once, but for a script, they might need a check or be pre-downloaded
try:
    nltk.data.find('tokenizers/punkt')
except nltk.downloader.DownloadError:
    nltk.download('punkt')
try:
    nltk.data.find('tokenizers/punkt_tab')
except nltk.downloader.DownloadError:
    nltk.download('punkt_tab')
# print("'punkt' tokenizer data downloaded successfully.") # Removed print for clean app.py

# API Key Initialization
# In a deployed Streamlit app, these would typically be loaded from environment variables
# or Streamlit secrets, not userdata.get(). For this step, I'll keep userdata.get()
# as it reflects the notebook state, but will add comments for deployment.
NEWS_API_KEY = userdata.get('GNEWS_API_KEY')
TAVILY_API_KEY = userdata.get('TAVILY_API_KEY')

# For NewsAPI: NEWS_API_BASE_URL = "https://newsapi.org/v2/everything"
# For GNews API: NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"
NEWS_API_BASE_URL = "https://gnews.io/api/v4/search"

# Initialize the Tavily client with the API key
tavily_client = TavilyClient(api_key=TAVILY_API_KEY)

def preprocess_text(text):
    """
    Tokenizes and normalizes the input text.
    Converts to lowercase and removes punctuation (for simplicity).
    """
    tokens = word_tokenize(text.lower())
    # Remove non-alphabetic tokens
    normalized_tokens = [word for word in tokens if word.isalpha()]
    return normalized_tokens

def recognize_intent(user_input):
    """
    Analyzes user input to classify the intent as either 'general_query' or 'news_request'.
    """
    processed_input = preprocess_text(user_input)
    lower_user_input = user_input.lower()

    # Define single-word news keywords
    news_single_keywords = ["news", "headlines"]

    # Define multi-word news phrases based on the Chatbot Scope and Requirements Definition
    # and observations from previous testing
    news_phrases_to_match = [
        "latest updates", "current events", "breaking news", "today's news",
        "recent developments", "what's happening", "tell me the news",
        "give me today's headlines", "what are the latest stories",
        "what is happening in the world"
    ]

    # Check for single-word keyword matches in processed tokens
    for token in processed_input:
        if token in news_single_keywords:
            return "news_request"

    # Check for multi-word phrase matches in the original lowercased input
    for phrase in news_phrases_to_match:
        if phrase in lower_user_input:
            return "news_request"

    return "general_query"

def fetch_news(query):
    """
    Fetches news articles from a news API based on the query.
    """
    if not NEWS_API_KEY:
        return "News API key is not set. Please provide a valid API key."

    if not NEWS_API_BASE_URL:
        return "News API base URL is not set. Please provide a valid URL."

    try:
        params = {
            'q': query,
            'lang': 'en',
            'country': 'us', # Or other relevant country codes
            'max': 5, # Number of articles to fetch
            'apikey': NEWS_API_KEY
        }
        response = requests.get(NEWS_API_BASE_URL, params=params)
        response.raise_for_status() # Raise an exception for HTTP errors
        data = response.json()

        articles = data.get('articles', [])
        if not articles:
            return "Sorry, I couldn't find any news for that query."

        news_output = []
        for i, article in enumerate(articles):
            title = article.get('title', 'No Title')
            description = article.get('description', 'No Description')
            url = article.get('url', '#')
            news_output.append(f"{i+1}. {title}\n   {description}\n   Read more: {url}\n")

        return "\n".join(news_output)

    except requests.exceptions.RequestException as e:
        return f"Error fetching news: {e}"
    except ValueError as e:
        return f"Error parsing API response: {e}"

def handle_news_request(user_input, category="General"):
    """
    Handles a news-specific request by extracting keywords from the user input
    and fetching relevant news articles, optionally filtered by category.
    """
    # Define phrases that indicate a news request but should be removed from the query
    news_request_phrases = [
        "tell me the news about",
        "give me today's headlines about",
        "what are the latest stories on",
        "what are the latest updates on",
        "latest news on",
        "news about",
        "headlines about",
        "current events about",
        "breaking news about",
        "today's news on",
        "recent developments on",
        "what's happening with",
        "tell me the news",
        "give me today's headlines",
        "what are the latest stories",
        "what is happening in the world",
        "news",
        "headlines",
        "latest updates",
        "current events",
        "breaking news",
        "today's news",
        "recent developments",
        "what's happening"
    ]

    query = user_input.lower()

    # Remove news request phrases from the query
    for phrase in news_request_phrases:
        if phrase in query:
            query = query.replace(phrase, "").strip()

    # Further clean the query by removing punctuation (except spaces) and extra spaces
    query = re.sub(r'[^a-zA-Z0-9\s]', '', query)
    query = re.sub(r'\s+', ' ', query).strip()

    # Append category to query if it's not 'General' and query is not empty
    if category != "General" and category.strip() != "":
        if query:
            query = f"{query} {category}"
        else:
            query = category # If only category was selected and no specific query

    # If the query becomes empty after removing phrases and cleaning, set a default
    if not query:
        query = "top stories"

    return fetch_news(query)

def fetch_web_results(query):
    """
    Fetches web search results using the Tavily API.
    """
    if not TAVILY_API_KEY:
        return "Tavily API key is not set. Please provide a valid API key."

    try:
        response = tavily_client.search(query=query, search_depth="advanced", max_results=5)

        results = response.get('results', [])
        if not results:
            return "Sorry, I couldn't find any relevant web results for that query using Tavily."

        # Synthesize a main answer from the first result's content or a concise summary
        synthesized_answer = response.get('answer', 'No summary available.')

        web_results_output = []
        if synthesized_answer and synthesized_answer != 'No summary available.':
            web_results_output.append(f"Here's what I found: {synthesized_answer}")
        else:
            # Fallback to the first result's content if no overall summary is provided
            first_result_content = results[0].get('content', 'No content available.')
            if first_result_content and len(first_result_content) > 100: # Take a snippet
                web_results_output.append(f"Here's a snippet from a top result: {first_result_content[:200]}...")
            elif first_result_content:
                web_results_output.append(f"Here's what I found: {first_result_content}")
            else:
                web_results_output.append("Here are some top results:")

        web_results_output.append("Top results:")

        for i, item in enumerate(results):
            title = item.get('title', 'No Title')
            url = item.get('url', '#')
            # Tavily 'content' can be long, so we take a snippet for description
            content_snippet = item.get('content', 'No description available.')
            if len(content_snippet) > 150:
                content_snippet = content_snippet[:150] + '...'

            web_results_output.append(f"{i+1}. {title}\n   {content_snippet}\n   Read more: {url}\n")

        return "\n".join(web_results_output)

    except Exception as e:
        return f"Error fetching web results from Tavily: {e}"

def handle_general_query(user_input):
    """
    Processes a general query and provides a response based on a simple rule-based system,
    with a fallback to web search for unknown queries.
    """
    lower_input = user_input.lower()

    # Simple keyword-based rule system
    if "what is ai" in lower_input or "define ai" in lower_input:
        return "AI stands for Artificial Intelligence, which is the simulation of human intelligence processes by machines, especially computer systems."
    elif "capital of france" in lower_input:
        return "The capital of France is Paris."
    elif "who invented the light bulb" in lower_input:
        return "Thomas Edison is widely credited with inventing the practical incandescent light bulb."
    elif "how to tie a shoelace" in lower_input:
        return "To tie a shoelace, make two 'bunny ears' with the laces, cross them over, and then tuck one under the other before pulling tight."
    elif "your name" in lower_input or "who are you" in lower_input:
        return "I am an AI assistant designed to help with your queries."
    elif "hello" in lower_input or "hi" in lower_input:
        return "Hello! How can I assist you today?"
    elif "how are you" in lower_input:
        return "I am an AI, so I don't have feelings, but I'm ready to help you!"
    else:
        # Fallback to web search if no rule-based answer is found
        # print(f"No direct answer found for '{user_input}'. Attempting web search...") # Removed print for clean app.py
        web_search_result = fetch_web_results(user_input)
        if web_search_result:
            return web_search_result
        else:
            return "I'm not sure how to answer that general query. Could you please rephrase it or ask something else?"

# Graph State Definition
class GraphState(TypedDict):
    """
    Represents the state of our graph. Used to pass information between nodes.
    """
    user_input: str
    intent: str
    response: str
    news_category: str # Added for news category selection

# LangGraph Node Functions
def intent_recognition_node(state: GraphState) -> GraphState:
    """
    Node to recognize the intent of the user input.
    """
    user_input = state["user_input"]
    intent = recognize_intent(user_input)
    return {"intent": intent}

def general_query_node(state: GraphState) -> GraphState:
    """
    Node to handle general queries.
    """
    user_input = state["user_input"]
    response = handle_general_query(user_input)
    return {"response": response}

def news_request_node(state: GraphState) -> GraphState:
    """
    Node to handle news requests.
    """
    user_input = state["user_input"]
    # Pass the selected news category to handle_news_request
    selected_news_category = state.get("news_category", "General")
    response = handle_news_request(user_input, selected_news_category)
    time.sleep(2) # Preserve the delay
    return {"response": response}

# LangGraph Workflow Setup and Compilation
def route_intent(state: GraphState):
    """
    Conditional edge based on intent.
    """
    if state["intent"] == "general_query":
        return "general_query_handler"
    elif state["intent"] == "news_request":
        return "news_request_handler"
    else:
        return "general_query_handler" # Fallback if intent is not recognized or an unexpected value

# Build the graph
workflow = StateGraph(GraphState)

# Add nodes to the graph
workflow.add_node("intent_recognizer", intent_recognition_node)
workflow.add_node("general_query_handler", general_query_node)
workflow.add_node("news_request_handler", news_request_node)

# Set the entry point
workflow.set_entry_point("intent_recognizer")

# Add conditional edges from the intent recognizer
workflow.add_conditional_edges(
    "intent_recognizer",
    route_intent,
    {
        "general_query_handler": "general_query_handler",
        "news_request_handler": "news_request_handler",
    },
)

# Set end points for the handlers
workflow.add_edge("general_query_handler", END)
workflow.add_edge("news_request_handler", END)

# Compile the graph
app = workflow.compile()

# Chatbot Response Function
def chatbot_response(user_input, news_category="General"):
    """
    Main function to process user input using the LangGraph workflow.
    """
    initial_state = {"user_input": user_input, "intent": "", "response": "", "news_category": news_category}

    # Run the graph directly to get the final state
    final_state = app.invoke(initial_state)

    return final_state.get("response", "I'm sorry, something went wrong and I couldn't process your request.")

# Streamlit UI
st.title("AI Chatbot")

# News Category Selection
news_categories = ['General', 'Technology', 'Politics', 'Sports', 'Business', 'Health', 'Science']
if 'selected_news_category' not in st.session_state:
    st.session_state.selected_news_category = 'General'

st.session_state.selected_news_category = st.selectbox(
    "Select News Category:",
    news_categories,
    index=news_categories.index(st.session_state.selected_news_category),
    key="news_category_selector"
)

# Initialize chat history
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history on app rerun
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if user_query := st.chat_input("Ask me anything:"):
    # Add user message to chat history
    st.session_state.messages.append({"role": "user", "content": user_query})
    # Display user message in chat message container
    with st.chat_message("user"):
        st.markdown(user_query)

    # Generate and display assistant response, passing the selected news category
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = chatbot_response(user_query, st.session_state.selected_news_category)
            st.markdown(response)
    # Add assistant response to chat history
    st.session_state.messages.append({"role": "assistant", "content": response})


Overwriting app.py


## Provide Run Instructions

### Subtask:
Give clear instructions on how to execute the Streamlit application from the Colab environment, including necessary commands for running the app and creating a public URL.


### Instructions
1. Run the `app.py` file using Streamlit, specifying the port 8501. Use the following command in a new code cell:
```bash
!streamlit run app.py --server.port 8501 --server.enableCORS false --server.enableXsrfProtection false
```
2. While the Streamlit app is running, use `npx localtunnel` in **another new code cell** to expose the Streamlit application to the internet, creating a public URL. This command will output a URL that you can click to access your Streamlit app:
```bash
!npx localtunnel --port 8501
```
**Important:** Keep both cells running to maintain the Streamlit app and its public URL.

## Final Task

### Subtask:
Summarize the features of the new Streamlit interface.


## Summary:

### Q&A

**Summarize the features of the new Streamlit interface:**
The new Streamlit interface for the chatbot includes:
*   A user-friendly chat interface where users can input queries.
*   Display of chat history, showing both user inputs and chatbot responses.
*   A loading spinner to indicate when the chatbot is processing a query.
*   A dropdown menu for selecting news categories, allowing users to filter news requests by 'General', 'Technology', 'Politics', 'Sports', 'Business', 'Health', and 'Science'.

### Data Analysis Key Findings

*   The entire chatbot logic, including NLTK downloads, API key initialization, `preprocess_text`, `recognize_intent`, `fetch_news`, `handle_news_request`, `fetch_web_results`, `handle_general_query` functions, `GraphState` definition, LangGraph nodes, workflow setup, compilation, and `chatbot_response` function, was successfully consolidated into a single `app.py` file.
*   The `app.py` file was equipped with a basic Streamlit UI, featuring a title, chat history management (`st.session_state`), a `st.chat_input` for user interaction, and `st.chat_message` for displaying responses, along with a `st.spinner` for indicating processing.
*   A `st.selectbox` UI element was added to allow users to select a news category from a predefined list (e.g., 'General', 'Technology', 'Politics'). This selection is maintained using `st.session_state.selected_news_category`.
*   The LangGraph's `GraphState` was extended to include `news_category`, and the `news_request_node` was modified to pass the user-selected category to the `handle_news_request` function.
*   The `handle_news_request` function was enhanced to incorporate the selected news category into the news query, allowing for category-specific news retrieval.
*   Clear instructions were provided for running the Streamlit application within a Colab environment using `!streamlit run app.py` and for generating a public URL using `!npx localtunnel --port 8501`.

### Insights or Next Steps

*   **Enhance API Key Management**: For production deployment, transition from `userdata.get()` to Streamlit secrets or environment variables for API key management to improve security and deployment flexibility.
*   **Expand News Categories & Filtering**: Consider integrating more sophisticated news category recognition from user input directly, rather than relying solely on a dropdown, or allow for multiple category selections.
