# Searching the query

- **First Problem:** Grounding is very expensive, and other solutions are also costly if not cached.  
- **Second Problem:** I used DuckDuckGo to retrieve search results, but the API provides very short responses that do not fully cover the question.  
- **Solution:** Use BeautifulSoup to extract the full body from the retrieved links and then summarize it using Flash 1.5/2.0.

abhi bhi many problem to think(caching--> easy , rerank -> ? ,

In [1]:
!pip install -U -q "google-generativeai>=0.8.2"

In [None]:
!pip install duckduckgo-search markdown2 backoff


## Initial Query (Using flash)

In [7]:
import os
import json
import google.generativeai as genai
from datetime import datetime

def process_query(user_query):
    # Configure the API
    api_key=""
    genai.configure(api_key=api_key)  # use gemini api key from https://aistudio.google.com/apikey

    # Define system instruction to determine if search is needed
    system_instruction = """
    Today Date = 29-03-2025
    You are an assistant that determines if a query requires internet search. Analyze the query and return a JSON with:
    1. "needs_search": boolean - true if the query requires recent information (after November 2023) or specific facts
    2. "reason": string - brief explanation why search is/isn't needed
    3. "atomic_questions": array - if query is complex, break it into smaller atomic questions (empty if search not needed)

    IMPORTANT: Any query about events, products, news, or data after November 2023 MUST have "needs_search" set to true.
    If you are not familar with the term asked in the question then also turn "needs_search" to true.
    For small query return the original query. For complex and long queries requiring search, break them down into simpler 2-3 sub-questions.
    Whenever someone ask about recent add 2025 in the sub questions.
    """

    # Create the model with appropriate configuration
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
        "response_mime_type": "application/json",
    }

    model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )

    # Start chat and send user query
    chat_session = model.start_chat(history=[])
    response = chat_session.send_message(user_query)

    # Parse the response
    try:
        result = json.loads(response.text)
        return result
    except json.JSONDecodeError:
        # Fallback if response is not valid JSON
        return {
            "needs_search": True,
            "reason": "Failed to parse response, defaulting to search required",
            "atomic_questions": [user_query]
        }

# Example usage
if __name__ == "__main__":

    query = "What are the latest developments in AI, And how it have affected the indians?"
    result = process_query(query)
    print(json.dumps(result, indent=2))

{
  "needs_search": true,
  "reason": "The query asks about the 'latest developments' which requires up-to-date information and the impact on Indians in 2025.",
  "atomic_questions": [
    "What are the latest developments in AI in 2025?",
    "How have these recent AI developments affected Indians in 2025?"
  ]
}


Beautifull UI Thanks to Claude 🙂

In [10]:
import json
import requests
from bs4 import BeautifulSoup
import concurrent.futures
import google.generativeai as genai
from duckduckgo_search import DDGS
import backoff
import time
import os
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
from tqdm.notebook import tqdm
import markdown2  # Added for Markdown rendering


api_key=""   # Replace with your actual API key

genai.configure(api_key=api_key)

# Style definitions for the UI
css_style = """
<style>
    .app-container {
        font-family: 'Roboto', sans-serif;
        max-width: 1000px;
        margin: 0 auto;
        padding: 20px;
        background-color: #f9f9f9;
        border-radius: 10px;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
    }
    .header {
        text-align: center;
        margin-bottom: 20px;
        color: #2c3e50;
    }
    .search-container {
        margin-bottom: 20px;
    }
    .result-container {
        background-color: white;
        padding: 15px;
        border-radius: 8px;
        box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
        margin-top: 20px;
    }
    .answer-box {
        background-color: #e8f4f8;
        padding: 15px;
        border-radius: 8px;
        border-left: 5px solid #3498db;
        margin-top: 10px;
    }
    /* Added styles for Markdown */
    .answer-box h1, .answer-box h2, .answer-box h3 {
        color: #2c3e50;
        margin-top: 10px;
        margin-bottom: 10px;
    }
    .answer-box ul, .answer-box ol {
        margin-left: 20px;
    }
    .answer-box code {
        background-color: #f0f0f0;
        padding: 2px 4px;
        border-radius: 3px;
    }
    .answer-box pre {
        background-color: #f0f0f0;
        padding: 10px;
        border-radius: 5px;
        overflow-x: auto;
    }
    .atomic-container {
        margin-top: 20px;
        padding: 10px;
        background-color: #f5f5f5;
        border-radius: 8px;
    }
    .atomic-question {
        font-weight: bold;
        color: #2980b9;
        margin-top: 10px;
    }
    .summary-item {
        background-color: white;
        padding: 10px;
        margin: 8px 0;
        border-radius: 6px;
        border-left: 3px solid #27ae60;
    }
    .citation {
        font-size: 0.8em;
        color: #7f8c8d;
        margin-top: 5px;
    }
    .progress-container {
        margin-top: 20px;
        text-align: center;
    }
    .status-message {
        margin: 15px 0;
        color: #2c3e50;
        font-style: italic;
    }
    .debug-info {
        font-family: monospace;
        font-size: 0.8em;
        background-color: #f0f0f0;
        padding: 10px;
        border-radius: 5px;
        margin-top: 20px;
        display: none;
    }
    .loader {
        display: inline-block;
        width: 30px;
        height: 30px;
        border: 3px solid rgba(0,0,0,.3);
        border-radius: 50%;
        border-top-color: #3498db;
        animation: spin 1s ease-in-out infinite;
    }
    @keyframes spin {
        to { transform: rotate(360deg); }
    }
    .error-message {
        color: #e74c3c;
        padding: 10px;
        background-color: #fadbd8;
        border-radius: 5px;
        margin: 10px 0;
    }
</style>
"""

# ---------------------------------------------------------------------------
# Core functionality (from your original code, unchanged)
# ---------------------------------------------------------------------------
def get_full_content(url):
    """Fetch the full content of a webpage"""
    try:
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(response.text, 'html.parser')
        paragraphs = soup.find_all('p')
        full_text = ' '.join([p.get_text() for p in paragraphs])
        return full_text
    except Exception as e:
        return f"Error fetching content: {str(e)}"

def search_with_full_content(query, max_results=2):
    """Search and retrieve full content for each result"""
    ddgs = DDGS()
    results = ddgs.text(keywords=query, max_results=max_results)
    enhanced_results = []
    for result in results:
        # Get the full content for each search result using the href.
        result['full_content'] = get_full_content(result['href'])
        enhanced_results.append(result)
    return enhanced_results

@backoff.on_exception(backoff.expo, (requests.exceptions.RequestException, ConnectionError), max_tries=3)
def flash_answer(prompt: str) -> str:
    """
    Calls the flash model with the provided prompt.
    """
    system_instruction = "Answer the below question "
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()

def flash_answer_(prompt: str) -> str:
    """
    Calls the flash model with the provided prompt.
    Adjust system_instruction as needed.
    """
    system_instruction = """You are a helpful agent with access to internet search results with proper citation.
     Answer the Query using those results with proper citation. Format your answer in Markdown."""
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()

def summarize_result(result: dict, atomic_question: str) -> dict:
    """
    Given a search result with full_content, send a prompt to Flash model
    to summarize the full content in 20-50 words in the context of the atomic question.
    """
    href = result.get('href', '')
    full_content = result.get('full_content', '')

    prompt = (
        f"Summarize the following content in 30-60 words, focusing on answering the question: '{atomic_question}'.\n\n"
        f"Content: {full_content}\n\n"
        f"Include the citation (URL) at the end{href}."
    )
    summary = flash_answer(prompt)
    # Store the summary along with the citation (href) and title.
    return {
        "title": result.get("title", ""),
        "href": href,
        "summary": summary
    }

def process_atomic_question(atomic_question: str, progress_callback=None) -> dict:
    """
    For a given atomic question, perform a DuckDuckGo search (with full content)
    and concurrently summarize each search result.
    """
    if progress_callback:
        progress_callback(f"Searching for: {atomic_question}")

    # Here we request two search results per atomic question.
    search_results = search_with_full_content(atomic_question, max_results=2)

    if progress_callback:
        progress_callback(f"Found {len(search_results)} results for: {atomic_question}")

    summaries = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(summarize_result, result, atomic_question)
            for result in search_results
        ]
        for future in concurrent.futures.as_completed(futures):
            summarized = future.result()
            summaries.append(summarized)
            if progress_callback:
                progress_callback(f"Summarized a result for: {atomic_question}")

    return {atomic_question: summaries}

def process_all_atomic_questions(atomic_questions: list, progress_callback=None) -> dict:
    """
    Process all atomic questions concurrently and return a mapping from each atomic
    question to its list of summarized search results.
    """
    atomic_summaries = {}
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_atomic = {
            executor.submit(process_atomic_question, aq, progress_callback): aq
            for aq in atomic_questions
        }
        for future in concurrent.futures.as_completed(future_to_atomic):
            atomic = future_to_atomic[future]
            atomic_summaries.update(future.result())
    return atomic_summaries

def final_answer(original_query: str, atomic_summaries: dict) -> str:
    """
    Construct a prompt that provides all the summarized content (with citations)
    from each atomic question and asks the model to generate a final answer
    to the original query.
    """
    prompt = "Using the following summarized search results (with citations), answer the original query. " \
             "Make sure to include proper citations for each piece of information.\n\n" \
             "Format your answer using Markdown syntax with proper headings, lists, and citation links. " \
             "Answer followed by citation."

    for atomic, summaries in atomic_summaries.items():
        prompt += f"\nAtomic Question: {atomic}\n"
        for item in summaries:
            summary = item.get('summary', '')
            href = item.get('href', '')
            prompt += f"- Summary: {summary}\n"
            prompt += f"  Citation: {href}\n"
        prompt += "\n"

    prompt += f"Original Query: {original_query}"
    return flash_answer_(prompt)

def process_query(user_query):
    """
    Determines if a query requires search and breaks it into atomic questions if needed.
    """
    system_instruction = """
    Today Date = 29-03-2025
    You are an assistant that determines if a query requires internet search. Analyze the query and return a JSON with:
    1. "needs_search": boolean - true if the query requires recent information (after November 2023) or specific facts
    2. "reason": string - brief explanation why search is/isn't needed
    3. "atomic_questions": array - if query is complex, break it into smaller atomic questions (empty if search not needed)

    IMPORTANT: Any query about events, products, news, or data after November 2023 MUST have "needs_search" set to true.
    If you are not familiar with the term asked in the question then also turn "needs_search" to true.
    For small query return the original query. For complex and long queries requiring search, break them down into simpler 2-3 sub-questions.
    Whenever someone asks about recent add 2025 in the sub questions.
    """

    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
        "response_mime_type": "application/json",
    }

    model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )

    chat_session = model.start_chat(history=[])
    response = chat_session.send_message(user_query)

    try:
        result = json.loads(response.text)
        return result
    except json.JSONDecodeError:
        # Fallback if response is not valid JSON
        return {
            "needs_search": True,
            "reason": "Failed to parse response, defaulting to search required",
            "atomic_questions": [user_query]
        }

def process_with_full_search(original_query: str, process_query_result: dict, progress_callback=None) -> dict:
    """
    If search is not required (needs_search is false), directly answer the query
    Otherwise, for each atomic sub-question, perform full search with content extraction and summarization,
    then combine the summaries to answer the original query with citations.
    """
    if not process_query_result.get("needs_search", True):
        if progress_callback:
            progress_callback("Direct answer (no search needed)")
        answer = flash_answer(original_query)
        return {"answer": answer}
    else:
        atomic_questions = process_query_result.get("atomic_questions", [])
        if progress_callback:
            progress_callback(f"Processing {len(atomic_questions)} atomic questions")

        atomic_summaries = process_all_atomic_questions(atomic_questions, progress_callback)

        if progress_callback:
            progress_callback("Generating final answer")

        final_ans = final_answer(original_query, atomic_summaries)
        return {"answer": final_ans, "atomic_summaries": atomic_summaries}

# ---------------------------------------------------------------------------
# Enhanced UI Components with Markdown Support
# ---------------------------------------------------------------------------
def create_search_ui():
    """Create and display the search interface"""
    display(HTML(css_style))

    # App container
    app_html = """
    <div class="app-container">
        <div class="header">
            <h1>AI-Powered Web Search</h1>
            <p>Ask any question to get researched answers with citations</p>
        </div>
    </div>
    """
    display(HTML(app_html))

    # Create widgets
    query_input = widgets.Text(
        description='Query:',
        placeholder='Enter your question here...',
        layout=widgets.Layout(width='80%')
    )

    search_button = widgets.Button(
        description='Search',
        button_style='primary',
        icon='search'
    )

    debug_checkbox = widgets.Checkbox(
        value=False,
        description='Show debug info',
        layout=widgets.Layout(width='auto')
    )

    output_area = widgets.Output()
    status_area = widgets.Output()

    # Display widgets
    display(widgets.HBox([query_input, search_button]))
    display(debug_checkbox)
    display(status_area)
    display(output_area)

    # Progress updates
    def update_status(message):
        with status_area:
            clear_output(wait=True)
            status_html = f'<div class="status-message"><div class="loader"></div> {message}</div>'
            display(HTML(status_html))

    # Handle search button click
    def on_search_button_clicked(b):
        query = query_input.value.strip()
        if not query:
            with output_area:
                clear_output(wait=True)
                display(HTML('<div class="error-message">Please enter a query</div>'))
            return

        with output_area:
            clear_output(wait=True)

        # Process the query
        update_status("Analyzing your query...")
        start_time = time.time()

        try:
            query_analysis = process_query(query)
            show_debug = debug_checkbox.value

            # Display query analysis if debug is enabled
            if show_debug:
                with output_area:
                    analysis_json = json.dumps(query_analysis, indent=2)
                    debug_html = f"""
                    <div class="debug-info" style="display: block;">
                        <h3>Query Analysis:</h3>
                        <pre>{analysis_json}</pre>
                    </div>
                    """
                    display(HTML(debug_html))

            if not query_analysis.get("needs_search", True):
                update_status("Generating direct answer (no search needed)...")
                result = process_with_full_search(query, query_analysis, update_status)

                with output_area:
                    # Convert Markdown to HTML
                    answer_html = markdown2.markdown(result['answer'])
                    time_taken = time.time() - start_time
                    direct_html = f"""
                    <div class="result-container">
                        <h2>Answer</h2>
                        <div class="answer-box">{answer_html}</div>
                        <p><em>Answered directly without search (took {time_taken:.2f} seconds)</em></p>
                    </div>
                    """
                    display(HTML(direct_html))
            else:
                atomic_questions = query_analysis.get("atomic_questions", [])
                update_status(f"Breaking down into {len(atomic_questions)} sub-questions...")

                result = process_with_full_search(query, query_analysis, update_status)

                # Convert Markdown to HTML
                answer_html_content = markdown2.markdown(result['answer'])
                time_taken = time.time() - start_time

                # Build the HTML in parts to avoid nested f-string issues
                answer_html = f"""
                <div class="result-container">
                    <h2>Answer</h2>
                    <div class="answer-box">{answer_html_content}</div>
                """

                if show_debug and "atomic_summaries" in result:
                    answer_html += """
                    <div class="atomic-container">
                        <h3>Research Process:</h3>
                    """

                    for question, summaries in result["atomic_summaries"].items():
                        question_html = f"""
                        <div class="atomic-question">{question}</div>
                        """
                        answer_html += question_html

                        for summary in summaries:
                            summary_text = summary['summary'].replace('\n', '<br>')
                            summary_href = summary['href']
                            summary_title = summary['title'] or summary['href']

                            summary_html = f"""
                            <div class="summary-item">
                                <div>{summary_text}</div>
                                <div class="citation">Source: <a href="{summary_href}" target="_blank">{summary_title}</a></div>
                            </div>
                            """
                            answer_html += summary_html

                    answer_html += "</div>"

                footer_html = f"""
                    <p><em>Researched answer (took {time_taken:.2f} seconds)</em></p>
                </div>
                """
                answer_html += footer_html

                with output_area:
                    display(HTML(answer_html))

        except Exception as e:
            with output_area:
                error_message = str(e)
                error_html = f"""
                <div class="error-message">
                    <h3>Error occurred:</h3>
                    <p>{error_message}</p>
                </div>
                """
                display(HTML(error_html))

        with status_area:
            clear_output()

    search_button.on_click(on_search_button_clicked)

    # Handle Enter key in query input
    def on_enter(widget):
        on_search_button_clicked(None)

    query_input.on_submit(on_enter)

    return {
        'query_input': query_input,
        'search_button': search_button,
        'output_area': output_area,
        'status_area': status_area
    }

# Function to run the app
def run_search_app():
    """Main function to run the search application"""
    # Check if API key is set
    if not api_key or api_key == "YOUR_API_KEY_HERE":
        api_key_warning = """
        <div style="color: red; padding: 10px; background-color: #ffe6e6; border-radius: 5px; margin: 10px 0;">
            <h3>⚠️ API Key Missing</h3>
            <p>Please set your Google AI API key in the code before running.</p>
            <code>api_key = "your_actual_api_key_here"</code>
        </div>
        """
        display(HTML(api_key_warning))
        return

    # Create and display UI
    ui_components = create_search_ui()

    print("👆 Search app is ready to use!")

# Run the app when executed
if __name__ == "__main__" or 'google.colab' in str(get_ipython()):
    run_search_app()

HBox(children=(Text(value='', description='Query:', layout=Layout(width='80%'), placeholder='Enter your questi…

Checkbox(value=False, description='Show debug info', layout=Layout(width='auto'))

Output()

Output()

👆 Search app is ready to use!


INFO:backoff:Backing off flash_answer(...) for 0.9s (requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))


# Not needed

## Query search using duckduck go

Will try this reranker later

[Reranker](https://app.contextual.ai/)




In [4]:
import json
import google.generativeai as genai

def flash_answer(query: str) -> str:
    system_instruction = "Answer the Below question"
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }

    # Create the model
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )

    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(query)
    return response.text

# Provided simple DuckDuckGo search function wrapper.
def search_ddg(
    query: str,
    search_type: str = "search",
    max_results: int = 5,
    timeout: int = 10
) -> list:
    try:
        from duckduckgo_search import DDGS
    except ImportError:
        print("Error: 'duckduckgo-search' package not installed!")
        print("Install it with: pip install duckduckgo-search")
        return []
    ddgs = DDGS(timeout=timeout)
    if search_type.lower() == "news":
        return ddgs.news(keywords=query, max_results=max_results)
    else:
        return ddgs.text(keywords=query, max_results=max_results)

def simple_ddg_search(
    query: str,
    search_type: str = "search",
    max_results: int = 2,
    return_json: bool = False
):
    results = search_ddg(query, search_type, max_results)
    if return_json:
        return json.dumps(results, indent=2)
    return results

# Main function that uses the result from process_query to decide the next step.
def process_result(result: dict, user_query: str):
    """
    If search is not needed, answer the user's original query using flash 1.5.
    Otherwise, call simple_ddg_search on each atomic question and return a dictionary
    where keys are atomic questions and values are the search results.
    """
    if not result.get("needs_search", True):
        # No search required; answer directly with flash 1.5.
        answer = flash_answer(user_query)
        return {"answer": answer}
    else:
        # Search is needed; iterate over atomic questions.
        search_results = {}
        for atomic in result.get("atomic_questions", []):
            # Perform DDG search for each atomic sub-question.
            atomic_result = simple_ddg_search(query=atomic)
            search_results[atomic] = atomic_result
        return search_results

# Example usage:
if __name__ == "__main__":

  # Example original query.
    query = "What are the latest developments in AI, And how it have affected the indians?"
    process_query_output = process_query(query)
    print(process_query_output)
    final_output = process_result(process_query_output, query)
    print(json.dumps(final_output, indent=2))


{'needs_search': True, 'reason': "The query asks about 'latest developments' which requires up-to-date information and the effect of AI on Indians in 2025.", 'atomic_questions': ['What are the latest developments in AI in 2025?', 'How have the latest AI developments affected Indians in 2025?']}
{
  "What are the latest developments in AI in 2025?": [
    {
      "title": "The 10 Biggest AI Trends Of 2025 Everyone Must Be Ready For Today - Forbes",
      "href": "https://www.forbes.com/sites/bernardmarr/2024/09/24/the-10-biggest-ai-trends-of-2025-everyone-must-be-ready-for-today/",
      "body": "Discover the 10 major AI trends set to reshape 2025: from augmented working and real-time decision-making to advanced AI legislation and sustainable AI initiatives."
    },
    {
      "title": "Future of AI in 2025 [Top Trends and Predictions] - GeeksforGeeks",
      "href": "https://www.geeksforgeeks.org/future-of-ai/",
      "body": "Artificial Intelligence (AI) is changing how industries wo

In [None]:
pip install backoff markdown2

In [None]:
import json
import requests
from bs4 import BeautifulSoup
import concurrent.futures
import google.generativeai as genai
from duckduckgo_search import DDGS
import backoff

# ---------------------------------------------------------------------------
# Provided functions for fetching full content and DuckDuckGo search
# ---------------------------------------------------------------------------
def get_full_content(url):
    """Fetch the full content of a webpage"""
    try:
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(response.text, 'html.parser')
        paragraphs = soup.find_all('p')
        full_text = ' '.join([p.get_text() for p in paragraphs])
        return full_text
    except Exception as e:
        return f"Error fetching content: {str(e)}"

def search_with_full_content(query, max_results=2):
    """Search and retrieve full content for each result"""
    ddgs = DDGS()
    results = ddgs.text(keywords=query, max_results=max_results)
    enhanced_results = []
    for result in results:
        # Get the full content for each search result using the href.
        result['full_content'] = get_full_content(result['href'])
        enhanced_results.append(result)
    return enhanced_results

# ---------------------------------------------------------------------------
# Flash 1.5 helper functions
# ---------------------------------------------------------------------------
@backoff.on_exception(backoff.expo, (requests.exceptions.RequestException, ConnectionError), max_tries=3)  # Apply backoff decorator

def flash_answer(prompt: str) -> str:
    """
    Calls the flash 1.5 model with the provided prompt.
    Adjust system_instruction as needed.
    """
    system_instruction = "Answer the below question "
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()




def flash_answer_(prompt: str) -> str:
    """
    Calls the flash 1.5 model with the provided prompt.
    Adjust system_instruction as needed.
    """
    system_instruction = """You are an helpful agent which have access to internet also you will be provided search result with proper citation.
     Answer the Query Using those result with proper citation."""
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()

# ---------------------------------------------------------------------------
# Concurrent summarization for each atomic sub-question
# ---------------------------------------------------------------------------
def summarize_result(result: dict, atomic_question: str) -> dict:
    """
    Given a search result with full_content, send a prompt to Flash 1.5
    to summarize the full content in 20-50 words in the context of the atomic question.
    """
    prompt = (
        f"Summarize the following content in 30-60 words, focusing on answering the question: '{atomic_question}'.\n\n"
        f"Content: {result.get('full_content', '')}\n\n"
        f"Include the citation (URL) at the end{result.get('href', '')}."
    )
    summary = flash_answer(prompt)
    # Store the summary along with the citation (href) and title.
    return {
        "title": result.get("title", ""),
        "href": result.get("href", ""),
        "summary": summary
    }

def process_atomic_question(atomic_question: str) -> dict:
    """
    For a given atomic question, perform a DuckDuckGo search (with full content)
    and concurrently summarize each search result.
    """
    # Here we request two search results per atomic question.
    search_results = search_with_full_content(atomic_question, max_results=2)
    summaries = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(summarize_result, result, atomic_question)
            for result in search_results
        ]
        for future in concurrent.futures.as_completed(futures):
            summarized = future.result()
            summaries.append(summarized)
    return {atomic_question: summaries}

def process_all_atomic_questions(atomic_questions: list) -> dict:
    """
    Process all atomic questions concurrently and return a mapping from each atomic
    question to its list of summarized search results.
    """
    atomic_summaries = {}
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_atomic = {
            executor.submit(process_atomic_question, aq): aq
            for aq in atomic_questions
        }
        for future in concurrent.futures.as_completed(future_to_atomic):
            atomic = future_to_atomic[future]
            atomic_summaries.update(future.result())
    return atomic_summaries

# ---------------------------------------------------------------------------
# Compose final answer using summarized search results
# ---------------------------------------------------------------------------
def final_answer(original_query: str, atomic_summaries: dict) -> str:
    """
    Construct a prompt that provides all the summarized content (with citations)
    from each atomic question and asks Flash 1.5 to generate a final answer
    to the original query.
    """
    prompt = "Using the following summarized search results (with citations), answer the original query  " \
             "Make sure to include proper citations for each piece of information.\n\n Give the answer in proper structure. Answer followed by citation"
    for atomic, summaries in atomic_summaries.items():
        prompt += f"Atomic Question: {atomic}\n"
        for item in summaries:
            prompt += f"- Summary: {item.get('summary')}\n"
            prompt += f"  Citation: {item.get('href')}\n"
        prompt += "\n"
    prompt += f"Original Query: {original_query}"
    print(prompt)
    return flash_answer_(prompt)

# ---------------------------------------------------------------------------
# Main function to process the output from the initial process_query
# ---------------------------------------------------------------------------
def process_with_full_search(original_query: str, process_query_result: dict) -> dict:
    """
    If search is not required (needs_search is false), directly answer the query using flash 2.0
    Otherwise, for each atomic sub-question, perform full search with content extraction and summarization,
    then combine the summaries to answer the original query with citations.
    """
    if not process_query_result.get("needs_search", True):
        answer = flash_answer(original_query)
        return {"answer": answer}
    else:
        atomic_questions = process_query_result.get("atomic_questions", [])
        atomic_summaries = process_all_atomic_questions(atomic_questions)
        final_ans = final_answer(original_query, atomic_summaries)
        return {"answer": final_ans, "atomic_summaries": atomic_summaries}

# ---------------------------------------------------------------------------
# Example usage
# ---------------------------------------------------------------------------
if __name__ == "__main__":
    # Example output from process_query

    original_query = "what are the latest news of today which can come in upsc"
    process_query_output = process_query(original_query)

    final_output = process_with_full_search(original_query, process_query_output)
    print(json.dumps(final_output, indent=2))


In [None]:
print(final_output["answer"])

## Search with beautiful UI thanks to claude 😎

In [24]:
# !pip install google-generativeai duckduckgo-search beautifulsoup4 backoff ipywidgets requests tqdm ipython-autotime markdown2

import json
import requests
from bs4 import BeautifulSoup
import concurrent.futures
import google.generativeai as genai
from duckduckgo_search import DDGS
import backoff
import time
import os
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
from tqdm.notebook import tqdm
import markdown2  # Added for Markdown rendering

  # Replace with your actual API key

genai.configure(api_key=api_key)

# Style definitions for the UI
css_style = """
<style>
    .app-container {
        font-family: 'Roboto', sans-serif;
        max-width: 1000px;
        margin: 0 auto;
        padding: 20px;
        background-color: #f9f9f9;
        border-radius: 10px;
        box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
    }
    .header {
        text-align: center;
        margin-bottom: 20px;
        color: #2c3e50;
    }
    .search-container {
        margin-bottom: 20px;
    }
    .result-container {
        background-color: white;
        padding: 15px;
        border-radius: 8px;
        box-shadow: 0 2px 4px rgba(0, 0, 0, 0.05);
        margin-top: 20px;
    }
    .answer-box {
        background-color: #e8f4f8;
        padding: 15px;
        border-radius: 8px;
        border-left: 5px solid #3498db;
        margin-top: 10px;
    }
    /* Added styles for Markdown */
    .answer-box h1, .answer-box h2, .answer-box h3 {
        color: #2c3e50;
        margin-top: 10px;
        margin-bottom: 10px;
    }
    .answer-box ul, .answer-box ol {
        margin-left: 20px;
    }
    .answer-box code {
        background-color: #f0f0f0;
        padding: 2px 4px;
        border-radius: 3px;
    }
    .answer-box pre {
        background-color: #f0f0f0;
        padding: 10px;
        border-radius: 5px;
        overflow-x: auto;
    }
    .atomic-container {
        margin-top: 20px;
        padding: 10px;
        background-color: #f5f5f5;
        border-radius: 8px;
    }
    .atomic-question {
        font-weight: bold;
        color: #2980b9;
        margin-top: 10px;
    }
    .summary-item {
        background-color: white;
        padding: 10px;
        margin: 8px 0;
        border-radius: 6px;
        border-left: 3px solid #27ae60;
    }
    .citation {
        font-size: 0.8em;
        color: #7f8c8d;
        margin-top: 5px;
    }
    .progress-container {
        margin-top: 20px;
        text-align: center;
    }
    .status-message {
        margin: 15px 0;
        color: #2c3e50;
        font-style: italic;
    }
    .debug-info {
        font-family: monospace;
        font-size: 0.8em;
        background-color: #f0f0f0;
        padding: 10px;
        border-radius: 5px;
        margin-top: 20px;
        display: none;
    }
    .loader {
        display: inline-block;
        width: 30px;
        height: 30px;
        border: 3px solid rgba(0,0,0,.3);
        border-radius: 50%;
        border-top-color: #3498db;
        animation: spin 1s ease-in-out infinite;
    }
    @keyframes spin {
        to { transform: rotate(360deg); }
    }
    .error-message {
        color: #e74c3c;
        padding: 10px;
        background-color: #fadbd8;
        border-radius: 5px;
        margin: 10px 0;
    }
</style>
"""

# ---------------------------------------------------------------------------
# Core functionality (from your original code, unchanged)
# ---------------------------------------------------------------------------
def get_full_content(url):
    """Fetch the full content of a webpage"""
    try:
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(response.text, 'html.parser')
        paragraphs = soup.find_all('p')
        full_text = ' '.join([p.get_text() for p in paragraphs])
        return full_text
    except Exception as e:
        return f"Error fetching content: {str(e)}"

def search_with_full_content(query, max_results=2):
    """Search and retrieve full content for each result"""
    ddgs = DDGS()
    results = ddgs.text(keywords=query, max_results=max_results)
    enhanced_results = []
    for result in results:
        # Get the full content for each search result using the href.
        result['full_content'] = get_full_content(result['href'])
        enhanced_results.append(result)
    return enhanced_results

@backoff.on_exception(backoff.expo, (requests.exceptions.RequestException, ConnectionError), max_tries=3)
def flash_answer(prompt: str) -> str:
    """
    Calls the flash model with the provided prompt.
    """
    system_instruction = "Answer the below question "
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()

def flash_answer_(prompt: str) -> str:
    """
    Calls the flash model with the provided prompt.
    Adjust system_instruction as needed.
    """
    system_instruction = """You are a helpful agent with access to internet search results with proper citation.
     Answer the Query using those results with proper citation. Format your answer in Markdown."""
    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    }
    flash_model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )
    chat_session = flash_model.start_chat(history=[])
    response = chat_session.send_message(prompt)
    return response.text.strip()

def summarize_result(result: dict, atomic_question: str) -> dict:
    """
    Given a search result with full_content, send a prompt to Flash model
    to summarize the full content in 20-50 words in the context of the atomic question.
    """
    href = result.get('href', '')
    full_content = result.get('full_content', '')

    prompt = (
        f"Summarize the following content in 30-60 words, focusing on answering the question: '{atomic_question}'.\n\n"
        f"Content: {full_content}\n\n"
        f"Include the citation (URL) at the end{href}."
    )
    summary = flash_answer(prompt)
    # Store the summary along with the citation (href) and title.
    return {
        "title": result.get("title", ""),
        "href": href,
        "summary": summary
    }

def process_atomic_question(atomic_question: str, progress_callback=None) -> dict:
    """
    For a given atomic question, perform a DuckDuckGo search (with full content)
    and concurrently summarize each search result.
    """
    if progress_callback:
        progress_callback(f"Searching for: {atomic_question}")

    # Here we request two search results per atomic question.
    search_results = search_with_full_content(atomic_question, max_results=2)

    if progress_callback:
        progress_callback(f"Found {len(search_results)} results for: {atomic_question}")

    summaries = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = [
            executor.submit(summarize_result, result, atomic_question)
            for result in search_results
        ]
        for future in concurrent.futures.as_completed(futures):
            summarized = future.result()
            summaries.append(summarized)
            if progress_callback:
                progress_callback(f"Summarized a result for: {atomic_question}")

    return {atomic_question: summaries}

def process_all_atomic_questions(atomic_questions: list, progress_callback=None) -> dict:
    """
    Process all atomic questions concurrently and return a mapping from each atomic
    question to its list of summarized search results.
    """
    atomic_summaries = {}
    with concurrent.futures.ThreadPoolExecutor() as executor:
        future_to_atomic = {
            executor.submit(process_atomic_question, aq, progress_callback): aq
            for aq in atomic_questions
        }
        for future in concurrent.futures.as_completed(future_to_atomic):
            atomic = future_to_atomic[future]
            atomic_summaries.update(future.result())
    return atomic_summaries

def final_answer(original_query: str, atomic_summaries: dict) -> str:
    """
    Construct a prompt that provides all the summarized content (with citations)
    from each atomic question and asks the model to generate a final answer
    to the original query.
    """
    prompt = "Using the following summarized search results (with citations), answer the original query. " \
             "Make sure to include proper citations for each piece of information.\n\n" \
             "Format your answer using Markdown syntax with proper headings, lists, and citation links. " \
             "Answer followed by citation."

    for atomic, summaries in atomic_summaries.items():
        prompt += f"\nAtomic Question: {atomic}\n"
        for item in summaries:
            summary = item.get('summary', '')
            href = item.get('href', '')
            prompt += f"- Summary: {summary}\n"
            prompt += f"  Citation: {href}\n"
        prompt += "\n"

    prompt += f"Original Query: {original_query}"
    return flash_answer_(prompt)

def process_query(user_query):
    """
    Determines if a query requires search and breaks it into atomic questions if needed.
    """
    system_instruction = """
    Today Date = 29-03-2025
    You are an assistant that determines if a query requires internet search. Analyze the query and return a JSON with:
    1. "needs_search": boolean - true if the query requires recent information (after November 2023) or specific facts
    2. "reason": string - brief explanation why search is/isn't needed
    3. "atomic_questions": array - if query is complex, break it into smaller atomic questions (empty if search not needed)

    IMPORTANT: Any query about events, products, news, or data after November 2023 MUST have "needs_search" set to true.
    If you are not familiar with the term asked in the question then also turn "needs_search" to true.
    For small query return the original query. For complex and long queries requiring search, break them down into simpler 2-3 sub-questions.
    Whenever someone asks about recent add 2025 in the sub questions.
    """

    generation_config = {
        "temperature": 0.2,
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
        "response_mime_type": "application/json",
    }

    model = genai.GenerativeModel(
        model_name="gemini-2.0-flash-001",
        generation_config=generation_config,
        system_instruction=system_instruction,
    )

    chat_session = model.start_chat(history=[])
    response = chat_session.send_message(user_query)

    try:
        result = json.loads(response.text)
        return result
    except json.JSONDecodeError:
        # Fallback if response is not valid JSON
        return {
            "needs_search": True,
            "reason": "Failed to parse response, defaulting to search required",
            "atomic_questions": [user_query]
        }

def process_with_full_search(original_query: str, process_query_result: dict, progress_callback=None) -> dict:
    """
    If search is not required (needs_search is false), directly answer the query
    Otherwise, for each atomic sub-question, perform full search with content extraction and summarization,
    then combine the summaries to answer the original query with citations.
    """
    if not process_query_result.get("needs_search", True):
        if progress_callback:
            progress_callback("Direct answer (no search needed)")
        answer = flash_answer(original_query)
        return {"answer": answer}
    else:
        atomic_questions = process_query_result.get("atomic_questions", [])
        if progress_callback:
            progress_callback(f"Processing {len(atomic_questions)} atomic questions")

        atomic_summaries = process_all_atomic_questions(atomic_questions, progress_callback)

        if progress_callback:
            progress_callback("Generating final answer")

        final_ans = final_answer(original_query, atomic_summaries)
        return {"answer": final_ans, "atomic_summaries": atomic_summaries}

# ---------------------------------------------------------------------------
# Enhanced UI Components with Markdown Support
# ---------------------------------------------------------------------------
def create_search_ui():
    """Create and display the search interface"""
    display(HTML(css_style))

    # App container
    app_html = """
    <div class="app-container">
        <div class="header">
            <h1>AI-Powered Web Search</h1>
            <p>Ask any question to get researched answers with citations</p>
        </div>
    </div>
    """
    display(HTML(app_html))

    # Create widgets
    query_input = widgets.Text(
        description='Query:',
        placeholder='Enter your question here...',
        layout=widgets.Layout(width='80%')
    )

    search_button = widgets.Button(
        description='Search',
        button_style='primary',
        icon='search'
    )

    debug_checkbox = widgets.Checkbox(
        value=False,
        description='Show debug info',
        layout=widgets.Layout(width='auto')
    )

    output_area = widgets.Output()
    status_area = widgets.Output()

    # Display widgets
    display(widgets.HBox([query_input, search_button]))
    display(debug_checkbox)
    display(status_area)
    display(output_area)

    # Progress updates
    def update_status(message):
        with status_area:
            clear_output(wait=True)
            status_html = f'<div class="status-message"><div class="loader"></div> {message}</div>'
            display(HTML(status_html))

    # Handle search button click
    def on_search_button_clicked(b):
        query = query_input.value.strip()
        if not query:
            with output_area:
                clear_output(wait=True)
                display(HTML('<div class="error-message">Please enter a query</div>'))
            return

        with output_area:
            clear_output(wait=True)

        # Process the query
        update_status("Analyzing your query...")
        start_time = time.time()

        try:
            query_analysis = process_query(query)
            show_debug = debug_checkbox.value

            # Display query analysis if debug is enabled
            if show_debug:
                with output_area:
                    analysis_json = json.dumps(query_analysis, indent=2)
                    debug_html = f"""
                    <div class="debug-info" style="display: block;">
                        <h3>Query Analysis:</h3>
                        <pre>{analysis_json}</pre>
                    </div>
                    """
                    display(HTML(debug_html))

            if not query_analysis.get("needs_search", True):
                update_status("Generating direct answer (no search needed)...")
                result = process_with_full_search(query, query_analysis, update_status)

                with output_area:
                    # Convert Markdown to HTML
                    answer_html = markdown2.markdown(result['answer'])
                    time_taken = time.time() - start_time
                    direct_html = f"""
                    <div class="result-container">
                        <h2>Answer</h2>
                        <div class="answer-box">{answer_html}</div>
                        <p><em>Answered directly without search (took {time_taken:.2f} seconds)</em></p>
                    </div>
                    """
                    display(HTML(direct_html))
            else:
                atomic_questions = query_analysis.get("atomic_questions", [])
                update_status(f"Breaking down into {len(atomic_questions)} sub-questions...")

                result = process_with_full_search(query, query_analysis, update_status)

                # Convert Markdown to HTML
                answer_html_content = markdown2.markdown(result['answer'])
                time_taken = time.time() - start_time

                # Build the HTML in parts to avoid nested f-string issues
                answer_html = f"""
                <div class="result-container">
                    <h2>Answer</h2>
                    <div class="answer-box">{answer_html_content}</div>
                """

                if show_debug and "atomic_summaries" in result:
                    answer_html += """
                    <div class="atomic-container">
                        <h3>Research Process:</h3>
                    """

                    for question, summaries in result["atomic_summaries"].items():
                        question_html = f"""
                        <div class="atomic-question">{question}</div>
                        """
                        answer_html += question_html

                        for summary in summaries:
                            summary_text = summary['summary'].replace('\n', '<br>')
                            summary_href = summary['href']
                            summary_title = summary['title'] or summary['href']

                            summary_html = f"""
                            <div class="summary-item">
                                <div>{summary_text}</div>
                                <div class="citation">Source: <a href="{summary_href}" target="_blank">{summary_title}</a></div>
                            </div>
                            """
                            answer_html += summary_html

                    answer_html += "</div>"

                footer_html = f"""
                    <p><em>Researched answer (took {time_taken:.2f} seconds)</em></p>
                </div>
                """
                answer_html += footer_html

                with output_area:
                    display(HTML(answer_html))

        except Exception as e:
            with output_area:
                error_message = str(e)
                error_html = f"""
                <div class="error-message">
                    <h3>Error occurred:</h3>
                    <p>{error_message}</p>
                </div>
                """
                display(HTML(error_html))

        with status_area:
            clear_output()

    search_button.on_click(on_search_button_clicked)

    # Handle Enter key in query input
    def on_enter(widget):
        on_search_button_clicked(None)

    query_input.on_submit(on_enter)

    return {
        'query_input': query_input,
        'search_button': search_button,
        'output_area': output_area,
        'status_area': status_area
    }

# Function to run the app
def run_search_app():
    """Main function to run the search application"""
    # Check if API key is set
    if not api_key or api_key == "YOUR_API_KEY_HERE":
        api_key_warning = """
        <div style="color: red; padding: 10px; background-color: #ffe6e6; border-radius: 5px; margin: 10px 0;">
            <h3>⚠️ API Key Missing</h3>
            <p>Please set your Google AI API key in the code before running.</p>
            <code>api_key = "your_actual_api_key_here"</code>
        </div>
        """
        display(HTML(api_key_warning))
        return

    # Create and display UI
    ui_components = create_search_ui()

    print("👆 Search app is ready to use!")

# Run the app when executed
if __name__ == "__main__" or 'google.colab' in str(get_ipython()):
    run_search_app()

HBox(children=(Text(value='', description='Query:', layout=Layout(width='80%'), placeholder='Enter your questi…

Checkbox(value=False, description='Show debug info', layout=Layout(width='auto'))

Output()

Output()

👆 Search app is ready to use!


INFO:backoff:Backing off flash_answer(...) for 0.4s (requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))


#not needed

## Test

In [None]:
import json
from typing import List, Dict, Optional, Union

def search_ddg(
    query: str,
    search_type: str = "search",
    max_results: int = 5,
    timeout: int = 10
) -> List[Dict]:
    """
    A simple function to search DuckDuckGo and return results.

    Args:
        query (str): The search query
        search_type (str): Type of search - "search" or "news"
        max_results (int): Maximum number of results to return
        timeout (int): Request timeout in seconds

    Returns:
        List[Dict]: List of search results as dictionaries
    """
    try:
        from duckduckgo_search import DDGS
    except ImportError:
        print("Error: 'duckduckgo-search' package not installed!")
        print("Install it with: pip install duckduckgo-search")
        return []

    # Create DDGS instance
    ddgs = DDGS(timeout=timeout)

    # Perform the search based on search_type
    if search_type.lower() == "news":
        return ddgs.news(keywords=query, max_results=max_results)
    else:
        return ddgs.text(keywords=query, max_results=max_results)


def simple_ddg_search(
    query: str,
    search_type: str = "search",
    max_results: int = 5,
    return_json: bool = False
) -> Union[List[Dict], str]:
    """
    Even simpler wrapper function for DuckDuckGo searches.

    Args:
        query (str): What to search for
        search_type (str): "search" or "news"
        max_results (int): How many results to return
        return_json (bool): If True, returns JSON string instead of Python list

    Returns:
        Union[List[Dict], str]: Search results as list of dicts or JSON string
    """
    results = search_ddg(query, search_type, max_results)

    if return_json:
        return json.dumps(results, indent=2)
    return results


# Example usage:
if __name__ == "__main__":
    # Example 1: Basic search returning Python objects
    results = simple_ddg_search("Government schemes leveraging NGO networks for implementation in india", max_results=10)
    print("\nSearch Results:")
    for i, result in enumerate(results, 1):
        print(result)
    # Example 2: News search with JSON output
    news_json = simple_ddg_search("Government schemes leveraging NGO networks for implementation in india  ",search_type= "news", return_json=True)

    print(f"\nNews Results (JSON):\n{news_json}")

In [16]:
import os
import json
import google.generativeai as genai
from datetime import datetime

def process_query_(user_query):
    # Configure the API
    genai.configure(api_key="AIzaSyDotanrPA6I0H1WZNPlL1e50gtV-oMR2WM")

    # Define system instruction to determine if search is needed
    system_instruction_ = """
summmerise based on the question given
    """

    # Create the model with appropriate configuration
    generation_config = {
        "temperature": 0.2,  # Lower temperature for more deterministic results
        "top_p": 0.95,
        "top_k": 40,
        "max_output_tokens": 8192,
    #    "response_mime_type": "application/json",
    }

    model = genai.GenerativeModel(
        model_name="gemini-1.5-flash-002",
        generation_config=generation_config,
        system_instruction=system_instruction_,
    )

    # Start chat and send user query
    chat_session = model.start_chat(history=[])
    response = chat_session.send_message(user_query)

    # Parse the response
    try:
        result = json.loads(response.text)
        return result
    except json.JSONDecodeError:
        # Fallback if response is not valid JSON
        return response.text
# Example usage
if __name__ == "__main__":
    query = answer
    result = process_query_(query)
    print(json.dumps(result, indent=2))

"Child labor remains a significant problem globally, particularly in Africa and South America.  South Sudan has the highest rate (48%), followed by Ethiopia (45%), Burkina Faso (42%), and Cameroon and Chad (both 39%).  While many countries have high rates, some like Barbados and Saint Lucia report very low rates (1%).  The data shows a disparity in child labor participation between genders in several countries.\n"


## Bring full content

In [13]:
import requests
from bs4 import BeautifulSoup
from duckduckgo_search import DDGS

def get_full_content(url):
    """Fetch the full content of a webpage"""
    try:
        response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
        soup = BeautifulSoup(response.text, 'html.parser')

        # Basic extraction - you might need to refine this based on site structure
        paragraphs = soup.find_all('p')
        full_text = ' '.join([p.get_text() for p in paragraphs])
        return full_text
    except Exception as e:
        return f"Error fetching content: {str(e)}"

def search_with_full_content(query, max_results=1):
    """Search and retrieve full content for each result"""
    ddgs = DDGS()
    results = ddgs.text(keywords=query, max_results=max_results)

    enhanced_results = []
    for result in results:
        # Add full content to each result
        result['full_content'] = get_full_content(result['href'])
        enhanced_results.append(result)

    return enhanced_results

In [30]:
a

[{'title': 'Child Labor by Country 2025 - World Population Review',
  'href': 'https://worldpopulationreview.com/country-rankings/child-labor-by-country',
  'body': 'Child labor is a controversial practice and is often illegal in many countries. It involves using children between the ages of 5 and 17 years old who are used for labor in a commercial or business setting.',
  'full_content': '48%  45%  42%  39%  39%  Child labor is a controversial practice and is often illegal in many countries. It involves using children between the ages of 5 and 17 years old who are used for labor in a commercial or business setting. However, just because it is frowned upon, that doesnâ\x80\x99t mean that it isnâ\x80\x99t still a popular practice around the globe. In this article, we will take a closer look at which countries still rely heavily on child labor and how the numbers stack up among one another. The majority of the countries that still participate in child labor practices are located in eithe

[Function Calling](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/function-calling/intro_function_calling.ipynb)

ChromaDb