<a href="https://colab.research.google.com/github/AbdullahFaiza/Deep-Learning-Spring-2025/blob/main/CapstoneProject/Colab/FNCode/ReseaAIAgent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PROJECT RESEA - A VIRTUAL RESEARCH AGENT**

# Resea is a Virtual Research Assistant built to automate and streamline research workflows.
# This assistant is capable of:
# 1. Fetching data from the web using a Google search via SerpAPI.
# 2. Summarizing research articles and other content using Hugging Face transformers.
# 3. Generating a professional research report in DOCX and PDF formats, including citations and sources.
# 4. Offering a simple web interface through Gradio for users to interact with.

# The goal of this project is to simplify research tasks, enabling students, professionals, and researchers to efficiently gather, summarize, and report relevant data for any given topic.

# **Pip Install Section:**

In [1]:
# === Install Dependencies ===
# Make sure to install the necessary dependencies before running the notebook.
# These libraries are required for fetching search results, summarizing the content, generating reports and creating a web interface.

# To avoid issues with widgets or notebooks on GitHub, installed the required versions of nbformat and ipywidgets:
!pip install nbformat==4.2.0 ipywidgets

# Install essential libraries
!pip uninstall -y serpapi -q
!pip install -q google-search-results transformers newspaper3k wikipedia python-docx lxml[html_clean] reportlab gradio


[0m

# **Imports and Setup:**

In [2]:
# === Imports ===
# Importing necessary libraries for the project:
import os  # Operating system interface for file operations
import re  # Regular expression for string pattern matching and manipulation
import wikipedia  # Wikipedia API for fetching data from Wikipedia
import gradio as gr  # Gradio for creating a simple web interface
from urllib.parse import urlparse  # Parsing URLs to extract domain
from newspaper import Article  # Newspaper3k to extract articles from URLs
from serpapi import GoogleSearch  # For performing Google search through SerpAPI
from transformers import pipeline  # Huggingface's pipeline for NLP tasks (summarization)
from docx import Document  # Library for creating and saving Word documents
from reportlab.pdfgen import canvas  # Library to create PDF reports


# **API Keys and Configuration:**

In [17]:
# === SerpAPI Key ===
SERPAPI_KEY = "445e154e5df8fa8655d9031c56e672cbdfa5d37cb74d6d021ca54d3fb400df1c"

# **Pip Install Section (continued) - Libraries for Summarization:**

In [18]:
# === Install Libraries for Summarization ===
# To use the Hugging Face summarization pipeline, make sure to have the following libraries installed:
!pip install transformers



# **Summarizing Content Using Hugging Face**

In [19]:
# === Summarizing the Text ===
# Using Hugging Face's transformers library, we utilize the pre-trained BART model to summarize the content.
# This summarization is important to extract key insights from longer content.

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")


Device set to use cpu


# **Main Function to Run the Program:**

In [20]:
# === Wikipedia Functions ===
def get_wikipedia_summary(topic):
    try:
        page = wikipedia.page(topic)
        text = re.sub(r'==.*?==+', '', page.content)
        return text[:3000]
    except Exception as e:
        return f"Wikipedia Error: {str(e)}"

def get_references(topic):
    try:
        page = wikipedia.page(topic)
        refs = page.references[:5]
        scored = []
        for ref in refs:
            domain = urlparse(ref).netloc
            score = 5 if domain.endswith('.edu') or domain.endswith('.gov') else 4 if domain.endswith('.org') else 3
            scored.append(f"{ref} [Credibility Score: {score}/5]")
        return scored
    except:
        return ["No references found."]

In [21]:
# === Web Search & Summarize ===
def get_web_results(topic):
    search = GoogleSearch({"q": topic, "api_key": SERPAPI_KEY, "num": 3})
    results = search.get_dict().get("organic_results", [])
    return [res.get("link") for res in results if res.get("link")]

def summarize_article(url):
    try:
        article = Article(url)
        article.download()
        article.parse()
        if not article.text.strip():
            return f"[Empty article at {url}]"
        text = article.text[:2000]
        summary = summarizer(text, max_length=130, min_length=30, do_sample=False)
        return summary[0]['summary_text']
    except Exception as e:
        return f"[Error summarizing {url}: {str(e)}]"

# **Creating and Saving the Research Report:**

In [22]:
# === Generating the Report ===
# The report is generated in two formats: DOCX and PDF.
# It includes the Wikipedia summary, references, and summaries from web articles.

def generate_research_report(topic):
    summary = get_wikipedia_summary(topic)
    refs = get_references(topic)
    urls = get_web_results(topic)
    web_summaries = [(url, summarize_article(url)) for url in urls]

    # Save as DOCX
    doc = Document()
    doc.add_heading(f"Research Report: {topic}", 0)

    doc.add_heading("Wikipedia Summary", level=1)
    doc.add_paragraph(summary)

    doc.add_heading("Wikipedia References", level=1)
    for ref in refs:
        doc.add_paragraph(ref)

    doc.add_heading("Web Summaries", level=1)
    for url, summ in web_summaries:
        score = 5 if ".edu" in url or ".gov" in url else 4 if ".org" in url else 3
        doc.add_paragraph(f"{url} [Credibility Score: {score}/5]\nSummary: {summ}")

    docx_file = f"{topic.replace(' ', '_')}_Resea_Report.docx"
    doc.save(docx_file)

    # Save as PDF
    pdf_file = f"{topic.replace(' ', '_')}_Resea_Summary.pdf"
    pdf = canvas.Canvas(pdf_file)
    pdf.drawString(50, 800, f"Research Summary: {topic}")
    pdf.drawString(50, 780, summary[:500])
    pdf.save()

    # Return results to UI
    formatted = f"""
📘 Research Report: {topic}

📝 Wikipedia Summary:
{summary}

🔗 References:
{chr(10).join(refs)}

🌐 Web Articles:
"""
    for url, summ in web_summaries:
        score = 5 if ".edu" in url or ".gov" in url else 4 if ".org" in url else 3
        formatted += f"\n- {url} [Credibility Score: {score}/5]\nSummary: {summ[:400]}\n"

    return formatted, docx_file, pdf_file



# **Gradio Interface (for Web Interface):**

In [23]:
# === Gradio Interface ===
# Gradio is used here to create a simple web interface where users can input their research topic,
# click a button to generate a report, and download the generated DOCX and PDF files.

from google.colab import files  # Used for file upload in Colab
files.upload()  # Upload files if needed

with gr.Blocks() as demo:
    with gr.Row():  # Add a row in the layout
        gr.Image("resea_mascot.png", width=180, show_label=False)  # Display an image
    gr.Markdown("""# 🤖 Resea: Virtual Research Assistant
                  Type a research topic below and get a complete report including citations, credibility, and summaries.
                  👉 If viewing inline, [open full app in new tab](https://gradio.live) after clicking below.""")
    topic_input = gr.Textbox(label="Enter Research Topic")  # Textbox to input topic
    generate_btn = gr.Button("Generate Report")  # Button to trigger report generation
    output_text = gr.Textbox(lines=20, label="Formatted Report")  # Textbox to display report
    docx_file = gr.File(label="Download DOCX")  # File input for DOCX download
    pdf_file = gr.File(label="Download PDF")  # File input for PDF download

    # Define what happens when the button is clicked
    generate_btn.click(fn=generate_research_report, inputs=topic_input, outputs=[output_text, docx_file, pdf_file])

    # Instructions for users
    gr.Markdown("After clicking Generate Report, you'll see a download link appear above. To view the full app in a new tab, [click here](https://gradio.live)")

demo.launch(share=True)  # Launch the Gradio interface, allowing sharing via a public link


Saving Resea_Mascot.png to Resea_Mascot (2).png
Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://2d8be2cedb94c07dd4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


