<a href="https://colab.research.google.com/github/Iffraah96/Deep-Learning-AI-ITAI-2376-/blob/main/Virtual_AI_Research_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**MODULE 1: Input Processor**

Classify user input into one (or more) of the following intents:

1. "search": Find papers based on a topic.
2. "summarize": Summarize a given paper or abstract.
3. "connect": Find relationships between sources.
4. "cite": Generate APA/MLA citations.

In [1]:
import re

def input_processor(user_input):
    user_input = user_input.lower().strip()

    tasks = {
        "search": [
            r'\b(find|search|get|look for|collect|gather|explore)\b',
            r'\b(paper|article|source|research|study|topic)\b'
        ],
        "summarize": [
            r'\b(summarize|summarise|summary|give me a summary|what is this about)\b'
        ],
        "connect": [
            r'\b(connection|relationship|link|relate|association)\b',
            r'\bhow (are|do)\b.*\b(related|connected|link)\b'
        ],
        "cite": [
            r'\b(cite|citation|apa|mla|format|how do i cite|reference)\b'
        ]
    }

    detected = []
    for task, patterns in tasks.items():
        for pattern in patterns:
            if re.search(pattern, user_input):
                detected.append(task)
                break  # no need to match more patterns for same task

    if not detected:
        detected = ["search"]  # fallback

    return detected

#--------------------------------------------------------------------------------------------------------------
#-------------------------------------------Example------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------
print(input_processor("Can you help me find research on AI in education?"))

print(input_processor("Summarize this article on bias in facial recognition."))

print(input_processor("What's the connection between these two papers?"))

print(input_processor("Format this citation in APA style"))


['search']
['search', 'summarize']
['connect']
['cite']


**MODULE 2**: **Search Tool — Academic Paper Search via Semantic Scholar API**


Search for academic papers based on a topic using the Semantic Scholar API, and return:
1. Title
2. Abstract
3. Authors
4. Year
5. URL

In [2]:
import requests

def search_semantic_scholar(query, limit=3):
    base_url = "https://api.semanticscholar.org/graph/v1/paper/search"
    params = {
        "query": query,
        "limit": limit,
        "fields": "title,abstract,authors,year,url"
    }

    try:
        response = requests.get(base_url, params=params)
        response.raise_for_status()  # Raise an exception for HTTP errors

        data = response.json()

        if data is None or "data" not in data:
            print("⚠️ Unexpected response format:")
            print(response.text)
            return {"error": "Unexpected API response. Please try again later."}

        results = []
        for item in data["data"]:
            paper = {
                "title": item.get("title", "N/A"),
                "abstract": item.get("abstract", "No abstract available."),
                "authors": [a.get('name', 'Unknown') for a in item.get("authors", [])],
                "year": item.get("year", "N/A"),
                "url": item.get("url", "N/A")
            }
            results.append(paper)

        if not results:
            return {"error": "No papers found. Try a more specific query."}

        return {"papers": results}

    except requests.exceptions.RequestException as e:
        return {"error": f"Search failed due to a connection issue: {str(e)}"}

    except Exception as e:
        return {"error": f"An unexpected error occurred: {str(e)}"}


#--------------------------------------------------------------------------------------------------------------
#-------------------------------------------Example------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------
query = "AI in Finance"
result = search_semantic_scholar(query)

if "papers" in result:
    for i, paper in enumerate(result["papers"]):
        print(f"\n📘 Paper {i+1}")
        print("Title:", paper["title"])
        print("Year:", paper["year"])
        print("Authors:", ", ".join(paper["authors"]))
        print("Abstract:", paper["abstract"][:300], "...")
        print("URL:", paper["url"])
else:
    print("❌", result["error"])



❌ Search failed due to a connection issue: 429 Client Error:  for url: https://api.semanticscholar.org/graph/v1/paper/search?query=AI+in+Finance&limit=3&fields=title%2Cabstract%2Cauthors%2Cyear%2Curl


**MODULE 3: Summarizer Tool**

Using HuggingFace Transformers


Automatically summarize long abstracts or full texts into short, clear summaries using a pre-trained language model.

In [3]:
#Install & Import Libraries (Colab)
!pip install transformers sentencepiece --quiet
from transformers import pipeline

# Initialize Summarization Model
summarizer = pipeline("summarization", model="sshleifer/distilbart-cnn-12-6")

#Define the Summarization Tool
def summarize_text(text, max_input_tokens=512):
    if not text or len(text.strip()) < 30:
        return "⚠️ Not enough content to summarize."

    try:
        input_text = text.strip().replace("\n", " ")
        input_text = input_text[:max_input_tokens]

        input_length = len(input_text.split())
        adjusted_max_length = max(30, int(input_length * 0.7))  # shorter than input

        summary = summarizer(
            input_text,
            max_length=adjusted_max_length,
            min_length=max(20, int(adjusted_max_length * 0.6)),
            do_sample=False
        )
        return summary[0]['summary_text']

    except Exception as e:
        return f"❌ Summarization failed: {str(e)}"


#--------------------------------------------------------------------------------------------------------------
#-------------------------------------------Example------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------
results = search_semantic_scholar("AI in Finance", limit=2)

if "papers" in results:
    for paper in results["papers"]:
        print(f"\n📘 {paper['title']}")
        summary = summarize_text(paper['abstract'])
        print("🔍 Summary:", summary)
else:
    print("❌", results["error"])



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


❌ Search failed due to a connection issue: 429 Client Error:  for url: https://api.semanticscholar.org/graph/v1/paper/search?query=AI+in+Finance&limit=2&fields=title%2Cabstract%2Cauthors%2Cyear%2Curl


**MODULE 4: Memory Buffer - Store and retrieve**

Papers found by the Search Tool

Summaries created by the Summarizer Tool

Any connections or citations

User feedback (for RL-like learning later)

In [4]:
from datetime import datetime
import uuid

class MemoryBuffer:
    def __init__(self):
        self.memory = {}  # session-based storage

    def _generate_id(self):
        import uuid
        return str(uuid.uuid4())[:8]  # short unique ID

    def store_papers(self, query, papers):
        from datetime import datetime
        session_id = datetime.now().strftime('%Y%m%d%H%M%S')
        self.memory[session_id] = {
            "query": query,
            "papers": [],
        }

        for paper in papers:
            paper_id = self._generate_id()
            self.memory[session_id]["papers"].append({
                "id": paper_id,
                "title": paper['title'],
                "abstract": paper['abstract'],
                "authors": paper['authors'],
                "year": paper['year'],
                "url": paper['url'],
                "summary": None,
                "citation": None,
                "feedback": None
            })

        return session_id

    def store_summary(self, session_id, paper_id, summary):
        for paper in self.memory.get(session_id, {}).get("papers", []):
            if paper["id"] == paper_id:
                paper["summary"] = summary
                return True
        return False

    def store_citation(self, session_id, paper_id, citation):
        for paper in self.memory.get(session_id, {}).get("papers", []):
            if paper["id"] == paper_id:
                paper["citation"] = citation
                return True
        return False

    def get_papers(self, session_id):
        return self.memory.get(session_id, {}).get("papers", [])

    def get_summary(self, session_id, paper_id):
        for paper in self.get_papers(session_id):
            if paper["id"] == paper_id:
                return paper.get("summary", None)
        return None


#--------------------------------------------------------------------------------------------------------------
#-------------------------------------------Example------------------------------------------------------------
#--------------------------------------------------------------------------------------------------------------
# Step 1: Search and store
results = search_semantic_scholar("machine learning in fraud detection", limit=2)
buffer = MemoryBuffer()

if "papers" in results:
    session = buffer.store_papers("machine learning in fraud detection", results["papers"])
    papers = buffer.get_papers(session)

    for paper in papers:
        print(f"\n📘 {paper['title']} ({paper['id']})")
        summary = summarize_text(paper['abstract'])
        buffer.store_summary(session, paper['id'], summary)
        print("🔍 Summary:", summary)



📘 AI and Machine Learning In Fraud Detection : Securing Digital Payments and Economic Stability (c2150e97)
🔍 Summary:  As digital payment fraud escalates, traditional models struggle to address increasingly sophisticated tactics such as phishing, account takeovers, and salami slicing . AI/ML-driven solutions include graph-based anomaly detection,

📘 Machine Learning In Fraud Detection and Prevention (f45c0ec8)
🔍 Summary:  Financial fraud refers to unauthorized mobile transactions using mobile platforms to fraudulently obtain funds through identity theft or credit card theft . Digitization has revolutionized the daily tasks we perform at the click of a button .


**MODULE 5: Citation Formatter (APA Style)**

Format papers stored in memory into APA-style citations using their:

1. Authors
2. Year
3. Title
4. URL

Basic APA Format:

'AuthorLast, F. (Year). *Title of the paper*. Retrieved from URL'

For multiple authors, we list up to 3 (or use “et al.” after that for simplicity).

In [5]:
def format_apa_citation(paper):
    authors = paper.get("authors", [])
    if not authors:
        author_part = "Unknown"
    elif len(authors) == 1:
        author_part = authors[0]
    elif len(authors) == 2:
        author_part = f"{authors[0]} & {authors[1]}"
    elif len(authors) == 3:
        author_part = f"{authors[0]}, {authors[1]}, & {authors[2]}"
    else:
        author_part = f"{authors[0]} et al."

    year = paper.get("year", "n.d.")
    title = paper.get("title", "Untitled")
    url = paper.get("url", "")

    citation = f"{author_part} ({year}). *{title}*. Retrieved from {url}"
    return citation

#Integrate with Memory Buffer - extend the memory buffer to store the citation:

def store_citation(self, session_id, paper_id, citation):
    for paper in self.memory.get(session_id, {}).get("papers", []):
        if paper["id"] == paper_id:
            paper["citation"] = citation
            return True
    return False

#You’d then use it like this:


for paper in buffer.get_papers(session):
    citation = format_apa_citation(paper)
    buffer.store_citation(session, paper["id"], citation)
    print("📄 APA Citation:", citation)


📄 APA Citation: Prakash Raju Kantheti & Prof. Stella Bvuma (2024). *AI and Machine Learning In Fraud Detection : Securing Digital Payments and Economic Stability*. Retrieved from https://www.semanticscholar.org/paper/d59b2a77ad9e0973ea4b99a5f359d92f06f44704
📄 APA Citation: Kamini Pareek et al. (2023). *Machine Learning In Fraud Detection and Prevention*. Retrieved from https://www.semanticscholar.org/paper/27fd8fcb19b1edc7ee391a18a8cbab8f0527b181


In [6]:
print(input_processor("Find and summarize papers on AI in agriculture and generate citations and cite."))

['search', 'summarize', 'cite']


**MODULE 6: ReAct Controller (Reasoning and Acting Loop)**

Create a simple controller that:

1. Thinks step-by-step
2. Chooses which tool to use (search, summarize, cite)
3. Stores and tracks what’s done
4. Generates a final output

In [7]:
buffer = MemoryBuffer()

#Define ReAct Agent
def react_agent(user_input):
    output_log = []
    output_log.append("🤔 Thought: Understanding user request...")
    tasks = input_processor(user_input)
    output_log.append(f"🧠 Detected tasks: {tasks}")

    session_id = None
    paper_results = []

    if "search" in tasks:
        output_log.append("🔍 Action: Searching for papers...")
        results = search_semantic_scholar(user_input)
        if "error" in results:
            return f"❌ Observation: {results['error']}"
        paper_results = results["papers"]
        session_id = buffer.store_papers(user_input, paper_results)
        output_log.append(f"📦 Observation: {len(paper_results)} papers stored in session {session_id}")
    else: # If no search task, check if there's an active session to use
        # In a more complex agent, you might have logic to retrieve the latest session
        # For simplicity here, we assume if no search, user might be referring to the last search
        # This is a placeholder and can be improved
        latest_session_id = list(buffer.memory.keys())[-1] if buffer.memory else None
        if latest_session_id:
          session_id = latest_session_id
          output_log.append(f"📦 Observation: Using existing session {session_id}")
        else:
          return "⚠️ Please perform a search first before summarizing or citing."


    if "connect" in tasks and session_id:
        output_log.append("🔗 Action: Identifying connections between sources...")
        # Implement connection logic or a placeholder
        output_log.append("🔗 (Connection logic not yet implemented)")

    if "summarize" in tasks and session_id:
        output_log.append("📝 Action: Summarizing papers...")
        for paper in buffer.get_papers(session_id):
            summary = summarize_text(paper['abstract'])
            buffer.store_summary(session_id, paper['id'], summary)
            output_log.append(f"\n📘 {paper['title']}\n🔍 Summary: {summary}")

    if "cite" in tasks and session_id:
        output_log.append("📄 Action: Generating citations...")
        for paper in buffer.get_papers(session_id):
            citation = format_apa_citation(paper)
            buffer.store_citation(session_id, paper['id'], citation)
            output_log.append(f"\n📘 {paper['title']}\n📄 Citation: {citation}")

    # Remove this redundant check
    # if not session_id:
    #     return "🤷‍♂️ I couldn't process your request. Try rephrasing."

    output_log.append(f"\n✅ Session complete. Session ID: {session_id}")
    return "\n".join(output_log)

# This goes outside the function:
print(react_agent("machine learning in healthcare."))

🤔 Thought: Understanding user request...
🧠 Detected tasks: ['search']
🔍 Action: Searching for papers...
📦 Observation: 3 papers stored in session 20250724021616

✅ Session complete. Session ID: 20250724021616


**Adding a simple Gradio interface so user can interact with AI Virtual Research Agent from a web-based UI inside Google Colab.**

In [8]:
#Step 1: Install Gradio
!pip install gradio --quiet
import gradio as gr

#Step 2: Define a Wrapper Function
#This wraps the react_agent() function into something Gradio can use.

def assistant_interface(user_input):
    try:
        output = react_agent(user_input)
        return output
    except Exception as e:
        return f"❌ Error: {str(e)}"

#Step 3: Build the Gradio Interface
#Here’s a UI with:
#   1. An input box
#   2. A "Submit" button
#   3. Scrollable chatbot-like output

with gr.Blocks() as demo:
    gr.Markdown("## 🤖 Academic Research Assistant")
    gr.Markdown("Ask for papers, summaries, and citations. Try: 'Find papers on AI in healthcare and summarize them.'")

    user_input = gr.Textbox(label="Enter your request")
    output_box = gr.Textbox(label="Agent Response", lines=20)

    submit_button = gr.Button("Run Agent")

    submit_button.click(fn=assistant_interface, inputs=user_input, outputs=output_box)

# Step 4: Launch the Interface
demo.launch(share=True)

#🔗 share=True gives you a public link to test the agent outside of Colab too!



Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://7aeb487c81ace7a3b4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


