### Research Assistant Agent Class

This code defines the `ResearchAssistantAgent` class, which simulates a research assistant. It's designed to:

- **Process user queries**: Extract the topic and scope of research.
- **Search the web**: Retrieve information from online sources using the `googlesearch-python` library.
- **Evaluate source credibility**: Assign a simple credibility score based on the URL and content length.
- **Summarize content**: Generate a brief summary of each retrieved source.
- **Generate a report**: Create a Word document (`.docx`) containing the research findings and citations using the `python-docx` library.
- **Apply reinforcement learning**: Update source selection weights based on simulated user feedback (though the feedback mechanism here is a simple example).

**Key Methods:**

- `__init__()`: Initializes the agent, including its memory system to store sources, summaries, citations, feedback logs, and source weights.
- `process_input(query)`: Parses the user's research query.
- `evaluate_source_credibility(url, content)`: Calculates a credibility score for a given source.
- `search_web(query)`: Performs the web search and retrieves content from the top results.
- `summarize_content(content)`: Summarizes the text content of a source.
- `extract_keywords(content)`: Extracts keywords from the source content.
- `generate_report(query, sources)`: Creates the Word document report.
- `apply_reinforcement_learning(action, reward)`: Updates internal weights based on feedback (simulated).
- `run(query, user_feedback)`: The main method to execute the research process.

### Library Installation and Imports

This code cell installs the necessary Python libraries and imports the required modules for the `ResearchAssistantAgent`.

- `!pip install requests beautifulsoup4 googlesearch-python`: Installs libraries for making HTTP requests, parsing HTML, and performing Google searches.
- `!pip install python-docx`: Installs the library for creating and modifying Word documents.
- `!pip install numpy`: Installs the NumPy library, commonly used for numerical operations (though not extensively used in this specific agent implementation).

The subsequent import statements make the functions and classes from these libraries available for use in the notebook.

In [1]:
# Install necessary libraries
!pip install requests beautifulsoup4 googlesearch-python
!pip install python-docx
!pip install numpy

import requests
from bs4 import BeautifulSoup
from googlesearch import search
import numpy as np
from docx import Document
import re
from typing import List, Dict, Tuple
import json


Collecting googlesearch-python
  Downloading googlesearch_python-1.3.0-py3-none-any.whl.metadata (3.4 kB)
Downloading googlesearch_python-1.3.0-py3-none-any.whl (5.6 kB)
Installing collected packages: googlesearch-python
Successfully installed googlesearch-python-1.3.0
Collecting python-docx
  Downloading python_docx-1.2.0-py3-none-any.whl.metadata (2.0 kB)
Downloading python_docx-1.2.0-py3-none-any.whl (252 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.0/253.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: python-docx
Successfully installed python-docx-1.2.0


In [3]:
class ResearchAssistantAgent:
    def __init__(self):
        # Memory system: Store search results, summaries, and citations
        self.memory = {
            'sources': [],  # List of (url, content, credibility_score)
            'summaries': [],  # List of summaries
            'citations': [],  # List of citation entries
            'feedback_log': [],  # List of (action, reward) for RL
            'source_weights': {}  # Weights for source selection (RL policy)
        }
        self.max_sources = 5  # Limit number of sources to manage resources

    def process_input(self, query: str) -> Tuple[str, Dict]:
        """Parse user query to extract topic and scope."""
        # Basic input validation
        if not query or len(query.strip()) < 3:
            raise ValueError("Invalid query: Query must be at least 3 characters long.")
        if any(word in query.lower() for word in ['inappropriate', 'harmful']):
            raise ValueError("Query contains inappropriate content.")

        # Extract topic and scope (simplified parsing)
        scope = {'max_results': self.max_sources, 'topic': query.strip()}
        return query, scope

    def evaluate_source_credibility(self, url: str, content: str) -> float:
        """Evaluate source credibility based on domain and content quality."""
        # Simplified credibility scoring
        credibility = 0.5  # Baseline score
        if 'edu' in url or 'gov' in url:
            credibility += 0.3
        if len(content) > 500:  # Longer content often indicates depth
            credibility += 0.2
        return min(credibility, 1.0)

    def search_web(self, query: str) -> List[Dict]:
        """Perform web search and retrieve content."""
        sources = []
        try:
            for url in search(query, num_results=self.max_sources):
                try:
                    response = requests.get(url, timeout=5)
                    soup = BeautifulSoup(response.text, 'html.parser')
                    content = ' '.join([p.text for p in soup.find_all('p')])
                    credibility = self.evaluate_source_credibility(url, content)
                    sources.append({
                        'url': url,
                        'content': content[:1000],  # Limit content size
                        'credibility': credibility
                    })
                    # Update source weights for RL
                    self.memory['source_weights'][url] = self.memory.get('source_weights', {}).get(url, 0.5) + credibility * 0.1
                except Exception as e:
                    print(f"Error fetching {url}: {str(e)}")
                    continue
        except Exception as e:
            print(f"Search error: {str(e)}")
            # Fallback strategy: Return cached sources if available
            if self.memory['sources']:
                return self.memory['sources']
        return sources

    def summarize_content(self, content: str) -> str:
        """Summarize content using simple truncation and keyword extraction."""
        sentences = re.split(r'[.!?]+', content)
        keywords = self.extract_keywords(content)
        summary = ' '.join(sentences[:2])  # Take first two sentences
        return f"{summary} (Keywords: {', '.join(keywords)})"

    def extract_keywords(self, content: str) -> List[str]:
        """Extract top keywords from content."""
        words = re.findall(r'\w+', content.lower())
        word_freq = {}
        for word in words:
            if len(word) > 3:  # Ignore short words
                word_freq[word] = word_freq.get(word, 0) + 1
        return sorted(word_freq, key=word_freq.get, reverse=True)[:3]

    def generate_report(self, query: str, sources: List[Dict]) -> str:
        """Generate a Word document with findings and citations."""
        doc = Document()
        doc.add_heading(f"Research Report: {query}", 0)

        for i, source in enumerate(sources, 1):
            doc.add_heading(f"Source {i}: {source['url']}", level=1)
            summary = self.summarize_content(source['content'])
            doc.add_paragraph(f"Summary: {summary}")
            doc.add_paragraph(f"Credibility Score: {source['credibility']:.2f}")
            self.memory['citations'].append(f"{i}. {source['url']}")

        doc.add_heading("Citations", level=1)
        for citation in self.memory['citations']:
            doc.add_paragraph(citation)

        output_file = "research_report.docx"
        doc.save(output_file)
        return output_file

    def apply_reinforcement_learning(self, action: str, reward: float):
        """Update policy based on feedback."""
        self.memory['feedback_log'].append((action, reward))
        # Update source weights based on average reward
        if action.startswith("select_source_"):
            url = action[len("select_source_"):]
            current_weight = self.memory['source_weights'].get(url, 0.5)
            self.memory['source_weights'][url] = current_weight + reward * 0.1

    def run(self, query: str, user_feedback: float = None):
        """Main execution loop."""
        try:
            # Process input
            query, scope = self.process_input(query)

            # Retrieve and process sources
            sources = self.search_web(query)
            if not sources:
                return "No sources found. Please try a different query."

            self.memory['sources'] = sources
            for source in sources:
                self.memory['summaries'].append(self.summarize_content(source['content']))

            # Generate report
            report_file = self.generate_report(query, sources)

            # Apply RL if feedback provided
            if user_feedback is not None:
                for source in sources:
                    self.apply_reinforcement_learning(f"select_source_{source['url']}", user_feedback)

            return f"Report generated: {report_file}"
        except Exception as e:
            print(f"Error in agent execution: {str(e)}")
            return "An error occurred. Please try again."

### Example Usage

This code cell demonstrates how to use the `ResearchAssistantAgent` class.

- `agent = ResearchAssistantAgent()`: Creates an instance of the `ResearchAssistantAgent`.
- `result = agent.run("artificial intelligence ethics")`: Runs the agent with the query "artificial intelligence ethics". This will trigger the web search, summarization, report generation, and simulated reinforcement learning.
- `print(result)`: Prints the result of the `run` method, which is a message indicating the report file name.
- `agent.apply_reinforcement_learning("select_source_example.com", 1.0)`: This line simulates user feedback. In a real-world scenario, this would likely come from user interaction with the generated report. The feedback (1.0 in this case, representing positive feedback) is used to update the agent's internal source weights, influencing future searches.

In [4]:
# Example usage
agent = ResearchAssistantAgent()
result = agent.run("artificial intelligence ethics")
print(result)

# Simulate user feedback (1.0 = positive, -1.0 = negative)
agent.apply_reinforcement_learning("select_source_example.com", 1.0)


Report generated: research_report.docx


Safety and Security Measures
"""
## Safety and Security Measures
- **Input Validation**: Checks for query length and inappropriate content.
- **Boundary Enforcement**: Limits number of sources to 5 to prevent overuse.
- **Fallback Strategies**: Uses cached sources if web search fails.
- **Transparency**: Reports errors and limitations in output.
"""