# Enhanced RAG with Ollama + CrewAI RAG Tool + Wikipedia Integration

A comprehensive RAG implementation using CrewAI with Ollama, the official CrewAI RAG tool, and Wikipedia web scraping capabilities.

In [1]:
import os
import requests
from bs4 import BeautifulSoup
from crewai import Agent, Task, Crew, LLM
from crewai.tools import BaseTool
from crewai_tools import RagTool
from typing import Type
import re

In [2]:
os.environ['CHROMA_OPENAI_API_KEY'] = 'dummy-key-for-chroma'

ollama_llm = LLM(
    model="ollama/llama3.2:1b",
    temperature=0.7,
)

print("Environment variables set and Ollama LLM configured successfully!")

Environment variables set and Ollama LLM configured successfully!


In [3]:
def scrape_wikipedia_page(url):
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        }
        
        response = requests.get(url, headers=headers)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.content, 'html.parser')
        
        content_div = soup.find('div', {'id': 'mw-content-text'})
        
        if not content_div:
            return "Content not found on the page."
        
        for element in content_div.find_all(['table', 'div', 'span'], class_=['navbox', 'infobox', 'reference', 'mw-editsection']):
            element.decompose()
        
        paragraphs = content_div.find_all('p')
        
        text_content = []
        for p in paragraphs:
            text = p.get_text().strip()
            text = re.sub(r'\[\d+\]', '', text)
            text = re.sub(r'\s+', ' ', text)
            if text and len(text) > 50:
                text_content.append(text)
        
        return '\n\n'.join(text_content)
        
    except requests.RequestException as e:
        return f"Error fetching page: {str(e)}"
    except Exception as e:
        return f"Error parsing page: {str(e)}"

print("Wikipedia scraping function created!")

Wikipedia scraping function created!


In [4]:
rag_tool = RagTool()

sample_content = """
Artificial Intelligence (AI) is a branch of computer science that aims to create machines 
capable of intelligent behavior. AI systems can perform tasks that typically require human 
intelligence, such as visual perception, speech recognition, decision-making, and language 
translation. Machine learning is a subset of AI that focuses on algorithms that can learn 
and improve from experience without being explicitly programmed.

Machine Learning (ML) is a method of data analysis that automates analytical model building. 
It is a branch of artificial intelligence based on the idea that systems can learn from data, 
identify patterns and make decisions with minimal human intervention. There are three main 
types of machine learning: supervised learning, unsupervised learning, and reinforcement 
learning.

CrewAI is a framework for orchestrating role-playing, autonomous AI agents. It enables 
collaboration between multiple AI agents to solve complex tasks. CrewAI agents can have 
specific roles, goals, and backstories, making them more effective at specialized tasks.

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with 
text generation. It retrieves relevant information from a knowledge base and uses it to 
generate more accurate and contextually relevant responses.

Natural Language Processing (NLP) is a subfield of AI that focuses on the interaction between 
computers and humans through natural language. It involves teaching computers to understand, 
interpret, and generate human language in a valuable way.

Ollama is a tool for running large language models locally on your machine. It allows you to 
run models like Llama, Mistral, and others without needing internet connectivity or API keys.
"""

rag_tool.add(data_type="text", content=sample_content)

print("CrewAI RAG tool initialized with sample content!")

CrewAI RAG tool initialized with sample content!


In [5]:
print("Scraping Wikipedia Lion page...")
lion_url = "https://en.wikipedia.org/wiki/Lion"
lion_content = scrape_wikipedia_page(lion_url)

print(f"Successfully scraped {len(lion_content)} characters from Wikipedia Lion page")
print(f"First 200 characters: {lion_content[:200]}...")

Scraping Wikipedia Lion page...
Successfully scraped 47347 characters from Wikipedia Lion page
First 200 characters: The lion (Panthera leo) is a large cat of the genus Panthera, currently found only in Sub-Saharan Africa and India. It has a muscular, broad-chested body; a short, rounded head; round ears; and a dark...


In [6]:
if lion_content and not lion_content.startswith("Error"):
    rag_tool.add(data_type="text", content=lion_content)
    print("Wikipedia Lion content added to RAG tool!")
else:
    print(f"Failed to add Wikipedia content: {lion_content}")

Wikipedia Lion content added to RAG tool!


In [7]:
knowledge_expert = Agent(
    role='Knowledge Expert',
    goal='Answer questions using the CrewAI RAG tool to provide accurate information from AI/ML knowledge and Wikipedia Lion data',
    backstory='An expert in retrieving and providing information from various sources including AI/ML topics and wildlife information using advanced RAG techniques.',
    tools=[rag_tool],
    llm=ollama_llm,
    verbose=True
)

print("Knowledge expert agent created with CrewAI RAG tool!")

Knowledge expert agent created with CrewAI RAG tool!


In [8]:
def ask_question(question):
    print(f"Question: {question}")
    print("=" * 50)
    
    task = Task(
        description=f"Answer this question: {question}",
        expected_output="A clear and accurate answer based on the available information",
        agent=knowledge_expert
    )
    
    crew = Crew(
        agents=[knowledge_expert],
        tasks=[task],
        verbose=True
    )
    
    result = crew.kickoff()
    return result

print("ask_question function ready!")

ask_question function ready!


In [9]:
def interactive_mode():
    print("Interactive RAG System (Ollama + CrewAI RAG Tool + Wikipedia)")
    print("Type 'quit' to exit")
    print("=" * 50)
    
    while True:
        question = input("\nAsk a question: ")
        
        if question.lower() == 'quit':
            print("Goodbye!")
            break
        
        if question.strip():
            try:
                result = ask_question(question)
                print(f"\nAnswer: {result}")
            except Exception as e:
                print(f"Error: {str(e)}")
        else:
            print("Please ask a question.")

print("interactive_mode function ready!")

interactive_mode function ready!


## Test the Enhanced RAG System with Wikipedia Lion Content

In [10]:
print("Testing Enhanced RAG System with Wikipedia Lion Content...")
print("=" * 70)

lion_questions = [
    "What is a lion?",
    "Where do lions live?",
    "What do lions eat?",
    "How big are lions?",
    "What is the difference between male and female lions?",
    "How do lions hunt?",
    "What are lion prides?",
    "Are lions endangered?"
]

for question in lion_questions:
    print(f"\nTesting: {question}")
    print("-" * 50)
    
    try:
        result = ask_question(question)
        print(f"Answer: {result}")
    except Exception as e:
        print(f"Error: {str(e)}")
    
    print("=" * 70)

print("\nLion RAG System Testing Complete!")

Testing Enhanced RAG System with Wikipedia Lion Content...

Testing: What is a lion?
--------------------------------------------------
Question: What is a lion?


Answer: Observation: The question asks about lions, but I don't have any information about them. I'll need to rely on my knowledge base.

Action: Use the Knowledge Base tool with the query 'What is a lion?' and similarity threshold set to None.

Input:
```json
{"query": "What is a lion?", "similarity_threshold": null}
```


Testing: Where do lions live?
--------------------------------------------------
Question: Where do lions live?


Answer: Thought: I need to search for lion habitats in my knowledge base.

Action: Knowledge base

Testing: What do lions eat?
--------------------------------------------------
Question: What do lions eat?


Answer: Thought: I need to find out what lions eat

Action: search Knowledge base for "lions diet"


Testing: How big are lions?
--------------------------------------------------
Question: How big are lions?


Answer: Observation: The question asked about the size of lions.

Action: Use the Knowledge base with the tool arguments to retrieve information about lions from Wikipedia.

Input:
```json
{
  "query": "lion size",
  "similarity_threshold": 0.8,
  "limit": 1
}
```

Testing: What is the difference between male and female lions?
--------------------------------------------------
Question: What is the difference between male and female lions?


Answer: Observation: The question asks about the difference between male and female lions. To provide an accurate answer, I need to access information from both male and female lions.

Action: Use the Knowledge base tool with a query of "male lion vs female lion".

Output:
```
{"query": {"description": "difference between", "type": "str"},
 "similarity_threshold": 0,
 "limit": null}
```

Testing: How do lions hunt?
--------------------------------------------------
Question: How do lions hunt?


Answer: Action: 
```
{"query": {"description": "lion hunting", "type": "str"},
"similarity_threshold": 0.0,
"limit": 1}
```


Testing: What are lion prides?
--------------------------------------------------
Question: What are lion prides?


Answer: Thought: I need to know where lions live and how many they are.

Action: Knowledge base
Input: {"query": "lions habitat and population"}

Testing: Are lions endangered?
--------------------------------------------------
Question: Are lions endangered?


Answer: Observation: The question asked whether lions are endangered. To determine this, we can rely on the information from both AI/ML topics and wildlife data.

Action: Use the CrewAI RAG tool to search for "lions" in the Knowledge Base's database.

Input: 
```json
{"query": {"description": "", "type": "str"}}
```

Lion RAG System Testing Complete!


## Test Mixed Questions (AI/ML + Lions)

In [11]:
print("Testing Mixed Questions (AI/ML + Lions)...")
print("=" * 60)

mixed_questions = [
    "What is artificial intelligence?",
    "How do lions hunt in groups?",
    "What is machine learning?",
    "What is the social structure of lions?",
    "What is CrewAI?",
    "Are lions apex predators?",
    "What is RAG in AI?",
    "What are the conservation status of lions?"
]

for question in mixed_questions:
    print(f"\nTesting: {question}")
    print("-" * 40)
    
    try:
        result = ask_question(question)
        print(f"Answer: {result}")
    except Exception as e:
        print(f"Error: {str(e)}")
    
    print("=" * 60)

print("\nMixed Questions Testing Complete!")

Testing Mixed Questions (AI/ML + Lions)...

Testing: What is artificial intelligence?
----------------------------------------
Question: What is artificial intelligence?


Answer: Thought: I will start by searching for information about artificial intelligence in my Knowledge base using the CrewAI RAG tool. 

Action: 
Input: `{"query": "artificial intelligence", "similarity_threshold": 0.5}`

Testing: How do lions hunt in groups?
----------------------------------------
Question: How do lions hunt in groups?


Answer: Observation: The question about lion hunting in groups requires knowledge of wildlife behavior and social structures. I will utilize the Knowledge Base tool to retrieve information on this topic.

Action: I will input a JSON object representing the question into the Knowledge Base tool with the following arguments:
```json
{
  "query": "lion hunting group behavior",
  "similarity_threshold": 0,
  "limit": null
}
```
This query asks for information about lion hunting in groups, and since there are no specific details provided, I will consider general information.


Testing: What is machine learning?
----------------------------------------
Question: What is machine learning?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Testing: What is the social structure of lions?
----------------------------------------
Question: What is the social structure of lions?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Testing: What is CrewAI?
----------------------------------------
Question: What is CrewAI?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Testing: Are lions apex predators?
----------------------------------------
Question: Are lions apex predators?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Testing: What is RAG in AI?
----------------------------------------
Question: What is RAG in AI?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Testing: What are the conservation status of lions?
----------------------------------------
Question: What are the conservation status of lions?


Error: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it

Mixed Questions Testing Complete!


## Add More Wikipedia Content

In [12]:
def add_wikipedia_content(url, description):
    print(f"Scraping {description}...")
    content = scrape_wikipedia_page(url)
    
    if content and not content.startswith("Error"):
        rag_tool.add(data_type="text", content=content)
        print(f"{description} content added to RAG tool!")
        return True
    else:
        print(f"Failed to add {description}: {content}")
        return False

print("Wikipedia content addition function ready!")

Wikipedia content addition function ready!


In [13]:
print("Adding additional Wikipedia content...")
print("=" * 50)

additional_pages = [
]

for url, description in additional_pages:
    add_wikipedia_content(url, description)

print("\nAdditional content addition complete!")

Adding additional Wikipedia content...

Additional content addition complete!


## Interactive Mode

Uncomment and run the cell below to start interactive mode:

In [14]:
# interactive_mode()

## Quick Test - Single Lion Question

Run this cell to test with a single lion question:

In [15]:
question = "Tell me about lions - their habitat, behavior, and conservation status"
print(f"Testing with: {question}")
print("=" * 80)

result = ask_question(question)
print(f"\nFinal Answer: {result}")

Testing with: Tell me about lions - their habitat, behavior, and conservation status
Question: Tell me about lions - their habitat, behavior, and conservation status


APIConnectionError: litellm.APIConnectionError: OllamaException - [WinError 10061] No connection could be made because the target machine actively refused it