## Testing ollama

In [1]:
import requests
 # Test the Ollama API
response = requests.post(
    "http://localhost:11434/api/generate",
    json={
        "model": "llama3.2:1b",
        "prompt": "Hello, what are your capabilities?",
        "stream": False
    }
)
print(response.json()["response"])

I can be used in a variety of ways, from helping you plan a vacation to creating art. I'm here to assist you in finding the help or information you need. My strengths include answering questions, generating text and images, as well as being able to play games with you.


## Document preprocessor

In [2]:
import os
from rag.document_processor import DocumentProcessor
 # Create a test document
os.makedirs("data", exist_ok=True)
with open("data/test_document.txt", "w") as f:
    f.write("""
    Retrieval-Augmented Generation (RAG) is a technique that enhances large language models
    by allowing them to access external knowledge. This approach combines the strengths of 
    retrieval-based and generation-based methods in natural language processing.
    
    The key components of a RAG system include:
    1. A document store containing knowledge
    2. A retrieval system to find relevant information
    3. A language model to generate responses
    
    RAG addresses the limitations of traditional language models, such as outdated knowledge
    and hallucinations, by grounding responses in factual information from external sources.
    """)
 # Initialize the document processor
 # Note: Using smaller chunks for Llama 3.2 1B to accommodate its smaller context window
processor = DocumentProcessor(chunk_size=500, chunk_overlap=50)
 # Process the test document
chunks = processor.process_documents("data")

Processed 2 documents into 6 chunks


In [3]:
print(f"Number of chunks: {len(chunks)}")
for i, chunk in enumerate(chunks):
    print(f"\nChunk {i+1}:")
    print(f"Text: {chunk.page_content}...")
    print(f"Metadata: {chunk.metadata}")

Number of chunks: 6

Chunk 1:
Text: # Retrieval-Augmented Generation (RAG)
        
        Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language models
        by incorporating external knowledge retrieval into the generation process. It was introduced
        by researchers at Facebook AI in 2020.
        
        ## Core Components
        
        1. **Document Store**: A collection of documents containing domain-specific knowledge....
Metadata: {'source': 'data\\rag_explanation.txt'}

Chunk 2:
Text: 2. **Retriever**: A system that finds relevant documents or passages based on a query.
        3. **Generator**: A language model that produces responses using the retrieved information.
        4. **Embedding Model**: Converts text into vector representations for similarity matching.
        5. **Vector Database**: Efficiently stores and indexes embeddings for quick retrieval.
        
        ## Advantages of RAG...
Metadata: {'source': 'data\\rag_expla

## Embeddings

In [4]:
from rag.document_processor import DocumentProcessor
from rag.embeddings import EmbeddingManager
import os
 # Process documents
processor = DocumentProcessor(chunk_size=500, chunk_overlap=50)
chunks = processor.process_documents("data")
 # Create a persist directory
os.makedirs("vectorstore", exist_ok=True)
 # Initialize embedding manager with local embeddings
embedding_manager = EmbeddingManager(
    model_name="all-MiniLM-L6-v2",  # Lightweight but effective embedding model
    persist_directory="vectorstore"
)
 # Create vector store from documents
embedding_manager.create_vectorstore(chunks)
 # Test loading the vector store
embedding_manager = EmbeddingManager(
    model_name="all-MiniLM-L6-v2",
    persist_directory="vectorstore"
)
success = embedding_manager.load_vectorstore()

Processed 2 documents into 6 chunks


  self.embeddings = HuggingFaceEmbeddings(model_name=model_name)
  from .autonotebook import tqdm as notebook_tqdm
  self.vectorstore.persist()


Created vector store with 6 documents
Loaded vector store from vectorstore


  self.vectorstore = Chroma(


In [5]:
print(f"Vector store loaded successfully: {success}")
 # Get the vector store
vectorstore = embedding_manager.get_vectorstore()
print(f"Vector store contains approximately {vectorstore._collection.count()} documents")

Vector store loaded successfully: True
Vector store contains approximately 20 documents


## Retriever

In [6]:
from rag.document_processor import DocumentProcessor
from rag.embeddings import EmbeddingManager
from rag.retriever import Retriever
 # Process documents (using previous example data)
processor = DocumentProcessor(chunk_size=500, chunk_overlap=50)
chunks = processor.process_documents("data")
 # Create embedding manager and vector store
embedding_manager = EmbeddingManager(model_name="all-MiniLM-L6-v2")
embedding_manager.create_vectorstore(chunks)
vectorstore = embedding_manager.get_vectorstore()
 # Initialize retriever
retriever = Retriever(vectorstore, top_k=2)
 # Test simple retrieval
query = "What is RAG and what are its components?"
documents = retriever.retrieve(query)

Processed 2 documents into 6 chunks
Created vector store with 6 documents


  documents = self.retriever.get_relevant_documents(query)


In [7]:
print(f"Query: {query}")
print(f"Retrieved {len(documents)} documents:")
for i, doc in enumerate(documents):
    print(f"\nDocument {i+1}:")
print(f"Content: {doc.page_content[:150]}...")
print(f"Source: {doc.metadata.get('source', 'Unknown')}")
# Test retrieval with scores
documents_with_scores = retriever.retrieve_with_scores(query)
print("\nDocuments with similarity scores:")
for i, (doc, score) in enumerate(documents_with_scores):
    print(f"Document {i+1} - Score: {score:.4f}")
# Test MMR retrieval for diversity
mmr_documents = retriever.retrieve_with_mmr(query, diversity=0.7)
print("\nMMR retrieval results:")
for i, doc in enumerate(mmr_documents):
    print(f"Document {i+1}: {doc.page_content}...")

Query: What is RAG and what are its components?
Retrieved 2 documents:

Document 1:

Document 2:
Content: Retrieval-Augmented Generation (RAG) is a technique that enhances large language models
    by allowing them to access external knowledge. This approa...
Source: data\test_document.txt

Documents with similarity scores:
Document 1 - Score: 0.8775
Document 2 - Score: 1.0964

MMR retrieval results:
Document 1: ## Implementation Approaches
        
        There are several ways to implement RAG systems:
        
        - **Basic RAG**: Simple retrieval followed by generation
        - **Advanced RAG**: Includes query reformulation, multi-step retrieval, and reranking
        - **Hybrid Approaches**: Combines fine-tuning with retrieval for specialized domains...
Document 2: RAG addresses the limitations of traditional language models, such as outdated knowledge
    and hallucinations, by grounding responses in factual information from external sources....


# RAG System

In [8]:
from main import OllamaRAGSystem
import os
 # Make sure we have test data
os.makedirs("data", exist_ok=True)
test_file_path = "data/rag_explanation.txt"
if not os.path.exists(test_file_path):
    with open(test_file_path, "w") as f:
        f.write("""
        # Retrieval-Augmented Generation (RAG)
        
        Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language models
        by incorporating external knowledge retrieval into the generation process. It was introduced
        by researchers at Facebook AI in the year 2020.
        
        ## Core Components
        
        1. **Document Store**: A collection of documents containing domain-specific knowledge.
        2. **Retriever**: A system that finds relevant documents or passages based on a query.
        3. **Generator**: A language model that produces responses using the retrieved information.
        4. **Embedding Model**: Converts text into vector representations for similarity matching.
        5. **Vector Database**: Efficiently stores and indexes embeddings for quick retrieval.
        
        ## Advantages of RAG
        
        - Reduces hallucinations by grounding responses in factual information
        - Enables access to up-to-date information beyond the model's training data
        - Allows incorporation of domain-specific knowledge
        - Provides transparency through explicit source attribution
        - More cost-effective than continuous model retraining
        
        ## Implementation Approaches
        
        There are several ways to implement RAG systems:
        
        - **Basic RAG**: Simple retrieval followed by generation
        - **Advanced RAG**: Includes query reformulation, multi-step retrieval, and reranking
        - **Hybrid Approaches**: Combines fine-tuning with retrieval for specialized domains
        """)
 # Initialize the RAG system with Llama 3.2 1B
rag = OllamaRAGSystem(
    data_dir="data",
    ollama_model="llama3.2:1b",
    top_k=2  # Using a smaller top_k value due to Llama 3.2's limited context window
)
 # Test queries
queries = [
    "What is RAG and when was it introduced?",
    "What are the main components of a RAG system?",
    "What are the advantages of using RAG over traditional LLMs?",
    "How can RAG be implemented in practice?"
 ]


Loaded vector store from vectorstore
Connected to Ollama with model: llama3.2:1b
Ollama RAG system initialized successfully!


  self.llm = Ollama(model=model_name, temperature=temperature)


In [9]:
 # Process each query and display results
for query in queries:
    print("\n" + "="*80)
    print(f"Query: {query}")
    print("="*80)
    
    # Get response with sources
    result = rag.query(query, with_sources=True)
    
    print(f"\nResponse:\n{result['response']}")
    
    # Test MMR retrieval for diverse results
    mmr_result = rag.query(query, with_sources=True, use_mmr=True)
    
    print(f"\nMMR Response:\n{mmr_result['response']}")


Query: What is RAG and when was it introduced?

Response:
Based on the provided documents, RAG (Retrieval-Augmented Generation) systems were first introduced in [Document 1] as a basic approach to implementing Retrieval-Augmented Generation (RAG) systems.

MMR Response:
Based on the provided documents, RAG (Retrieval-Augmented Generation) addresses the limitations of traditional language models [Document 1] by grounding responses in factual information from external sources [Document 2]. 

RAG was introduced as an advanced retrieval approach [Document 2], which includes query reformulation and reranking to improve its performance.

Query: What are the main components of a RAG system?

Response:
Based on the provided documents, the main components of a RAG (Retrieval-Augmented Generation) system include:

1. **Retrieval**: This involves finding relevant documents in a database or data source.
2. **Generation**: This involves generating new content, such as text or images, based on the 

## Evaluation

In [10]:
from main import OllamaRAGSystem
from rag.evaluator import SimpleRAGEvaluator
import pandas as pd
 # Initialize the RAG system
rag = OllamaRAGSystem(
    data_dir="data",
    ollama_model="llama3.2:1b",
    top_k=2
)
 # Create test cases
test_queries = [
    {
        "query": "What is RAG and when was it introduced?",
        "relevant_docs": ["data/rag_explanation.txt"]
    },
    {
        "query": "What are the main components of a RAG system?",
        "relevant_docs": ["data/rag_explanation.txt"]
    },
    {
        "query": "What are the advantages of using RAG?",
        "relevant_docs": ["data/rag_explanation.txt"]
    }
]
 # Initialize evaluator
evaluator = SimpleRAGEvaluator()
 # Run evaluation
results = evaluator.run_evaluation(rag, test_queries)
 # Display results
pd.set_option('display.max_colwidth', None)
#print(results[["query", "response", "precision", "recall", "f1_score", "context_utilization", "has_citations"]])

Loaded vector store from vectorstore
Connected to Ollama with model: llama3.2:1b
Ollama RAG system initialized successfully!


In [11]:
results[["query", "response", "precision", "recall", "f1_score", "context_utilization", "has_citations"]]

Unnamed: 0,query,response,precision,recall,f1_score,context_utilization,has_citations
0,What is RAG and when was it introduced?,"Based on the provided documents, RAG (Retrieval-Augmented Generation) systems were first introduced in [Document 1] Implementation Approaches. Specifically, they are mentioned as a basic retrieval approach followed by generation [Document 1].",0.0,0.0,0,0.5,True
1,What are the main components of a RAG system?,"Based on the provided documents, the main components of a RAG (Retrieval-Augmented Generation) system include:\n\n1. **Retrieval**: This involves finding relevant documents in a database.\n2. **Generation**: This involves generating new content based on the retrieved documents.\n\nThese two components are often combined to create a hybrid approach, as mentioned in [Document 2] ""Combines fine-tuning with retrieval for specialized domains"".",0.0,0.0,0,0.32,True
2,What are the advantages of using RAG?,"Based on the provided documents, it appears that RAG (Retrieval-Augmented Generation) systems offer several advantages. According to [Document 1], Basic RAG provides a simple retrieval process followed by generation, which can be beneficial for certain applications.\n\nHowever, as mentioned in [Document 2], Advanced RAG includes query reformulation, multi-step retrieval, and reranking, which can lead to improved results and better performance. Additionally, Hybrid Approaches combine fine-tuning with retrieval for specialized domains, suggesting that a combination of these approaches may offer the most benefits.\n\nIt's worth noting that the advantages of using RAG systems are not explicitly stated in either document, so I couldn't provide a definitive answer based on the provided information alone.",0.0,0.0,0,0.356322,True
