<style>
.Yellow{color: Yellow;}
.Black{color: Black;}
.custom-style {
    color: Maroon;
    font-family: sans-serif;
    font-size: 150px;
    text-align: center;
    margin-top: 5px;
    margin-bottom: 5px;
}
</style>
<p class="custom-style">
<span class=Black>De</span><span class=Yellow>X</span><span class=Black>alo</span>
</p>



# Introduction
In this project, I will develop a simple RAG (Retrieval-Augmented Generation) powered by OpenAI model with LangChain. and a calculator tool.

## `This Notebook is only to test the tools used in the project.`
# Key Requirements
- The application should be able to answer questions about the content of a PDF/txt file.
- The application should be able to perform calculations using a calculator tool.
- The application should be able to handle multiple files and provide answers based on the content of the files.
- A FastAPI backend
- A React frontend





# Importing Libraries
Any necessary libraries are imported in this section.

In [43]:
import os
import math
from typing import Type
from pydantic import BaseModel, Field

from langchain.agents import AgentType, initialize_agent
from langchain.tools import BaseTool
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory

import tempfile
from typing import List
import re


from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import TokenTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document

# Functions Definition & Global Variables
The functions & global or enviroment variables that will be used in the notebook are defined in this section.

In [None]:
# Set API key
os.environ["OPENAI_API_KEY"] = "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"

In [3]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

In [36]:
# Define a simple calculator tool that our agent can use
class CalculatorInput(BaseModel):
    expression: str = Field(description="Mathematical expression to evaluate")

class CalculatorTool(BaseTool):
    name: str = "calculator"
    description: str = "Useful for performing mathematical calculations"
    args_schema: Type[BaseModel] = CalculatorInput
    
    def _run(self, expression: str) -> str:
        """Evaluate a mathematical expression."""
        try:
            # Add basic math functions to the locals
            local_dict = {
                'sqrt': math.sqrt,
                'pow': pow,
                'abs': abs,
                'round': round,
                'max': max,
                'min': min,
                'sin': math.sin,
                'cos': math.cos,
                'tan': math.tan,
                'pi': math.pi,
                'e': math.e,
                'log': math.log,
                'log10': math.log10,
                'floor': math.floor,
                'ceil': math.ceil
            }
            
            # Evaluate the expression
            result = eval(expression, {"__builtins__": {}}, local_dict)
            return str(result)
        except Exception as e:
            return f"Error evaluating expression: {str(e)}"

    async def _arun(self, expression: str) -> str:
        """Async version of _run."""
        return self._run(expression)
    


def test_agent(query , agent):
    print(f"\n--- Testing: '{query}' ---")
    response = agent.invoke(input=query)
    print(f"\nFinal response: {response['output']}")
    print("-----------------------------------\n")



# Define a simple function to simulate RAG context
def query_with_context(query, context , agent):
    """Simulate a query with RAG context"""
    enriched_query = (
        f"I have a question that may require information from these documents:\n\n"
        f"{context}\n\n"
        f"My question is: {query}"
    )
    
    print(f"\n--- Testing with RAG context ---")
    print(f"Query: '{query}'")
    print(f"Using context: '{context[:100]}...'")
    
    response = agent.invoke(input=enriched_query)
    
    print(f"\nFinal response: {response['output']}")
    print("-----------------------------------\n")


# LangChain
This section demonstrates how LangChain agents work with tools

## Initialize the language model


In [None]:
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo") # start by using the gpt-3.5-turbo model then switch to gpt-4 if needed or after testing

In [19]:
# test with simple prompt
answer = llm.invoke("What is the capital of France?")
print(answer.content)

The capital of France is Paris.


## Create memory to store chat history

In [9]:
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

## Initialize the agent

In [11]:
tools = [CalculatorTool()]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
    verbose=True,  # This shows the agent's thought process
    memory=memory
)


## Test the agent with different types of queries


In [30]:
test_agent("What is the square root of 144 plus 5?", agent)



--- Testing: 'What is the square root of 144 plus 5?' ---


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "calculator",
    "action_input": "sqrt(144) + 5"
}
```[0m
Observation: [36;1m[1;3m17.0[0m
Thought:[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "17.0"
}
```[0m

[1m> Finished chain.[0m

Final response: 17.0
-----------------------------------



In [32]:
test_agent("Tell me about the weather today" , agent)


--- Testing: 'Tell me about the weather today' ---


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "I'm an AI assistant and do not have real-time information. You can check the weather using a weather website or app for the most up-to-date information."
}
```[0m

[1m> Finished chain.[0m

Final response: I'm an AI assistant and do not have real-time information. You can check the weather using a weather website or app for the most up-to-date information.
-----------------------------------



In [34]:
test_agent("If I have 3 apples and each costs $2.50, and I also buy 2 oranges at $1.75 each, how much will I spend in total?" , agent)



--- Testing: 'If I have 3 apples and each costs $2.50, and I also buy 2 oranges at $1.75 each, how much will I spend in total?' ---


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "calculator",
    "action_input": "3 * 2.50 + 2 * 1.75"
}
```[0m
Observation: [36;1m[1;3m11.0[0m
Thought:[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "11.0"
}
```[0m

[1m> Finished chain.[0m

Final response: 11.0
-----------------------------------



## RAG Simulation

In [38]:
# Sample context that simulates retrieved document chunks from my CV
sample_context = """Projects: 
 Wireless Chess Robotic Arm (WCRA-AI)  Graduation Project. Grade: Excellent.  
▪ Designed & built a chess-playing robotic arm utilizing CNN and IoT with two operation modes. 
▪ Implemented strategic decision-making using the NegaMax search algorithm on a Raspberry Pi. 
▪ Guide search algorithm with efficiently updatable neural network (NNUE) as evaluation function. 
▪ Integrated computer vision for gameplay & data collection to train CNN. Kaggle 
 AI Model Testing & Reporting (NLP  Computer Vision) (Individual) 
▪ Extracted CAMeLBERT sentiment analysis strengths, drawbacks & potential use cases in a report. 
▪ Extracted FaceNet512 strengths & drawbacks through challenging selective examples in a report. 
 Quantity Forecasting Across Different Products & Cities (Time Series) (Individual). 
▪ Data exploration, visualization, preprocessing & feature engineering. 
▪ Time series cross-validation Modeling (Exponential smoothing, ARIMA, Prophet, & LSTM). 
▪ Tailor a new zero-aware error metric to suit the problem & squeeze performance using N-BEATS. 
 Enhanced Valuation of Personal Injury Claims (Omdena Project) Demo. 
▪ Collaborated with an international team to develop a predictive model for injury claims. 
▪ Data collection, analysis, visualization, and preprocessing. 
▪ Applied machine learning techniques to improve transparency and accuracy in legal processes. 
 AI Research Assistant (CrewAI) (individual). 
▪ Multi-agent AI system to help researchers identify trends and research gaps. 
▪ 3 specialized AI agents to conduct comprehensive research & a user-friendly Streamlit app. 
▪ Utilize APIs to access the latest research papers and tech news to guide the next research steps. 
"""

# Test with context
query_with_context("What was my graduation project?", sample_context , agent)


--- Testing with RAG context ---
Query: 'What was my graduation project?'
Using context: 'Projects: 
 Wireless Chess Robotic Arm (WCRA-AI)  Graduation Project. Grade: Excellent.  
▪ Designe...'


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "Wireless Chess Robotic Arm (WCRA-AI) Graduation Project. Grade: Excellent. Designed & built a chess-playing robotic arm utilizing CNN and IoT with two operation modes. Implemented strategic decision-making using the NegaMax search algorithm on a Raspberry Pi. Guide search algorithm with efficiently updatable neural network (NNUE) as evaluation function. Integrated computer vision for gameplay & data collection to train CNN. Kaggle"
}
```[0m

[1m> Finished chain.[0m

Final response: Wireless Chess Robotic Arm (WCRA-AI) Graduation Project. Grade: Excellent. Designed & built a chess-playing robotic arm utilizing CNN and IoT with two operation modes. Implemented strategic de

# RAG (Retrieval-Augmented Generation) System

## Token Text Splitter

In [46]:
# Initialize the Token Text Splitter for chunking
text_splitter = TokenTextSplitter(
    chunk_size=200,  # Smaller chunks for demonstration
    chunk_overlap=50
)

# 4. Load and split the document
loader = PyPDFLoader("Mohamed Motaz CV.pdf")
documents = loader.load()
chunks = text_splitter.split_documents(documents)

print(f"Split document into {len(chunks)} chunks")
print("\nSample chunks:")
for i, chunk in enumerate(chunks[:5]):  # Show first 3 chunks
    print(f"\nChunk {i+1}:")
    print(f"Content: {chunk.page_content[:100]}...")
    print(f"Metadata: {chunk.metadata}")


Split document into 8 chunks

Sample chunks:

Chunk 1:
Content: Mohamed Moataz 
Phone: (+20) 1120682864             
Email:  mohamad.motaz.mostafa@gmail.com        ...
Metadata: {'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2025-04-10T19:58:19+02:00', 'author': 'Win 10', 'moddate': '2025-04-10T19:58:19+02:00', 'source': 'Mohamed Motaz CV.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}

Chunk 2:
Content:  Diploma - Information Technology Institute (ITI)  
- Supervised by the Ministry of Communication to...
Metadata: {'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2025-04-10T19:58:19+02:00', 'author': 'Win 10', 'moddate': '2025-04-10T19:58:19+02:00', 'source': 'Mohamed Motaz CV.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}

Chunk 3:
Content: -playing robotic arm utilizing CNN and IoT with two operation modes. 
▪ Implemented strategic decisi

## Semantic Chunking

In [47]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings

In [None]:
# Initialize the Token Text Splitter for chunking
text_splitter = SemanticChunker(OpenAIEmbeddings())

# 4. Load and split the document
loader = PyPDFLoader("Mohamed Motaz CV.pdf")
documents = loader.load()
chunks = text_splitter.split_documents(documents)

print(f"Split document into {len(chunks)} chunks")
print("\nSample chunks:")
for i, chunk in enumerate(chunks[:5]):  # Show first 3 chunks
    print(f"\nChunk {i+1}:")
    print(f"Content: {chunk.page_content[:100]}...")
    print(f"Metadata: {chunk.metadata}")

Split document into 5 chunks

Sample chunks:

Chunk 1:
Content: Mohamed Moataz 
Phone: (+20) 1120682864             
Email:  mohamad.motaz.mostafa@gmail.com             
LinkedIn: mohamed-moataz 
Military Status: Completed. Professional Summary:  
Data Scientist with a strong foundation in mathematics and experience in developing and deploying AI 
models....
Metadata: {'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2025-04-10T19:58:19+02:00', 'author': 'Win 10', 'moddate': '2025-04-10T19:58:19+02:00', 'source': 'Mohamed Motaz CV.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}

Chunk 2:
Content: Proven abilities in data exploration, machine learning, and software development. Passionate 
about advancing AI & ML technologies, scientific research, and contributing to innovative projects. Education: 
AI & ML 9-month Diploma - Information Technology Institute (ITI)  
- Supervised by the Ministry of Communication to lea

## Embeddings and vector store

In [53]:
embeddings = OpenAIEmbeddings()
vector_store = FAISS.from_documents(chunks, embeddings)

In [None]:
def search_documents(query: str, k: int = 3):
    """Search for relevant document chunks."""
    print(f"\n--- Searching for: '{query}' ---")
    results = vector_store.similarity_search(query, k=k)
    
    print(f"Found {len(results)} relevant chunks:")
    for i, doc in enumerate(results):
        print(f"\nResult {i+1}:")
        print(f"Content: {doc.page_content}")
        print(f"Similarity: Present in top {k} results")
    
    return results

In [56]:
search_documents("What is my email addres?" , 1 )



--- Searching for: 'What is my email addres?' ---
Found 1 relevant chunks:

Result 1:
Content: Mohamed Moataz 
Phone: (+20) 1120682864             
Email:  mohamad.motaz.mostafa@gmail.com             
LinkedIn: mohamed-moataz 
Military Status: Completed. Professional Summary:  
Data Scientist with a strong foundation in mathematics and experience in developing and deploying AI 
models.
Similarity: Present in top 1 results


[Document(id='7d203892-c789-4395-8f5e-253473905564', metadata={'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2025-04-10T19:58:19+02:00', 'author': 'Win 10', 'moddate': '2025-04-10T19:58:19+02:00', 'source': 'Mohamed Motaz CV.pdf', 'total_pages': 2, 'page': 0, 'page_label': '1'}, page_content='Mohamed Moataz \nPhone: (+20) 1120682864             \nEmail:  mohamad.motaz.mostafa@gmail.com             \nLinkedIn: mohamed-moataz \nMilitary Status: Completed. Professional Summary:  \nData Scientist with a strong foundation in mathematics and experience in developing and deploying AI \nmodels.')]

# RAG

In [61]:

def answer_with_rag_and_agent(query: str):
    """Process a query using RAG + agent."""
    print(f"\n--- Processing query with RAG + Agent: '{query}' ---")
    
    # Retrieve relevant documents
    relevant_docs = search_documents(query, k=1)
    context_text = "\n\n".join([doc.page_content for doc in relevant_docs])
    
    # Enrich the query with context
    enriched_query = (
        f"I have a question that may require information from these documents:\n\n"
        f"{context_text}\n\n"
        f"My question is: {query}"
    )
    
    # Process with the agent
    print("\nSending to agent with context...")
    response = agent.invoke(input=enriched_query)
    
    print(f"\nFinal response: {response['output']}")
    print("-----------------------------------\n")
    
    return response['output']


In [62]:
answer_with_rag_and_agent("What is my email addres?")



--- Processing query with RAG + Agent: 'What is my email addres?' ---

--- Searching for: 'What is my email addres?' ---
Found 1 relevant chunks:

Result 1:
Content: Mohamed Moataz 
Phone: (+20) 1120682864             
Email:  mohamad.motaz.mostafa@gmail.com             
LinkedIn: mohamed-moataz 
Military Status: Completed. Professional Summary:  
Data Scientist with a strong foundation in mathematics and experience in developing and deploying AI 
models.
Similarity: Present in top 1 results

Sending to agent with context...


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "mohamad.motaz.mostafa@gmail.com"
}
```[0m

[1m> Finished chain.[0m

Final response: mohamad.motaz.mostafa@gmail.com
-----------------------------------



'mohamad.motaz.mostafa@gmail.com'

In [64]:
answer_with_rag_and_agent("what was my ranking in QTech hakathon?")


--- Processing query with RAG + Agent: 'what was my ranking in QTech hakathon?' ---

--- Searching for: 'what was my ranking in QTech hakathon?' ---
Found 1 relevant chunks:

Result 1:
Content: All certifications are available upon request. GitHub 
Hackathons: 
Q-tech & ITI Hackathon 
Ranked 3rd place among 15 different teams from ITI graduation projects. Certificate  
Certifications
 Intermediate Machine Learning, Kaggle.
Similarity: Present in top 1 results

Sending to agent with context...


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m```json
{
    "action": "Final Answer",
    "action_input": "Ranked 3rd place among 15 different teams from ITI graduation projects."
}
```[0m

[1m> Finished chain.[0m

Final response: Ranked 3rd place among 15 different teams from ITI graduation projects.
-----------------------------------



'Ranked 3rd place among 15 different teams from ITI graduation projects.'