# Ollama PDF RAG Notebook

## Import Libraries


In [None]:
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_ollama import OllamaEmbeddings
from langchain_ollama.chat_models import ChatOllama
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.retrievers.multi_query import MultiQueryRetriever
from IPython.display import display, Markdown
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.vectorstores import FAISS  # This is the correct import for FAISS



## Load PDF

In [2]:
def local_path(data):
    loader = DirectoryLoader(data,
                    glob="*.pdf",
                    loader_cls=PyPDFLoader)
    
    documents = loader.load()

    return documents


In [3]:
extracted_data = local_path("Data/")

## Split text into chunks

In [4]:
# Split text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(extracted_data)
print(f"Text split into {len(chunks)} chunks")


Text split into 322 chunks


## Create vector database

In [6]:
# Create FAISS vector store
vector_store = FAISS.from_documents(
    documents=chunks,
    embedding=embeddings
)

# Save the FAISS index
index_path = "faiss_index_new"
vector_store.save_local(index_path)
print(f"FAISS index saved to {index_path}")

NameError: name 'FAISS' is not defined

[Document(metadata={'source': 'Data\\Dsa.pdf', 'page': 0, 'page_label': '1'}, page_content='DSA\nDat a St ruc tur es and Alg orith ms\nAnn otated Referenc e w ith  Examp les\nGranville Bar ne/g425 Luca  Del Tongo'),
 Document(metadata={'source': 'Data\\Dsa.pdf', 'page': 1, 'page_label': '2'}, page_content='Data Structures and Algorithms:\nAnnotated Reference with Examples\nFirst Edition\nCopyright c⃝ Granville Barnett, and Luca Del Tongo 2008.'),
 Document(metadata={'source': 'Data\\Dsa.pdf', 'page': 2, 'page_label': '3'}, page_content='This book is made exclusively available from DotNetSlackers\n(http://dotnetslackers.com/) the place for .NET articles, and news from\nsome of the leading minds in the software industry.'),
 Document(metadata={'source': 'Data\\Dsa.pdf', 'page': 3, 'page_label': '4'}, page_content='Contents\n1 Introduction 1\n1.1 What this book is, and what it isn’t . . . . . . . . . . . . . . . .1\n1.2 Assumed knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 1

ValidationError: 1 validation error for Document
page_content
  Input should be a valid string [type=string_type, input_value=Document(metadata={'sourc...e/g425 Luca  Del Tongo'), input_type=Document]
    For further information visit https://errors.pydantic.dev/2.10/v/string_type

## Set up LLM and Retrieval

In [21]:
# Set up LLM and retrieval
local_model = "llama3.2"  
llm = ChatOllama(model=local_model)

In [22]:
# Query prompt template
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate 2
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

# Set up retriever
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

NameError: name 'vector_db' is not defined

## Create chain

In [9]:

template = """Some point to note while generating an answer:
a) Be curteous and creative in your responses. Answer like you are a friendly agent, making the user feel comfortable.
a.1) Imagine yourself as an intelligent assistant which is helping the users on different types of study materials. You will have a knolwedgebase of study materials to search on.
b) Be very structured in your response, Give the response with html tags as explained below
give <b> html tags for bullet points and to bold the important words. Use <br> html tags to display contents in new lines wherever required for clear displaying. 
Important Note : We need to display the content with proper HTML tags. Your task is to automatically add the relevant <li>, <ul>, break <br>, <p>, bold <b> tags to beautifully present the responses to be displayed in an html <div> tag.
c) Remember to not talk about any leaves that are not applicable to me. Example Maternity Leaves are not applicable to male employees. Contractual Leave is not valid for non contractual employeesS
d) The answer should not be in the email format and look like a normal chat.

Context:
{context}

Question: {question}

Answer:
Please provide the answer in a clear, point-wise format for better readability:
1. ...
2. ...
3. ...
"""

prompt = ChatPromptTemplate.from_template(template)

In [10]:
# Create chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Chat with PDF

In [11]:
def chat_with_pdf(question):
    """
    Chat with the PDF using the RAG chain.
    """
    return display(Markdown(chain.invoke(question)))

In [11]:
# Example 1
chat_with_pdf("What is SQL?")

Here's the information about SQL:

**What is SQL?**

* SQL stands for Structured Query Language.
* It is a programming language designed for managing and manipulating data stored in relational database management systems (RDBMS).

**Main Characteristics of SQL:**

1. **Declarative Language**: SQL is used to declare what you want to do with your data, rather than how to do it.
2. **Structured**: SQL uses a structured syntax to define queries, with a focus on readability and maintainability.
3. **Language**: SQL is a language that allows you to interact with databases, perform operations, and retrieve data.

**SQL Syntax:**

* SQL commands are written in a specific syntax, which includes:
	+ SELECT
	+ INSERT
	+ UPDATE
	+ DELETE
	+ CREATE
	+ DROP
	+ ALTER

**Common SQL Operations:**

1. **Data Retrieval**: Use the SELECT statement to retrieve data from a database.
2. **Data Insertion**: Use the INSERT statement to add new data to a database.
3. **Data Update**: Use the UPDATE statement to modify existing data in a database.
4. **Data Deletion**: Use the DELETE statement to remove data from a database.

**SQL Uses:**

1. **Database Management**: SQL is used to manage and manipulate data stored in relational databases.
2. **Data Analysis**: SQL is used to analyze and retrieve data for decision-making purposes.
3. **Web Development**: SQL is used to interact with web applications that use databases as storage.
4. **Business Intelligence**: SQL is used to perform complex queries and reports on large datasets.

**Key Benefits of SQL:**

1. **Standardization**: SQL is a standardized language, making it easy to communicate between different database systems.
2. **Flexibility**: SQL allows for flexible data modeling, enabling efficient storage and retrieval of complex data structures.
3. **Security**: SQL provides robust security features to protect sensitive data.

Overall, SQL is an essential tool for anyone working with databases, providing a powerful and flexible language for managing and analyzing data.

In [12]:
# Example 1
chat_with_pdf("give me detail information about sql commands")

Here is the detailed information about SQL commands:

**SQL Commands**

### 1. CREATE INDEX
* **Purpose**: Create an index on one or more columns in a table.
* **Syntax**: `CREATE INDEX index_name ON table_name (column1, column2, ...)`
* **Example**: `CREATE INDEX idx_employee_name ON employees (name);`
* **Benefits**:
	+ Improves query performance by enabling faster data retrieval.
	+ Reduces the number of rows that need to be scanned during a query.

### 2. DROP INDEX
* **Purpose**: Remove an existing index from a table.
* **Syntax**: `DROP INDEX index_name ON table_name;`
* **Example**: `DROP INDEX idx_employee_name;`
* **Benefits**:
	+ Removes the index, which can improve performance if the index is not used.

### 3. CREATE CONSTRAINT
* **Purpose**: Define constraints that ensure data integrity.
* **Constraints include**:
	+ PRIMARY KEY: ensures a unique value in a column or set of columns.
	+ FOREIGN KEY: ensures relationships between tables.
	+ UNIQUE: ensures that only unique values are stored in a column.
	+ NOT NULL: ensures that a value is not null in a column.
	+ CHECK: ensures that a value meets a specific condition.
* **Syntax**: `ALTER TABLE table_name ADD CONSTRAINT constraint_name [CONSTRAINT_TYPE] [COLUMN_NAME]`
* **Example**: `ALTER TABLE orders ADD CONSTRAINT fk_customer FOREIGN KEY (customer_id) REFERENCES customers(id);`

### 4. DROP CONSTRAINT
* **Purpose**: Remove an existing constraint from a table.
* **Syntax**: `ALTER TABLE table_name DROP CONSTRAINT constraint_name;`
* **Example**: `ALTER TABLE orders DROP CONSTRAINT fk_customer;`
* **Benefits**:
	+ Removes the constraint, which can simplify queries.

### 5. TRUNCATE TABLE
* **Purpose**: Delete data from a table, but not the table itself.
* **Syntax**: `TRUNCATE TABLE table_name;`
* **Example**: `TRUNCATE TABLE orders;`
* **Benefits**:
	+ Quickly removes all rows from a table without affecting indexing or relationships.

### 6. SELECT
* **Purpose**: Retrieve data from a database.
* **Syntax**: `SELECT column1, column2, ... FROM table_name [WHERE condition];`
* **Example**: `SELECT * FROM employees;` (returns all columns and rows)
* **Benefits**:
	+ Allows extraction of specific columns from a table.

### 7. CREATE TABLE
* **Purpose**: Create a new table in the database.
* **Syntax**: `CREATE TABLE table_name (column1 data_type, column2 data_type, ...);`
* **Example**: `CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50), department VARCHAR(20));`

### 8. DROP TABLE
* **Purpose**: Remove an existing table from the database.
* **Syntax**: `DROP TABLE table_name;`
* **Example**: `DROP TABLE orders;`
* **Benefits**:
	+ Removes the table and all data it contains.

### 9. ALTER TABLE
* **Purpose**: Make changes to a table, such as adding or removing columns.
* **Syntax**: `ALTER TABLE table_name [ADD|REMOVE] COLUMN column_name [data_type];`
* **Example**: `ALTER TABLE employees ADD COLUMN salary DECIMAL(10, 2);`

### 10. UPDATE
* **Purpose**: Modify existing data in a table.
* **Syntax**: `UPDATE table_name SET column1 = value1, column2 = value2 WHERE condition;`
* **Example**: `UPDATE employees SET salary = 50000 WHERE id = 1;`

### 11. DELETE
* **Purpose**: Remove data from a table.
* **Syntax**: `DELETE FROM table_name WHERE condition;`
* **Example**: `DELETE FROM orders WHERE customer_id = 1;`

Note that this is not an exhaustive list of SQL commands, but rather a selection of commonly used ones.

In [13]:
# Example 1
chat_with_pdf(input("Enter your Query : " ))

Here's a detailed explanation of Bubble Sort:

**What is Bubble Sort?**

Bubble Sort is a simple sorting algorithm that works by repeatedly iterating through a list of elements and swapping adjacent elements if they are in the wrong order.

**How does it work?**

1. The algorithm starts at the beginning of the list.
2. It compares the first two elements and swaps them if they are in the wrong order (i.e., if the first element is greater than the second element).
3. It then moves to the next pair of elements (the third and fourth elements) and repeats the process, comparing and swapping adjacent elements if necessary.
4. This process continues until the end of the list is reached.
5. The algorithm then starts over from the beginning of the list, but this time it only compares elements that have not been swapped yet. If any swaps are made, the process repeats until no more swaps are needed.

**Step-by-Step Example**

Suppose we want to sort the following list using Bubble Sort:

`[5, 2, 8, 3, 1]`

1. Start at the beginning of the list and compare the first two elements (5 and 2). Since 5 is greater than 2, swap them.
`[2, 5, 8, 3, 1]`
2. Move to the next pair of elements (5 and 8). Since 5 is less than 8, no swap is needed.
3. Continue this process until the end of the list is reached:
	* Compare 5 and 3. Swap if necessary (no swap).
	* Compare 5 and 1. Swap if necessary (swap).
	* Compare 2 and 1. Swap if necessary (swap).
4. The sorted list is now `[1, 2, 3, 5, 8]`.
5. Start over from the beginning of the list:
	* Compare 1 and 2. No swap.
	* Compare 2 and 3. No swap.
	* ...and so on.
6. Since no swaps were made this time, the algorithm terminates and the final sorted list is `[1, 2, 3, 5, 8]`.

**Time Complexity**

The time complexity of Bubble Sort is O(n^2), where n is the number of elements in the list. This means that the algorithm's running time increases quadratically with the size of the input.

**Space Complexity**

The space complexity of Bubble Sort is O(1), since it only requires a single additional memory location to store temporary swap values.

**Use Cases**

Bubble Sort is suitable for small datasets or when the dataset is nearly sorted, as its simplicity and stability make it easy to implement. However, for larger datasets or more complex data structures, other sorting algorithms like QuickSort or Merge Sort may be more efficient.








 
   
     
         ## Clean up (optional)

In [None]:
# Optional: Clean up when done 
vector_db.delete_collection()
print("Vector database deleted successfully")