# Code Clash

## Gita_RAG


### Step 1 : Data Extraction and Collection

This script demonstrates loading multiple CSV files into pandas DataFrames.

Files Loaded:
1. Bhagwad_Gita_Verses_English_Questions.csv
   - Contains Bhagavad Gita verses in English with related questions.
   - Stored in `df`.

2. Bhagwad_Gita_Verses_Concepts.csv
   - Contains concepts associated with the verses.
   - Stored in `df1`.

3. Bhagwad_Gita_Verses_English.csv
   - Contains the English translation of Bhagavad Gita verses.
   - Stored in `df2`.

4. Gita_Word_Meanings_English.csv
   - Contains the English meanings of Sanskrit words from the Gita.
   - Stored in `df3`.

5. Gita_Word_Meanings_Hindi.csv
   - Contains the Hindi meanings of Sanskrit words from the Gita.
   - Stored in `df4`.

Steps:
1. Import the pandas library.
2. Define the file paths for each CSV file.
3. Load each file into a pandas DataFrame using `pd.read_csv()`.

Variables:
- `df`, `df1`, `df2`, `df3`, `df4`: DataFrames storing the loaded data.

Use Cases:
- Preparing data for analysis, visualization, or machine learning tasks.
- Merging and processing related datasets for insights.

Note:
- Ensure the file paths are accurate and the files exist in the specified location.

In [73]:
import pandas as pd

file_path = "Bhagwad_Gita_Verses_English_Questions.csv" # Loading data from csv file and storing it in a dataframe
df = pd.read_csv(file_path)

file_path1 = "Bhagwad_Gita_Verses_Concepts.csv"
df1 = pd.read_csv(file_path1)

file_path2 = "Bhagwad_Gita_Verses_English.csv"
df2 = pd.read_csv(file_path2)

file_path3 = "Gita_Word_Meanings_English.csv"
df3 = pd.read_csv(file_path3)

file_path4 = "Gita_Word_Meanings_Hindi.csv"
df4 = pd.read_csv(file_path4)

In [74]:
#df
#df1
#df2
#df3
#df4

 ### Step 2 : Data Formating


This script demonstrates how to use LlamaIndex's `SimpleDirectoryReader` with a custom file extractor 
to process multiple CSV files efficiently. 

Modules Used:
- SimpleDirectoryReader: Reads and processes files from specified paths.
- PagedCSVReader: A specialized reader for handling CSV files, suitable for large datasets.

Steps:
1. **Import Required Libraries**:
    - Import the `SimpleDirectoryReader` for reading files.
    - Import the `PagedCSVReader` for efficient CSV file handling.

2. **Initialize CSV Reader**:
    - Create an instance of `PagedCSVReader` to process `.csv` files.

3. **Setup SimpleDirectoryReader**:
    - Pass a list of file paths (`input_files`) to the reader.
    - Map `.csv` files to the `PagedCSVReader` via the `file_extractor` argument.

4. **Load Data**:
    - Use `load_data()` to process all specified files.
    - The output is stored in the `docs` variable.

Variables:
- `file_path1, file_path2, ...`: Paths to the input CSV files.
- `csv_reader`: An instance of `PagedCSVReader` for processing CSV files.
- `reader`: An instance of `SimpleDirectoryReader` configured to read CSV files.
- `docs`: A list of processed documents ready for downstream tasks.

Use Cases:
- Efficiently loading and processing multiple CSV files for embedding, querying, or analytics.


In [3]:
# Necessary Modules from llama_index
from llama_index.core.readers import SimpleDirectoryReader # used to read and extract data from files of different formats such as csv
from llama_index.readers.file import PagedCSVReader # used to read data from csv files

csv_reader = PagedCSVReader()

reader = SimpleDirectoryReader( 
    input_files=[file_path1,file_path2,file_path3,file_path4],
    file_extractor= {".csv": csv_reader}
    )

docs = reader.load_data()

In [4]:
docs # Displaying the data extracted from the files as a dictionary

[Document(id_='6929c48a-8e02-40af-9eab-1d81581e4a2b', embedding=None, metadata={'file_path': 'Bhagwad_Gita_Verses_Concepts.csv', 'file_name': 'Bhagwad_Gita_Verses_Concepts.csv', 'file_type': 'text/csv', 'file_size': 33660, 'creation_date': '2024-12-27', 'last_modified_date': '2024-12-27'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text_resource=MediaResource(embeddings=None, data=None, text='Chapter: 2\nVerse: 13\nConcept: Transmigration of Soul\nKeyword: Transmigration\nSanskrit: देहिनोऽस्मिन्यथा देहे कौमारं यौवनं जरा| तथा देहान्तरप्राप्तिर्धीरस्तत्र न मुह्यति || 2.13 ||\nEnglish: Just as childhood, youth, and old age are natural stages for a living being in their current body, so too is the a

### RAG Pipeline

#### Step 3 : Creation of vector store 

Script for Loading and Indexing Bhagavad Gita Verses with LlamaIndex and Chroma

This script demonstrates the process of loading and indexing Bhagavad Gita verses using LlamaIndex for embedding and Chroma for vector storage.

Files Loaded:
1. **Embedding Model**:
   - A Hugging Face model used for embedding Bhagavad Gita verses into vector representations.
   - Stored in `embed_model`.

2. **Chroma Database**:
   - A persistent database used for storing vector embeddings.
   - Stored in `db` and `db2`.

3. **Chroma Collection - "Verse_collection"**:
   - Contains the embeddings of Bhagavad Gita verses stored in the Chroma database.
   - Stored in `chroma_collection`.

4. **Index**:
   - A vector index created using the embedded Bhagavad Gita verses.
   - Stored in `index`.

Steps:
1. **Import Required Libraries**:
   - Import necessary libraries like `HuggingFaceEmbedding` from LlamaIndex, `chromadb`, and others for vector storage and indexing.

2. **Initialize Embedding Model**:
   - Load the Hugging Face model for embedding text using `HuggingFaceEmbedding()`.

3. **Create Chroma Database Client**:
   - Set up a persistent Chroma client to store vectors on disk using `chromadb.PersistentClient`.

4. **Create and Configure Chroma Collection**:
   - Retrieve or create the `Verse_collection` where the vectors will be stored.

5. **Create Vector Store**:
   - Use `ChromaVectorStore` to interface with the Chroma collection, and create a `StorageContext` to manage storage configurations.

6. **Create Index from Documents**:
   - Create an index from the Bhagavad Gita verses using `VectorStoreIndex.from_documents()` with the embedded data.

7. **Load the Index from Disk**:
   - Reinitialize the Chroma client and vector store to load the saved index from disk for further use.

Variables:
- **embed_model**: The Hugging Face model used for generating vector embeddings from text.
- **db** and **db2**: Persistent Chroma clients storing the vector database.
- **chroma_collection**: The Chroma collection containing stored embeddings.
- **vector_store**: The Chroma vector store object used for managing and querying the embeddings.
- **storage_context**: A context object for configuring how the embeddings are stored.
- **index**: The vector index created from the embeddings of the Bhagavad Gita verses.

Use Cases:
- **Text Embedding and Storage**: Transforming textual data (Bhagavad Gita verses) into vector embeddings and storing them in a persistent database.
- **Search and Querying**: Creating a searchable index for efficiently querying the Bhagavad Gita verses based on semantic similarity.
- **Data Analysis**: Preparing the dataset for analysis, machine learning, or deep learning tasks, such as similarity comparison, clustering, or topic modeling.



In [5]:
# Creating HuggingFaceEmbedding object to use the HuggingFace model for embeddings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embed_model = HuggingFaceEmbedding()

In [6]:
# Creating a VectorStore object to store the embeddings of the data
import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

In [7]:
# create client
db = chromadb.PersistentClient(path="./gita_db")
chroma_collection = db.get_or_create_collection("Verse_collection")

In [8]:
# save embedding to disk
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [9]:
# create index
index = VectorStoreIndex.from_documents(
    docs, storage_context=storage_context, embed_model=embed_model
)

In [10]:
# load from disk
db2 = chromadb.PersistentClient(path="./gita_db")
chroma_collection = db2.get_or_create_collection("Verse_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

#### Step 4 : Model Integration 

Script for Querying the Bhagavad Gita Index Using Ollama LLM

This script demonstrates how to query a persisted index of Bhagavad Gita verses using the Ollama language model for natural language processing.

Files and Variables:
1. **Ollama LLM (Language Model)**:
   - An instance of the `Ollama` language model used for processing and generating responses to queries.
   - Stored in `llm`.

2. **Persisted Index**:
   - The index created from Bhagavad Gita verses, stored using LlamaIndex and Chroma.
   - Stored in `index`.

3. **Query Engine**:
   - A query engine built using the index and the Ollama model for retrieving answers to user queries.
   - Stored in `query_engine`.

4. **Query**:
   - The user input query, such as "How does Gita start?" that is passed to the query engine.
   - Input from the user and processed by the query engine.

5. **Response**:
   - The response generated by the query engine based on the user's query.
   - Stored in `response`.

Steps:
1. **Import the Ollama LLM**:
   - Import the `Ollama` language model from LlamaIndex's `llms.ollama`.

2. **Initialize Ollama LLM**:
   - Create an instance of the `Ollama` model with the version `"llama3.2"`. This model will be used to process queries and generate responses.

3. **Create a Query Engine**:
   - Create a `RetrieverQueryEngine` using the `index` (created in the previous script) and the `Ollama` language model (`llm`). This query engine will allow querying the indexed data.

4. **Query the Index**:
   - Prompt the user to enter a query using `input()`.
   - Pass the query to the query engine using `query_engine.query()` to retrieve a response based on the index.

5. **Print the Response**:
   - Print the response returned by the `query_engine` to the user. The response will be the model's generated answer to the user's query.



In [None]:
from llama_index.llms.ollama import Ollama
llm = Ollama(model="llama3.2")
# Query Data from the persisted index
query_engine = index.as_query_engine(llm=llm) # RetrieverQueryEngine
response = query_engine.query(input("Enter your query: ")) # Query : "How does Gita start?"
print(response) # CompletionResponse

The Bhagwad Gita begins with Dhritarashtra's inquiry to Sanjaya about the assembly of his people and the sons of Pandu on the holy plain of Kurukshetra.
