<a href="https://colab.research.google.com/github/Vlad-Enia/NN-LLM-Intro/blob/master/Part%20II%20-%20LLMs/Demos/RAG/RAG_Chromadb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG with OpenAI and Chroma


### Chroma

* ChromaDB is an open-source vector database designed for embedding-based search and retrieval, making it a popular choice for Retrieval-Augmented Generation (RAG) applications.

* It provides a simple, Python-native interface for storing and querying high-dimensional vectors.

* ChromaDB supports persistent storage, filtering, and hybrid search, making it ideal for use cases where fast, lightweight, and easy-to-integrate vector storage is needed.

In [None]:
!pip install chromadb
!pip install pypdf

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import chromadb
from pypdf import PdfReader
from google.colab import userdata
import uuid

### Create Local Chroma Vector Store
We create a ChromaDB vector store using a specified path and name

In [None]:
chromadb_path = userdata.get('CHROMADB_PATH')

client = chromadb.PersistentClient(path=chromadb_path)

if 'Electric_Vehicles' in client.list_collections():
    client.delete_collection(name='Electric_Vehicles')

collection = client.create_collection(name='Electric_Vehicles')

### Parsing Knowledge
We parse the PDF files using `PdfReader` from `pypdf` package as follows
  * Each page in the PDF document is stored and embedded as a `document` in the vector store
  * For each `document`, we also include as metadata the `file name` and the `page number`
  * We generate **unique ids** using `uuid` package for each document

In [None]:
file_names = [
    'electric_vehicles.pdf',
    'pev_consumer_handbook.pdf',
    'department-for-transport-ev-guide.pdf'
]

# Replace with actual file paths after uploading them to drive
ev_docs_path = userdata.get('EV_DOCS_PATH')

for file in file_names:
    print(f'Processing {file}')
    path = f'{ev_docs_path}/{file}'
    reader = PdfReader(path)

    for i, page in enumerate(reader.pages):
        text = page.extract_text()

        collection.add(
            documents=[text],
            metadatas=[{ 'file': file, 'page': i + 1 }], # i starts from 0, but pdf pages start from 1
            ids=[uuid.uuid4().hex]
        )

Processing electric_vehicles.pdf
Processing pev_consumer_handbook.pdf
Processing department-for-transport-ev-guide.pdf


### Query the Vector Store and the LLM

In [None]:
from openai import OpenAI

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=OPENAI_API_KEY)

In [None]:
from IPython.display import Markdown, display


question = 'What types of charging stations exist for charging EVs?'
candidate_number = 5  # number of candidates to include from the vector store

if question is not None and len(question) > 0:
    # Query the vector store
    results = collection.query(
        query_texts=[question], # the question is embedded automatically
        include=['documents', 'metadatas'], # specifically ask for the documents and the metadata
        n_results=candidate_number
    )

    # Concatenate the results to form context
    context = ''
    documents = results['documents'][0]
    metadatas = results['metadatas'][0]

    for i in range(0, len(documents)):
        document = documents[i]
        metadata = metadatas[i]
        context += f'''
          Document title: {metadata["file"]}
          Document Page: {metadata["page"]}
          Context: {document}'''
        context += '\n\n'

    # The promt will include instructions, the question and the context
    content = f'''
        Answer the following question using the provided context, and if the
        answer is not contained within the context, say "I don't know." Explain
        your answer if possible, and give reference (document and page) to the sources you used to
        answer.

        Question:
        {question}

        Context:
        {context}
        '''

    messages = [{ 'role': 'user', 'content': content }]

    # Get the response
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=messages,
    )

    output = response.choices[0].message.content
    display(Markdown(output))

The types of charging stations for charging Electric Vehicles (EVs) are:

1. **Level 1 Charging Station**: This provides charging through a 120-volt AC plug and typically adds about 2 to 5 miles of range per hour of charging. It doesn't require any special installation and usually comes standard with portable EVSE cordsets.

2. **Level 2 Charging Station**: This type offers charging through a 240-volt AC plug and can add about 10 to 20 miles of range per hour of charging. Level 2 EVSE requires installation of charging equipment and a dedicated electrical circuit.

3. **DC Fast Charging Station**: This provides DC electricity directly to the vehicle with an AC input of 480 V, enabling rapid charging that can add about 60 to 80 miles of range to a PEV in 20 minutes or less.

4. **Wireless or Inductive Charging Station**: This technology uses an electromagnetic field to transfer electricity to a PEV without a cord. It is becoming available again as an aftermarket add-on for newer EVs.

These variations cater to different charging needs and scenarios, depending on charging speed and installation requirements. The details regarding these charging types can be found in the "Plug-In Electric Vehicle Handbook for Consumers" on pages 9, 10, and 12.