<a href="https://colab.research.google.com/github/avikumart/LLM-GenAI-Transformers-Notebooks/blob/main/TMLC_LLM_projects/RAG/RAG_Personal_Resource_Assistant_Langchain%2C_Cohere_and_ChromaDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Installing necessary libraries

Cohere provides free trial keys to use their LLMs. So generate one trial key from dashboard.cohere.com

In [1]:
!pip install langchain-cohere langchain pdfminer.six chromadb -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.3/44.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m35.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m605.5/605.5 kB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m73.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.6/278.6 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00

langchain-cohere: Enables integration of Cohere's language models with LangChain for advanced text generation and processing workflows.

langchain: Provides a modular framework for building language model-powered applications, such as chatbots, question-answering systems, and conversational agents.

pdfminer.six: Facilitates text extraction from PDF files, making it useful for document analysis and preprocessing tasks.

chromadb: A vector database library designed for efficient storage and retrieval of embeddings, ideal for tasks like semantic search and recommendation systems.

## Importing libraries

In [2]:
import os
from google.colab import userdata
os.environ["COHERE_API_KEY"] = userdata.get('COHERE_KEY')
from langchain_core.prompts import PromptTemplate
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_cohere import ChatCohere
from langchain.schema.output_parser import StrOutputParser
from pdfminer.high_level import extract_text as extract_text_pdf_miner
from langchain.vectorstores import Chroma
from langchain.embeddings import CohereEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.schema import Document
from langchain_core.runnables import RunnableParallel,RunnablePassthrough


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


## VectorDB setup

In [3]:
# Define the directory where the Chroma database will persist data
persist_directory = "/content/chroma_db"

# Initialize Cohere embeddings with the specified model
# "embed-english-v3.0" is a pre-trained English language embedding model by Cohere
# The user_agent parameter specifies the tool or library using the Cohere API, in this case, LangChain
embedding = CohereEmbeddings(
    model="embed-english-v3.0",
    user_agent="langchain"
)


  embedding = CohereEmbeddings(


We are processing 2 research papers on transformers and yolo. You can use the PDFs

In [4]:
# Loop through a list of PDF files to process
for pdf_name in ["/content/Newwhitepaper_Operationalizing Generative AI on Vertex AI.pdf", "/content/Newwhitepaper_Solving Domain-Specific problems using LLMs.pdf"]:
    # Open each PDF file in binary mode
    with open(pdf_name, 'rb') as f:
        # Extract text from the PDF using the extract_text_pdf_miner function
        text = extract_text_pdf_miner(f)

        # Clean the extracted text by removing newline characters and joining into a single string
        cleaned_text = " ".join(text.split("\n"))

        # Initialize a list to store document chunks
        docs = []

        # Create a text splitter to divide the text into manageable chunks
        # Each chunk has a maximum size of 2048 characters with a 512-character overlap
        splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=512)

        # Split the cleaned text into chunks and wrap each chunk in a Document object
        for chunk in splitter.split_text(cleaned_text):
            docs.append(Document(page_content=chunk, metadata={"source": pdf_name}))

    # Create a Chroma collection from the processed documents
    # Use the specified persist directory and embedding model for storage and retrieval
    vector_collection_fixed_size = Chroma.from_documents(
        documents=docs,
        persist_directory=persist_directory,
        embedding=embedding
    )

In [5]:
# Initialize a Chroma vector database
# The persist_directory specifies the location where the database is stored
# The embedding_function parameter provides the embedding model used for vector representation
vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)

  vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)


In [6]:
vectordb

<langchain_community.vectorstores.chroma.Chroma at 0x7de5441a88b0>

In [7]:
# Perform a similarity search on the vector database
# The query "What is YOLO?" is used to find the most relevant documents
# k=1 specifies that the top 1 most similar document should be retrieved
# The method also returns relevance scores indicating how closely each document matches the query
vectordb.similarity_search_with_relevance_scores("how to operationalize the genAI using vertex?", k=1)

[(Document(metadata={'source': '/content/Newwhitepaper_Operationalizing Generative AI on Vertex AI.pdf'}, page_content='Operationalizing  Generative AI on  Vertex AI using  MLOps  Authors: Anant Nawalgaria,   Gabriela Hernandez Larios, Elia Secchi,   Mike Styer, Christos Aniftos   and Onofrio Petragallo  \x0cAcknowledgements  Reviewers and Contributors  Nenshad Bardoliwalla  Warren Barkley  Mikhail Chrestkha  Chase Lyall  Lakshmanan Sethu  Erwan Menard  Curators and Editors  Antonio Gulli  Anant Nawalgaria  Grace Mollison   Technical Writer  Joey Haymaker  Designer  Michael Lanning   2  Operationalizing Generative AI on Vertex AI using ML OpsSeptember 2024\x0cTable of contents  Introduction     What are DevOps and MLOps?   Lifecycle of a gen AI system   Discover   Develop and experiment   The foundational model paradigm   The core component of LLM Systems: A prompted model component     Chain & Augment   Tuning & training     Data Practices   Evaluate   Deploy     Deployment of gen AI 

In [8]:
# Initialize an LLM instance using Cohere's "command-r" model
# The temperature parameter controls randomness in the generated responses; 0 ensures deterministic outputs
llm = ChatCohere(model="command-r", temperature=0)

# Define a prompt template for generating answers based on a given context and question
prompt_str = """Answer the question below using the context:

Context: {context}

Question: {question}

Answer: """

# Create a ChatPromptTemplate from the string template, enabling dynamic input for context and question
prompt = ChatPromptTemplate.from_template(prompt_str)

# Create a retrieval pipeline to fetch relevant context and pass through the user's question
retrieval = RunnableParallel(
    {
        # Use the vector database as a retriever to fetch relevant context for the question
        "context": vectordb.as_retriever(),

        # Pass through the user's input question without modification
        "question": RunnablePassthrough()
    }
)

# Define an output parser to format the generated response into a string
output_parser = StrOutputParser()

# Create a processing chain that retrieves context, formats the prompt, generates an LLM response, and parses the output
chain = retrieval | prompt | llm | output_parser

In [9]:
# Invoke the chain of components (retrieval, prompt generation, LLM processing, and output parsing)
# The question "What is YOLO?" is passed through the chain to generate the response
response = chain.invoke("What is LLMs?")

# Print the response generated by the chain
print(response)

LLMs are large language models which have become powerful tools to tackle complex challenges in different domains. While the early focus was on general-purpose tasks, LLMs have evolved to address specific problems in specialized fields through fine-tuning. They process vast amounts of data across multiple domains, enhancing existing workflows and unlocking new possibilities.


Other chain invoking methods!

.invoke(): The goal is to pass in an input and receive the output—neither more nor less.

.batch(): This is faster than using invoke three times when you wish to supply several inputs to get multiple outputs because it handles the parallelization for you.

.stream():  We may begin printing the response before the entire response is complete.

In [10]:
response_with_batch = chain.batch(["What is Transformers", "How is Transformer different than LLMs?"])

for response in response_with_batch:
  print(response)
  print("\n")

I found several references to the term 'Transformer' in the documents you provided. 

The Vision Transformer (ViT) is a type of documentation focused on transformers in the field of AI. Mingxing Tan and Quoc V. Le also refer to 'EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks', which employs transformers. Hugging Face also has a documentation site for transformers called 'Transformers Documentation'. 

However, I found no detailed explanation of what transformers are or their purpose in the provided texts. I can infer that they are a technology used in machine learning models, but I cannot provide a full and precise definition in this instance.

The term 'transform' also appears in the document, where it discusses the transformation of real-world data patterns over time and the need for models to adapt and learn from these changes.


The term "Transformer" refers to a type of neural network architecture, while LLMs (Large Language Models) are a category of mode

In [None]:
for chunk in chain.stream("What are the 3 vectors in Transformers architecture?"):
  print(chunk, flush=True, end="")

I found no mention of three vectors in the Transformer architecture. However, a key component in the Transformer architecture is the use of three distinct sub-layers in both the encoder and decoder stacks. 

The first sub-layer employs a multi-head self-attention mechanism, which presumably involves multiple attention heads that operate in parallel and attend to different portions of the input sequence. This allows the model to capture different aspects of input dependencies.

The second sub-layer is a simple, position-wise fully connected feed-forward network, which applies the same transformation to all positions in the sequence. 

The third element, not a vector but rather a fundamental aspect of the architecture, is the residual connection applied around each sub-layer. This is followed by layer normalization to stabilize and streamline the information flow. 

These three architectural elements, in combination, form the key building blocks of the Transformer model, enabling it to c