<a href="https://colab.research.google.com/github/anshupandey/Generative-AI-for-Professionals/blob/main/RAG_implementation_with_PineCone_Cohere_and_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RAG_implementation_with_PineCone_Cohere_and_LangChain

Retrieval-Augmented Generation (RAG) combines large language models (LLMs) with external knowledge bases to enhance the model's responses.

In this implementation, the Retriever-Augmented Generation (RAG) system integrates PineCone as the vector database, Cohere as the language model, and LangChain as the orchestration framework to provide an efficient and scalable solution for advanced information retrieval and generation tasks.


**Vector Database**: PineCone
**Framework**: LangChain
**Large Language Model**: Cohere (model=command)
**Embedding Model**: Cohere (model="embed-english-light-v3.0")

**PineCone** serves as the vector database, which is crucial for managing and retrieving vector embeddings that represent textual data. These embeddings are generated using Cohere's "**embed-english-light-v3.0**" model, designed to capture the semantic essence of texts in vector form efficiently. PineCone excels in handling these embeddings, enabling fast and accurate retrieval of relevant information based on the semantic closeness to the input queries.

**Cohere**'s language model, particularly the command configuration, is employed to generate responses. Once PineCone retrieves the most relevant embeddings, the Cohere model uses this information to construct detailed and contextually relevant responses. This model is known for its ability to understand and generate coherent and contextually appropriate natural language, making it ideal for ensuring that the generated responses are informative and engaging.

**LangChain** acts as the framework that ties together PineCone and Cohere. It coordinates the data flow between the vector database and the language model, ensuring that the embeddings retrieved by PineCone are effectively utilized by the Cohere model for generating responses. LangChain is essential for maintaining the efficiency of the RAG system, optimizing the interaction between its components to maximize response quality and retrieval speed.

Together, these technologies create a robust RAG system that leverages the unique capabilities of each component, enhancing the system's overall performance in retrieving and generating precise information responses in diverse applications.



## Environment Setup

In [1]:
!pip install -q langchain langchain_community cohere pinecone-client langchain_pinecone pypdf langchain-cohere

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m867.6/867.6 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m151.2/151.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.5/214.5 kB[0m [31m9.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m14.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.9/302.9 kB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m120.6/120.6 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━

In [2]:
pip install cohere --quiet

In [3]:
import os
OPENAI_API_KEY = "sk-******************"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
CO_API_KEY = "CI***************"
os.environ["CO_API_KEY"] = CO_API_KEY
os.environ["COHERE_API_KEY"] = CO_API_KEY
os.environ["PINECONE_API_KEY"] = "**********************"

The RecursiveCharacterTextSplitter is used in Retriever-Augmented Generation (RAG) models for handling long text inputs.

Here are the main points for its use:

- Length Management
- Context Preservation
- Efficient Processing

## Data Preparation for RAG

In [4]:
# Import the RecursiveCharacterTextSplitter class.
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Initialize a RecursiveCharacterTextSplitter with specified separators and configurations.
# The text will be split into chunks around 1000 characters long, with a 20-character overlap between chunks.

character_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n", ". ", " ", ""],  # The hierarchy of separators to use for splitting.
    chunk_size=500,  # Target size of each chunk in characters.
    chunk_overlap=100)




Here we are using four pdfs related to Stock market investments and strategies. These are our external data sources on which we will implement RAG solution

- PyPDFLoader class from the langchain_community.document_loaders module to load and process multiple PDF documents


- Loop Through PDF Files: The code iterates over the list of PDF filenames using a for-loop.


- Loading PDF Content: For each PDF file in the list, an instance of PyPDFLoader is created with the filename as an argument.


- Extract and Split Text: The load_and_split method of the PyPDFLoader instance is called to load the text content from the PDF file and split it into manageable segments. The method uses a text_splitter for this purpose, which is RecursiveCharacterTextSplitter defined above cell

In [5]:
!wget -q https://anshupandey.blob.core.windows.net/generativeaidocs/INVESTMENT_STRATEGIES_1.pdf
!wget -q https://anshupandey.blob.core.windows.net/generativeaidocs/NSEHandbook.pdf
!wget -q https://anshupandey.blob.core.windows.net/generativeaidocs/The%20Basics%20for%20Investing%20in%20Stocks.pdf
!wget -q https://anshupandey.blob.core.windows.net/generativeaidocs/Howstockmarketworks.pdf

In [6]:
from langchain_community.document_loaders import PyPDFLoader

pdfs=["NSEHandbook.pdf","INVESTMENT_STRATEGIES_1.pdf","The Basics for Investing in Stocks.pdf","Howstockmarketworks.pdf"]
docs=[]
for i in pdfs:
    loader = PyPDFLoader(i)
    docs.extend(loader.load_and_split(text_splitter=character_splitter))

In [7]:
len(docs)

1429

These function is used for printing the retrieved documents from vector store in nice and understandable format

In [8]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

We are using cohere for vector embedding of our external data sources

## Embedding Generation

In [9]:
import cohere
co = cohere.Client(os.getenv("Cohere_embedding"))

In [10]:
from langchain_community.embeddings import CohereEmbeddings

In [11]:
embeddings = CohereEmbeddings(model="embed-english-light-v3.0",cohere_api_key=os.getenv('COHERE_API_KEY'))

  warn_deprecated(


## Vector DB Setup and Retrieval

We are using Pinecone vectorDB for storing the Vector embeddings

In [12]:
from pinecone import Pinecone, ServerlessSpec
import time

use_serverless=False

# configure client
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))



# check for and delete index if already exists
index_name = 'langchain-rag'
if index_name in pc.list_indexes().names():
    pc.delete_index(index_name)

# create a new index
pc.create_index(
    name=index_name,
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(
        cloud='aws',
        region='us-east-1'
    )
)

# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
    time.sleep(1)

In [13]:
from langchain_pinecone import PineconeVectorStore

index_name = "langchain-rag"

docsearch = PineconeVectorStore.from_documents(docs, embeddings, index_name=index_name)

We are using Cohere LLM (Large language model)

In [14]:
from langchain_cohere.chat_models import ChatCohere
from langchain_core.messages import HumanMessage

In [15]:
model = ChatCohere(model="command", max_tokens=256, temperature=0.75,cohere_api_key=os.getenv("COHERE_API_KEY"))

This function will retrieve top n results from vectorDB which are semantically similar to the query

In [16]:
def get_similiar_docs(query, k=6, score=False):
  if score:
    similar_docs = docsearch.similarity_search_with_score(query, k=k)
  else:
    similar_docs = docsearch.similarity_search(query, k=k)
  return similar_docs

query=input("what is you query? ")
pretty_print_docs(get_similiar_docs(query))
# what is difference between intraday and future options?

what is you query? what is difference between intraday and future options?
Document 1:

Futures contracts can be sold before the maturity date and the
price will depend on the price of the underlying security. If youfail to act in time and sell a contract, there could be a pile of porkbellies, manganese or whatever delivered to the front garden.
There is also an ‘index future’, which is an outright bet similar
----------------------------------------------------------------------------------------------------
Document 2:

There is also an ‘index future’, which is an outright bet similar
to backing a horse, with the money being won or lost dependingon the level of the index at the time the bet matures. A FTSE100Index future values a one-point difference between the bet andthe Index at £25.
An extension of that is ‘spread trading’, which is just out and
----------------------------------------------------------------------------------------------------
Document 3:

The device is convenie

## Implementing RAG Chain

Now below, we are making a chaining process where we are defining sequence of action should perform to produce best answer to given query

In [17]:
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

In [18]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
llm = model
chain = (
    {"context":docsearch.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke(input("query: "))
# what is difference between intraday and future options?

query: what is difference between intraday and future options?


"I'm sorry, I did not find any information clearly delineating the difference between intraday and future options. Can I provide some information on future options? \nFuture options, also called derivatives, are a type of financial contract that gives the holder the right to buy or sell a specific asset at a later date, subject to its agreed-upon price and terms. These assets can be a variety of mediums, such as commodities, bonds, stocks, etc. For futures, individuals or institutions have access to the asset for a specified amount of time and can then sell it in the future at a specified price according to the futures contract, thus reducing the risk of the selling price deviating from the current market assessment of the future value of the asset. \n\nFuture contracts would be a different comparison point to intraday, as futures contain an agreed-upon price and specified date for a variety of assets."

# Thank You