<a href="https://colab.research.google.com/github/Engr-Nazim/RAG_Langchain/blob/main/Final_RAG_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NOTE:** Before run this project need to setup just two things


1.   GOOGLE_API_KEY
2.   PINECONE_API_KEY

In below example add my data from text files, i need to so you need to add that files please see below [file_code](https://colab.research.google.com/drive/1-y_g1qUj2JcxWD9A0xv1KJo6MpglAevM#scrollTo=_fzxtErPw1Hh&line=12&uniqifier=1)

Please download and upload on colab [example_files](https://github.com/KingAbdulRehman/LangChain/tree/main/Rag_LangChain)

# Install And Import Packages

In [47]:
!pip install langchain langchain-google-genai pinecone-client langchain-pinecone langchain-community



In [49]:
from langchain_google_genai import GoogleGenerativeAI, ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
import langchain
import os
from google.colab import userdata
from pinecone import ServerlessSpec, Pinecone
from langchain_pinecone import PineconeVectorStore
from langchain_community.document_loaders import TextLoader
from tqdm import tqdm
from uuid import uuid4
from langchain_core.documents import Document
from langchain.chains import RetrievalQA

# Setup Model and Key

In [50]:
# Setup Api key in Enviroment
if "GOOGLE_API_KEY" not in os.environ:
  os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

if "PINECONE_API_KEY" not in os.environ:
  os.environ["PINECONE_API_KEY"] = userdata.get("PINECONE_API_KEY")

In [51]:
# Config Model Google Gemini Model with Langchain
llm = ChatGoogleGenerativeAI(
  model = "gemini-2.0-flash-exp",
  temperature=0.9,
  max_tokens=500,
  max_retries=5,
  timeout=None,
  top_k=10,
  top_p=0.1,
  )

In [54]:
# test model setup successfully
print(llm.invoke("hello, short reply").content)

Okay.


# Setup Pinecone and Create New Index

In [55]:
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

index_name = "try-rag-test"
if index_name not in [index["name"] for index in pc.list_indexes()]:
  pc.create_index(
    name=index_name,
    dimension=768,
    metric='cosine',
    spec=ServerlessSpec(cloud='aws', region='us-east-1')
  )

index = pc.Index(index_name)

# Setup Embedding and Vector

In [56]:
embedding_model = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [59]:
# test embedding model
vector = embedding_model.embed_query("test")
print(vector[:3])

[0.026467779651284218, 0.019067756831645966, -0.053323060274124146]


In [60]:
# setup vector store
vector_store = PineconeVectorStore(
    index=index,
    embedding=embedding_model
)

In [61]:
import os
from langchain_core.documents import Document
from langchain_community.document_loaders import TextLoader
from uuid import uuid4

In [63]:
# load documents
documents = []
for i in range(3):
  file_path = f"/content/example.txt{ ('' if i == 0 else str(i)) }.txt"
  # Check if the file exists before loading
  if os.path.exists(file_path):
    loader = TextLoader(file_path)
    doc = loader.load()
    document = Document(
        page_content=doc[0].page_content,
        metadata=doc[0].metadata
    )
    documents.append(document)
  else:
    print(f"Warning: File not found: {file_path}")

uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(
    documents=documents,
    ids=uuids
)



[]

In [64]:
# for just delete case
# vector_store.delete(ids=['8f334f3b-5420-4770-8a46-35b505c3dc73'])

# Retrieval

In [70]:
results = vector_store.similarity_search(
  "in 2000",
  k=3
)

for res in results:
  print(res.page_content)

My Name is Abdul Rehman, and i born in karachi pakistan, in 1900


In [66]:
retriever = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 2, "score_threshold": 0.5},
)
retriever.invoke("How is Nazim uddin")

[Document(id='7b9008ea-9a52-4267-8d42-423fe0a44c36', metadata={'source': '/content/example.txt'}, page_content='My Name is Abdul Rehman, and i born in karachi pakistan, in 1900')]

# Gen AI with Retriever

In [68]:
llm_with_rag = RetrievalQA.from_chain_type(
  llm=llm,
  chain_type="stuff",
  retriever=retriever
)

ai_result = llm_with_rag.invoke("How is Nazim uddin")

In [69]:
print(ai_result['result'])

I don't have access to real-time information, including the well-being of individuals. Therefore, I cannot tell you how Nazim uddin is doing.
