# Retrieval-Augmented Generation (RAG) with LangChain and Pinecone

This notebook demonstrates a **basic RAG pipeline** using:
- LangChain
- Pinecone (vector database)
- Embeddings + LLM
- A simple text document

## Workflow
1. Load and split documents
2. Create embeddings
3. Store vectors in Pinecone
4. Retrieve relevant chunks
5. Generate an answer using context

In [1]:
! pip install langchain langchain-pinecone langchain-huggingface pinecone-client sentence-transformers python-dotenv



In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  from .autonotebook import tqdm as notebook_tqdm
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 175.95it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


In [4]:
from langchain_pinecone import PineconeVectorStore

vectorstore = PineconeVectorStore(
    index_name=os.getenv("PINECONE_INDEX_NAME"),
    embedding=embeddings
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [5]:
from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 100,
        "do_sample": False
    }
)

Loading weights: 100%|██████████| 148/148 [00:01<00:00, 141.09it/s, Materializing param=transformer.wte.weight]             
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Passing `generation_config` together with generation-related arguments=({'max_new_tokens', 'do_sample'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.


In [6]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template("""
Context:
{context}

Question:
{input}

Answer:
""")

doc_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, doc_chain)

In [7]:
response = rag_chain.invoke({
    "input": "What is this document about?"
})

response["answer"]

Both `max_new_tokens` (=100) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'Human: \nContext:\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nQuestion:\nWhat is this document about?\n\nAnswer:\n\nThis document is aimed at explaining the basic concepts of the language\n\nlanguages.\n\nLangChain is a framework designed to simplify the development of appl