# Retrieval-Augmented Generation (RAG) with LangChain, Pinecone & OpenAI-style Embeddings

This notebook demonstrates a **basic RAG pipeline** using:
- LangChain
- Pinecone (vector database)
- Embeddings + LLM
- A simple text document

## Workflow
1. Load and split documents
2. Create embeddings
3. Store vectors in Pinecone
4. Retrieve relevant chunks
5. Generate an answer using context



## Environment Setup
Make sure you have a `.env` file with:

```env
PINECONE_API_KEY=your_key
PINECONE_INDEX_NAME=your_index
```


In [1]:

# Install dependencies (run once)
! pip install langchain langchain-community langchain-pinecone langchain-huggingface pinecone-client python-dotenv sentence-transformers




In [2]:

import os
from dotenv import load_dotenv

load_dotenv()


True


## Load and Split Documents


In [3]:

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("data/sample.txt")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

docs = text_splitter.split_documents(documents)
len(docs)


1


## Create Embeddings and Store in Pinecone


In [4]:

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

vectorstore = PineconeVectorStore.from_documents(
    docs,
    embeddings,
    index_name=os.getenv("PINECONE_INDEX_NAME")
)

print("✅ Documents successfully ingested into Pinecone")


  from tqdm.autonotebook import tqdm
  embeddings = HuggingFaceEmbeddings(
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 116.67it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


✅ Documents successfully ingested into Pinecone



## Build the RAG Pipeline


In [5]:

from langchain_huggingface import HuggingFacePipeline
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 200,
        "temperature": 0.1,
        "do_sample": False
    }
)

prompt = ChatPromptTemplate.from_template("""
Context:
{context}

Question:
{input}

Answer:
""")


Loading weights: 100%|██████████| 148/148 [00:01<00:00, 76.89it/s, Materializing param=transformer.wte.weight]             
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Passing `generation_config` together with generation-related arguments=({'do_sample', 'max_new_tokens', 'temperature'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.



## Ask a Question


In [6]:

doc_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, doc_chain)

response = rag_chain.invoke({
    "input": "What is this document about?"
})

response["answer"]


Both `max_new_tokens` (=200) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


'Human: \nContext:\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nLangChain is a framework designed to simplify the development of applications\npowered by large language models.\n\nRetrieval-Augmented Generation (RAG) combines information retrieval\nwith text generation to improve accuracy and reduce hallucinations.\n\nQuestion:\nWhat is this document about?\n\nAnswer:\n\nThis document is about the development of a language model for the\n\nLangChain language model.\n\nLangChain is a framework designed to simplify the developme


## Conclusion
This notebook demonstrates how **Retrieval-Augmented Generation** improves LLM responses
by grounding them in external knowledge stored in Pinecone.
