## Implement an End-to-End Retrieval-Augmented Generation (RAG) pipeline
### using LangSmith. Include the following:
###  Data ingestion and embedding creation
###  Retrieval mechanism
###  Response generation using an LLM
###  LangSmith integration for evaluation and monitoring Provide the code and briefly describe your approach.

In [8]:
import os
import langsmith
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage
from langchain.prompts import ChatPromptTemplate

In [9]:
from dotenv import load_dotenv
import os

In [10]:
# Load .env file
load_dotenv()

True

In [11]:
import os
print(os.getenv("GOOGLE_API_KEY"))  # Should print the API key or None
print(os.getenv("LANGCHAIN_API_KEY"))

AIzaSyCYyo69R1YhYArdTbs765WTi70-93D0r4Y
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


In [12]:
# Set environment variables
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")

In [13]:
# Step 1: Load and preprocess documents
loader = TextLoader("sample_data.txt")  # Load a text file with relevant documents
documents = loader.load()

In [14]:
# Step 2: Split documents into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(documents)

In [15]:
# Step 3: Generate embeddings using Google AI
embedding_model = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
vector_db = FAISS.from_documents(docs, embedding_model)

In [16]:
# Step 4: Set up retrieval mechanism
retriever = vector_db.as_retriever(search_kwargs={"k": 3})
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3)  # You can replace this with another model


In [17]:
# Step 5: Query and generate response
query = "What are the key concepts from the document?"
rag_chain = RetrievalQA.from_chain_type(llm, retriever=retriever, chain_type="stuff")
response = rag_chain.run(query)

  response = rag_chain.run(query)


In [18]:
print("Generated Response:\n", response)

Generated Response:
 Key concepts covered include:

* **Unsupervised Learning:**  Deals with unlabeled data, used for finding patterns and structures.  Clustering and dimensionality reduction are key techniques.
* **Reinforcement Learning:** Agent-based learning through interaction with an environment to maximize rewards.  Used in robotics and game playing.
* **Natural Language Processing (NLP):**  Used in applications like chatbots, virtual assistants, text summarization, and information retrieval.
* **Neural Networks and Deep Learning:** Computational models inspired by the brain. Deep learning uses multi-layered networks (CNNs, RNNs, Transformers) to learn complex representations.  Requires large datasets and substantial computing power.
* **Applications of AI in Healthcare:**  While mentioned as a title, the provided text doesn't offer details on this topic.
