# 📘 Smart Research Assistant
A real-life LangChain project that uses all major LangChain components.

## 🔧 1. Install Dependencies

In [15]:
import os
from dotenv import load_dotenv
load_dotenv()

True

## 📥 2. Load Files & Web Content

In [2]:

from langchain.document_loaders import TextLoader, WebBaseLoader

txt_loader = TextLoader("data/cricket.txt", encoding="utf-8")
txt_docs = txt_loader.load()

web_loader = WebBaseLoader("https://en.wikipedia.org/wiki/Cricket")
web_docs = web_loader.load()

documents = txt_docs + web_docs


USER_AGENT environment variable not set, consider setting it to identify your requests.


## ✂️ 3. Split Text into Chunks

In [3]:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = splitter.split_documents(documents)


## 🧠 4. Embed and Store in Vector DB

In [4]:
from langchain.vectorstores import Chroma
from langchain_openai.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(docs, embedding=embeddings, persist_directory="./project_vecdb")
db.persist()

  db.persist()


## 🔍 5. Perform Retrieval

In [5]:

retriever = db.as_retriever(search_kwargs={"k": 3})


## 💬 6. Chat with LLM using Retrieval

In [6]:

from langchain_openai.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template(
    "Answer the question based on this context:\n\n{context}\n\nQuestion: {question}"
)

llm = ChatOpenAI()
chain = retriever | (lambda docs: {
    "context": "\n\n".join([d.page_content for d in docs]),
    "question": "What are the different cricket formats?"
}) | prompt | llm

response = chain.invoke("What are the different cricket formats?")
print(response.content)


The different cricket formats listed are 100-ball cricket, Backyard cricket, Bete-ombro, Blind cricket, Club cricket, Crocker, Deaf cricket, French cricket, Indoor cricket, UK variant, Kilikiti, Plaquita, Single wicket, Softball cricket, T10 cricket, Tape ball cricket, Tennis ball cricket, Vigoro, and Village cricket.


## 🧠 7. Add Memory

In [7]:

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory)

print(conversation.predict(input="Who is the captain of India?"))


  memory = ConversationBufferMemory()
  conversation = ConversationChain(llm=llm, memory=memory)


The current captain of the Indian cricket team is Virat Kohli. He is one of the most successful and popular cricketers in the world, known for his aggressive batting style and exceptional leadership skills. Would you like to know more about his career statistics or achievements as a captain?


## 📚 8. Summarize Documents

In [9]:

from langchain.chains.summarize import load_summarize_chain

summary_chain = load_summarize_chain(llm, chain_type="map_reduce")
summary = summary_chain.invoke(docs)
print(summary)


{'input_documents': [Document(metadata={'source': 'data/cricket.txt'}, page_content='**Cricket: A Gentlemen’s Game with a Global Following**'), Document(metadata={'source': 'data/cricket.txt'}, page_content='Cricket is a bat-and-ball sport that enjoys passionate followings in several parts of the world, especially in countries like India, England, Australia, South Africa, Pakistan, Sri Lanka, New Zealand, and the West Indies. Often referred to as a “gentlemen’s game,” cricket traces its origins back to 16th-century England. Over time, it has grown from a mere pastime to a professional sport that captivates millions with its unique blend of strategy, athleticism, and tradition.\n\n---'), Document(metadata={'source': 'data/cricket.txt'}, page_content='---\n\n## 1. Historical Foundations'), Document(metadata={'source': 'data/cricket.txt'}, page_content='Cricket’s earliest records suggest it began as a children’s game in southeast England, possibly as early as the late 1500s. By the 17th c

## 🛠 9. Add Tools + Agent

In [10]:

from langchain.agents import initialize_agent, Tool
from langchain.tools import SerpAPIWrapper

search = SerpAPIWrapper()
tools = [Tool(name="search", func=search.run, description="Web search")]

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)
print(agent.run("Who is the latest IPL winner?"))


ImportError: cannot import name 'SerpAPIWrapper' from 'langchain.tools' (C:\Users\drago\anaconda3\envs\genai_env\lib\site-packages\langchain\tools\__init__.py)

## 🧱 10. Output Parser + Chain

In [17]:

from langchain.output_parsers import StrOutputParser
from langchain.prompts import PromptTemplate

basic_prompt = PromptTemplate.from_template("List 3 cricket formats.")
pipeline = basic_prompt | llm | StrOutputParser()
print(pipeline.invoke({}))


ImportError: cannot import name 'StrOutputParser' from 'langchain.output_parsers' (C:\Users\drago\anaconda3\envs\genai_env\lib\site-packages\langchain\output_parsers\__init__.py)

## 🧪 11. Add Callbacks

In [None]:

from langchain.callbacks import StdOutCallbackHandler

llm_with_logs = ChatOpenAI(callbacks=[StdOutCallbackHandler()])
llm_with_logs.invoke("What is cricket?")


## 🧯 12. Add Fallbacks

In [None]:

from langchain.schema.runnable import RunnableWithFallbacks

fallback_llm = ChatOpenAI(model="gpt-3.5-turbo")
fallback_chain = RunnableWithFallbacks(llm, fallbacks=[fallback_llm])
print(fallback_chain.invoke("Give me a cricket fact.").content)


## 🌊 13. Streaming

In [None]:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

stream_llm = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])
stream_llm.invoke("Tell me a cricket story.")


## 🚀 14. Gradio Deployment

In [None]:

import gradio as gr

def ask_llm(q):
    return chain.invoke(q).content

gr.Interface(fn=ask_llm, inputs="text", outputs="text").launch()
