In [2]:
import os
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
os.environ['LANGCHAIN_API_KEY'] = os.getenv('LANGCHAIN_API_KEY')
os.environ['LANGCHAIN_PROJECT'] = os.getenv('LANGCHAIN_PROJECT')
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_TRACING_V2'] = 'true'

In [4]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-4o-mini')

print(llm)

client=<openai.resources.chat.completions.Completions object at 0x0000023C6578F6D0> async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x0000023C65211590> root_client=<openai.OpenAI object at 0x0000023C64C37750> root_async_client=<openai.AsyncOpenAI object at 0x0000023C6559A090> model_name='gpt-4o-mini' model_kwargs={} openai_api_key=SecretStr('**********')


In [5]:
result = llm.invoke('What is the difference between AI agent and Agentic AI')
print(result.content)

The terms "AI agent" and "agentic AI" refer to different concepts within the field of artificial intelligence, and it's important to understand their distinctions:

1. **AI Agent**:
   - An AI agent refers to any system or program that performs tasks autonomously or semi-autonomously based on data and pre-defined rules. It can perceive its environment through sensors, make decisions, and take actions to achieve specific goals.
   - Examples of AI agents include chatbots, recommendation systems, self-driving cars, and game-playing AI like those used in chess or Go.
   - AI agents can be classified based on their functionalities, such as reactive agents (which respond to immediate stimuli), deliberative agents (which plan over time), or hybrid agents (which employ both reactive and deliberative approaches).

2. **Agentic AI**:
   - Agentic AI typically refers to a more advanced or sophisticated form of AI that exhibits a higher level of autonomy and decision-making capability. This type 

In [6]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
  [
    ('system', 'You are an expert critic that roasts in subtle yet brutal way'),
    ('user', '{input}')
  ]
)

prompt

ChatPromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are an expert critic that roasts in subtle yet brutal way'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})])

In [8]:
chain = prompt | llm

response = chain.invoke({'input' : 'Roast an ai company that was launched as an non profit org and now is a for profit org'})

print(response.content)

Oh, the classic transformation from non-profit to for-profit—like watching a noble knight trade in his shining armor for a slick suit and a briefcase full of dreams. This AI company must have figured that altruism just doesn’t pay the bills—who knew that changing the world with artificial intelligence doesn’t come with a salary? It’s like they went from feeding the hungry to catering their own banquet, claiming it’s all in the name of “sustainability.” I guess the only thing more artificial than their intelligence is their commitment to non-profit ideals. Congratulations, folks, you’ve officially mastered the art of virtue signaling while cashing in on the very innovations that were supposed to save humanity. 

Welcome to the future: the robots might just take over, but at least they'll be wearing designer labels.


In [9]:
from langchain_core.output_parsers import JsonOutputParser
from langchain_core.prompts import PromptTemplate

output_parser = JsonOutputParser()

prompt = PromptTemplate(
  template='Answer the user question. \n {question} \n {format}\n',
  input_variables= ['question'],
  partial_variables= {'format' : output_parser.get_format_instructions()},

)

chain = prompt | llm | output_parser

response = chain.invoke({'question': 'What is tachyon?'})

In [10]:
print(response)

{'definition': 'A tachyon is a theoretical particle in physics that is said to travel faster than the speed of light. Its existence has not been confirmed experimentally.', 'properties': {'speed': 'Faster than light', 'mass': 'Imaginary mass (as suggested by some theories)', 'theoretical_status': 'Not yet observed or proven to exist'}, 'relevance': {'theories': ['Quantum field theory', 'String theory'], 'implications': ['Potential violation of causality', 'Could lead to time travel scenarios']}}


### **Building a Basic RAG System Using Langchain and a Webpage as Data**  

#### **1️⃣ Load Webpage Content (Web Base Loader)**  
Use `WebBaseLoader` to extract text from a webpage. This allows retrieval of raw text data from any given URL, which will serve as the knowledge source for retrieval.  

#### **2️⃣ Split Text into Chunks**  
Since LLMs have token limits, use `RecursiveCharacterTextSplitter` to break the extracted text into manageable chunks. This ensures better context retrieval while maintaining semantic integrity.  

#### **3️⃣ Convert Chunks to Vectors & Store in Vector Database**  
Each chunk is converted into vector embeddings using an embedding model (e.g., OpenAI, Hugging Face). These vectors are then stored in a **vector database** (e.g., FAISS, Chroma, Pinecone) for efficient retrieval based on similarity search.  

#### **4️⃣ Query Processing & Context Retrieval**  
When a user asks a question, the query is also converted into a vector. A similarity search is performed in the vector database to find the most relevant chunks. These retrieved chunks are then passed as context to the LLM, which generates a response based on them.  

This process ensures that the LLM provides **accurate, knowledge-grounded responses** instead of relying solely on its pretrained data. 🚀

In [11]:
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader('https://python.langchain.com/docs/introduction/')

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [12]:
document = loader.load()
document

[Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='\n\n\n\n\nIntroduction | 🦜️🔗 LangChain\n\n\n\n\n\n\nSkip to main contentJoin us at  Interrupt: The Agent AI Conference by LangChain on May 13 & 14 in San Francisco!IntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1💬SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval Augmented Generation (RAG) App: Part 2Build an Extraction ChainBuild an AgentTaggingBuild a Retrieval Augmented Generation (RAG) App: Part 1Build a semantic search engineBuild a Question/Answering system over SQL dataSummarize TextHow-to guidesHow

### **Top 3 Text Splitters in Langchain**  

1️⃣ **RecursiveCharacterTextSplitter**  
   - Best for general-purpose text splitting while preserving semantic meaning.  
   - Splits based on characters (e.g., ".", "\n", " ") and allows overlap.  

2️⃣ **TokenTextSplitter**  
   - Splits text based on token count instead of characters.  
   - Useful for models with strict token limits (e.g., OpenAI models).  

3️⃣ **MarkdownTextSplitter**  
   - Specifically designed for **Markdown documents**.  
   - Preserves headers, lists, and structured formatting.  


In [26]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document  # Ensure correct Document usage


text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=200)

In [27]:
documents = text_splitter.split_documents(document)
documents

[Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='Introduction | 🦜️🔗 LangChain'),
 Document(metadata={'source': 'https://python.langchain.com/docs/introduction/', 'title': 'Introduction | 🦜️🔗 LangChain', 'description': 'LangChain is a framework for developing applications powered by large language models (LLMs).', 'language': 'en'}, page_content='Skip to main contentJoin us at  Interrupt: The Agent AI Conference by LangChain on May 13 & 14 in San Francisco!IntegrationsAPI ReferenceMoreContributingPeopleError referenceLangSmithLangGraphLangChain HubLangChain JS/TSv0.3v0.3v0.2v0.1💬SearchIntroductionTutorialsBuild a Question Answering application over a Graph DatabaseTutorialsBuild a simple LLM application with chat models and prompt templatesBuild a ChatbotBuild a Retrieval 

In [28]:
if isinstance(documents, list) and isinstance(documents[0], Document):
    print("Documents split correctly!")
else:
    raise TypeError("Split documents are not in the expected format.")

Documents split correctly!


In [15]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [16]:
from langchain_community.vectorstores import FAISS

vectors = FAISS.from_documents(documents, embeddings)

In [17]:
vectors

<langchain_community.vectorstores.faiss.FAISS at 0x23c6657dd90>

In [18]:
query = 'Prompt Templates are responsible for'

result = vectors.similarity_search(query)

result[0].page_content

"capability to LLMs and Chat ModelsBuild an Agent with AgentExecutor (Legacy)How to construct knowledge graphsHow to partially format prompt templatesHow to handle multiple queries when doing query analysisHow to use built-in tools and toolkitsHow to pass through arguments from one step to the nextHow to compose prompts togetherHow to handle multiple retrievers when doing query analysisHow to add values to a chain's stateHow to construct filters for query analysisHow to configure runtime chain internalsHow deal with high cardinality categoricals when doing query analysisCustom Document LoaderHow to use the MultiQueryRetrieverHow to add scores to retriever resultsCachingHow to use callbacks in async environmentsHow to attach callbacks to a runnableHow to propagate callbacks  constructorHow to dispatch custom callback eventsHow to pass callbacks in at runtimeHow to split by characterHow to cache chat model responsesHow to handle rate limitsHow to init any model in one lineHow to track"

In [48]:
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Answer the question based on the provided context:
<context>
{context}
</context>

Question: {question}
Answer:"""
)


In [49]:
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)
print(document_chain)

bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template='\nAnswer the question based on the provided context:\n<context>\n{context}\n</context>\n\nQuestion: {question}\nAnswer:')
| ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x0000023C6578F6D0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x0000023C65211590>, root_client=<openai.OpenAI object at 0x0000023C64C37750>, root_async_client=<openai.AsyncOpenAI object at 0x0000023C6559A090>, model_name='gpt-4o-mini', model_kwargs={}, openai_api_key=SecretStr('**********'))
| StrOutputParser() kwargs={} config={'run_name': 'stuff_documents_chain'} config_factories=[]


In [50]:
def retrieve_documents(query, top_k=3):
    results = vectors.similarity_search(query, k=top_k)
    
    # Ensure results are LangChain Document objects
    if isinstance(results[0], str):
        return [Document(page_content=text) for text in results]
    
    return results


In [51]:
query = "What is LangChain and how does it work?"
retrieved_docs = retrieve_documents(query)

# Print retrieved content
for i, doc in enumerate(retrieved_docs):
    print(f"Document {i+1}:\n{doc.page_content}\n{'-'*80}\n")


Document 1:
LangChain is a framework for developing applications powered by large language models (LLMs).
LangChain simplifies every stage of the LLM application lifecycle:
--------------------------------------------------------------------------------

Document 2:
Introduction | 🦜️🔗 LangChain
--------------------------------------------------------------------------------

Document 3:
storesWhy LangChain?Ecosystem🦜🛠️ LangSmith🦜🕸️ LangGraphVersionsv0.3v0.2Pydantic compatibilityMigrating from v0.0 chainsHow to migrate from v0.0 chainsMigrating from ConstitutionalChainMigrating from ConversationalChainMigrating from ConversationalRetrievalChainMigrating from LLMChainMigrating from LLMMathChainMigrating from LLMRouterChainMigrating from MapReduceDocumentsChainMigrating from MapRerankDocumentsChainMigrating from MultiPromptChainMigrating from RefineDocumentsChainMigrating from RetrievalQAMigrating from StuffDocumentsChainUpgrading to LangGraph memoryHow to migrate to LangGraph memoryHow t

In [52]:
from langchain_core.documents import Document  # Ensure this is imported

# Convert strings to Document objects if necessary
retrieved_docs = [
    doc if isinstance(doc, Document) else Document(page_content=doc)
    for doc in retrieved_docs
]

context = "\n\n".join([doc.page_content for doc in retrieved_docs])


In [54]:
# Ensure retrieved_docs is a list of Document objects
from langchain_core.documents import Document

if retrieved_docs and isinstance(retrieved_docs[0], str):
    retrieved_docs = [Document(page_content=doc) for doc in retrieved_docs]

query = "What is LangChain and how does it work?"

# Invoke the chain with the expected keys
response = document_chain.invoke({
    "context": retrieved_docs,  # List of Document objects
    "question": query                    # The question to answer
})

print("RAG Response:\n", response)


RAG Response:
 LangChain is a framework designed for developing applications that utilize large language models (LLMs). It simplifies the entire lifecycle of LLM applications by providing tools and functionalities that assist in various stages, from building to deploying. By leveraging LangChain, developers can streamline the integration of LLMs into their applications, making it easier to manage and develop LLM-powered functionalities. The framework facilitates tasks such as memory management, data retrieval, and conversation flow, ensuring that developers can focus on enhancing user experience and application performance.
