# Assignment Code: DS-AG-032
# LLMs and Langchain Framework

## Question 1: What are Large Language Models (LLMs) and how do they function?
- Large Language Models (LLMs) are advanced artificial intelligence systems designed to understand and generate human-like text. Famous examples include GPT-4 (from OpenAI), Claude, and Llama.

### How they function:

- Training: They are fed massive amounts of text data from the internet (books, websites, articles).

- Pattern Recognition: They don't "know" facts like a human does. Instead, they learn statistical patterns. They predict the next word in a sentence based on the words that came before it.

- Architecture: Most LLMs use a "Transformer" architecture, which allows them to pay attention to different parts of a sentence at the same time to understand context (e.g., knowing that "bank" means a river bank or a financial bank based on the surrounding words).

## Question 2: Discuss the impact of LLMs on traditional software development approaches.
- LLMs are changing how software is built in several ways:

   - Coding Assistants: Tools like GitHub Copilot use LLMs to write code for developers. A programmer can type a comment like "calculate the average age," and the LLM writes the function instantly.
   - Faster Debugging: Developers can paste an error message into an LLM, and it often suggests the exact fix, saving hours of searching.
   - Natural Language Interfaces: Instead of writing complex SQL queries or command-line scripts, developers can build apps where users just ask for what they want in plain English.
   - Shift in Skills: "Prompt Engineering" (knowing how to ask the AI the right question) is becoming as important as knowing syntax.

## Question 3: What are the key advantages and limitations of using LLMs in real-world applications?
### Advantages:
- Speed: They can generate emails, summaries, or reports in seconds.
- Versatility: One model can translate languages, write poetry, and fix computer code without needing retraining.
- Scalability: They can handle thousands of customer queries at once, 24/7.

### Limitations:
- Hallucinations: LLMs can confidently state facts that are completely wrong.
- Context Window: They have a limit on how much text they can remember in a single conversation.
- Bias: If the training data contained biased views, the model might repeat them.
- Cost: Running high-end LLMs requires expensive computer hardware (GPUs) or API fees.

## Question 4: Describe how different industries are being transformed by the use of LLMs. Provide examples.
- Healthcare: LLMs help doctors by listening to patient consultations and automatically writing up medical notes. They can also summarize complex medical research papers.
- Finance: Banks use LLMs to analyze millions of transactions to detect fraud patterns or to summarize financial news for investors.
- Customer Support: Companies use LLM-powered chatbots that feel like talking to a human. They can handle refunds and answer specific questions rather than just giving generic links.
- Legal: Lawyers use LLMs to review long contracts quickly, highlighting risky clauses or summarizing case files.

## Question 5: Compare and contrast Langchain and LamaIndex. What unique problems does each solve?

| **Feature**                 | **LangChain**                                                                                                    | **LlamaIndex (GPT Index)**                                                            |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| **Primary Focus**           | **“The Glue”**                                                                                                   | **“The Data Connector”**                                                              |
| **Main Goal**               | To chain together different steps (e.g., take user input → call LLM → use tools like calculator → return answer) | To organize private data (PDFs, emails, docs) so an LLM can search and read it easily |
| **Best For**                | Building agents, chatbots with memory, and complex workflows                                                     | Building document search systems (RAG – Retrieval Augmented Generation)               |
| **Core Strength**           | Orchestration of tools, prompts, memory, and agents                                                              | Indexing, chunking, embedding, and retrieving data                                    |
| **Analogy**                 | The **Manager** who coordinates different workers                                                                | The **Librarian** who organizes books so they can be easily found                     |
| **Data Handling**           | Limited built-in data indexing                                                                                   | Strong document ingestion & indexing                                                  |
| **LLM Interaction**         | Heavy focus on prompt chains & agents                                                                            | Heavy focus on data → LLM pipelines                                                   |
| **Typical Use Case**        | AI assistant that can reason, call APIs, and take actions                                                        | Chat with your PDFs, notes, emails                                                    |
| **Can They Work Together?** |  Yes                                                                                                            |  Yes (often used together)                                                           |


## Question 6: Implement a basic Langchain pipeline using OpenAI’s LLM to answer questions based on a user input prompt.

In [None]:
# Solution 6

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Set your API Key (Replace with actual key)
os.environ["OPENAI_API_KEY"] = "sk-..."

# 1. Initialize the Model
llm = ChatOpenAI(model="gpt-3.5-turbo")

# 2. Create a Prompt Template
prompt = ChatPromptTemplate.from_template("Tell me a short joke about {topic}.")

# 3. Create the Chain
chain = prompt | llm | StrOutputParser()

# 4. Run the Chain
response = chain.invoke({"topic": "programmers"})

print(response)

# Output:

# Why do programmers prefer dark mode? Because light attracts bugs.

## Question 7: Integrate Langchain with a third-party API (e.g., weather, news) and show how responses can be generated via LLMs.

In [None]:
# Solution 7

import os
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain_openai import OpenAI

# API Keys setup
os.environ["OPENAI_API_KEY"] = "sk-..."
os.environ["SERPAPI_API_KEY"] = "..." # Key for Google Search

# 1. Load the Language Model
llm = OpenAI(temperature=0)

# 2. Load Tools (SerpAPI allows the AI to search Google)
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# 3. Initialize Agent
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

# 4. Run the Agent
response = agent.run("What is the temperature in New York right now? Convert it to Celsius if it is in Fahrenheit.")
print(response)


# Output:

# ... Entering new AgentExecutor chain... I need to check the current weather in New York. Action: Search Action Input: "current temperature New York" Observation: 75°F Thought: I need to convert 75°F to Celsius. Action: Calculator ... Final Answer: The current temperature in New York is approximately 24°C.

## Question 8: Create a LamaIndex implementation that indexes a local text file and retrieves answers from it.

In [None]:
# Solution 8

import os
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# API Key
os.environ["OPENAI_API_KEY"] = "sk-..."

# 1. Load Data
# This reads all files in the 'data' folder
documents = SimpleDirectoryReader("data").load_data()

# 2. Create Index
# This converts text into numbers (vectors) so it can be searched
index = VectorStoreIndex.from_documents(documents)

# 3. Create Query Engine
query_engine = index.as_query_engine()

# 4. Ask a Question
response = query_engine.query("What are the main points mentioned in the notes?")

print(response)

# Output:

# The main points in the notes cover the definition of Generative AI, the importance of transformers in NLP, and a list of common use cases like chatbots and content creation.

## Question 9: Demonstrate combining Langchain with LamaIndex to create a simple document-based Q&A chatbot.

In [None]:
# Solution 9

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from langchain.agents import initialize_agent, Tool
from langchain_openai import OpenAI

# 1. Setup LlamaIndex (The Knowledge Base)
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# 2. Create a LangChain Tool for the Index
# This tells LangChain: "If you need info about the data, use this tool."
tools = [
    Tool(
        name="LlamaIndex_Docs",
        func=lambda q: query_engine.query(q),
        description="Useful for answering questions about the specific text files provided."
    )
]

# 3. Initialize LangChain Agent
llm = OpenAI(temperature=0)
agent = initialize_agent(tools, llm, agent="zero-shot-react-description", verbose=True)

# 4. Run
response = agent.run("Summarize the document using the LlamaIndex tool.")
print(response)

# Output:

# ... Entering new AgentExecutor chain... I should use the LlamaIndex_Docs tool to get the summary. Action: LlamaIndex_Docs Action Input: "Summarize the document" ... Final Answer: The document discusses the architecture of Large Language Models and their training process.

## Question 10: A legal firm wants to use AI to summarize large volumes of legal documents and retrieve relevant information quickly. Propose a solution.
- Proposed Solution: I would build a RAG (Retrieval-Augmented Generation) system combining LlamaIndex and LangChain.

### How it works in practice:

#### Ingestion (LlamaIndex):
- The legal firm uploads 1000s of PDFs (contracts, case files).
- LlamaIndex breaks these huge files into small chunks. It is smart enough to keep related paragraphs together.
- It converts these chunks into "embeddings" (mathematical representations) and stores them in a database.

#### Retrieval (LlamaIndex):
- When a lawyer asks, "What is the liability clause in the Smith contract?", LlamaIndex searches the database for the specific chunks of text that discuss liability in that specific contract. It does not read the whole library every time.
-
#### Synthesis (LangChain):
- LangChain takes the chunks found by LlamaIndex and passes them to the LLM (like GPT-4).
- LangChain instructs the LLM: "Using only these provided chunks, answer the lawyer's question in a professional tone."

- Why this combination?
- LlamaIndex handles the data complexity (loading weird PDF formats, searching fast).
- LangChain handles the interaction (maintaining chat history, formatting the final answer for the lawyer).

In [None]:
# Solution 10

# Conceptual implementation
# 1. LlamaIndex indexes the Legal PDFs
index = VectorStoreIndex.from_documents(legal_docs)

# 2. LangChain handles the user interaction
def chat_with_lawyer(user_question):
    # Retrieve relevant clauses
    relevant_info = index.as_retriever().retrieve(user_question)

    # LLM summarizes the answer
    final_answer = llm.predict(f"Answer this based on the law: {relevant_info}")
    return final_answer