## Testing out the environment

In [1]:
import langchain
from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from dotenv import load_dotenv
from pprint import pprint

In [2]:
load_dotenv()  # Load environment variables from a .env file if present

True

In [3]:
langchain.__version__

'0.3.27'

In [4]:
gemini_flash_model = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)

In [5]:
model = ChatGroq(
    model="openai/gpt-oss-20b",
    temperature=0,
    max_tokens=None,
    reasoning_format="parsed",
    timeout=None,
    max_retries=2,
)

In [6]:
model.invoke("what is the capital of india").content

'The capital of India is **New\u202fDelhi**.'

In [7]:
gemini_flash_model.invoke("what is the capital of india").content

'The capital of India is **New Delhi**.'

### Embedding model

In [None]:
embedding_model = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")
embeddings = embedding_model.embed_query(text="What's our Q1 revenue?", output_dimensionality=10)

[-0.03572908416390419,
 0.014558478258550167,
 0.011592254973948002,
 -0.08969993889331818,
 -0.009068180806934834,
 0.013664662837982178,
 0.011340967379510403,
 -0.005701108369976282,
 -0.027033332735300064,
 3.775993536692113e-05]

In [16]:
from langchain_huggingface import HuggingFaceEmbeddings

In [19]:
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={"device": "cpu"},
)
embeddings = embedding_model.embed_query(text="What's our Q1 revenue?")

In [22]:
len(embeddings)

384

## Langchain Prompts

### Basic Prompt template

In [8]:
from langchain_core.prompts import PromptTemplate, load_prompt

In [62]:
template = PromptTemplate(
    template="""
    You are a helpful assistant that can generate a short report on the topic: {paper_input}
    The report should be in the style of {style_input} and the length should be {length_input}
    """,
    input_variables=["paper_input", "style_input", "length_input"]
)

template.save("./prompt_templates/template.json")

In [63]:
template = load_prompt("./prompt_templates/template.json")

In [64]:
prompt = template.format(
        paper_input="Attention is all you need", style_input="Beginner-Friendly", length_input="Short (1-2 paragraphs)"
    )

In [67]:
print (prompt)

### Messages


    You are a helpful assistant that can generate a short report on the topic: Attention is all you need
    The report should be in the style of Beginner-Friendly and the length should be Short (1-2 paragraphs)
    


### Messages

In [9]:
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

In [35]:
messages = [
    SystemMessage(content="You are a helpful assistant that can answer questions and help with tasks."),
    HumanMessage(content="What is the capital of France?"),
]

result = model.invoke(messages)

result.content

'The capital of France is **Paris**.'

### Dynammic list of messages

In [39]:
chat_template = ChatPromptTemplate(messages=[
    SystemMessage(content="You are a helpful assistant who is an expert in the domain: {domain}"),
    HumanMessage(content="Explain the topic in simple terms: {topic}"),
])

In [42]:
prompt = chat_template.invoke({"domain": "AI", "topic": "Self Attention"})

print(prompt)

messages=[SystemMessage(content='You are a helpful assistant who is an expert in the domain: {domain}', additional_kwargs={}, response_metadata={}), HumanMessage(content='Explain the topic in simple terms: {topic}', additional_kwargs={}, response_metadata={})]


### The above does not work

In [47]:
chat_template = ChatPromptTemplate(messages=[
    ("system", "You are a helpful assistant who is an expert in the domain: {domain}"),
    ("human", "Explain the topic in simple terms in 3-5 sentences: {topic}"),
])
prompt = chat_template.invoke({"domain": "AI", "topic": "Self Attention"})

print(prompt)

messages=[SystemMessage(content='You are a helpful assistant who is an expert in the domain: AI', additional_kwargs={}, response_metadata={}), HumanMessage(content='Explain the topic in simple terms in 3-5 sentences: Self Attention', additional_kwargs={}, response_metadata={})]


In [48]:
result = model.invoke(prompt)
print(result.content)


Self‑attention is a way for a model to look at all the words in a sentence at once and decide how much each word should influence every other word.  
For each word, the model creates three vectors—query, key, and value—then compares the query of one word with the keys of all words to get a “similarity score.”  
These scores are turned into weights (via a softmax) that say how much attention each word should give to the others, and the weighted sum of the value vectors gives the new representation for that word.  
Because every word can attend to every other word, the model captures long‑range relationships and context without needing to process the sentence sequentially.  
This mechanism is the core of transformer models, enabling them to understand and generate language efficiently.


### Message Placeholder

In [72]:
history_messages = [
    HumanMessage(content="I want to request a refund for my order #12345."),
    AIMessage(content="Your refund request for order #12345 has been initiated. It will be processed in 3-5 business days.")
]

In [75]:
history_messages[0].content


'I want to request a refund for my order #12345.'

In [71]:
chat_template = ChatPromptTemplate(messages=[
    ("system", "You are a helpful assistant that can answer questions and help with tasks."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{query}"),
])

prompt = chat_template.invoke({"query": "how many day again?", "chat_history": history_messages})

pprint(prompt)

ChatPromptValue(messages=[SystemMessage(content='You are a helpful assistant that can answer questions and help with tasks.', additional_kwargs={}, response_metadata={}), HumanMessage(content='I want to request a refund for my order #12345.', additional_kwargs={}, response_metadata={}), AIMessage(content='Your refund request for order #12345 has been initiated. It will be processed in 3-5 business days.', additional_kwargs={}, response_metadata={}), HumanMessage(content='how many day again?', additional_kwargs={}, response_metadata={})])


In [57]:
response = model.invoke(prompt)
print(response.content)

It will take **3–5 business days** to complete the refund.  
That means the processing time is counted only on weekdays (Monday‑Friday), excluding public holidays. If you have any more questions, just let me know!


## Structured Outputs

### Typed Dict

In [20]:
from typing import TypedDict, Annotated, Optional, Literal, List

class Person(TypedDict):
    name: str
    age: int
    email: str

person = Person(name="John", age=30, email="john@example.com")

In [None]:
person = Person(name=123, age=30, email="john@example.com") # this is wrong according to the type def, but it will not raise an error

In [78]:
# create a dummy review for a mobile phone in plain text
review = """
This is a decent phone! I love the camera and the battery life is amazing. Also, the price is reasonable. The display is ok, but some more details are that the ppi is 400 and the screen to body ratio is 80%.
Gaming experience is good, but the phone gets heated up after 1 hour of gaming.
Netwok connectivity is good. Overall, it is a good phone for the price.
"""

# create a review schema

class Review(TypedDict):
    summary: str
    sentiment: str

structured_model = model.with_structured_output(Review)
response = structured_model.invoke(review)
pprint(response)

{'sentiment': 'positive',
 'summary': 'The phone offers great camera performance, long battery life, and '
            'a reasonable price, with a decent display and solid gaming '
            'experience, though it heats up after extended play. Overall, a '
            'solid value.'}


`with_structured_output` with a schema results in a system prompt behind the scenes which results in generating the response which follows the provided shema

### Using Annotations

In [80]:
class Review(TypedDict):
    summary: Annotated[str, "A summary of the review"]
    sentiment: Annotated[str, "The sentiment of the review - posutive, negative or neutral"]

structured_model = model.with_structured_output(Review)
response = structured_model.invoke(review)
pprint(response)



{'sentiment': 'positive',
 'summary': 'The phone offers a solid camera, long battery life, and a '
            'reasonable price. The display is decent with a 400\u202fppi and '
            '80% screen‑to‑body ratio. Gaming is enjoyable but the device '
            'tends to heat after an hour. Network connectivity is reliable, '
            'making it a good overall value.'}


In [81]:
# create a detailed review for iPhone 17 Air
iphone_17_air_review = """
The iPhone 17 Air represents Apple's boldest design departure in years, delivering an incredibly thin profile that feels almost impossibly light in hand. At just 5.5mm thick, this device pushes the boundaries of engineering while maintaining the premium build quality we expect from Apple. The aerospace-grade aluminum frame feels solid despite its minimal thickness, and the new Ceramic Shield front provides excellent protection without adding bulk. The device comes in four stunning colors: Midnight Black, Starlight Silver, Deep Purple, and a new Ocean Blue that shifts subtly in different lighting conditions.

Performance-wise, the iPhone 17 Air doesn't compromise despite its slim form factor. The A18 Bionic chip with its 3nm process delivers exceptional speed and efficiency, handling everything from intensive gaming to professional video editing with ease. The 8GB of unified memory ensures smooth multitasking, and the improved Neural Engine makes AI-powered features incredibly responsive. Battery life is surprisingly robust for such a thin device, easily lasting a full day of heavy usage thanks to the more efficient chip and optimized iOS 18 integration. The new MagSafe wireless charging is faster than ever, reaching 25W speeds that rival many wired chargers.

The camera system is where the iPhone 17 Air truly shines, featuring a revolutionary new periscope telephoto lens that somehow fits within the ultra-thin chassis. The main 48MP sensor captures stunning photos with incredible detail and dynamic range, while the new computational photography features produce professional-quality results in challenging lighting conditions. Night mode has been significantly improved, and the new Action mode for video recording delivers gimbal-like stabilization. The front-facing camera now supports 4K ProRes recording, making it perfect for content creators who demand the highest quality.

However, the pursuit of thinness does come with some trade-offs. The device can get noticeably warm during intensive tasks, and the reduced internal space means no room for a traditional headphone jack or even the Lightning port - it's USB-C only with wireless charging as the primary power source. The speakers, while clear, lack the depth and bass response of thicker iPhone models. Additionally, the ultra-thin design makes the device feel somewhat fragile, and Apple's recommended case adds back much of the thickness that the Air design eliminates. Despite these minor compromises, the iPhone 17 Air succeeds in creating a truly premium, futuristic smartphone experience that feels like a glimpse into the next decade of mobile technology.
"""


In [85]:
class Review(TypedDict):
    key_themes: Annotated[list[str], "The key themes of the review as a list of strings"]
    summary: Annotated[str, "A brief summary of the review - max 100 words"]
    sentiment: Annotated[Literal["pos", "neg", "neu"], "The sentiment of the review"]
    pros: Annotated[Optional[list[str]], "The pros of the review as a list of strings"]
    cons: Annotated[Optional[list[str]], "The cons of the review as a list of strings"]

structured_model = model.with_structured_output(Review)
response = structured_model.invoke(iphone_17_air_review)
pprint(response)

{'cons': ['Device can get noticeably warm during intensive tasks',
          'No headphone jack or Lightning port – USB-C only',
          'Speakers lack depth and bass response',
          'Ultra-thin design feels somewhat fragile',
          "Apple's recommended case adds back much of the thickness that the "
          'Air design eliminates'],
 'key_themes': ['Design and Build',
                'Performance and Efficiency',
                'Camera System',
                'Battery Life and Charging',
                'Trade-offs and Limitations'],
 'pros': ['Incredibly thin profile at 5.5mm',
          'Aerospace-grade aluminum frame',
          'Ceramic Shield front protection',
          'Stunning color options',
          'A18 Bionic chip with 3nm process',
          '8GB unified memory',
          'Improved Neural Engine',
          'Robust battery life',
          '25W MagSafe wireless charging',
          'Revolutionary periscope telephoto lens',
          '48MP main sensor',
 

In [None]:
# create a detailed review for iPhone 17 Air -- without cons
iphone_17_air_review_without_cons = """
The iPhone 17 Air represents Apple's boldest design departure in years, delivering an incredibly thin profile that feels almost impossibly light in hand. At just 5.5mm thick, this device pushes the boundaries of engineering while maintaining the premium build quality we expect from Apple. The aerospace-grade aluminum frame feels solid despite its minimal thickness, and the new Ceramic Shield front provides excellent protection without adding bulk. The device comes in four stunning colors: Midnight Black, Starlight Silver, Deep Purple, and a new Ocean Blue that shifts subtly in different lighting conditions.

Performance-wise, the iPhone 17 Air doesn't compromise despite its slim form factor. The A18 Bionic chip with its 3nm process delivers exceptional speed and efficiency, handling everything from intensive gaming to professional video editing with ease. The 8GB of unified memory ensures smooth multitasking, and the improved Neural Engine makes AI-powered features incredibly responsive. Battery life is surprisingly robust for such a thin device, easily lasting a full day of heavy usage thanks to the more efficient chip and optimized iOS 18 integration. The new MagSafe wireless charging is faster than ever, reaching 25W speeds that rival many wired chargers.

The camera system is where the iPhone 17 Air truly shines, featuring a revolutionary new periscope telephoto lens that somehow fits within the ultra-thin chassis. The main 48MP sensor captures stunning photos with incredible detail and dynamic range, while the new computational photography features produce professional-quality results in challenging lighting conditions. Night mode has been significantly improved, and the new Action mode for video recording delivers gimbal-like stabilization. The front-facing camera now supports 4K ProRes recording, making it perfect for content creators who demand the highest quality.
"""

structured_model = model.with_structured_output(Review)
response = structured_model.invoke(iphone_17_air_review_without_cons)
pprint(response)



{'cons': [],
 'key_themes': ['Design & Build',
                'Performance & Efficiency',
                'Battery & Charging',
                'Camera & Photography',
                'User Experience'],
 'pros': ['Ultra-thin 5.5mm profile with premium aerospace-grade aluminum '
          'frame',
          'Exceptional performance from A18 Bionic 3nm chip and 8GB RAM',
          'Robust battery life and fast 25W MagSafe charging',
          'Revolutionary periscope telephoto lens and 48MP main sensor',
          'Advanced computational photography and improved night mode',
          'Front camera supports 4K ProRes for creators'],
 'sentiment': 'pos',
 'summary': 'The iPhone\u202f17\u202fAir redefines slimness with a 5.5\u202fmm '
            'chassis that feels surprisingly solid, thanks to aerospace‑grade '
            'aluminum and Ceramic Shield. Powered by the A18 Bionic 3\u202fnm '
            'chip and 8\u202fGB RAM, it delivers gaming‑grade speed, efficient '
            'mul

This works really well -- but there is no data validation - this can be done by Pydantic

### Using Pydantic

In [11]:
from pydantic import BaseModel

In [12]:
class Student(BaseModel):
    name: str

new_student = {'name': 'mini'}
student = Student(**new_student)
print(student)

name='mini'


In [13]:
class Student(BaseModel):
    name: str = "mini" # default value
    age: Optional[int] = None # optional field

student = Student()
print(student)

name='mini' age=None


In [14]:
# pydantic does type coercion, whenever possible
class Student(BaseModel):
    name: str = "mini"
    age: Optional[int] = None

student = Student(name="mini", age="29")
print(student)

name='mini' age=29


In [15]:
# email validation using pydantic
from pydantic import EmailStr
class User(BaseModel):
    name: str
    email: EmailStr

user = User(name="mini", email="mini@example.com")
print(user)

name='mini' email='mini@example.com'


In [98]:
user = User(name="mini", email="mini@")
print(user)

ValidationError: 1 validation error for User
email
  value is not a valid email address: There must be something after the @-sign. [type=value_error, input_value='mini@', input_type=str]

In [None]:
# using Field in Pydantic
from pydantic import Field


class Student(BaseModel):
    name: str = "mini"
    age: Optional[int] = None
    email: EmailStr 
    cgpa: float = Field(ge=0, le=10, default=8.0, description="The CGPA of the student") # description is like the annotation in TypedDict - helps the llm understand the field

student = Student(name="mini", age=20, email="mini@example.com", cgpa=9.5)
print(student)


name='mini' age=20 email='mini@example.com' cgpa=9.5


In [18]:
student = Student(name="mini", age=20, email="mini@example.com", cgpa=9.5)
print(student)

name='mini' age=20 email='mini@example.com' cgpa=9.5


In [None]:
dict(student)

{'name': 'mini', 'age': 20, 'email': 'mini@example.com', 'cgpa': 9.5}

In [None]:
class Review(TypedDict):
    key_themes: Annotated[list[str], "The key themes of the review as a list of strings"]
    summary: Annotated[str, "A brief summary of the review - max 100 words"]
    sentiment: Annotated[Literal["pos", "neg", "neu"], "The sentiment of the review"]
    pros: Annotated[Optional[list[str]], "The pros of the review as a list of strings"]
    cons: Annotated[Optional[list[str]], "The cons of the review as a list of strings"]

In [22]:
# create a detailed review for iPhone 17 Air -- without cons
iphone_17_air_review_without_cons = """
The iPhone 17 Air represents Apple's boldest design departure in years, delivering an incredibly thin profile that feels almost impossibly light in hand. At just 5.5mm thick, this device pushes the boundaries of engineering while maintaining the premium build quality we expect from Apple. The aerospace-grade aluminum frame feels solid despite its minimal thickness, and the new Ceramic Shield front provides excellent protection without adding bulk. The device comes in four stunning colors: Midnight Black, Starlight Silver, Deep Purple, and a new Ocean Blue that shifts subtly in different lighting conditions.

Performance-wise, the iPhone 17 Air doesn't compromise despite its slim form factor. The A18 Bionic chip with its 3nm process delivers exceptional speed and efficiency, handling everything from intensive gaming to professional video editing with ease. The 8GB of unified memory ensures smooth multitasking, and the improved Neural Engine makes AI-powered features incredibly responsive. Battery life is surprisingly robust for such a thin device, easily lasting a full day of heavy usage thanks to the more efficient chip and optimized iOS 18 integration. The new MagSafe wireless charging is faster than ever, reaching 25W speeds that rival many wired chargers.

The camera system is where the iPhone 17 Air truly shines, featuring a revolutionary new periscope telephoto lens that somehow fits within the ultra-thin chassis. The main 48MP sensor captures stunning photos with incredible detail and dynamic range, while the new computational photography features produce professional-quality results in challenging lighting conditions. Night mode has been significantly improved, and the new Action mode for video recording delivers gimbal-like stabilization. The front-facing camera now supports 4K ProRes recording, making it perfect for content creators who demand the highest quality.
"""


class Review(BaseModel):
    key_themes: List[str] = Field(description="The key themes of the review as a list of strings")
    summary: str = Field(description="A brief summary of the review - max 100 words")
    sentiment: Literal["pos", "neg", "neu"] = Field(description="The sentiment of the review - posutive, negative or neutral")
    pros: Optional[List[str]] = Field(description="The pros of the review as a list of strings", default=None)
    cons: Optional[List[str]] = Field(description="The cons of the review as a list of strings", default=None)

structured_model = model.with_structured_output(Review)
response = structured_model.invoke(iphone_17_air_review_without_cons)
pprint(dict(response))

{'cons': [],
 'key_themes': ['design',
                'performance',
                'battery',
                'camera',
                'MagSafe',
                'color options'],
 'pros': ['ultra-thin 5.5mm profile',
          'aerospace-grade aluminum frame',
          'Ceramic Shield front',
          'four vibrant colors',
          'A18 Bionic 3nm chip',
          '8GB unified memory',
          'Neural Engine AI',
          'robust battery life',
          '25W MagSafe charging',
          'periscope telephoto lens',
          '48MP main sensor',
          'advanced computational photography',
          'improved night mode',
          'Action mode stabilization',
          '4K ProRes front camera'],
 'sentiment': 'pos',
 'summary': 'The iPhone\u202f17\u202fAir redefines slimness with a 5.5\u202fmm '
            'chassis that feels almost weightless yet feels solid thanks to '
            'aerospace‑grade aluminum and Ceramic Shield. Powered by the A18 '
            'Bionic 3

### JSON Schema

This is useful when u cannot define the structure using python - say u are restricted to a diff langauge

In [23]:
{
    "title": "Student",
    "description": "A student is a person who is studying at a school or university",
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "The name of the student"
        },
        "age": {
            "type": "integer",
            "description": "The age of the student"
        }
    },
    "required": ["name"]
}

{'title': 'Student',
 'description': 'A student is a person who is studying at a school or university',
 'type': 'object',
 'properties': {'name': {'type': 'string',
   'description': 'The name of the student'},
  'age': {'type': 'integer', 'description': 'The age of the student'}},
 'required': ['name']}

In [24]:
# review schema
# schema
json_schema = {
  "title": "Review",
  "type": "object",
  "properties": {
    "key_themes": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "Write down all the key themes discussed in the review in a list"
    },
    "summary": {
      "type": "string",
      "description": "A brief summary of the review"
    },
    "sentiment": {
      "type": "string",
      "enum": ["pos", "neg"],
      "description": "Return sentiment of the review either negative, positive or neutral"
    },
    "pros": {
      "type": ["array", "null"],
      "items": {
        "type": "string"
      },
      "description": "Write down all the pros inside a list"
    },
    "cons": {
      "type": ["array", "null"],
      "items": {
        "type": "string"
      },
      "description": "Write down all the cons inside a list"
    },
    "name": {
      "type": ["string", "null"],
      "description": "Write the name of the reviewer"
    }
  },
  "required": ["key_themes", "summary", "sentiment"]
}

In [25]:
structured_model = model.with_structured_output(json_schema)

response = structured_model.invoke(iphone_17_air_review_without_cons)
pprint(response)

{'cons': None,
 'key_themes': ['Design & Build',
                'Performance & Efficiency',
                'Battery Life',
                'MagSafe Charging',
                'Camera System',
                'Color Options'],
 'name': None,
 'pros': ['Ultra‑thin 5.5\u202fmm profile',
          'Lightweight feel',
          'Aerospace‑grade aluminum frame',
          'Ceramic Shield front',
          'Four vibrant color options',
          'A18 Bionic 3\u202fnm chip',
          '8\u202fGB unified memory',
          'Neural Engine AI features',
          'Full‑day battery life',
          '25\u202fW MagSafe wireless charging',
          'Periscope telephoto lens',
          '48\u202fMP main sensor',
          'Advanced computational photography',
          'Improved Night mode',
          'Action mode video stabilization',
          '4K ProRes front camera'],
 'sentiment': 'pos',
 'summary': 'The iPhone\u202f17\u202fAir delivers a bold, ultra‑thin design '
            'with premium bui

## Output Parsers

If your llm cannot generate structured ops out of the box, we can use Op Parsers -- these are classes in LangChain that help convert raw llm responses into structured ops

These can be used both with models which can and cannot provide structured ops

### StrOutputParser

In [30]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser


In [28]:
# 1st prompt -> detailed report
template1 = PromptTemplate(
    template='Write a detailed report on {topic}',
    input_variables=['topic']
)

# 2nd prompt -> summary
template2 = PromptTemplate(
    template='Write a 2 line summary on the following text. /n {text}',
    input_variables=['text']
)

parser = StrOutputParser()


In [29]:
chain = template1 | model | parser | template2 | model | parser

chain.invoke({"topic": "AI"})

'This comprehensive report traces AI’s evolution from early symbolic systems to today’s foundation models, detailing core technologies, real‑world applications, and the economic, ethical, and governance challenges they pose. It underscores AI’s transformative impact across industries while calling for responsible design, transparency, and global cooperation to harness its benefits and mitigate risks.'

### JsonOutputParser

In [34]:
JsonOutputParser().get_format_instructions()

'Return a JSON object.'

In [35]:
parser = JsonOutputParser()

template = PromptTemplate(
    template='Give me 5 facts about {topic} \n {format_instruction}',
    input_variables=['topic'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = template | model | parser

result = chain.invoke({'topic':'black hole'})

pprint(result)

{'facts': ['Black holes are regions of spacetime where gravity is so strong '
           'that nothing, not even light, can escape once it crosses the event '
           'horizon.',
           'The size of a black hole is defined by its Schwarzschild radius, '
           'which is proportional to its mass (approximately 3 kilometers per '
           'solar mass).',
           'Supermassive black holes, with masses millions to billions of '
           'times that of the Sun, reside at the centers of most galaxies, '
           'including our Milky Way.',
           'Black holes can grow by accreting matter from their surroundings '
           'or by merging with other black holes, a process that emits '
           'powerful gravitational waves detectable by observatories like LIGO '
           'and Virgo.',
           'Despite their name, black holes are not perfect vacuum; they can '
           'emit Hawking radiation—a theoretical quantum effect that causes '
           'them to lose 

Here its returning in JSON but we __cannot control the schema__

### StructuredOutputParser

- we can enforce a schema here

In [36]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

In [37]:
schema = [ResponseSchema(name="fact_1", description="The first fact about the topic"),
ResponseSchema(name="fact_2", description="The second fact about the topic"),
ResponseSchema(name="fact_3", description="The third fact about the topic"),
ResponseSchema(name="fact_4", description="The fourth fact about the topic"),
ResponseSchema(name="fact_5", description="The fifth fact about the topic")
]

parser = StructuredOutputParser.from_response_schemas(schema)

parser.get_format_instructions()

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"fact_1": string  // The first fact about the topic\n\t"fact_2": string  // The second fact about the topic\n\t"fact_3": string  // The third fact about the topic\n\t"fact_4": string  // The fourth fact about the topic\n\t"fact_5": string  // The fifth fact about the topic\n}\n```'

In [None]:
template = PromptTemplate(
    template='Give me 5 facts about {topic} \n {format_instruction}',
    input_variables=['topic'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

chain = template | gemini_flash_model | parser # note: this fails with the openai oss model

chain.invoke({'topic':'black hole'})

{'fact_1': 'A black hole is a region of spacetime where gravity is so strong that nothing, not even light, can escape from it.',
 'fact_2': 'The boundary around a black hole beyond which no escape is possible is called the event horizon.',
 'fact_3': 'Most black holes form from the remnants of large stars that collapse in on themselves at the end of their life cycle, known as stellar black holes.',
 'fact_4': 'Supermassive black holes, millions to billions of times the mass of our Sun, are found at the center of most large galaxies, including our own Milky Way (Sagittarius A*).',
 'fact_5': "Despite their immense gravity, black holes do not 'suck' things in from vast distances; objects must get very close to be pulled in, and if our Sun were replaced by a black hole of the same mass, Earth would continue to orbit it normally."}

Disadv

- CANNOT do data validation 
- even if llm sends wrong format, (eg: str instead of int) we cannot validate that

### PydanticOutputParser

In [44]:
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

# define schema
class Person(BaseModel):
    name: str = Field(description="The name of the person")
    age: int = Field(description="The age of the person", gt=18, lt=150),
    city: str = Field(description="The city of the person")

# create parser
parser = PydanticOutputParser(pydantic_object=Person)


In [45]:
parser.get_format_instructions()



'The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "The name of the person", "title": "Name", "type": "string"}, "age": {"title": "Age", "type": "integer"}, "city": {"description": "The city of the person", "title": "City", "type": "string"}}, "required": ["name", "city"]}\n```'

In [None]:
template = PromptTemplate(
    template='Give me the name, age and city of the person with details: {details} \n {format_instruction}',
    input_variables=['details'],
    partial_variables={'format_instruction': parser.get_format_instructions()}
)

print (template.invoke({'details':'John is 25 years old and lives in New York'}))

text='Give me the name, age and city of the person with details: John is 25 years old and lives in New York \n The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}\nthe object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{"properties": {"name": {"description": "The name of the person", "title": "Name", "type": "string"}, "age": {"title": "Age", "type": "integer"}, "city": {"description": "The city of the person", "title": "City", "type": "string"}}, "required": ["name", "city"]}\n```'




In [50]:
chain = template | model | parser

chain.invoke({'details':'John is 25 years old and lives in New York'})

Person(name='John', age=25, city='New York')

## Chains in LangChain

### Sequential Chain

In [54]:
prompt = PromptTemplate(
    template='Generate 5 interesting facts about {topic}',
    input_variables=['topic']
)

chain = prompt | model | StrOutputParser()

print(chain.invoke({'topic':'black hole'}))

**Five Fascinating Facts About Black Holes**

1. **They’re Not “Vacuum Cleaners”**  
   Despite the popular image of a black hole sucking in everything nearby, space around a black hole is essentially empty. Objects only fall in if they cross the event horizon or are on a trajectory that brings them close enough to be captured by the black hole’s gravity.

2. **Time Slows Down Near the Event Horizon**  
   According to Einstein’s theory of relativity, time runs slower the closer you are to a massive object. Near a black hole’s event horizon, time can dilate to the point where, for an outside observer, an infalling object appears to freeze and fade as it approaches the horizon.

3. **They Emit Hawking Radiation**  
   In 1974, Stephen Hawking predicted that black holes can emit tiny amounts of thermal radiation due to quantum effects near the event horizon. This “Hawking radiation” means black holes can slowly lose mass and eventually evaporate over astronomically long timescales.

4. *

In [56]:
chain.get_graph().print_ascii()

     +-------------+       
     | PromptInput |       
     +-------------+       
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *              
      +----------+         
      | ChatGroq |         
      +----------+         
            *              
            *              
            *              
   +-----------------+     
   | StrOutputParser |     
   +-----------------+     
            *              
            *              
            *              
+-----------------------+  
| StrOutputParserOutput |  
+-----------------------+  


In [57]:
prompt_detailed_report = PromptTemplate(
    template='Write a detailed report on {topic}',
    input_variables=['topic']
)

prompt_key_facts = PromptTemplate(
    template='Give me a 3 point summary of the following text: {text}',
    input_variables=['text']
)

chain = prompt_detailed_report | model | StrOutputParser() | prompt_key_facts | model | StrOutputParser()


print(chain.invoke({'topic':'cricket'})) 

**Three‑point summary**

1. **Cricket’s structure and reach** – A bat‑and‑ball sport governed by the ICC, played worldwide by over 2.5 billion fans. It exists in five main formats (Test, ODI, T20I, The Hundred, domestic first‑class), each with distinct rules, durations, and audiences.

2. **Economic and cultural significance** – Cricket generates a multi‑billion‑dollar global market (e.g., IPL >US$1 billion in 2023). It is a national pastime in South Asia, the Caribbean, Australia, England, and New Zealand, shaping identity, media, and even diplomatic relations.

3. **Future trajectory** – Technological innovations (DRS, AI analytics, smart gear), new formats (100‑over matches, hybrid games), and sustainability initiatives are reshaping the sport, while expansion into emerging markets (US, China, Africa) and governance reforms aim to balance commercial growth with tradition.


In [58]:
chain.get_graph().print_ascii()

     +-------------+       
     | PromptInput |       
     +-------------+       
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *              
      +----------+         
      | ChatGroq |         
      +----------+         
            *              
            *              
            *              
   +-----------------+     
   | StrOutputParser |     
   +-----------------+     
            *              
            *              
            *              
    +----------------+     
    | PromptTemplate |     
    +----------------+     
            *              
            *              
            *              
      +----------+         
      | ChatGroq |         
      +----------+         
            *              
            *              
            *       

### Parallel Chain

In [59]:
prompt_1 = PromptTemplate(
    template="Generate short and simple notes from the following text: {text}",
    input_variables=['text']
)

prompt_2 = PromptTemplate(
    template="Generate 5 short simple QnAs like a quiz from the following text: {text}",
    input_variables=['text']
)


prompt_3 = PromptTemplate(
    template="Merge the provided notes and quiz into a single document \n {notes} \n {quiz}",
    input_variables=['notes', 'quiz']
)

parser = StrOutputParser()

In [60]:
from langchain.schema.runnable import RunnableParallel

In [63]:
text = model.invoke("Generate a detailed report on Attention is all you need paper").content

In [64]:
print (text)

# Attention Is All You Need  
**Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017).** *Attention Is All You Need*. In *Advances in Neural Information Processing Systems* (NeurIPS 2017).  

---

## 1. Executive Summary  

The 2017 NeurIPS paper “Attention Is All You Need” introduced the **Transformer** architecture, a novel neural network model that dispenses with recurrence and convolution entirely, relying solely on a **self‑attention** mechanism. The Transformer achieved state‑of‑the‑art results on several machine‑translation benchmarks (WMT 2014 English‑German and English‑French) while dramatically reducing training time and enabling far larger parallelism. Its design has since become the backbone of virtually all modern large‑scale language models (BERT, GPT, T5, etc.).

Key innovations:

| Innovation | What it solves | Impact |
|------------|----------------|--------|
| **Scaled Dot‑Product Attention** | Efficient, dif

In [65]:
# define the parallel chain

parallel_chain = RunnableParallel({
    "notes": prompt_1 | model | StrOutputParser(),
    "quiz": prompt_2 | model | StrOutputParser(),
})

# define the sequential chain for merging

merge_chain = prompt_3 | model | parser

# combine the parallel and sequential chains

chain = parallel_chain | merge_chain

print(chain.invoke({"text":text}))


# “Attention Is All You Need” – Short & Simple Notes + Quiz  
*(Vaswani et al., 2017 – NeurIPS)*  

---

## 1. What It Is  
The **Transformer** – a neural‑network architecture that relies **only on self‑attention** (no RNNs or CNNs). It became the foundation for BERT, GPT, T5, and many other state‑of‑the‑art language models.

---

## 2. Why It Matters  

| Benefit | Why It Matters |
|---------|----------------|
| **Faster training** | Fully parallelizable – no sequential bottleneck. |
| **Higher accuracy** | Outperformed RNN/CNN baselines on WMT 2014 (BLEU 28.4 EN‑DE, 41.0 EN‑FR). |
| **Scalable** | Works on GPUs/TPUs; training time grows linearly with GPU count. |
| **Modular** | Easy to stack, adapt, or extend (e.g., BERT, GPT). |

---

## 3. Core Ideas  

| Idea | Purpose | Effect |
|------|---------|--------|
| **Scaled Dot‑Product Attention** | Efficiently weighs token relationships | Core building block |
| **Multi‑Head Attention** | Learns multiple “views” of the data | Richer c

In [66]:
chain.get_graph().print_ascii()

          +---------------------------+            
          | Parallel<notes,quiz>Input |            
          +---------------------------+            
                ***             ***                
              **                   **              
            **                       **            
+----------------+              +----------------+ 
| PromptTemplate |              | PromptTemplate | 
+----------------+              +----------------+ 
          *                             *          
          *                             *          
          *                             *          
    +----------+                  +----------+     
    | ChatGroq |                  | ChatGroq |     
    +----------+                  +----------+     
          *                             *          
          *                             *          
          *                             *          
+-----------------+            +-----------------+ 
| StrOutputP