## **LangChain Introduction**

LangChain is a framework for developing applications powered by large language models (LLMs).

Key Components:
* Chat Models: Unified way to use different LLMs from different providers
* Prompt Templates: Unique approach to prompt management to allow injection of conversation history, reuse of prompts
* Vector Stores: Integration package to convert text to embeddings for efficient retrieval of data for RAG applications
* Document Loaders: Load in multiple documents (e.g. PDF, DOCX, HTML, Markdown, Powerpoint) from different resources to give knowledge to your chatbot.

## **Dependencies**

To use the LangChain framework to build LLM powered applications, we will need to install these dependencies:
* `langchain`: Contains the base infrastructure and some commonly used chains for RAG use cases.
* `langchain_core`: Contains most of the abstractions which is key for building LLM applications. (e.g. Prompt Templates, Messages, Tools)
* `langchain_openai`: Contains the OpenAI integration package which gives us access to OpenAI models
* `langchain_community`: Contains the community integrations with different providers to load in documents (Document Loader) and index document (vector databases)

In [27]:
%pip install -q langchain langchain_openai langchain_community


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [28]:
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key!")

## **LangChain Basics - Prompting with Chat Models**

### **Messages Abstraction**

When we use the OpenAI SDK, we pass in our messages to the model in this manner:
```python
response = client.chat.completions.create(
    ...,
    messages = [
        {'role': 'system', 'content': '...'}.
        {'role': 'user', 'content': '...'}.
        {'role': 'assistant', 'content': '...'}.
    ]
    ...
)

However, if we were to use another providers which used token based prompting(e.g. Llama):
```
messages = <|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>

What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```

LangChain will unify the way we communicate with this models with the [`messages`](https://python.langchain.com/docs/concepts/messages/) abstraction:
* `SystemMessage`: Corresponds to the `system` level prompt in all the different providers
* `HumanMessage`: Corresponds to the `user` queries to the model
* `AIMessage`: Corresponds to the `assistant` queries to the model

In [3]:
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
messages = [
    SystemMessage("You are an AI Chatbot who will reply to all responses in Singlish"),
    HumanMessage("Complain about inflaton in Singapore"),
]
print(messages)

[SystemMessage(content='You are an AI Chatbot who will reply to all responses in Singlish', additional_kwargs={}, response_metadata={}), HumanMessage(content='Complain about inflaton in Singapore', additional_kwargs={}, response_metadata={})]


### **Chat Models**

With the messages we have defined above, we are ready to prompt for LLMs.

We will import `ChatOpenAI` class from `langchain_openai` in order to use the enhanced functionalities that LangChain offers in their Chat Models

In [4]:
from langchain_openai import ChatOpenAI
client = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0.0)

In order to get the LLM response, we just need to use the `.invoke()` method which exists for most of the components in LangChain.

**`.invoke()` is a standardised method across most of the LangChain components which takes in an input in a specified format and returns a specified output**

Read what `.invoke()` does for the different chat models [here](https://python.langchain.com/api_reference/core/language_models/langchain_core.language_models.chat_models.BaseChatModel.html).

In [5]:
response = client.invoke(messages)
print(response)
print(response.content)

content='Wah, inflation in Singapore really quite jialat leh! Everything also so expensive now, like food, transport, even kopi also kena increase price. Last time can buy chicken rice for $3.50, now must pay $5 or more, siao ah! \n\nAnd don’t talk about housing, rental prices also sky high, many people struggling to find affordable place to stay. Sometimes feel like salary no increase but everything else keep going up, really stress one. Hope government can do something about it, otherwise how to survive like that?' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 111, 'prompt_tokens': 33, 'total_tokens': 144, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0705bf87c0', 'finish_reason': 'stop', 'logprobs': None} id='run-cd

## **LangChain Basics - Prompt Templates**

*Prompt: Text Input that we pass into a chat model*

*Prompt Template: Text Input that can be modified at run time that we pass into a chat model*

With Prompt Templates, we can:
* Reuse the same backbone template for similar task
* Inject some arbitary information into the prompt for the LLM to produce more contextualised answers

### **Translation Task with Prompt Templates**

To create a template with different roles for the chat models, we first instantiate a `ChatPromptTemplate` and pass in a list of tuples in this syntax:
`('role', 'instructions')`

* `role` defines the type of message
* `instructions/prompt` is the content of message corresponding to the role

Any dynamic parts of the prompts that needs to be modified in runtime should be wrapped in `{...}`.
(e.g. `('human', 'Translate {word} to English)`)

In [None]:
from langchain_core.prompts import ChatPromptTemplate
template = ChatPromptTemplate(
    [
        ('system', 'You are an expert in translation from English to Chinese'),
        ('human', 'Translate {word} to English')
    ]
)

messages=[SystemMessage(content='You are an expert in translation from English to Chinese', additional_kwargs={}, response_metadata={}), HumanMessage(content='Translate 人工智能 to English', additional_kwargs={}, response_metadata={})]


To use the template, we call the `.invoke()` method, providing a dictionary where the keys match the dynamic placeholders in the template and the values specify the content to fill in.

In [10]:
print(template.invoke(
    {
        'word': '人工智能',
    }
))

messages=[SystemMessage(content='You are an expert in translation from English to Chinese', additional_kwargs={}, response_metadata={}), HumanMessage(content='Translate 人工智能 to English', additional_kwargs={}, response_metadata={})]


Notice we get back a list of messages which is perfect to pass a chat model to give us a response:

We can now translate different chinese words without typing the `"Translate ... to English!"` repeatedly!

In [13]:
prompt1 = template.invoke({
    'word': '人工智能'
})
prompt2 = template.invoke({
    'word': '神经网络'
})
print(client.invoke(prompt1).content)
print('===' * 10)
print(client.invoke(prompt2).content)

The translation of "人工智能" to English is "artificial intelligence."
The translation of "神经网络" to English is "neural network."


## **LangChain Bridging - Pydantic**

Pydantic is the most widely used data validation library in Python.

With Pydantic's BaseModel, we can define a schema (ruleset) which will be then passed to the chat model as instructions to format our desired output.

Pydantic will validate and coerce data, it will also catch for unexpected mistakes (e.g. LLM returns "11" instead of 11).



In this example, we will try to model some attributes of a person (e.g. name, height).

We first define a class called `Person` inheriting the propeties and methods from `BaseModel`.

After so, we simply define the key name and its corresponding datatype in this format:
`key: datatype`

In [2]:
from pydantic import BaseModel, Field

class Person(BaseModel):
    """Information about a person"""
    name: str
    height: float

Let's test our created model by passing in some data:

In [4]:
person1 = Person(name = "Alice", height = 1.8)
print(person1.name)
print(person1.height)

Alice
1.8


Passing in data with the wrong type will result in **validation error**:

In [None]:
person2 = Person(name = 000, height = "abc")
print(person2.name)
print(person2.height)

By default, Pydantic will also try to coerce some types if possible (e.g. `string` to `float`)

In this example, we do not get any validation error when passing `height = "1.72"`

In [6]:
person3 = Person(name = "Alice", height = "1.72")

In [8]:
print(person3.name)
print(person3.height)

Alice
1.72


To ensure that our `BaseModel` we create works optimally with chat models, we need to provide context for the different fields (e.g. what should be stored in that key)

We will use the `Field` to acheive this, and in building LLM applications, this will act as instructions for the LLM to understand what information to populate into each field.

In [22]:
from pydantic import BaseModel, Field

class Person(BaseModel):
    """Information about a person"""
    name: str = Field(description="Name of the Person")
    height: float = Field(description= "Height of the person in meters")
print(Person.__fields__)

{'name': FieldInfo(annotation=str, required=True, description='Name of the Person'), 'height': FieldInfo(annotation=float, required=True, description='Height of the person in meters')}


/tmp/ipykernel_4694/3237870130.py:7: PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  print(Person.__fields__)


## **LangChain Basics - Structured Output**

**Prerequisite: Static Typing**

**Prerequisite: Pydantic**  

*If you're unfamiliar with Pydantic, be sure to watch the bridging lecture for an overview.*

### **Information Extraction Task**

Given this paragraph, extract the marks for the different students.

```
Adam scored 30 / 50 marks for math and 20 / 50 marks for science.
Betty scored 1.5 times higher than Adam for math and just passed the science examination
```

Let's define a Pydantic model which contains:
* Fields as the different subject
* Suitable Datatype for the different field
* Description of the field

In [32]:
from pydantic import BaseModel, Field
from typing import List
class StudentMarks(BaseModel):
    name: str = Field(..., description = "Name of the student")
    math: float = Field(..., description="Student's Mark for Math")
    science: float = Field(..., description="Student's Mark for Science")
class ClassMarks(BaseModel):
    marks: List[StudentMarks] = Field(..., description="Combined list of student marks")

To get structured output, we just need to bind our chat model with this `BaseModel` with the `.with_structured_output()` method

Instantiate a normal chat model:

In [33]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model = 'gpt-4o-mini', temperature = 0.0)

Instantiate a structured output chat model with the `.with_structured_output()` method:

In [36]:
llm_structured = llm.with_structured_output(ClassMarks)

Let's invoke the chat model with structured output with the paragraph above:

In [37]:
print(llm_structured.invoke(["""
Adam scored 30 / 50 marks for math and 20 / 50 marks for science.
Betty scored 1.5 times higher than Adam for math and just passed the science examination
"""]))

marks=[StudentMarks(name='Adam', math=30.0, science=20.0), StudentMarks(name='Betty', math=45.0, science=25.0)]
