## LCEL Tinkering

> https://python.langchain.com/docs/expression_language/get_started

---

In [1]:
from dotenv import load_dotenv
import os

In [2]:
load_dotenv("/Users/shaunaksen/Documents/personal-projects/Natural-Language-Processing/LLM Concepts/llamaindex_tutorials/knowledge_graphs/.env")

True

In [3]:
from langchain_openai import AzureOpenAI, AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

from langchain_community.vectorstores import DocArrayInMemorySearch

In [4]:
chat_gpt4 = AzureChatOpenAI(
        deployment_name="gpt-4-32k",
        model="gpt-4-32k",
        openai_api_type="azure",
        azure_endpoint=os.environ['AZURE_API_BASE'],
        openai_api_key=os.environ['AZURE_API_KEY'],
        openai_api_version=os.environ['AZURE_API_VERSION'],
        max_retries=2,
        temperature=0,
    )
embedding = AzureOpenAIEmbeddings(
    model="text-embedding-ada-002",
    openai_api_type='azure',
    azure_endpoint=os.environ['AZURE_API_BASE'],
    openai_api_key=os.environ['AZURE_API_KEY'],
    openai_api_version=os.environ['AZURE_API_VERSION']
)

In [5]:
gpt_35_turbo = AzureChatOpenAI(
    deployment_name="gpt-35-turbo",
    model="gpt-35-turbo",
    openai_api_type="azure",
    azure_endpoint=os.environ['AZURE_API_BASE'],
    openai_api_key=os.environ['AZURE_API_KEY'],
    openai_api_version=os.environ['AZURE_API_VERSION'],
    max_retries=2,
    temperature=1,
)

In [6]:
gpt_35_turbo_instruct = AzureOpenAI(
    deployment_name="gpt-35-turbo-instruct",
    model="gpt-35-turbo-instruct",
    openai_api_type="azure",
    azure_endpoint=os.environ['AZURE_API_BASE'],
    openai_api_key=os.environ['AZURE_API_KEY'],
    openai_api_version=os.environ['AZURE_API_VERSION'],
    max_retries=2,
    temperature=1,
)

## Basic example: prompt + model + output parser

In [7]:
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
model = gpt_35_turbo
output_parser = StrOutputParser()

chain = prompt | model | output_parser

user_input = {"topic": "cricket"}

chain.invoke(user_input)

'Why did the cricket player bring string to the game? \n\nSo he could tie the score!'

The `|` symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one component as input into the next component.

In this chain the user input is passed to the prompt template, then the prompt template output is passed to the model, then the model output is passed to the output parser. Let’s take a look at each component individually to really understand what’s going on.

### 1. prompt

prompt is a `BasePromptTemplate`, which means it takes in a dictionary of template variables and produces a PromptValue. A PromptValue is a wrapper around a completed prompt that can be passed to either an LLM (which takes a string as input) or ChatModel (which takes a sequence of messages as input). It can work with either language model type because it defines logic both for producing BaseMessages and for producing a string.

In [8]:
prompt_output = prompt.invoke(user_input)
prompt_output

ChatPromptValue(messages=[HumanMessage(content='tell me a joke about cricket')])

### 2. Model

The PromptValue is then passed to model. In this case our model is a ChatModel, meaning it will output a BaseMessage.

In [9]:
model_output = model.invoke(prompt_output)

model_output

AIMessage(content="Why don't cricketers like rain? Because it washes out all their runs!", response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})

In [10]:
type(model_output)

langchain_core.messages.ai.AIMessage

If our model was an LLM, it would output a string.


In [11]:
llm = gpt_35_turbo_instruct

llm.invoke(prompt_output)

'\n\nAI: Why did the cricket go to the doctor? Because he was feeling crick-ly.'

### 3. Output parser



And lastly we pass our model output to the output_parser, which is a BaseOutputParser meaning it takes either a string or a BaseMessage as input. 
The StrOutputParser specifically simple converts any input into a string.

In [12]:
output_parser.invoke(model_output)

"Why don't cricketers like rain? Because it washes out all their runs!"

### 4. Entire Pipeline

To follow the steps along:

- We pass in user input on the desired topic as {"topic": "ice cream"}
- The prompt component takes the user input, which is then used to construct a PromptValue after using the topic to construct the prompt.
- The model component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The generated output from the model is a ChatMessage object.
- Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned from the invoke method.

## RAG Search Example

In [13]:
texts = [
    "mini is 28 years old",
    "mini likes to eat paneer",
    "mini lives in kolkata, jhansi and bangalore"
]

In [14]:
vectorstore = DocArrayInMemorySearch.from_texts(
    texts, embedding=embedding
)

In [15]:
retriever = vectorstore.as_retriever()

In [16]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

In [19]:
prompt = ChatPromptTemplate.from_template(template)
model = gpt_35_turbo
output_parser = StrOutputParser()

In [18]:
prompt.invoke({
    "context": "context",
    "question": "question"
})

ChatPromptValue(messages=[HumanMessage(content='Answer the question based only on the following context:\ncontext\n\nQuestion: question\n')])

In [20]:
setup_and_retrieval = RunnableParallel(
    {
        "context": retriever,
        "question": RunnablePassthrough()
    }
)

In [21]:
chain = setup_and_retrieval | prompt | model | output_parser

In [22]:
chain.invoke("where does mini live?")

'Mini lives in Kolkata, Jhansi and Bangalore.'

In this case, the composed chain is:

`chain = setup_and_retrieval | prompt | model | output_parser`

To explain this, we first can see that the prompt template above takes in context and question as values to be substituted in the prompt. Before building the prompt template, we want to retrieve relevant documents to the search and include them as part of the context.

As a preliminary step, we’ve setup the retriever using an in memory store, which can retrieve documents based on a query. This is a runnable component as well that can be chained together with other components, but you can also try to run it separately:

In [23]:
retriever.invoke("where does mini live?")

[Document(page_content='mini is 28 years old'),
 Document(page_content='mini lives in kolkata, jhansi and bangalore'),
 Document(page_content='mini likes to eat paneer')]

We then use the `RunnableParallel` to prepare the expected inputs into the prompt by using the entries for the retrieved documents as well as the original user question, using the retriever for document search, and RunnablePassthrough to pass the user’s question:

```
setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)
```

To review, the complete chain is:

```
setup_and_retrieval = RunnableParallel(
    {"context": retriever, "question": RunnablePassthrough()}
)
chain = setup_and_retrieval | prompt | model | output_parser
```


With the flow being:

1. The first steps create a `RunnableParallel` object with two entries. The first entry, `contex`t will include the document results fetched by the `retriever`. The second entry, `question` will contain the user’s original question. To pass on the question, we use `RunnablePassthrough` to copy this entry.

2. Feed the dictionary from the step above to the prompt component. It then takes the user input which is question as well as the retrieved document which is context to construct a prompt and output a `PromptValue`.

3. The `model` component takes the generated prompt, and passes into the OpenAI LLM model for evaluation. The generated output from the model is a `ChatMessage` object.

4. Finally, the output_parser component takes in a ChatMessage, and transforms this into a Python string, which is returned from the invoke method.


> As the prompt in the 2nd setep needs 2 outputs, we use a `RunnableParallel` in the `setup_and_retrieval` so that these 2 o/ps can be fed in parallel to the next component (prompt)

__What is `Runnable`__?


A unit of work that can be invoked, batched, streamed, transformed and composed.

The Runnable protocol is implemented for most components. This is a standard interface, which makes it easy to define custom chains as well as invoke them in a standard way. The standard interface includes:

- invoke/ainvoke: Transforms a single input into an output.

- batch/abatch: Efficiently transforms multiple inputs into outputs.

- stream/astream: Streams output from a single input as it’s produced.

- astream_log: Streams output and selected intermediate results from an input.

The input type and output type varies by component:

![](https://gcdnb.pbrd.co/images/LRnTQtlaNk9W.png)


All runnables expose input and output schemas to inspect the inputs and outputs: - input_schema: an input Pydantic model auto-generated from the structure of the Runnable - output_schema: an output Pydantic model auto-generated from the structure of the Runnable

Let’s take a look at these methods. To do so, we’ll create a super simple PromptTemplate + ChatModel chain.

In [24]:
prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
model = gpt_35_turbo
output_parser = StrOutputParser()

chain = prompt | model | output_parser

user_input = {"topic": "cricket"}

chain.invoke(user_input)

'Why did the cricket player go to jail?\n\nBecause he bowled a maiden over!'

### Input Schema

In [25]:
# The input schema of the chain is the input schema of its first part, the prompt.
chain.input_schema.schema()

{'title': 'PromptInput',
 'type': 'object',
 'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}

In [26]:
prompt.input_schema.schema()

{'title': 'PromptInput',
 'type': 'object',
 'properties': {'topic': {'title': 'Topic', 'type': 'string'}}}

In [28]:
model.input_schema.schema()

{'title': 'AzureChatOpenAIInput',
 'anyOf': [{'type': 'string'},
  {'$ref': '#/definitions/StringPromptValue'},
  {'$ref': '#/definitions/ChatPromptValueConcrete'},
  {'type': 'array',
   'items': {'anyOf': [{'$ref': '#/definitions/AIMessage'},
     {'$ref': '#/definitions/HumanMessage'},
     {'$ref': '#/definitions/ChatMessage'},
     {'$ref': '#/definitions/SystemMessage'},
     {'$ref': '#/definitions/FunctionMessage'},
     {'$ref': '#/definitions/ToolMessage'}]}}],
 'definitions': {'StringPromptValue': {'title': 'StringPromptValue',
   'description': 'String prompt value.',
   'type': 'object',
   'properties': {'text': {'title': 'Text', 'type': 'string'},
    'type': {'title': 'Type',
     'default': 'StringPromptValue',
     'enum': ['StringPromptValue'],
     'type': 'string'}},
   'required': ['text']},
  'AIMessage': {'title': 'AIMessage',
   'description': 'Message from an AI.',
   'type': 'object',
   'properties': {'content': {'title': 'Content',
     'anyOf': [{'type': '

### Output Schema

A description of the outputs produced by a Runnable. This is a Pydantic model dynamically generated from the structure of any Runnable. You can call .schema() on it to obtain a JSONSchema representation.

In [29]:
chain.output_schema.schema()

{'title': 'StrOutputParserOutput', 'type': 'string'}

In [30]:
chain.batch([{"topic": "bears"}, {"topic": "cats"}])


['Why did the teddy bear say no to dessert? \nBecause it was already stuffed!',
 "Why don't cats play poker in the jungle? Too many cheetahs!"]

In [31]:
# You can set the number of concurrent requests by using the max_concurrency parameter

chain.batch([{"topic": "bears"}, {"topic": "cats"}], config={"max_concurrency": 5})

["Why do bears have fur coats? \n\nBecause they'd look silly in leather!",
 'Why did the cat wear a fancy dress? Because she was feline fabulous!']

### Parallelism

Let’s take a look at how LangChain Expression Language supports parallel requests. For example, when using a RunnableParallel (often written as a dictionary) it executes each element in parallel.



In [32]:
chain1 = ChatPromptTemplate.from_template("tell me a joke about {topic}") | model
chain2 = (
    ChatPromptTemplate.from_template("write a short (2 line) poem about {topic}")
    | model
)
combined = RunnableParallel(joke=chain1, poem=chain2)

In [33]:
type(combined)

langchain_core.runnables.base.RunnableParallel

In [34]:
%%time
chain1.invoke({"topic": "bears"})

CPU times: user 9.7 ms, sys: 2.61 ms, total: 12.3 ms
Wall time: 1.36 s


AIMessage(content='Why did the bear break up with his girlfriend? \n\nBecause he found someone "furrier"!', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})

In [35]:
%%time
chain2.invoke({"topic": "bears"})

CPU times: user 7.97 ms, sys: 1.78 ms, total: 9.75 ms
Wall time: 1.21 s


AIMessage(content='Big and brown, they roam around\nFierce and strong, they rule the ground.', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})

In [36]:
%%time
combined.invoke({"topic": "bears"})

CPU times: user 22.2 ms, sys: 5.53 ms, total: 27.7 ms
Wall time: 1.39 s


{'joke': AIMessage(content='Why did the bear wear a raincoat?\n\nBecause it was just a drizzly bear!', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}}),
 'poem': AIMessage(content='In the wild they roam,\nFurry giants, kings of home.', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})}

In [38]:
%%time
combined.batch([{"topic": "bears"}, {"topic": "cats"}])

CPU times: user 47.2 ms, sys: 8.18 ms, total: 55.4 ms
Wall time: 1.39 s


[{'joke': AIMessage(content="Why don't bears like fast food? Because they can't catch it!", response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}}),
  'poem': AIMessage(content='Bears roam free, \nMajestic creatures of the wild.', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})},
 {'joke': AIMessage(content='Why did the cat sit on the computer?\n\nTo keep an eye on the mouse!', response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}}),
  'poem': AIMessage(content="Soft, sleek, and sweet,\nA feline's purr can't be beat.", response_metadata={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {}})}]