In [1]:
#!pip install docarray

# Langchain for LLM App Development 

We talked about how building an LLM app involves doing some prompt management 
where we can either prepare the input data from the user with some 
pre-prompting, or do some post-prompting and some cleaning up after the LLM 
gives an output to ensure that our app performs the functionalities as expected.

So, this kind of workflow usually involves a lot of abstractions where prompts 
are no longer static pieces of text, but dynamic, they have to integrate 
information.

![](./images/Notebook_4-dynamic_prompt.png)

This dynamics requirement from a prompt will lead to the need for creating certain types of abstractions to properly handle and manage prompts effectively.

Another need in the context of more complex LLM App development, is the need for chaining prompts together, meaning connecting the output of one prompt to another. This is often the case for when prompts might be too large and a single call to the LLM won't be enough to solve the problem or the context window (maximum tokens/words the model can read and writer per request) is exceeded.

![](./images/Notebook_4-prompt_chaining.png)

# Lanchain

[Langchain](https://python.langchain.com/docs/get_started/introduction.html) is a framework created by Harrison Chase that facilitates the creation and management of dynamic prompts and chaining between prompts.

Its main features are:
- **Components**: abstractions for working with LMs
- **Off-the-shelf chains**: assembly of components for accomplishing certain higher-level tasks

With langchain it becomes much easier to create what are called Prompt Templates, which are prompts that can take in user data and abstract away the need for typing out everything that is required for a task to get done.

Let's take a look at some simple examples to get started.

In order to create an application with LangChain, we need to understand its core components:

- Models
- Prompts
- Indexes
- Chains
- Agents (won't go into it)

![](2023-08-17-14-48-39.png)

**Models**

abstractions over the LLM APIs like the ChatGPT API.​

In [2]:
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

chat_model = ChatOpenAI(temperature=0)

You can predict outputs from both LLMs and ChatModels:

In [3]:
chat_model.predict("hi!")
# Output: "Hi"

'Hello! How can I assist you today?'

You can also use the predict method over a string input:

In [4]:
text = "What would be a good name for a dog that loves to nap??"


chat_model.predict(text)
# Output: "Snuggles"

'Snuggles'

Finally, you can use the `predict_messages` method over a list of messages:

In [5]:
from langchain.schema import HumanMessage

text = "What would be a good dog name for a dog that loves to nap?"
messages = [HumanMessage(content=text)]

chat_model.predict_messages(messages)


AIMessage(content='A good dog name for a dog that loves to nap could be "Snooze" or "Snuggles".', additional_kwargs={}, example=False)

**Prompts**

Prompt Templates are useful abstractions for reusing prompts. 

They are used to provide context for the specific task that the language model needs to complete. 
A simple example is a `PromptTemplate` that formats a string into a prompt:

In [6]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate.from_template("What is a good dog name for a dog that loves to {activity}?")
prompt.format(activity="nap")
# Output: "What is a good dog name for a dog that loves to nap?"

'What is a good dog name for a dog that loves to nap?'

# Chains

In [7]:
from langchain.chains import LLMChain

chain = LLMChain(
    llm=ChatOpenAI(),
    prompt=prompt,
)
chain.run("eat")

'Biscuit'

You can also create more complex ChatPromptTemplates that contains a list of ChatMessageTemplates:

In [8]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

template = "You are a helpful assistant that translates {input_language} to {output_language}."
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

[SystemMessage(content='You are a helpful assistant that translates English to French.', additional_kwargs={}),
 HumanMessage(content='I love programming.', additional_kwargs={}, example=False)]

**Output Parsers**

OutputParsers convert the raw output from an LLM into a format that can be used downstream. Here is an example of an OutputParser that converts a comma-separated list into a list:

In [9]:
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""

    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

CommaSeparatedListOutputParser().parse("hi, bye")
# Output: ['hi', 'bye']

['hi', 'bye']

**LLMChain**

Finally, you can combine all these components into an LLMChain:

In [10]:

from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.chains import LLMChain
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""

    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generated 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""
system_message_prompt = SystemMessagePromptTemplate.from_template(template)
human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chain = LLMChain(
    llm=ChatOpenAI(),
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)
chain.run("dogs")
# Output: ['Golden Retriever','Labrador Retriever','German Shepherd','Bulldog','Poodle']

['Golden Retriever',
 'Labrador Retriever',
 'German Shepherd',
 'Bulldog',
 'Beagle']

This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to an LLM, and then pass the output through an output parser.

Ok, so these are the basics of langchain. But how can we leverage these abstraction capabilities inside our LLM app application?

One of the best applications of langchain is for the "chat with your data"-types of applications, where the user uploads a document like a pdf or a .txt file, and is able to query that document using langchain powered by an LLM like ChatGPT. 

# LangChain Lab Exercises

Let's take a look at a simple example of **Simple Sequential Chain**:

In [11]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain
from langchain.chains import SequentialChain

In [12]:
# This is an LLMChain to write a synopsis given a title of a play.
llm = ChatOpenAI(temperature=.7)
template = """You are a learning assistant. Given a technical subject, write down 5 fundamental concepts to understand it.
Subject: {subject}
Learning assistant: The 5 fundamental concepts are:"""
prompt_template = PromptTemplate(input_variables=["subject"], template=template)
learning_chain = LLMChain(llm=llm, prompt=prompt_template)

In [13]:
# This is an LLMChain to write a review of a play given a synopsis.
llm = ChatOpenAI(temperature=.7)
template = """You are an expert teacher in all technical and scientific fields. Given a list of 5 concepts, write down a simple intuitive explanation of each concept.
Concepts:
{concepts}
Intuitive explanations:"""
prompt_template = PromptTemplate(input_variables=["concepts"], template=template)
explanation_chain = LLMChain(llm=llm, prompt=prompt_template)

In [14]:
# This is the overall chain where we run these two chains in sequence.
learning_overall_chain = SimpleSequentialChain(chains=[learning_chain, explanation_chain], verbose=True)

In [15]:
output = learning_overall_chain.run("3D printing")
output



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m1. Additive Manufacturing: 3D printing is a form of additive manufacturing, where objects are created by adding layers of material on top of each other. Understanding the process of additive manufacturing is essential to grasp the fundamentals of 3D printing.

2. CAD Modeling: Computer-Aided Design (CAD) is a crucial aspect of 3D printing. Learning how to create digital 3D models using CAD software allows users to design and customize objects before printing them. Understanding the basics of CAD modeling is necessary to utilize 3D printers effectively.

3. Slicing: Slicing is the process of dividing a 3D model into multiple layers or slices. Each layer is then printed individually to create the final object. Understanding how slicing works, including parameters such as layer height and print speed, is essential for optimal 3D printing results.

4. Materials and Filaments: Different materials and filaments can be used 

"1. Additive Manufacturing: Additive manufacturing is like building a house with Legos. Instead of carving a solid object from a block of material, we stack thin layers on top of each other until we create the desired shape. It's like adding one Lego piece at a time to create something unique and custom-made.\n\n2. CAD Modeling: CAD modeling is like designing a virtual object on a computer. It's like using a digital sculpting tool to shape and mold a 3D model, just like an artist would shape clay with their hands. With CAD modeling, we can create precise and detailed designs before bringing them to life with a 3D printer.\n\n3. Slicing: Slicing is like cutting a cake into multiple layers. Just like we cut a cake into slices, we divide a 3D model into many thin layers. Each layer represents a slice that will be printed individually. Think of it as creating a stack of transparent sheets, where each sheet represents a different layer of the object.\n\n4. Materials and Filaments: Materials

**Sequential Chains**

In [16]:
# This is an LLMChain to write a synopsis given a title of a play.
llm = ChatOpenAI(temperature=.7)
template = """You are a learning assistant. Given a technical subject, write down 5 fundamental concepts to understand it.
Subject: {subject}
Field: {field}
Learning assistant: The 5 fundamental concepts are:"""
prompt_template = PromptTemplate(input_variables=["subject", "field"], template=template)
learning_chain = LLMChain(llm=llm, prompt=prompt_template, output_key="concepts")

In [17]:
# This is an LLMChain to write a review of a play given a synopsis.
llm = ChatOpenAI(temperature=.7)
template = """You are an expert teacher in all technical and scientific fields. Given a list of 5 concepts, write down a simple intuitive explanation of each concept.

Concepts:
{concepts}
Intuitive explanations:"""
prompt_template = PromptTemplate(input_variables=["concepts"], template=template)
explanation_chain = LLMChain(llm=llm, prompt=prompt_template, output_key="explanations")

In [18]:
# This is the overall chain where we run these two chains in sequence.
learning_overall_chain = SequentialChain(
    chains=[learning_chain, explanation_chain],
    input_variables=["subject", "field"],
    # Here we return multiple variables
    output_variables=["concepts", "explanations"],
    verbose=True)

In [19]:
learning_overall_chain({"subject":"3D Printing", "field": "engineering"})



[1m> Entering new SequentialChain chain...[0m

[1m> Finished chain.[0m


{'subject': '3D Printing',
 'field': 'engineering',
 'concepts': '1. Additive Manufacturing: 3D printing is a type of additive manufacturing, which means it creates objects by adding layers of material on top of each other. Understanding this concept is crucial to comprehend the basic principle behind 3D printing technology.\n\n2. CAD Design: Computer-Aided Design (CAD) is an essential component of 3D printing. CAD software allows engineers to create digital models of objects that can then be printed in 3D. Understanding how to design and manipulate objects using CAD software is fundamental to the 3D printing process.\n\n3. Materials and Filaments: Different materials can be used for 3D printing, such as plastics, metals, ceramics, and even biological materials. Understanding the characteristics and properties of these materials and filaments is important to choose the appropriate material for a specific 3D printing project.\n\n4. Printing Technologies: There are various 3D printing te

# Simple Q&A Example

In [20]:
#!pip install docarray
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown

In [21]:
import pandas as pd
df = pd.read_csv("./superheroes.csv")
df.head()

Unnamed: 0,Superhero Name,Superpower,Power Level,Catchphrase
0,Captain Thunder,Bolt Manipulation,90,Feel the power of the storm!
1,Silver Falcon,Flight and Agility,85,"Soar high, fearlessly!"
2,Mystic Shadow,Invisibility and Illusions,78,Disappear into the darkness!
3,Blaze Runner,Pyrokinesis,88,Burn bright and fierce!
4,Electra-Wave,Electric Manipulation,82,Unleash the electric waves!


In [22]:
file = 'superheroes.csv'
loader = CSVLoader(file_path=file)

Now, let's set up our Vector store (we'll talk about what that is in a second):

In [23]:
from langchain.indexes import VectorstoreIndexCreator

In [24]:
index = VectorstoreIndexCreator(vectorstore_cls=DocArrayInMemorySearch).from_loaders([loader])

In [25]:
query = "Tell me the catch phrase for Captain Thunder"

In [26]:
response = index.query(query)

In [27]:
display(Markdown(response))

 Captain Thunder's catchphrase is "Feel the power of the storm!"

Ok, cool! So, now let's backtrack a little bit and discuss what is going on.

If we want LLMs to get access to our data in order to help us get insights, we have one major problem: LLMs have a limited context window, meaning they can only process a few thousand words at a time which constraints their ability to answer questions (for example) of really big documents. 

That's where things like embeddings and vector stores come into play, these are way of representing the data in such a way to facilitate the access by an LLM. Let's break it down, starting with embeddings:

Embeddings are numerical representations of data, meaning, their a way to represent data such that the semantic information of that data is properly reflected in the distances between the data points in the embedding.

That means that text with similar content will have similar vectors (which is a way of saying that they'll be closer together in the embedding).

![](2023-07-30-19-29-27.png)

[LangChain for LLM Application Development by Deeplearning.ai](https://learn.deeplearning.ai/langchain/lesson/1/introduction)

Now, let's talk about vector databases.

A vector database is a way to store these embeddings, these numerical representations that we just discussed.

The pipeline is:
- In coming document
- Create chunks of text from that document
- Embed each chunk
- Store these embeddings

![](2023-07-30-19-32-13.png)

[LangChain for LLM Application Development by Deeplearning.ai](https://learn.deeplearning.ai/langchain/lesson/1/introduction)

Now, we can query those embeddings stored in the vector database to get the most relevant responses!

So when we create a query, that query is first embedded and then we compare its embedding with the embeddings we have stored in the vector database. We then select the N-most similar embeddings, and pass those to the LLM.

![](images/2023-07-30-19-34-48.png)

[LangChain for LLM Application Development by Deeplearning.ai](https://learn.deeplearning.ai/langchain/lesson/1/introduction)

- __Embedding__: the useful data representation that will be used as the thing (or things) we can query
- __Vector Database__: the vectorization of the embedding chunks (the database for the embeddings where we can do the query)

### Embeddings


![](./images/embeddings.png)

![](./images/embeddings2.png)


### Vector Database




![](./images/2023-07-17-12-48-28.png)



# References
- https://python.langchain.com/docs/get_started/introduction.html
- https://medium.com/@remitoffoli/a-visual-guide-to-llm-powered-app-architecture-57e47426a92f
- [LangChain for LLM App Development short course by coursera](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)
- [LLM Evaluation](https://learn.deeplearning.ai/langchain/lesson/6/evaluation)
[Models, Prompts, parsers, memory and chains from this langchain for](https://learn.deeplearning.ai/langchain/lesson/7/agents)
- [Chat With Your Data - Retrieval](https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/5/retrieval)
- [Emebeddings simple definition](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)
- [Vector DBs - simple definition](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)