In [None]:
!pip install openai
!pip install langchain
!pip install langchain_community
!pip install langchain_openai
!pip install langchainhub
!pip install python-dotenv

In [None]:
import os
from dotenv import load_dotenv
from openai import OpenAI

# loading from a .env file
# load_dotenv(dotenv_path="/full/path/to/your/.env")

# or 
# if you're on google colab just uncomment below and replace with your openai api key
# os.environ["OPENAI_API_KEY"] = "<your-openai-api-key>"

# Langchain for LLM App Development 

We talked about how building an LLM app involves doing some prompt management 
where we can either prepare the input data from the user with some 
pre-prompting, or do some post-prompting and some cleaning up after the LLM 
gives an output to ensure that our app performs the functionalities as expected.

So, this kind of workflow usually involves a lot of abstractions where prompts 
are no longer static pieces of text, but dynamic, they have to integrate 
information.

![](./images/Notebook_4-dynamic_prompt.png)

This dynamics requirement from a prompt will lead to the need for creating certain types of abstractions to properly handle and manage prompts effectively.

Another need in the context of more complex LLM App development, is the need for chaining prompts together, meaning connecting the output of one prompt to another. This is often the case for when prompts might be too large and a single call to the LLM won't be enough to solve the problem or the context window (maximum tokens/words the model can read and writer per request) is exceeded.

![](./images/Notebook_4-prompt_chaining.png)

# Lanchain

[Langchain](https://python.langchain.com/docs/get_started/introduction.html) is a framework created by Harrison Chase that facilitates the creation and management of dynamic prompts and chaining between prompts.

Its main features are:
- **Components**: abstractions for working with LMs
- **Off-the-shelf chains**: assembly of components for accomplishing certain higher-level tasks

With langchain it becomes much easier to create what are called Prompt Templates, which are prompts that can take in user data and abstract away the need for typing out everything that is required for a task to get done.

Let's take a look at some simple examples to get started.

In order to create an application with LangChain, we need to understand its core components:

- Models
- Prompts
- Output Parsers

![](2023-08-17-14-48-39.png)

**Models**

abstractions over the LLM APIs like the ChatGPT API.​

In [3]:
#!pip install langchain
# !pip install langchain-openai

In [1]:
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAI
import os

chat_model = ChatOpenAI(api_key=os.environ["OPENAI_API_KEY"])

In [2]:
chat_model

ChatOpenAI(client=<openai.resources.chat.completions.Completions object at 0x12623b5d0>, async_client=<openai.resources.chat.completions.AsyncCompletions object at 0x12672cc10>, openai_api_key=SecretStr('**********'), openai_api_base='https://api.openai.com/v1', openai_proxy='')

You can predict outputs from both LLMs and ChatModels:

In [3]:
chat_model.invoke("hi! Tell me a quick story about large language models")
# Output: "Hi"

AIMessage(content='Once upon a time, there was a group of researchers who were fascinated by the potential of large language models. They worked tirelessly to develop algorithms and frameworks that could process and generate vast amounts of text using artificial intelligence. These models were trained on massive datasets of text from books, articles, and websites, allowing them to understand and generate human-like language.\n\nAs these language models grew in size and complexity, they began to amaze the world with their capabilities. They could generate coherent and creative stories, poems, and even write code. People marveled at the way these models could understand and respond to human language, making them invaluable tools for various applications such as chatbots, translation services, and content generation.\n\nHowever, as these large language models became more advanced, concerns arose about their potential misuse and ethical implications. Some worried about the spread of misinf

In [4]:
output = chat_model.invoke("hi! Tell me a joke about an instructor who is always having issues when he tries to run live demos during his live-trainings.")

In [5]:
output

AIMessage(content="Why did the instructor's live demo fail? Because every time he pressed play, it was always on the wrong track!", response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 35, 'total_tokens': 59}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-ac014fed-231d-4941-8686-6f9183ee7551-0', usage_metadata={'input_tokens': 35, 'output_tokens': 24, 'total_tokens': 59})

In [6]:
from IPython.display import display, Markdown

Markdown(output.content)

Why did the instructor's live demo fail? Because every time he pressed play, it was always on the wrong track!

In [7]:
# Display text with markdown formatting
from IPython.display import Markdown

# Display the text
Markdown(output.content)

Why did the instructor have trouble running live demos during his trainings?

Because every time he tried to demonstrate something, it always turned into a "live fail" instead of a live demo!

**Prompts**

Prompt Templates are useful abstractions for reusing prompts. 

They are used to provide context for the specific task that the language model needs to complete. 
A simple example is a `PromptTemplate` that formats a string into a prompt:

In [7]:
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("What is a good name for a company that makes {product}?")
prompt.format(product="hair maker")

'Human: What is a good name for a company that makes hair maker?'

In [8]:
chain = prompt | chat_model

# PP
chain.invoke({"product": "hair maker"})

AIMessage(content='Locks & Tresses Creations', response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 20, 'total_tokens': 27}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-ce6aa386-9674-4e88-b490-b94b537b806a-0', usage_metadata={'input_tokens': 20, 'output_tokens': 7, 'total_tokens': 27})

In [9]:
# U1

chain.invoke({"product": "fresh packaged meal"})

AIMessage(content='Fresh Fare Delights', response_metadata={'token_usage': {'completion_tokens': 4, 'prompt_tokens': 21, 'total_tokens': 25}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-fc1757da-fdf2-4d66-9896-c9b1be921fc0-0', usage_metadata={'input_tokens': 21, 'output_tokens': 4, 'total_tokens': 25})

In [12]:
# MP
chain.invoke({"product": "Beddings"})

AIMessage(content='CozyDreams Bedding Co.')

In [13]:
# KP
product = "plats that are not easy to kill"

chain.invoke({"product": product})

AIMessage(content='Evergreen Creations')

In [14]:
#SZ:  Advance Night time Nutrients
product = "Advance Night time Nutrients"

chain.invoke({"product": product})

AIMessage(content='Nightly Nourish')

In [15]:
# RC
product = "drum set?"

chain.invoke({"product": product})

AIMessage(content='Beat Masters Drum Co.')

In [16]:
# JC
product = "Pancakes"
chain.invoke({"product": product})

AIMessage(content='Fluffy Stack Co.')

In [17]:
# MP
product = "pestisides"

chain.invoke({"product": product})

AIMessage(content='EcoGuard Pest Solutions')

In [18]:
# RM
product = "Feijoada"

chain.invoke({"product": product})

AIMessage(content='Feijoada Delights Co.')

In [19]:
# MP
product = "Dosa & Idly"

chain.invoke({"product": product})

AIMessage(content='Dosai Delights')

However, the advantages of using these over raw string formatting are several. You can "partial" out variables - e.g. you can format only some of the variables at a time. You can compose them together, easily combining different templates into a single prompt. For explanations of these functionalities, see the section on prompts for more detail.

PromptTemplates can also be used to produce a list of messages. In this case, the prompt not only contains information about the content, but also each message (its role, its position in the list, etc.). Here, what happens most often is a ChatPromptTemplate is a list of ChatMessageTemplates. Each ChatMessageTemplate contains instructions for how to format that ChatMessage - its role, and then also its content. Let's take a look at this below:

In [10]:
# source: https://python.langchain.com/docs/modules/model_io/quick_start
from langchain.prompts import ChatPromptTemplate

template = "You are a helpful assistant that translates {input_language} to {output_language}."
human_template = "{text}"

chat_prompt = ChatPromptTemplate.from_messages([
    ("system", template),
    ("human", human_template),
])

chat_prompt.format_messages(input_language="English", output_language="French", text="I love programming.")

[SystemMessage(content='You are a helpful assistant that translates English to French.'),
 HumanMessage(content='I love programming.')]

**Output Parsers**

OutputParsers convert the raw output from an LLM into a format that can be used downstream. Here is an example of an OutputParser that converts a comma-separated list into a list:

In [11]:
from langchain_core.output_parsers import JsonOutputParser


output_parser = JsonOutputParser()
output = output_parser.parse('{"name": "Lucas"}')
print(output)
type(output)

{'name': 'Lucas'}


dict

# Composing Chains with LCEL

source: https://python.langchain.com/docs/modules/model_io/quick_start#:~:text=We%20can%20now,green'%2C%20'yellow'%2C%20'orange'%5D
We can now combine all these into one chain. This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to a language model, and then pass the output through an (optional) output parser. 

The modern version with the LCEL interface:

In [12]:
template = "Generate a list of 5 {text}.\n\n{format_instructions}"

chat_prompt = ChatPromptTemplate.from_template(template)

chat_prompt = chat_prompt.partial(format_instructions=output_parser.get_format_instructions())

chain = chat_prompt | chat_model | output_parser
chain.invoke({"text": "AI topics"})
# >> ['red', 'blue', 'green', 'yellow', 'orange']

{'AI Topics': ['Natural Language Processing',
  'Computer Vision',
  'Reinforcement Learning',
  'Ethical AI',
  'AI in Healthcare']}

In [13]:
# KP: professions that are least threatened by AI
example = "professions that are least threatened by AI"

chain.invoke({"text": example})

{'professions': ['Psychologist', 'Social worker', 'Teacher', 'Nurse', 'Chef']}

In [14]:
# TB
example = "names for spaceships"
chain.invoke({"text": example})

{'spaceshipNames': ['Stellar Phoenix',
  'Galactic Voyager',
  'Cosmic Explorer',
  'Nebula Seeker',
  'Celestial Navigator']}

In [15]:
# AP
example = "things to do for productive day"

chain.invoke({"text": example})

{'things_to_do': ['Create a to-do list and prioritize tasks',
  'Set specific goals for the day',
  'Take breaks to avoid burnout',
  'Stay organized and declutter your workspace',
  'Reflect on achievements at the end of the day']}

In [29]:
# MP
example = "Starwars Movies"

chain.invoke({"text": example})

{'movies': ['Star Wars: A New Hope',
  'Star Wars: The Empire Strikes Back',
  'Star Wars: Return of the Jedi',
  'Star Wars: The Force Awakens',
  'Star Wars: The Last Jedi']}

In [30]:
# SZ: Favourite UK food

example = "Favourite UK food"

chain.invoke({"text": example})

{'favourite_food': ['Fish and chips',
  'Full English breakfast',
  "Shepherd's pie",
  'Bangers and mash',
  'Roast beef with Yorkshire pudding']}

In [24]:
output_parser.get_format_instructions()

'Return a JSON object.'

we are using the | syntax to join these components together. This | syntax is powered by the LangChain Expression Language (LCEL) and relies on the universal Runnable interface that all of these objects implement. To learn more about LCEL, read the documentation here.

<!-- For this part I just took some info from the langchain official docs: https://python.langchain.com/docs/modules/model_io/quick_start -->

The modern LCEL interface version:

In [26]:
from langchain_core.output_parsers import CommaSeparatedListOutputParser
template = """What would be 5 good names for the animal: {animal} that is {adjective}?
The output should be just one sentence separated by commas."""

chat_prompt = ChatPromptTemplate.from_template(template)

chain = chat_prompt | ChatOpenAI() | CommaSeparatedListOutputParser()

chain.invoke({"animal":"dogs", "adjective": "sleepy"})

['1. Snoozy', '2. Dozer', '3. Napper', '4. Snuggles', '5. Dreamer.']

This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to an LLM, and then pass the output through an output parser.

Ok, so these are the basics of langchain. But how can we leverage these abstraction capabilities inside our LLM app application?

One of the best applications of langchain is for the "chat with your data"-types of applications, where the user uploads a document like a pdf or a .txt file, and is able to query that document using langchain powered by an LLM like ChatGPT. 

# LangChain Lab Exercises

Let's take a look at a simple example of a simple chain using now only the modern interface.

In [29]:
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema.output_parser import StrOutputParser

In [30]:
llm = ChatOpenAI(temperature=.7)
template = """You are a learning assistant. Given a technical subject, write down 5 fundamental concepts to understand it.
Subject: {subject}
Learning assistant: The 5 fundamental concepts are:"""
subject_prompt = ChatPromptTemplate.from_template(template)

In [31]:
# This is an LLMChain to write a review of a play given a synopsis.
llm = ChatOpenAI(temperature=.7)
template = """You are an expert teacher in all technical and scientific fields. Given a list of 5 concepts, write down a simple intuitive explanation of each concept.
Concepts:
{concepts}
Intuitive explanations:"""
concepts_prompt = ChatPromptTemplate.from_template(template)

In [32]:
from IPython.display import Markdown
# This is the overall chain where we run these two chains in sequence.
learning_overall_chain = (
    {"concepts": subject_prompt | llm | StrOutputParser() }
    | concepts_prompt
    | llm
    | StrOutputParser()
    )

output = learning_overall_chain.invoke({"subject": "Quantum Mechanics"})
Markdown(output)

1. Superposition - Quantum particles can be in multiple states at the same time until we observe them and "collapse" their state into one definite outcome.

2. Wave-particle duality - Quantum particles can act as both waves and particles, showing different behaviors depending on how we measure them.

3. Uncertainty principle - We can never precisely know both the position and momentum of a particle at the same time, as the act of measuring one affects the other.

4. Quantum entanglement - When two particles become entangled, their properties are linked even if they are far apart, suggesting a mysterious connection that goes beyond conventional physics.

5. Quantum tunneling - Quantum particles can "tunnel" through obstacles that would be impossible to pass through according to classical physics, showing the strange and fascinating behavior of quantum mechanics.

Example from KP: Can you write a sample Langchain to do (2+3) * 6. (2+3) is one chain and + 6 is another. chain.

In [33]:
template = """
You are a mathematical engine. Given a math operation you should output only the result.
input: {math_input}
output:
"""

chat_model = ChatOpenAI(temperature=0)
prompt1 = ChatPromptTemplate.from_template(template)
output_parser = StrOutputParser()

chain_math1 = prompt1 | chat_model | output_parser

chain_math1.invoke({"math_input": "2+2"})

'4'

# Simple Q&A Example

In [31]:
# !pip install docarray
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import Chroma
from IPython.display import display, Markdown
from langchain.indexes import VectorstoreIndexCreator
import pandas as pd

In [32]:
df = pd.read_csv("./superheroes.csv")
df.head()

Unnamed: 0,Superhero Name,Superpower,Power Level,Catchphrase
0,Captain Thunder,Bolt Manipulation,90,Feel the power of the storm!
1,Silver Falcon,Flight and Agility,85,"Soar high, fearlessly!"
2,Mystic Shadow,Invisibility and Illusions,78,Disappear into the darkness!
3,Blaze Runner,Pyrokinesis,88,Burn bright and fierce!
4,Electra-Wave,Electric Manipulation,82,Unleash the electric waves!


In [33]:
file = 'superheroes.csv'
loader = CSVLoader(file_path=file)

In [34]:
loader

<langchain_community.document_loaders.csv_loader.CSVLoader at 0x126f114d0>

In [35]:
documents = loader.load()

documents

[Document(page_content='Superhero Name: Captain Thunder\nSuperpower: Bolt Manipulation\nPower Level: 90\nCatchphrase: Feel the power of the storm!', metadata={'source': 'superheroes.csv', 'row': 0}),
 Document(page_content='Superhero Name: Silver Falcon\nSuperpower: Flight and Agility\nPower Level: 85\nCatchphrase: Soar high, fearlessly!', metadata={'source': 'superheroes.csv', 'row': 1}),
 Document(page_content='Superhero Name: Mystic Shadow\nSuperpower: Invisibility and Illusions\nPower Level: 78\nCatchphrase: Disappear into the darkness!', metadata={'source': 'superheroes.csv', 'row': 2}),
 Document(page_content='Superhero Name: Blaze Runner\nSuperpower: Pyrokinesis\nPower Level: 88\nCatchphrase: Burn bright and fierce!', metadata={'source': 'superheroes.csv', 'row': 3}),
 Document(page_content='Superhero Name: Electra-Wave\nSuperpower: Electric Manipulation\nPower Level: 82\nCatchphrase: Unleash the electric waves!', metadata={'source': 'superheroes.csv', 'row': 4}),
 Document(page

Now, let's set up our Vector store (we'll talk about what that is in a second):

In [36]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

In [38]:
# !pip install faiss-cpu
from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(documents, embeddings)

Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (3.6 kB)
Downloading faiss_cpu-1.8.0-cp311-cp311-macosx_11_0_arm64.whl (3.1 MB)
[2K   [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m41.4 MB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.8.0


In [39]:
retriever = db.as_retriever()

In [40]:
from langchain_core.runnables import RunnableLambda, RunnablePassthrough


template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [42]:
from langchain_core.output_parsers import StrOutputParser
model = ChatOpenAI()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

In [43]:
query = "Tell me the catch phrase for Captain Thunder"
print(chain.invoke(query))

The catchphrase for Captain Thunder is "Feel the power of the storm!"


In [44]:
df

Unnamed: 0,Superhero Name,Superpower,Power Level,Catchphrase
0,Captain Thunder,Bolt Manipulation,90,Feel the power of the storm!
1,Silver Falcon,Flight and Agility,85,"Soar high, fearlessly!"
2,Mystic Shadow,Invisibility and Illusions,78,Disappear into the darkness!
3,Blaze Runner,Pyrokinesis,88,Burn bright and fierce!
4,Electra-Wave,Electric Manipulation,82,Unleash the electric waves!
5,Crimson Cyclone,Super Speed,91,Blazing fast and unstoppable!
6,Aqua Fury,Hydrokinesis,80,Ride the waves of power!
7,Lunar Guardian,Lunar Manipulation,77,Embrace the moon's might!
8,Steel Titan,Super Strength and Durability,95,Indestructible force of nature!
9,Nightblade,Night Vision and Stealth,84,Strike from the shadows!


In [45]:
query = "Tell me the catch phrase for the likely fastest superhero in the table"
print(chain.invoke(query))

The catchphrase for the likely fastest superhero in the table is "Blazing fast and unstoppable!"


# References
- https://python.langchain.com/docs/get_started/introduction.html
- https://medium.com/@remitoffoli/a-visual-guide-to-llm-powered-app-architecture-57e47426a92f
- [LangChain for LLM App Development short course by coursera](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)
- [LLM Evaluation](https://learn.deeplearning.ai/langchain/lesson/6/evaluation)
[Models, Prompts, parsers, memory and chains from this langchain for](https://learn.deeplearning.ai/langchain/lesson/7/agents)
- [Chat With Your Data - Retrieval](https://learn.deeplearning.ai/langchain-chat-with-your-data/lesson/5/retrieval)
- [Emebeddings simple definition](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)
- [Vector DBs - simple definition](https://learn.deeplearning.ai/langchain/lesson/5/question-and-answer)