# Langchain Demo

## What is LangChain?

LangChain is an open source framework that allows AI developers to combine Large Language Models (LLMs) with external data. 

**Resources**

> LangChain resources
> - Landpage: https://readthedocs.org/projects/langchain/db2d
> - Comonents: https://docs.langchain.com/docs/category/components
> - git: https://github.com/hwchase17/langchain.git
> - API Reference: https://api.python.langchain.com/en/latest/

> This notebook is largely based on Greg Kamradt's videos and cookbooks
> - [Langchain tuorial suite](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5)
> - [Cookbook Fundamentals](https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook%20Part%201%20-%20Fundamentals.ipynb)
> - [Cookbook Comprehensive Guide](https://nathankjer.com/introduction-to-langchain/)

> Additonal resources and tutorial
> - [A Gentle Intro to Chaining LLMs, Agents, and utils via LangChain](https://towardsdatascience.com/a-gentle-intro-to-chaining-llms-agents-and-utils-via-langchain-16cd385fca81)

## This notebook

This notebook collects Python examples. The chapters are based oo the LangChain compoents documented here https://docs.langchain.com/docs/category/components.

Some changes though:
- use Annoy instead of FAISS as a vector database
- use Google Search API instead of SerpAPI
- change in examples and additional examples 
- change in API keys setup



This notebook has been tested in June 2023 on AWS SageMaker using DataScience 3.0 image.

Test environment:
> - AWS SageMaker Studio's notebook 
>> - Kernel image Data Science 3.0
>> - t3.medium 2CPU - 4GB
>> - Python 3.9.15
>> - Linux default 4.14.304-226.531.amzn2.x86_64
> - installed packages:
>> - langchain 0.0.218
>> - openai 0.27.8
>> - google_api_python_client 2.90.0
>> - tikitoken 0.4.0



---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">NOTEBOOK SETUP</div>



**Instructions**

All setups are at the top of the notebook so that you can run all this section initialize the notebook.

Notebook chapters are not dependant on each other and may be run in isolation.

Before running the setup you may need to create the following resources
- request an OpenAI API keys. OpenAI APIs are not free.
- create a Custom Search Engine in Google Search. it is free.
- request an API key for the Google Search service. It is free.

Confer to the setup sections for instruction on how to create those resources.

---
## API keys and environment

Langchain will get the API keys from environment variables or function parameters.

**Instructions**

- Never show the keys in shared notebooks, whether it part of the code or a log. A simple way to avoid key leakage, is to use environement variables.  You set the environment variable in the terminal or some local configuration. If so you do not have to set the key here.

- If it is easier for you to set the key here by assigning the value, do not forget to empty the string right after you run this block. The environment will be kept in memory as long as the kernel runs.

- Be careful when printing the keys. Ensure that you remove the outputs. 

- Before sharing check that the keys are not printed out by some features of the libraries. Avoid to print libraries' objects. They often hold the API keys as a property and may disclose the key value.


I Store API keys and configuration information in AWS Secrets Manager. The code below retrieves the secret holding the keys. The secret is a JSON string consisting in key/value pairs. It will be used later to set various environnement variables.

When using Notebooks an SageMaker do not forget to give permissions to read this secret to SageMaker execution role.

In [41]:
%%bash --out secrets 
# using AWS's Secret Manager to store keys
# garb the keys and store it into a Pytthon variable
export RESPONSE=$(aws secretsmanager get-secret-value --secret-id 'salvia/labbench/tests' )
export SECRETS=$( echo $RESPONSE | jq '.SecretString | fromjson')

echo $SECRETS

---
## LangChain Setup

**Resources**
> - [LangChain GetStarted](https://python.langchain.com/docs/get_started/quickstart)

In [42]:
pip install langchain


[0mNote: you may need to restart the kernel to use updated packages.


---
## OpenAI Setup

**Resources**
> - [OpenAI tutorial on API keys](https://platform.openai.com/docs/quickstart)
> - [OpenAI package on Pypi](https://pypi.org/project/openai/)

In [43]:
import os

os.environ["OPENAI_API_KEY"] = eval(secrets)["OPENAI_API_KEY"]


In [44]:
pip install openai


[0mNote: you may need to restart the kernel to use updated packages.


---
## Google Search setup

**Resources**

> How to configure the Google search in LangChain 
> - https://python.langchain.com/docs/ecosystem/integrations/google_search

> Custom Search Engine configuration 
> - https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search

> CSE API 
> - repo: https://github.com/google/google-api-python-client
> - more info: https://developers.google.com/api-client-library/python/apis/customsearch/v1
> - complete docs: https://api-python-client-doc.appspot.com/

> Get an API key
> - https://developers.google.com/custom-search/v1/introduction

> Package information
> - [Google API client package on Pypi](https://pypi.org/project/google-api-python-client/)

In [45]:
# Unlock the API and get a key 
os.environ["GOOGLE_API_KEY"] = eval(secrets)["GOOGLE_API_KEY"]
# Create or use an existing Custom Search Engine
# on the CSE page under Searcg Engone ID
os.environ["GOOGLE_CSE_ID"] = eval(secrets)["GOOGLE_CSE_ID"]


In [46]:
pip install google-api-python-client

Collecting google-api-python-client
  Using cached google_api_python_client-2.90.0-py2.py3-none-any.whl (11.4 MB)
Collecting httplib2<1.dev0,>=0.15.0 (from google-api-python-client)
  Using cached httplib2-0.22.0-py3-none-any.whl (96 kB)
Collecting google-auth<3.0.0.dev0,>=1.19.0 (from google-api-python-client)
  Downloading google_auth-2.21.0-py2.py3-none-any.whl (182 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.1/182.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting google-auth-httplib2>=0.1.0 (from google-api-python-client)
  Using cached google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.0,<3.0.0.dev0,>=1.31.5 (from google-api-python-client)
  Using cached google_api_core-2.11.1-py3-none-any.whl (120 kB)
Collecting uritemplate<5,>=3.0.1 (from google-api-python-client)
  Using cached uritemplate-4.1.1-py2.py3-none-any.whl (10 kB)
Collecting googleapis-common-proto

## Setup Annoy as a vector database 

Some examples requires a Vector Database (document selector, document retrieval).

LangChain use ChromaDB by default. For whatever reason it failed to install. Used Annoy instead. An alterntive is FAIIS. You may also want to use online Vector database like Pinecone or Weaviate. 

Most of these packages include c++ code and requires GCC at the install time. It is not included in SageMaker DataScience 3 image. So the first step is installing GCC. 

**Resources**
> - [Annoy package on Pypi](https://pypi.org/project/annoy/)

In [47]:
!apt-get update && apt-get install -y build-essential

Get:1 http://deb.debian.org/debian bullseye InRelease [116 kB]
Get:2 http://security.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:3 http://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:4 http://security.debian.org/debian-security bullseye-security/main amd64 Packages [245 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 Packages [8183 kB]
Get:6 http://deb.debian.org/debian bullseye-updates/main amd64 Packages [14.8 kB]
Fetched 8651 kB in 2s (5020 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  dirmngr dpkg-dev fakeroot g++ g++-10 gnupg gnupg-l10n gnupg-utils gpg
  gpg-agent gpg-wks-client gpg-wks-server gpgconf gpgsm gpgv
  libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl
  libassuan0 libdpkg-perl libfakeroot libfile-fcntllock-perl libksba8
  liblocale-gettext-perl libnpth0 

In [48]:
pip install annoy

[0mNote: you may need to restart the kernel to use updated packages.


## Setup additional tools for embeddings

When working with embeddings additonal packages are required.

- tiktoken, as a encoder and tokenizer

**Resources**
> - [Tiktoken package on Pypi](https://pypi.org/project/tiktoken/)

 

In [49]:
pip install tiktoken

[0mNote: you may need to restart the kernel to use updated packages.


---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN OVERVIEW</div>


---
# 1. Basic features

---
## Get prediction from a langage model

In [50]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))




1. Switzerland
2. Germany
3. Norway
4. Austria
5. Iceland


---
## Manage prompts with templates

In [51]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked by {interest}"
)

In [52]:
text = prompt.format(interest="food")
print(f"{text=}")
print(llm(text))

text='what are the 5 best countries in Europe ranked by food'


1. Italy
2. France
3. Spain
4. Greece
5. Portugal


In [53]:
text = prompt.format(interest="siteseeing")
print(f"{text=}")
print(llm(text))

text='what are the 5 best countries in Europe ranked by siteseeing'


1. Italy
2. France
3. Spain
4. Greece
5. Germany


---
# 2. Chains

<div class="alert alert-block alert-warning"> TODO  what is a chain </div>


---
## Built-in chains

In [54]:
from langchain.chains import PALChain
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.7)

palchain = PALChain.from_math_prompt(llm=llm, verbose=True)


text = """If my age is half of my dad's age 
and he is going to be 60 next year, 
what is my current age?"""
#palchain.run("If my age is half of my dad's age and he is going to be 60 next year, what is my current age?")
palchain.run(text)




[1m> Entering new  chain...[0m
[32;1m[1;3mdef solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dads_age_next_year = 60
    my_age_relative = 0.5
    my_age_current = dads_age_next_year * my_age_relative
    result = my_age_current
    return result[0m

[1m> Finished chain.[0m


'30.0'

<div class="alert alert-block alert-warning"> 
    TODO <br>
    - different result each run <br>
    - and should be 29.5
</div>


> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_next_year = 60
    my_age_fraction = 0.5
    my_age_now = dad_age_next_year * my_age_fraction
    result = my_age_now
    return result

> Finished chain.
'30.0'

> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_current = 59
    my_age_current = dad_age_current / 2
    result = my_age_current
    return result

> Finished chain.
'29.5'

---
## Multi-step workflow to feed prompt into the model

In [55]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

# chain feeds the prompt into the langage mmodel.
chain = LLMChain(llm=llm, prompt=prompt)

In [56]:
chain.run("science")

'?\n\n1. Germany\n2. Switzerland\n3. United Kingdom\n4. Sweden\n5. Finland'

In [57]:
print(chain.run("tv shows"))



1. United Kingdom 
2. Germany 
3. France 
4. Ireland 
5. Norway


---
## Using OpenAI Chat API (less expensive)
requires a chain to feed the prompt into the chat 

<div class="alert alert-block alert-warning"> TODO  move to components + desribe resource </div>

**Resources**
> - Other Chat APIs: https://api.python.langchain.com/en/latest/modules/chat_models.html

In [58]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

chatopenai = ChatOpenAI(model_name="gpt-3.5-turbo")

prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

llmchain_chat = LLMChain(llm=chatopenai, prompt=prompt)
print(llmchain_chat.run("food"))


The ranking of the best countries in Europe for food can vary depending on personal preferences. However, considering the general consensus and culinary reputation, the following countries are often regarded as having exceptional cuisine:

1. Italy: Italy is renowned for its diverse and delicious cuisine. From pasta and pizza to gelato and espresso, Italian food is loved worldwide. Each region in Italy has its own specialties, making it a food lover's paradise.

2. France: French cuisine is known for its elegance and sophistication. With its rich sauces, cheeses, pastries, and wines, France offers a wide range of gastronomic delights. From escargots to croissants, French food is a true celebration of flavors.

3. Spain: Spanish cuisine is vibrant, flavorful, and diverse. From tapas and paella to jamón ibérico and gazpacho, Spain has a diverse range of dishes that showcase its culinary heritage. Each region has its own unique specialties, making it a dynamic food destination.

4. Greece

---
## Leverage LLM Math

Evaluating chains that know how to do math.

**Resources**
> - Langchain module LLM_Math: ttps://python.langchain.com/docs/guides/evaluation/llm_math

In [59]:
from langchain.prompts import load_prompt
from langchain.chains import LLMMathChain

# loads the model.
llm = OpenAI(temperature=0.9)

prompt = load_prompt('lc://prompts/llm_math/prompt.json')

# deprecated
##chain = LLMMathChain(llm=llm, prompt=prompt)

chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("what is the largest prime number lower than 20"))


No `_type` key found, defaulting to `prompt`.



Answer: 19


---
# 3. Tools

<div class="alert alert-block alert-warning"> TODO  what is a tool </div>


---
## Leverage Google Search

**Instructions**

Make sure:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

---
# 4. Agent

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.


---
## Test with LLM model only 


In [137]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
# low temperature to avoid randomness
llm = OpenAI(model_name="text-davinci-003", temperature=0)

text = "Who is the prime minister of France since may 2022"

# Actual API call - may tale a while.
print(llm(text))




The Prime Minister of France since May 2022 is Jean Castex.


**OUTPUT**

'The Prime Minister of France since May 2022 is Jean Castex.'

This answer is wrong. Since the model has been trained mid 2021, it is not up-to-date. Elisabeth Borne is Prime Minister since may 2022.

---
## Agent leveraging Google Search

**Instructions**

Make sure:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

In [138]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
llm = OpenAI(temperature=0)

# load some tools
tools = load_tools(["google-search"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [139]:
agent.run("Who is the prime minister of France since may 2022")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out who is the current prime minister of France
Action: google_search
Action Input: "prime minister of France"[0m
Observation: [36;1m[1;3mThe prime minister is the holder of the second-highest office in France, after the president of France. The president, who appoints but cannot dismiss the prime ... 4 days ago ... President Biden spoke today with President Emmanuel Macron of France, Chancellor Olaf Scholz of Germany, and Prime Minister Rishi Sunak of ... Usually, the Chief Ministers were members of the King's Council (the archaic form of cabinet) or high members of the French nobility or the Catholic clergy. May 16, 2022 ... French President Emmanuel Macron picked Labour Minister Elisabeth Borne as his new prime minister on Monday as he prepares for legislative ... The head of the government of France has been called the prime minister of France (French: Premier ministre) since 1959, when Michel Debré became the first

'Élisabeth Borne is the prime minister of France since May 16, 2022.'

**OUTPUT**

'Élisabeth Borne is the prime minister of France since May 16, 2022.'

This is true.

---
# 5. Memory - Conversation

<div class="alert alert-block alert-warning"> TODO  what is a conversation </div>


In [65]:
from langchain import OpenAI, ConversationChain

# create a model
llm = OpenAI(temperature=0)

conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi There")





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi There
AI:[0m

[1m> Finished chain.[0m


" Hi there! It's nice to meet you. How can I help you today?"

In [66]:
conversation.predict(input="What is the first thing that I said to you?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi There
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: What is the first thing that I said to you?
AI:[0m

[1m> Finished chain.[0m


' You said "Hi there!"'

In [67]:
conversation.predict(input="What is an alternative for the first thing that I said to you?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi There
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: What is the first thing that I said to you?
AI:  You said "Hi there!"
Human: What is an alternative for the first thing that I said to you?
AI:[0m

[1m> Finished chain.[0m


' An alternative for the first thing you said to me is "Hello!"'

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN COMPONENTS</div>


---
# 6. Schemas

Basic data types and schemas that are used throughout the codebase.

There are 3 types of schemas
- Text (see above)
- Prompts
- Messages 
- Document


<br/>

**Resources**
> - Schhemas component:  https://docs.langchain.com/docs/components/schema/


---
## Text

In [68]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))



1. Switzerland
2. Germany
3. Norway
4. Finland
5. Austria


---
## Chat messages
Chat messages are like text with a type

There are 3 types
- System: background context that tells the AI what to do
- Human: inputs sent by the user
- AI : response of the AI


In [69]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0.7)

In [70]:
messages = [ SystemMessage(content="You are a nice AI and help users to feature out what to eat.")]
     
messages.append( HumanMessage(content="I like tuna, list some recipes.") )

In [71]:
response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

Sure! Here are some delicious tuna recipes you might enjoy:

1. Tuna Salad Sandwich: Mix canned tuna with mayonnaise, chopped celery, diced red onion, and a squeeze of lemon juice. Spread the mixture on your choice of bread, add lettuce and tomato, and enjoy a classic tuna salad sandwich.

2. Tuna Poke Bowl: Combine diced fresh tuna with soy sauce, sesame oil, rice vinegar, and a touch of honey. Serve it over a bed of steamed rice, and add toppings such as avocado, cucumber, edamame, and sesame seeds.

3. Grilled Tuna Steaks: Season fresh tuna steaks with salt, pepper, and a bit of olive oil. Grill them for a few minutes on each side until cooked to your desired level of doneness. Serve with a squeeze of lemon juice and a side of mixed greens or roasted vegetables.

4. Tuna Pasta Bake: Cook pasta according to package instructions. In a separate pan, sauté diced onion, garlic, and bell peppers. Add canned tuna, marinara sauce, and cooked pasta to the pan. Mix it all together, transfer t

In [72]:
messages.append( HumanMessage(content="show the first one.") )

response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

Certainly! Here's the recipe for a classic Tuna Salad Sandwich:

Ingredients:
- 2 cans of tuna, drained
- 1/4 cup mayonnaise
- 1 celery stalk, finely chopped
- 1/4 cup red onion, diced
- 1 tablespoon lemon juice
- Salt and pepper, to taste
- Bread slices
- Lettuce leaves
- Tomato slices

Instructions:
1. In a bowl, combine the drained tuna, mayonnaise, celery, red onion, and lemon juice. Mix well until all the ingredients are evenly combined.
2. Taste and season with salt and pepper according to your preference.
3. Take a slice of bread and spread a generous amount of the tuna mixture on it.
4. Top with lettuce leaves and tomato slices.
5. Place another slice of bread on top to complete the sandwich.
6. Repeat the process to make more sandwiches, if desired.
7. Cut the sandwiches diagonally or into halves, and they are ready to serve.

You can also customize this recipe by adding other ingredients such as chopped pickles, diced bell peppers, or a pinch of dried herbs. Enjoy your delici

---
## Examples
An list of input output pairs thet represent the input and expected output.

Used to fine tune a model or do in-context learning.

**Resources**
> - Prompt Template:  https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples


In [73]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))


# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))



=== exemple prompt ===
question: red bold
color:red; font-style:bold;

=== prompt ===
question: red bold
color:red; font-style:bold;

question: green italic
color:green; font-style:italic;

question: blue bold
color:blue; font-style:bold;

question: pink
color:pink;

question: green
color:green;

question: pink italic
color:pink; font-style:italic;

question: pink bold

=== answer ===

color:pink; font-style:bold;


---
## Documents

An unstructured object that conaints a pieces of text and metadatas.

<div class="alert alert-block alert-warning"> TODO  resource </div>


<div class="alert alert-block alert-warning"> TODO how to use this concept? 
make some knowledge available?
how to use metadata?
</div>


In [74]:
from langchain.schema import Document

Document(
    page_content="This is my document. it contains useful information",
    metadata={
        'author':"Claude",
        'identifier':"1234"
    }
)

Document(page_content='This is my document. it contains useful information', metadata={'author': 'Claude', 'identifier': '1234'})

---
# 7. Models
LangChain provides interfaces and integrations for two types of models:
- LLMs: Models that take a text string as input and return a text string
- Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message

<br/>

**Resources**
> - Model Component: https://python.langchain.com/docs/modules/model_io/models/
> - List of models: https://platform.openai.com/docs/models


---
## Langage Model 
LLMs: Models that take a text string as input and return a text string

In [75]:
from langchain.llms import OpenAI

# additnal parameters to select a mode, pass the API key ...
llm = OpenAI(model_name="text-ada-001", temperature=0.7)

llm("What day comes after Friday?")

'\n\nMonday.'

---
## Chat Model 
Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat 

Also make sense for a unique interaction as Chat API is less expensive.


In [76]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=1)

In [77]:
messages = [ 
    SystemMessage(content="You are a nice AI and help users to feature out what to eat."),
    HumanMessage(content="I like tuna, list some recipes.")
]
     
chat(messages)

AIMessage(content="Sure! Here are a few recipes featuring tuna that you might enjoy:\n\n1. Tuna Salad: Mix canned tuna, mayonnaise, chopped celery, diced red onions, and a squeeze of lemon juice. Serve it on a bed of lettuce or between two slices of freshly baked bread.\n\n2. Tuna Pasta: Cook your favorite pasta and toss it with a sauce made from canned tuna, olive oil, minced garlic, cherry tomatoes, olives, and a sprinkle of crushed red pepper flakes.\n\n3. Tuna Poke Bowl: Marinate cubes of fresh tuna in a mixture of soy sauce, sesame oil, ginger, and garlic. Serve it over a bed of rice with toppings like avocado, cucumber, seaweed, and sesame seeds.\n\n4. Tuna Melt: Spread tuna salad (from the first recipe above) onto slices of bread, top it with sliced tomatoes and cheese, and toast it in the oven until the cheese is melted and bubbly.\n\n5. Tuna Steaks: Season fresh tuna steaks with salt, pepper, and a drizzle of olive oil. Sear them in a hot pan for a few minutes on each side unt

---
### Text Embedding Model

Convert text into a series of numbers (a vector) which holds the meaning of the text.

Mainly used for text comparison.

In [78]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

text="A leader should know all about truth and honesty, and when to see the difference. (Truck) - Bromeliad Trilogy"

text_embedding = embeddings.embed_query(text)

print(f"embedding length: {len(text_embedding)}")
print(f"5 first values of the vector: {text_embedding[:5]}")

embedding length: 1536
5 first values of the vector: [-0.0020272971596568823, -0.016961609944701195, 0.013975410722196102, -0.014824817888438702, 0.001639920868910849]


---
# 8. prompts
A "prompt" refers to the input to the model. This input is rarely hard coded, but rather is often constructed from multiple components. A PromptTemplate is responsible for the construction of this input. LangChain provides several classes and functions to make constructing and working with prompts easy.

LangChain documentation is split into four sections:
- PromptValue: The class representing an input to a model.
- Prompt Templates: The class in charge of constructing a PromptValue.
- Example Selectors: Often times it is useful to include examples in prompts. These examples can be hardcoded, but it is often more powerful if they are dynamically selected.
- Output Parsers: Language models (and Chat Models) output text. But many times you may want to get more structured information than just text back. This is where output parsers come in. Output Parsers are responsible for (1) instructing the model how output should be formatted, (2) parsing output into the desired formatting (including retrying if necessary).

<br/>

**Resources**
> - Prompts Component: https://docs.langchain.com/docs/components/prompts/

---
## Simple prompt

In [79]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# write a simple  prompt. use """ to allow multiline string.
prompt = """
Today is Monday. Tomorrow is Wednesday.

What is wrong with this statement?
"""

# query the model
print(llm(prompt))


This statement is incorrect because tomorrow is Tuesday, not Wednesday.


---
## Prompt with template and placeholder.

In [80]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# setup a prompt. use """ to allow multiline string.
template = PromptTemplate (
    input_variables=["today", "tomorrow"],
    template="""
    Today is {today}. Tomorrow is {tomorrow}.

    What is wrong with this statement?
    """
)

prompt = template.format(today="Monday", tomorrow="Wednesday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

prompt='\n    Today is Monday. Tomorrow is Wednesday.\n\n    What is wrong with this statement?\n    '

This statement is incorrect because tomorrow is Tuesday, not Wednesday.


In [81]:
prompt = template.format(today="Thursday", tomorrow="Friday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

prompt='\n    Today is Thursday. Tomorrow is Friday.\n\n    What is wrong with this statement?\n    '

This statement is factually correct, so there is nothing wrong with it.


---
## Example selectors and Few Shot Learning

A way to select from a series of examples in few shot learning 

**Resources**
> - Example Selector: https://api.python.langchain.com/en/latest/modules/example_selector.html
> - Few shot learning: https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples



### Example selectors and Few Shot Learning with NGram


<div class="alert alert-block alert-warning"> FIXME </div>

### Example selectors and Few Shot Learning with similarities

requires a vector database

In [82]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Annoy
#from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))

# Example selector that selects examples based on SemanticSimilarity.

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    #Chroma,
    Annoy,
    # This is the number of examples to produce.
    k=2
)

# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    example_selector=example_selector, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))



=== exemple prompt ===
question: red bold
color:red; font-style:bold;

=== prompt ===
question: red bold
color:red; font-style:bold;

question: pink italic
color:pink; font-style:italic;

question: pink bold

=== answer ===

color:pink; font-style:bold;


---
## Output Parser and response format

A way to format the outpu
- Format nstructions: An autogenerated prompt telling how the result should be formatted
- parser: a method which will extract the output int hte desired format. you may prvie a custom parser


**Resources**
> - OutputParser:https://docs.langchain.com/docs/components/prompts/output-parser

In [134]:
from langchain.llms import OpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.prompts.prompt import PromptTemplate


# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# how you would like the response to be structured
# periods at the send of sentence are required. 
# If not there description ends up in the json text and break the JSON format
response_schemas = [
    ResponseSchema(name="bad_string", description="This is a poorly formatted string."),
    ResponseSchema(name="good_string", description="This is a your string reformatted.")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# check instructions
format_instructions =output_parser.get_format_instructions()
print("\nformat_instructions")      
print(format_instructions)      

template = """
You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


{format_instructions}

% USER_INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt_template = PromptTemplate(
    input_variables=['user_input'],
    partial_variables={'format_instructions': format_instructions},
    template=template
)

# format the user input as a prompt
# for whateveer reason it does not work well with format.
# format_promt retruns an object, not a string and should be converted to a string 
prompt = prompt_template.format_prompt(user_input="Wellcom to Californya!").to_string()
print("\nprompt")
print(prompt)

# gets the response
response = llm(prompt)
print("\nresponse=")      
print(response)      

# gets the JSON document
print("\nparsed output=")     

# comma sometimes missing
#response.replace('"good_string"',',"good_string"')

output_parser.parse(response)                   



format_instructions
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

prompt

You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

% USER_INPUT:
Wellcom to Californya!

YOUR RESPONSE:


response=
```json
{
	"bad_string": "Wellcom to Californya!",
	"good_string": "Welcome to California!"
}
```

parsed output=


{'bad_string': 'Wellcom to Californya!',
 'good_string': 'Welcome to California!'}

---
# 9. Indexes

Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.

LangChain documentation is split into four sections:

- Document Loaders: Classes responsible for loading documents from various sources.
- Text Splitters: Classes responsible for splitting text into smaller chunks.
- VectorStores: The most common type of index. One that relies on embeddings.
- Retrievers: Interface for fetching relevant documents to combine with language models.

<br/>

**Resource**
> - Indexes Component: https://docs.langchain.com/docs/components/indexing/


**Instructions**

For the example below, make sure that:
- a vector database client is installed

---
## Document Loaders

Easy ways to import documents from other sources 
and make it available for use in your language models.

**Resources**
> -  Document Loaders: https://python.langchain.com/docs/modules/data_connection/document_loaders
> - List of loaders: https://github.com/hwchase17/langchain/tree/master/langchain/document_loaders

In [141]:
from langchain.document_loaders import HNLoader
 
# Setup a Hacker News loader
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")
 
data = loader.load()
 
print(f"Found {len(data)} comments")


sample = '\n'.join([x.page_content[:100] for x in data[:2]])
print("\nHere's a sample (first 100 chars of the 3 first items)")
print(sample)
                 

Found 76 comments

Here's a sample (first 100 chars of the 3 first items)
Ozzie_osman 5 months ago  
             | next [–] 

LangChain is awesome. For people not sure what 
Ozzie_osman 5 months ago  
             | parent | next [–] 

Also, another library to check out is 


---
## Text Splitters

allow you to split a document into smaller chunk

<div class="alert alert-block alert-warning"> TODO  resource </div>


In [88]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# This is a long document we can split up.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


print("docuument content")
start = 2200
print(documents[0].page_content[start-200:start+300])

 
# The recommended TextSplitter is the RecursiveCharacterTextSplitter. 
# This will split documents recursively by different characters - starting with "\n\n", then "\n", then " ".
# This is nice because it will try to keep all the semantically relevant content in the same place 
# for as long as possible.
# Important parameters to know here are chunkSize and chunkOverlap. 
# chunkSize controls the max size (in terms of number of characters) of the final documents. 
# chunkOverlap specifies how much overlap there should be between chunks. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=200,
    chunk_overlap=20,
)
 
texts = text_splitter.create_documents([document[0].page_content])
 
print(f"\nSplitted into {len(texts)} parts")
 
print("Preview:")
i = int(start/150)
print(texts[i+1].page_content, "\n-")
print(texts[i+2].page_content, "\n-")
print(texts[i+3].page_content)


Found 1 document(s)
docuument content
ght Alice, "without pictures or
conversations?"
So she was considering in her own mind (as well as she could, for the
day made her feel very sleepy and stupid), whether the pleasure of
making a daisy-chain would be worth the trouble of getting up and
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.
There was nothing so very remarkable in that, nor did Alice think it so
[Pg 4]very much out of the way to hear the Rabbit say to itself, "Oh dear! Oh
dear! I shall be

Splitted into 478 parts
Preview:
it, "and what is the use of a book," thought Alice, "without pictures or
conversations?"
So she was considering in her own mind (as well as she could, for the 
-
day made her feel very sleepy and stupid), whether the pleasure of
making a daisy-chain would be worth the trouble of getting up and 
-
picking the daisies, when suddenly a White Rabbit with pink eyes ran
close by her.
There was nothing so very remarkable in that, 

---
## Vextor Store and Retrievers 
A retriever is an interface that returns documents given an unstructured query. 

A retriever does not need to be able to store documents, only to return (or retrieve) it. 

It usually relies to a vector store as a document management backbone.

A vector store is a particular type of database optimized for storing documents and their embeddings, and then fetching of the most relevant documents for a particular query, ie. those whose embeddings are most similar to the embedding of the query.

- local : ChromaDB, FAISS, Annoy
- Online: Pinecone, Weaviate

However a retriever is more general than a vector store and there are other types of retrievers as well, e.g. Wikipedia or search engines like Elestic Search or Kendra.


Question answering over documents consists of four steps:
1. Create an index
2. Create a Retriever from that index
3. Create a question answering chain
4. Ask questions

<br/>

**Resources**
> - Lit of retrievers: https://python.langchain.com/docs/modules/data_connection/retrievers/
> - LangChain Supported VectorStores: https://api.python.langchain.com/en/latest/modules/vectorstores.html
> - Retrievers: https://github.com/hwchase17/langchain/tree/master/langchain/retrievers

### Store document in a Vector Store and retrieve information

In [89]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


# Get your splitter ready
# Using small chunk for the sake of example. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=25)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)

print(f"\nSplitted into {len(texts)} parts")

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(texts, embeddings)

Found 1 document(s)

Splitted into 182 parts


In [90]:
# Init a retriever for this db
retriever = db.as_retriever()

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
docs = retriever.get_relevant_documents(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)


Found 4
the White Rabbit was still in sight, hurrying down it. There was not a
moment to be lost. Away went Alice like the wind and was just in time to
hear it say, as it turned a corner, "Oh, my ears and whi

IV—THE RABBIT SENDS IN A LITTLE BILL
It was the White Rabbit, trotting slowly back again and looking
anxiously about as it went, as if it had lost something; Alice heard it
muttering to itself, "The D

"Call the first witness," said the King; and the White Rabbit blew three
blasts on the trumpet and called out, "First witness!"
The first witness was the Hatter. He came in with[Pg 44] a teacup in one

you please, sir—" The Rabbit started violently, dropped the white
kid-gloves and the fan and skurried away into the darkness as hard as he
could go.
Alice took up the fan and gloves and she kept fanni


In [91]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Asking theLLM
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

qa.run(query)

" The White Rabbit is a character in Alice's Adventures in Wonderland by Lewis Carroll."

In [92]:
qa.run(query)

" The White Rabbit is a character in Alice's Adventures in Wonderland by Lewis Carroll."

### One line index creation and information retrieval

In [93]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)

# creating an indexer
# default to Chroma as a vector database
# Use CharacterTextSplitter. May also be RecursiveCharacterTextSplitter.
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Annoy,
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
)

index = index_creator.from_loaders([loader])

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
index.query(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)

Created a chunk of size 5268, which is longer than the specified 500
Created a chunk of size 1756, which is longer than the specified 500
Created a chunk of size 5973, which is longer than the specified 500
Created a chunk of size 2017, which is longer than the specified 500
Created a chunk of size 1972, which is longer than the specified 500
Created a chunk of size 1105, which is longer than the specified 500
Created a chunk of size 2369, which is longer than the specified 500
Created a chunk of size 4726, which is longer than the specified 500
Created a chunk of size 3384, which is longer than the specified 500
Created a chunk of size 2193, which is longer than the specified 500
Created a chunk of size 4590, which is longer than the specified 500
Created a chunk of size 1344, which is longer than the specified 500
Created a chunk of size 1672, which is longer than the specified 500
Created a chunk of size 3707, which is longer than the specified 500
Created a chunk of size 2612, whic


Found 4
the White Rabbit was still in sight, hurrying down it. There was not a
moment to be lost. Away went Alice like the wind and was just in time to
hear it say, as it turned a corner, "Oh, my ears and whi

IV—THE RABBIT SENDS IN A LITTLE BILL
It was the White Rabbit, trotting slowly back again and looking
anxiously about as it went, as if it had lost something; Alice heard it
muttering to itself, "The D

"Call the first witness," said the King; and the White Rabbit blew three
blasts on the trumpet and called out, "First witness!"
The first witness was the Hatter. He came in with[Pg 44] a teacup in one

you please, sir—" The Rabbit started violently, dropped the white
kid-gloves and the fan and skurried away into the darkness as hard as he
could go.
Alice took up the fan and gloves and she kept fanni


In [94]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Ask the question to the model 
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), 
                                 chain_type="stuff", 
                                 retriever=index.vectorstore.as_retriever())

qa.run(query)

" The White Rabbit is a character in Alice's Adventures in Wonderland. He is a white rabbit wearing a waistcoat and carrying a watch. He is in a hurry and is looking for the Duchess' fan and gloves."

In [95]:
qa.run(query)

" The White Rabbit is a character in Alice's Adventures in Wonderland. He is a white-furred rabbit with pink eyes who wears a waistcoat and carries a pocket watch. He is known for being late and speaks in a hurried and anxious manner."

---
## Wikipedia retriever


<div class="alert alert-block alert-warning"> TODO </div>


---
# 9. Memory


Memory is the concept of storing and retrieving data in the process of a conversation. 

There are two main methods:
- Based on input, fetch any relevant pieces of data
- Based on the input and output, update state accordingly

There are two main types of memory: short term and long term.
- Short term memory generally refers to how to pass data in the context of a singular conversation (generally is previous ChatMessages or summaries of them).
- Long term memory deals with how to fetch and update information between conversations.

<br/>

**Resource**
> - Memory Component: https://docs.langchain.com/docs/components/memory/
> - Chat Message History: https://docs.langchain.com/docs/components/memory/chat_message_history
> - [LangChain: Enhancing Performance with Memory Capacity](https://towardsdatascience.com/langchain-enhancing-performance-with-memory-capacity-c7168e097f81)


<div class="alert alert-block alert-warning"> TODO vs Conversation and buffer memory (check blog)?</div>


<div class="alert alert-block alert-warning"> TODO Long term memory</div>


In [116]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI
from pprint import pprint
 
chat = ChatOpenAI(temperature=0)
 
history = ChatMessageHistory()
 
history.add_ai_message("hi!")
 
history.add_user_message("what is the capital of france?")

#After adding messages to the history, you can pass this history to the language model 
#to generate context-aware responses:

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

ai_response=AIMessage(content='The capital of France is Paris.', additional_kwargs={}, example=False)

history.messages:
[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of france?', additional_kwargs={}, example=False),
 AIMessage(content='The capital of France is Paris.', additional_kwargs={}, example=False)]


In [117]:
history.add_user_message("what is the population os this city?")

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response.content=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

ai_response.content='As of 2021, the estimated population of Paris is around 2.2 million people. However, if you include the metropolitan area, the population is over 10 million, making it one of the most populous cities in Europe.'

history.messages:
[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of france?', additional_kwargs={}, example=False),
 AIMessage(content='The capital of France is Paris.', additional_kwargs={}, example=False),
 HumanMessage(content='what is the population os this city?', additional_kwargs={}, example=False),
 AIMessage(content='As of 2021, the estimated population of Paris is around 2.2 million people. However, if you include the metropolitan area, the population is over 10 million, making it one of the most populous cities in Europe.', additional_kwargs={}, example=False)]


---
# 10. Chains
Chains is a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.


Example:
- chaining LLM and tool
- summariztion chain

<br/>

**Resources**
> - Chain Component: https://docs.langchain.com/docs/components/chains/


<div class="alert alert-block alert-warning"> TODO index related chain https://docs.langchain.com/docs/components/chains/index_related_chains  </div>




## Simple sequential model

A Simple Sequential Chain helps break up tasks to avoid language models getting distracted, confused, or hallucinating when asked to perform too many tasks in a row.

In this example, the chain first receives the user location (Rome) and outputs a classic dish from Rome. Then, it provides a simple recipe for that classic dish. The verbose=True parameter ensures that the chain prints statements during its execution, making it easier to debug and understand the chain’s progress.

In [118]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain
 
llm = OpenAI(temperature=1)
 
# Step 1 - dish for location

template = """
Your job is to come up with a classic dish from the area that the users suggests. 

% USER LOCATION {user_location} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

location_chain = LLMChain(llm=llm, prompt=prompt_template)
 
    # Step 2 - Recipe
template = """
Given a meal, give a short and simple recipe on how to make that dish at home. 

% MEAL {user_meal} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)
 
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

# chain the steps
# set verbose to True to check what happes
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=False)
 
review = overall_chain.run("Rome")



[1m> Entering new  chain...[0m
[36;1m[1;3mA classic dish from Rome is Spaghetti Carbonara. This dish consists of spaghetti pasta tossed with a mixture of pancetta, eggs, cheese, pepper, and parsley.[0m
[33;1m[1;3m
Spaghetti Carbonara
Ingredients: 
-4 ounces of diced pancetta 
-1/3 cup of Parmesan cheese
-2 eggs
-ground black pepper
-2 tablespoons of chopped parsley

Instructions: 
1. Cook the pancetta in a large skillet over a medium heat until it is slightly browned. 
2. Meanwhile, cook the spaghetti according to package instructions. 
3. In a medium bowl, whisk together the eggs and Parmesan cheese until the mixture is creamy. 
4. Add the cooked pancetta to the egg mixture and season with pepper. 
5. Drain the cooked spaghetti and put it back into the pot. 
6. Add the egg and pancetta mixture to the pot and toss everything together until all of the noodles are coated with the egg mixture. 
7. Garnish with chopped parsley and additional Parmesan cheese, if desired. 
8. Enjoy!

## Summarization Chain

The Summarization Chain breaks the text into smaller chunks and summarizing each chunk, creating a final summary based on the individual summaries.

In this example, the chain first splits the essay into chunks of 700 characters. It then generates summaries for each chunk and creates a final concise summary based on these individual summaries.

In [120]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()
 
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)
 
# Split your docs into texts
# only kept first 10 000 characters of the document to save computing
texts = text_splitter.split_documents(documents[:1000])
 
# There is a lot of complexity hidden in this one line. 
# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
summary = chain.run(texts)

# save the final summary
with open('alice_summary_10000.py', 'w') as file:
    file.write(summary)
    
print(summary)



[1m> Entering new  chain...[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"The Project Gutenberg eBook of Alice's Adventures in Wonderland, by Lewis Carroll
    























The Project Gutenberg eBook of Alice's Adventures in Wonderland

This ebook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this ebook or online at www.gutenberg.org. If you are not located in the United States, you’ll have to check the laws of the country where you are located before using this eBook.

Title: Alice's Adventures in Wonderland"


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Title: Alice's Adventures in Wonderland

Author: Lewis Carroll
Illustrator: Gordon 

**OUTPUT**
 
Some map summaries

> Alice hears the White Rabbit muttering to itself, concerned that the Duchess will execute it for losing the fan and pair of white kid-gloves. Alice offers to help the Rabbit search for them, but they are nowhere to be found because everything has changed since Alice's dip in the pool.

> Alice meets a Rabbit who accuses her of being his housemaid Mary Ann and orders her to fetch his gloves and fan. She finds a neat little house with the Rabbit's name on a brass plate and goes in without knocking. She is afraid of meeting the real Mary Ann before she can find the fan and gloves.

> Alice finds her way into a room with a table in the window, containing a fan and some gloves. She notices a bottle and drinks from it, hoping it will make her grow large again. When she drinks half of the bottle she finds her head pressing against the ceiling, so she hastily puts it down.

> A character wishes she wouldn't grow anymore, but sadly she continues to grow rapidly. As a result, she kneels on the floor, puts her arm out the window and her foot up the chimney, and is uncertain of her fate.

 
Final summary

> In Lewis Carroll's Alice's Adventures in Wonderland, Alice follows a White Rabbit into a strange world and has to navigate unexpected events and peculiar characters. She eventually meets a Caterpillar who helps her regain control of her changing size. Project Gutenberg is a non-profit organization committed to making electronic books free to the public. Donations up to $5,000 are available, and the full license stipulates amounts and terms of use.

## Summarize stored documents

<div class="alert alert-block alert-warning"> TODO  make use of the vector db</div>

# 11. Agents

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

It splits the documentation into the following sections:
> - Tools: How language models interact with other resources.
> - Agents: The language model that drives decision making.
> - Toolkits: Sets of tools that when used together can accomplish a specific task.
> - Agent Executor: The logic for running agents with tools.


**Resources**
> - Agents: https://docs.langchain.com/docs/components/agents/

<div class="alert alert-block alert-warning"> TODO </div>

## Tool
Tools are interfaces an agent can call to interact with other services

**Resources**
> - Tools: https://python.langchain.com/docs/modules/agents/tools/

**Instructions**

For the example below, make sure that:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

In [126]:
from langchain.tools import Tool
from langchain.utilities import GoogleSearchAPIWrapper

search = GoogleSearchAPIWrapper()

tool = Tool(
    name="Google Search",
    description="Search Google for recent results.",
    func=search.run,
)

tool.run("Who is the French Prime Minister name since May 2022?")

"Élisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic recordsEdit. Length of the successive governments\xa0... May 16, 2022 ... May 16, 2022. PARIS — President Emmanuel Macron appointed Élisabeth Borne, the low-key minister of labor and a former minister of the\xa0... May 16, 2022 ... President Emmanuel Macron has named Labour Minister Elisabeth Borne as prime minister to lead his ambitious reform plans, the first woman to\xa0... May 16, 2022 ... Outgoing French Prime Minister Jean Castex welcomes newly named Prime Minister Elisabeth Borne Monday, May 16, 2022, at the Prime Minister\xa0... Elisabeth Borne has been named the new Prime Minister of France, the first time in 30 ... From Simon Bouvier, CNN. Updated 8:15 PM EDT, Mon May 16, 2022. Apr 25, 2022 ... Analysts suggest Macron may name Élisabeth Borne, the minister for work, ... Borne would be the second female French PM after Édith Cresson,\xa0... Borne, 61, the labor minister in French President Emmanuel

## Agent leveraging tools

Google Search and LLM-math are predefined tools:
- LLM-Math is a langage model trained to do math logic.
- Google)search tool allow to place queries on Google Search

In [129]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
llm = OpenAI(temperature=0)

# load some tools
tools = load_tools(["google-search", "llm-math"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [130]:
agent.run("How many Teslas have been sold in 2022. Multiple by 2")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out how many Teslas have been sold in 2022
Action: google_search
Action Input: "how many Teslas have been sold in 2022"[0m
Observation: [36;1m[1;3mApr 15, 2023 ... Tesla total revenue for 2022 was 81,462 billion USD. We show it from 2018 – 2022. Tesla annual revenue 2018 - 2022. Year, Annual ... Jun 7, 2023 ... How many Tesla vehicles were delivered in 2023? ... As of June 2022, Tesla was the most valuable brand within the global automotive sector. Jan 25, 2023 ... The Model 3 and Model Y make up around 95% of the 1.31 million Teslas sold in 2022. Tesla. Tesla's finished 2022 on a tear, bolstered by recent ... Jan 7, 2023 ... Overall, Tesla reported delivering about 1.25 million Model Y and Model 3 vehicles globally in 2022. The Model 3 ranked 13th in sales at 211,641 ... Jan 3, 2023 ... The electric automaker delivered 1.3 million vehicles in 2022, up 40% from 2021. It produced nearly 1.4 million vehicles, up 47% from 

'2,620,000 Teslas were sold in 2022.'

In [131]:
agent.run("Multiply by 2 the population of the capital of Frannce")



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find the population of the capital of France first
Action: google_search
Action Input: population of the capital of France[0m
Observation: [36;1m[1;3mParis is the capital and most populous city of France, with an official estimated population of 2,102,650 residents as of 1 January 2023 in an area of more ... France, officially French Republic, French France or République Française, country of northwestern Europe. Historically and culturally among the most ... France officially the French Republic is a country located primarily in Western Europe. ... France is a unitary semi-presidential republic with its capital in Paris, ... 4 days ago ... Paris, city and capital of France, situated in the north-central part of the country. People were living on the site of the present-day city ... Including population figures, maps and links to official or near official ... Buildings of European Capitals: Athens, Greece; London, UK; Paris,

'The population of the capital of France multiplied by 2 is 4205300.'

In [132]:
agent.run("""Who is the current prime minister of France. 
Is he or she younger than the President?""") 



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out who the current prime minister is and then compare their age to the President.
Action: google_search
Action Input: "current prime minister of France"[0m
Observation: [36;1m[1;3mPresentEdit. Élisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic records ... May 16, 2022 ... Who is France's new Prime Minister Elisabeth Borne? French President Emmanuel Macron picked Labour Minister Elisabeth Borne as his new prime ... The current Prime Minister of France is Élisabeth Borne. She was given the job by President Emmanuel Macron on 16 May 2022. May 16, 2022 ... President Emmanuel Macron has named Labour Minister Elisabeth Borne as prime minister to lead his ambitious reform plans, the first woman to ... Feb 8, 2023 ... France's prime minister, Élisabeth Borne, sat on a recent, rainy evening in a dim room at a Red Cross shelter, listening to young women ... May 2, 2014 ... On the recommendation of the

'The current Prime Minister of France is Élisabeth Borne, who is 61 years old. The President of France is Emmanuel Macron, who is 44 years old. Therefore, the Prime Minister is older than the President.'

In [135]:
if False:
    # too complex
    # either fails because it tries to add dates and nulber
    # or give weird results like
    # 'Élisabeth Borne will be 70 in the year 2215.'
    agent.run("""Who is the current prime minister of France. 
    When will he or she be 70?""") 



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out who the current prime minister is and when they will be 70.
Action: google_search
Action Input: "current prime minister of France"[0m
Observation: [36;1m[1;3mPresentEdit. Élisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic records ... May 16, 2022 ... Who is France's new Prime Minister Elisabeth Borne? French President Emmanuel Macron picked Labour Minister Elisabeth Borne as his new prime ... The current Prime Minister of France is Élisabeth Borne. She was given the job by President Emmanuel Macron on 16 May 2022. May 16, 2022 ... President Emmanuel Macron has named Labour Minister Elisabeth Borne as prime minister to lead his ambitious reform plans, the first woman to ... Feb 8, 2023 ... France's prime minister, Élisabeth Borne, sat on a recent, rainy evening in a dim room at a Red Cross shelter, listening to young women ... May 2, 2014 ... On the recommendation of the Prime Minister, Pr

'Élisabeth Borne will be 70 in the year 2215.'