# LangChain Demo

## What is LangChain?

LangChain is an open source framework that allows AI developers to combine Large Language Models (LLMs) with external data. 

**Resources**

> LangChain resources
> - Landpage: https://readthedocs.org/projects/langchain/db2d
> - Comonents: https://docs.langchain.com/docs/category/components
> - git: https://github.com/hwchase17/langchain.git
> - API Reference: https://api.python.langchain.com/en/latest/

> LangChain applications
> - [LangChain Awesome](https://github.com/kyrolabs/awesome-langchain)

> This notebook is largely based on Greg Kamradt's videos and cookbooks
> - [Langchain tuorial suite](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5)
> - [LangChain cookbooks](https://github.com/gkamradt/langchain-tutorials)

> Additonal resources and tutorial
> - [Cookbook Comprehensive Guide](https://nathankjer.com/introduction-to-langchain/)
> - [A Gentle Intro to Chaining LLMs, Agents, and utils via LangChain](https://towardsdatascience.com/a-gentle-intro-to-chaining-llms-agents-and-utils-via-langchain-16cd385fca81)

## This notebook

This notebook collects Python examples. The chapters are based oo the LangChain compoents documented here https://docs.langchain.com/docs/category/components.

Some changes though:
- use Annoy instead of FAISS as a vector database
- use Google Search API instead of SerpAPI
- change in examples and additional examples 
- change in API keys setup



This notebook has been tested in June 2023 on AWS SageMaker using DataScience 3.0 image.

Test environment:
> - AWS SageMaker Studio's notebook 
>> - Kernel image Data Science 3.0
>> - t3.medium 2CPU - 4GB
>> - Python 3.9.15
>> - Linux default 4.14.304-226.531.amzn2.x86_64
> - installed packages:
>> - langchain 0.0.218
>> - openai 0.27.8
>> - google_api_python_client 2.90.0
>> - tikitoken 0.4.0



More examples in dedicated notebooks in the same folder
- tests-large-documents
- tests-vdb-chroma
- tests-sql

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">NOTEBOOK SETUP</div>



**Instructions**

All setups are at the top of the notebook so that you can run all this section initialize the notebook.

Notebook chapters are not dependant on each other and may be run in isolation.

Before running the setup you may need to create the following resources
- request an OpenAI API keys. OpenAI APIs are not free.

Additonal requirements for some examples
- create a Custom Search Engine in Google Search. it is free.
- request an API key for the Google Search service. It is free.
- request a Kaggle API

Confer to the setup sections for instruction on how to create those resources.

---
## API keys and environment

Langchain will get the API keys from environment variables or function parameters.

**Instructions**

- Never show the keys in shared notebooks, whether it part of the code or a log. A simple way to avoid key leakage, is to use environement variables.  You set the environment variable in the terminal or some local configuration. If so you do not have to set the key here.

- If it is easier for you to set the key here by assigning the value, do not forget to empty the string right after you run this block. The environment will be kept in memory as long as the kernel runs.

- Be careful when printing the keys. Ensure that you remove the outputs. 

- Before sharing check that the keys are not printed out by some features of the libraries. Avoid to print libraries' objects. They often hold the API keys as a property and may disclose the key value.


I Store API keys and configuration information in AWS Secrets Manager. The code below retrieves the secret holding the keys. The secret is a JSON string consisting in key/value pairs. It will be used later to set various environnement variables.

When using Notebooks and SageMaker do not forget to give permissions to read this secret to SageMaker execution role.

In [48]:
!apt-get update && apt-get install -y jq 1>/dev/null

Hit:1 http://deb.debian.org/debian bullseye InRelease
Hit:2 http://deb.debian.org/debian-security bullseye-security InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Reading package lists... Done


In [49]:
%%bash --out secrets 
# using AWS's Secret Manager to store keys
# garb the keys and store it into a Pytthon variable
export RESPONSE=$(aws secretsmanager get-secret-value --secret-id 'salvia/labbench/tests' )
export SECRETS=$( echo $RESPONSE | jq '.SecretString | fromjson')

echo $SECRETS

---
## pip upgrade

In [50]:
!pip install --upgrade pip  1>/dev/null

[0m

---
## LangChain Setup

**Resources**
> - [LangChain GetStarted](https://python.langchain.com/docs/get_started/quickstart)

In [51]:
!pip install langchain==0.0.230 1>/dev/null

[0m

---
## OpenAI Setup

**Resources**
> - [OpenAI tutorial on API keys](https://platform.openai.com/docs/quickstart)
> - [OpenAI package on Pypi](https://pypi.org/project/openai/)

In [52]:
import os

os.environ["OPENAI_API_KEY"] = eval(secrets)["OPENAI_API_KEY"]


In [53]:
!pip install openai==0.27.8 1>/dev/null

[0m

---
## Google Search setup

**Resources**

> How to configure the Google search in LangChain 
> - https://python.langchain.com/docs/ecosystem/integrations/google_search

> Custom Search Engine configuration 
> - https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search

> CSE API 
> - repo: https://github.com/google/google-api-python-client
> - more info: https://developers.google.com/api-client-library/python/apis/customsearch/v1
> - complete docs: https://api-python-client-doc.appspot.com/

> Get an API key
> - https://developers.google.com/custom-search/v1/introduction

> Package information
> - [Google API client package on Pypi](https://pypi.org/project/google-api-python-client/)

In [54]:
# Unlock the API and get a key 
os.environ["GOOGLE_API_KEY"] = eval(secrets)["GOOGLE_API_KEY"]
# Create or use an existing Custom Search Engine
# on the CSE page under Searcg Engone ID
os.environ["GOOGLE_CSE_ID"] = eval(secrets)["GOOGLE_CSE_ID"]


In [55]:
!pip install google-api-python-client==2.90.0 1>/dev/null

[0m

---
## Setup Annoy as a vector database 

Some examples requires a Vector Database (document selector, document retrieval).

LangChain use ChromaDB by default. For whatever reason it failed to install. Used Annoy instead. An alterntive is FAIIS. You may also want to use online Vector database like Pinecone or Weaviate. 

Most of these packages include c++ code and requires GCC at the install time. It is not included in SageMaker DataScience 3 image. So the first step is installing GCC. 

NOTE: Annoy is read-only - once the index is built you cannot add any more emebddings.

<br/>

**Resources**
> - [Annoy package on Pypi](https://pypi.org/project/annoy/)

Install GCC C++ compiler as a prerequiite

In [56]:
!apt-get update && apt-get install -y build-essential 1>/dev/null

Hit:1 http://deb.debian.org/debian bullseye InRelease
Hit:2 http://deb.debian.org/debian-security bullseye-security InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Reading package lists... Done


In [57]:
pip install annoy==1.17.3 1>/dev/null

[0mNote: you may need to restart the kernel to use updated packages.


In [58]:
#!pip install chromadb==0.3.27 1>/dev/null

---
## SQL database setup

- sqlite3: db engine
- sqlalchemy: ORM for databases
- ipython-sql: SQL magic function
- pandas:  data science/data analysis

In [59]:
!pip install pysqlite3==0.5.1 1>/dev/null

[0m

In [60]:
!pip install pandas==1.4.4 1>/dev/null

[0m

In [61]:
!pip install sqlalchemy==2.0.18 1>/dev/null

[0m

In [62]:
!pip install ipython-sql==0.5.0 1>/dev/null

[0m

---
## Setup additional datasets tools
<div class="alert alert-block alert-warning"> 
    TODO <br>
</div>

Kaggle is used to get some datasets

Setup the folowing API Keys
- os.environ['KAGGLE_USERNAME'] = 'YOUR_USERNAME'
- os.environ['KAGGLE_KEY'] = 'YOUR_KEY'

<br/>

**Resources**

> - https://lindevs.com/set-up-kaggle-api



In [63]:
# Get An API Token
os.environ["KAGGLE_USERNAME"] = eval(secrets)["KAGGLE_USERNAME"]
os.environ["KAGGLE_KEY"] = eval(secrets)["KAGGLE_KEY"]

In [64]:
!pip install kaggle==1.5.15 1>/dev/null

[0m

In [65]:
!pip install wikipedia==1.4.0 1>/dev/null

[0m

## Setup additional text managelment tools

When working with embeddings additonal packages are required.

- tiktoken, as a encoder and tokenizer

**Resources**
> - [Tiktoken package on Pypi](https://pypi.org/project/tiktoken/)

 

In [119]:
!pip install lxml 1>/dev/null

[0m

In [129]:
!pip install beautifulsoup4 1>/dev/null

[0m

In [116]:
!pip install tiktoken==0.4.0 1>/dev/null

[0m

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN OVERVIEW</div>


---
# 1. Basic features

---
## Get prediction from a langage model

In [67]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))




1. Switzerland 
2. Germany 
3. Sweden 
4. Norway 
5. Denmark


---
## Manage prompts with templates

In [68]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked by {interest}"
)

In [69]:
text = prompt.format(interest="food")
print(f"{text=}")
print(llm(text))

text='what are the 5 best countries in Europe ranked by food'


1. Italy
2. Spain
3. France
4. Greece
5. Portugal


In [70]:
text = prompt.format(interest="siteseeing")
print(f"{text=}")
print(llm(text))

text='what are the 5 best countries in Europe ranked by siteseeing'


1. Italy 
2. France 
3. Spain 
4. Greece 
5. Germany


---
# 2. Chains

Chains are sequences of modular components (or other chains) combined in a particular way to accomplish a common use case.


Example:
- chaining LLM and tool
- summarization chain

---
## Built-in chains

In [71]:
from langchain.chains import PALChain
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.7)

palchain = PALChain.from_math_prompt(llm=llm, verbose=True)


text = """If my age is half of my dad's age 
and he is going to be 60 next year, 
what is my current age?"""
#palchain.run("If my age is half of my dad's age and he is going to be 60 next year, what is my current age?")
palchain.run(text)




[1m> Entering new  chain...[0m
[32;1m[1;3mdef solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_next_year = 60
    my_age_next_year = dad_age_next_year / 2
    my_age_current = my_age_next_year - 1
    result = my_age_current
    return result[0m

[1m> Finished chain.[0m


'29.0'

<div class="alert alert-block alert-warning"> 
    TODO <br>
    - different result each run <br>
    - and should be 29.5
</div>


**OUTPUT**

Most of time the response is wrong. It neglects the fact that the father will be 60 necxt year, so he is 59 actually.
```
> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_next_year = 60
    my_age_fraction = 0.5
    my_age_now = dad_age_next_year * my_age_fraction
    result = my_age_now
    return result

> Finished chain.
'30.0'
```

**OUTPUT**

Once in a while it yields the correct answer.

```
> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_current = 59
    my_age_current = dad_age_current / 2
    result = my_age_current
    return result

> Finished chain.
'29.5'
````

---
## Multi-step workflow to feed prompt into the model

Output of model 1 is feed into the model 2.

In [72]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

# chain feeds the prompt into the langage mmodel.
chain = LLMChain(llm=llm, prompt=prompt)

In [73]:
chain.run("science")

' and technology\n\n1. Germany \n2. United Kingdom \n3. Switzerland \n4. Sweden \n5. Finland'

In [74]:
print(chain.run("tv shows"))



1. United Kingdom 
2. Germany 
3. France 
4. Italy 
5. Spain


---
## Using the OpenAI Chat API (less expensive) as a chain
requires a chain to feed the prompt into the chat 

**Resources**
> - Other Chat APIs: https://api.python.langchain.com/en/latest/modules/chat_models.html

In [75]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

chatopenai = ChatOpenAI(model_name="gpt-3.5-turbo")

prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

llmchain_chat = LLMChain(llm=chatopenai, prompt=prompt)
print(llmchain_chat.run("food"))


Ranking the "best" countries for food in Europe is subjective and can vary depending on personal taste preferences. However, here are five countries known for their culinary traditions and diverse gastronomy:

1. Italy: Italy is famous for its iconic dishes such as pasta, pizza, gelato, and espresso. Each region has its own unique specialties, making Italian cuisine incredibly diverse and flavorful.

2. France: French cuisine is renowned worldwide for its elegance and sophistication. From escargots to foie gras, and from croissants to coq au vin, France offers a rich variety of dishes that celebrate fresh ingredients and culinary excellence.

3. Spain: Spain is known for its vibrant and diverse food culture. Tapas, paella, and jam√≥n ib√©rico are just a few examples of the delicious dishes available. Spanish cuisine often combines bold flavors with fresh ingredients, and each region has its own culinary treasures.

4. Greece: Greek cuisine is characterized by its use of fresh, locally 

---
## Leverage LLM Math

Evaluating chains that know how to do math.

**Resources**
> - Langchain module LLM_Math: ttps://python.langchain.com/docs/guides/evaluation/llm_math

In [76]:
from langchain.prompts import load_prompt
from langchain.chains import LLMMathChain

# loads the model.
llm = OpenAI(temperature=0.9)

prompt = load_prompt('lc://prompts/llm_math/prompt.json')

# deprecated
##chain = LLMMathChain(llm=llm, prompt=prompt)

chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("what is the largest prime number lower than 20"))


No `_type` key found, defaulting to `prompt`.



Answer: 19


---
# 3. Agent

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a ‚Äúagent‚Äù which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.


---
## Test with LLM model only 

Since models are now updated regularly, I fored an model that is not updated in order to check that it gets an old answer.


In [77]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
# low temperature to avoid randomness
llm = OpenAI(model_name="text-davinci-003", temperature=0)

text = "Who is the prime minister of France since may 2022"

# Actual API call - may tale a while.
print(llm(text))




The Prime Minister of France since May 2022 is Jean Castex.


**OUTPUT**

'The Prime Minister of France since May 2022 is Jean Castex.'

This answer is wrong. Since the model has been trained mid 2021, it is not up-to-date. Elisabeth Borne is Prime Minister since may 2022.

---
## Agent leveraging Google Search

**Instructions**

Make sure:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

<br/>

<div class="alert alert-block alert-warning"> FIXME it does not work.
    <br/>
    it seems to work until the model is forced. 
    model defaults to DPT3.5 turbo which is updated and presummably knows the correct answer.
</div>

In [78]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
# forcing an model that doesnot know the correct information
llm = OpenAI(model_name="text-davinci-003", temperature=0)


# load some tools
tools = load_tools(["google-search"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [79]:
agent.run("Who is the prime minister of France since may 2022")



[1m> Entering new  chain...[0m
[32;1m[1;3m I should research this online
Action: google_search
Action Input: "prime minister of France since may 2022"[0m
Observation: [36;1m[1;3m√âlisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic recordsEdit. Length of the successive governments of the French Fifth¬†... May 16, 2022 ... The last woman prime minister, Edith Cresson, briefly headed the cabinet from May 1991 to April 1992 under President Francois Mitterrand. √âlisabeth Borne is a French politician who has served as Prime Minister of France since May 2022. She is a member of President Emmanuel Macron's party¬†... May 16, 2022 ... √âlisabeth Borne, the minister of labor who previously was in ... https://www.nytimes.com/2022/05/16/world/europe/macron-prime-minister.html. France's newly appointed Prime Minister Elisabeth Borne looks on during a handover ceremony in the courtyard. May 17th 2022 | PARIS. Prime Minister Shri Narendra Modi paid an official vis

'√âlisabeth Borne has been the Prime Minister of France since May 2022.'

**OUYPUT**

with da vinci model. The API ssems to be called but for whatever reason the LLM did not get the correct answer. It seems that it only looked for the confirmation of who it thinks is the prime minister.

Interrestingly the query to Google Search is only "prime minister of France" while the default model would issue the full sentence (check next test).

The response list all the prime ministers as well as trandom information and the bot did not get that the first in the list is the current one.


```
> Entering new  chain...
 I need to find out who is the current prime minister of France
Action: google_search
Action Input: "prime minister of France"
Observation: The prime minister of France officially the prime minister of the French Republic, is the head of government of the French Republic and the leader of the ... May 16, 2022 ... French President Emmanuel Macron picked Labour Minister Elisabeth Borne as his new prime minister on Monday as he prepares for legislative ... Usually, the Chief Ministers were members of the King's Council (the archaic form of cabinet) or high members of the French nobility or the Catholic clergy. Jun 24, 2023 ... President Biden spoke today with President Emmanuel Macron of France, Chancellor Olaf Scholz of Germany, and Prime Minister Rishi Sunak of ... The head of the government of France has been called the prime minister of France (French: Premier ministre) since 1959, when Michel Debr√© became the first ... Aug 21, 2022 ... President Biden spoke with President Emmanuel Macron of France, Chancellor Olaf Scholz of Germany, and Prime Minister Boris Johnson of the ... On 3 July 2020, Macron appointed the centre-right Jean Castex as the Prime Minister of France. Castex has been described as being seen to be a social ... 6 days ago ... In February 2015 Prime Minister Manuel Valls was forced to invoke Article 49 of the French constitution, a rarely used measure that allows a ... Archives ¬∑ Visit of Paul Reynaud, former Prime Minister of France, 4:00PM. Emmanuel Macron was elected eighth President of the French Republic on 7 May 2017. The President of the Republic appoints the Prime Minister, who proposes the ...
Thought: I now know the final answer
Final Answer: Jean Castex is the current prime minister of France since July 2020.
```

In [80]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
# using default model gpt3.5
llm = OpenAI(temperature=0)


# load some tools
tools = load_tools(["google-search"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [81]:
agent.run("Who is the prime minister of France since may 2022")



[1m> Entering new  chain...[0m
[32;1m[1;3m I should research this online
Action: google_search
Action Input: "prime minister of France since may 2022"[0m
Observation: [36;1m[1;3m√âlisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic recordsEdit. Length of the successive governments of the French Fifth¬†... May 16, 2022 ... The last woman prime minister, Edith Cresson, briefly headed the cabinet from May 1991 to April 1992 under President Francois Mitterrand. √âlisabeth Borne is a French politician who has served as Prime Minister of France since May 2022. She is a member of President Emmanuel Macron's party¬†... May 16, 2022 ... √âlisabeth Borne, the minister of labor who previously was in ... https://www.nytimes.com/2022/05/16/world/europe/macron-prime-minister.html. France's newly appointed Prime Minister Elisabeth Borne looks on during a handover ceremony in the courtyard. May 17th 2022 | PARIS. Prime Minister Shri Narendra Modi paid an official vis

'√âlisabeth Borne has been the Prime Minister of France since May 2022.'

**OUTPUT**

The default model yields the correct answer. 

Weirly enough the query to Google Search is different : "prime minister of France since may 2022"

Google Search's response is less misleading.

```
> Entering new  chain...
 I should research this online
Action: google_search
Action Input: "prime minister of France since may 2022"
Observation: √âlisabeth Borne has served as Prime Minister since 16 May 2022. Fifth Republic recordsEdit. Length of the successive governments of the French Fifth ... May 16, 2022 ... The last woman prime minister, Edith Cresson, briefly headed the cabinet from May 1991 to April 1992 under President Francois Mitterrand. √âlisabeth Borne is a French politician who has served as Prime Minister of France since May 2022. She is a member of President Emmanuel Macron's party ... May 16, 2022 ... √âlisabeth Borne, the minister of labor who previously was in ... https://www.nytimes.com/2022/05/16/world/europe/macron-prime-minister.html. France's newly appointed Prime Minister Elisabeth Borne looks on during a handover ceremony in the courtyard. May 17th 2022 | PARIS. Prime Minister Shri Narendra Modi paid an official visit to France on May 04, 2022 on his way back from the 2nd India-Nordic Summit in Copenhagen. May 16, 2022 ... Borne is the first French female prime minister since √âdith Cresson, who briefly headed the cabinet from May 1991 to April 1992 under the ... May 16, 2022 ... France's Labour Minister Elisabeth Borne leaves the √âlys√©e presidential palace after the weekly cabinet meeting in 01:42. France's Labour ... She was given the job by President Emmanuel Macron on 16 May 2022. Prime ministers since 1958Edit. Political parties. Independent May 6, 2022 ... Palais de l'√âlys√©e, Wednesday May 4th, 2022. 1. President of the French Republic Mr. Emmanuel Macron hosted Prime Minister of India, ...
Thought: I now know the final answer
Final Answer: √âlisabeth Borne has been the Prime Minister of France since May 2022.

> Finished chain.
'√âlisabeth Borne has been the Prime Minister of France since May 2022.'
```

In [82]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
# using defauklt model alone
llm = OpenAI(temperature=0)

text = "Who is the prime minister of France since may 2022"

print(llm(text))




The Prime Minister of France since May 2022 is Jean Castex.


In [83]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
# using defauklt model alone
llm = OpenAI(model_name="gpt-3.5-turbo", temperature=0)

text = "Who is the prime minister of France since may 2022"

print(llm(text))




As an AI language model, I cannot provide real-time information as my responses are based on data available up until September 2021. As of my last update, the Prime Minister of France is Jean Castex, who assumed office on July 3, 2020. However, please note that political positions can change, and it is always best to refer to the latest news sources for the most up-to-date information.


In [84]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
# using defauklt model alone
llm = OpenAI(model_name="gpt-3.5-turbo-0301", temperature=0)

text = "Who is the prime minister of France since may 2022"

print(llm(text))


As an AI language model, I do not have access to real-time information. However, as of May 2022, the current Prime Minister of France is Jean Castex.


---
# 4. Memory - Conversation

maintaining a conversation with a user in natural language and simulating the way a human would behave as a conversational partner. 

**Resources**
> -https://python.langchain.com/docs/modules/memory/how_to/conversational_customization

In [88]:
from langchain import OpenAI, ConversationChain

# create a model
llm = OpenAI(temperature=0)

conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi There")





[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi There
AI:[0m

[1m> Finished chain.[0m


" Hi there! It's nice to meet you. How can I help you today?"

In [86]:
conversation.predict(input="What is the first thing that I said to you?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi There
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: What is the first thing that I said to you?
AI:[0m

[1m> Finished chain.[0m


' You said "Hi there!"'

In [87]:
conversation.predict(input="What is an alternative for the first thing that I said to you?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi There
AI:  Hi there! It's nice to meet you. How can I help you today?
Human: What is the first thing that I said to you?
AI:  You said "Hi there!"
Human: What is an alternative for the first thing that I said to you?
AI:[0m

[1m> Finished chain.[0m


' An alternative for the first thing you said to me is "Hello!"'

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN COMPONENTS</div>


---
# 5. Schemas

Basic data types and schemas that are used throughout the codebase.

There are 3 types of schemas
- Text (see above)
- Prompts
- Messages 
- Document


<br/>

**Resources**
> - Schhemas component:  https://docs.langchain.com/docs/components/schema/


---
## Text

In [89]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))



1. Germany
2. Switzerland
3. Netherlands
4. Sweden
5. Finland


---
## Chat messages
Chat messages are like text with a type

There are 3 types
- System: background context that tells the AI what to do
- Human: inputs sent by the user
- AI : response of the AI


In [90]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0.7)

In [91]:
messages = [ SystemMessage(content="You are a nice AI and help users to feature out what to eat.")]
     
messages.append( HumanMessage(content="I like tuna, list some recipes.") )

In [92]:
response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

Certainly! Here are a few tuna recipes you might enjoy:

1. Tuna Salad: Mix canned tuna with mayo, diced celery, chopped red onion, and a squeeze of lemon juice. Serve it on a bed of lettuce, in a sandwich, or with crackers.

2. Tuna Pasta: Cook your favorite pasta according to package instructions. In a separate pan, saut√© garlic and cherry tomatoes in olive oil. Add in drained canned tuna, a pinch of red pepper flakes, and salt. Toss the cooked pasta in the sauce and sprinkle with fresh parsley.

3. Tuna Poke Bowl: Combine diced fresh tuna with soy sauce, sesame oil, rice vinegar, and a pinch of sugar. Serve the marinated tuna over a bowl of steamed rice, and add toppings like avocado, cucumber, edamame, and sesame seeds.

4. Tuna Steaks: Marinate fresh tuna steaks in a mixture of soy sauce, ginger, garlic, and honey. Grill or sear the steaks until cooked to your desired level of doneness. Serve with a side of steamed vegetables or a salad.

5. Tuna Ni√ßoise Salad: Arrange cooked ba

In [93]:
messages.append( HumanMessage(content="show the first one.") )

response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

Certainly! Here's a quick and easy recipe for Tuna Salad:

Ingredients:
- 2 cans of tuna, drained
- 1/4 cup mayonnaise
- 1/4 cup diced celery
- 2 tablespoons chopped red onion
- 1 tablespoon lemon juice (optional)
- Salt and pepper, to taste

Instructions:
1. In a mixing bowl, add the drained tuna and break it up into smaller pieces using a fork.
2. Add mayonnaise, diced celery, chopped red onion, and lemon juice (if desired) to the bowl with the tuna.
3. Mix all the ingredients together until well combined. If you prefer a creamier texture, you can add more mayonnaise.
4. Season the tuna salad with salt and pepper according to your taste.
5. Serve the tuna salad on a bed of lettuce, as a sandwich filling, or with crackers for a tasty snack.

Feel free to adjust the ingredients and measurements to suit your preferences. Enjoy your homemade tuna salad!


---
## Examples
An list of input output pairs thet represent the input and expected output.

Used to fine tune a model or do in-context learning.

**Resources**
> - Prompt Template:  https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples


In [94]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))


# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))



=== exemple prompt ===
question: red bold
color:red; font-style:bold;

=== prompt ===
question: red bold
color:red; font-style:bold;

question: green italic
color:green; font-style:italic;

question: blue bold
color:blue; font-style:bold;

question: pink
color:pink;

question: green
color:green;

question: pink italic
color:pink; font-style:italic;

question: pink bold

=== answer ===

color:pink; font-style:bold;


---
## Documents

An unstructured object that conaints a pieces of text and metadatas.

<div class="alert alert-block alert-warning"> TODO  resource </div>


<div class="alert alert-block alert-warning"> TODO how to use this concept? 
make some knowledge available?
how to use metadata?
</div>


In [97]:
from langchain.schema import Document
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain


# temperature 0 means no randomness
llm = OpenAI(temperature=0)


document = Document(
    page_content="""

        So she swallowed one of the cakes and was delighted to find that she
        began shrinking directly. As soon as she was small enough to get through
        the door, she ran out of the house and found quite a crowd of little
        animals and birds waiting outside. They all made a rush at Alice the
        moment she appeared, but she ran off as hard as she could and soon found
        herself safe in a thick wood.
        """,
    metadata={
        'author':"Lewis Caroll",
        'identifier':"1234"
    }
)

print("Document")
print(document)

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    verbose=False)

# run the chain against the documment
summary = chain.run([document])
    
print("\nSummary")
print(summary)


Document
page_content='\n\n        So she swallowed one of the cakes and was delighted to find that she\n        began shrinking directly. As soon as she was small enough to get through\n        the door, she ran out of the house and found quite a crowd of little\n        animals and birds waiting outside. They all made a rush at Alice the\n        moment she appeared, but she ran off as hard as she could and soon found\n        herself safe in a thick wood.\n        ' metadata={'author': 'Lewis Caroll', 'identifier': '1234'}

Summary
 After eating a cake, Alice shrinks and escapes the house. She runs away from a crowd of animals and birds and finds refuge in a thick wood.


In [98]:
text_sample= """
<p>Just at this moment her head struck against the roof of the hall; in
fact, she was now rather more than nine feet high, and she at once took
up the little golden key and hurried off to the garden door.</p>
<p>Poor Alice! It was as much as she could do, lying down on one side, to
look through into the garden with one eye; but to get through was more
hopeless than ever. She sat down and began to cry again.</p>
<p>She went on shedding gallons of tears, until there was a large pool all
'round her and reaching half down the hall.</p>
<p>After a time, she heard a little pattering of feet in the distance and
she hastily dried her eyes to see what was coming. It was the White
Rabbit returning, splendidly dressed, with a pair of white kid-gloves in
one hand and a large fan in the other. He<span class="pagenum"><a id="Page_10">[Pg 10]</a></span> came trotting along in a
great hurry, muttering to himself, "Oh! the Duchess, the Duchess! Oh!
<i>won't</i> she be savage if I've kept her waiting!"</p>
<p class="figright"><a href="https://www.gutenberg.org/cache/epub/19033/images/i005.jpg" id="id-6474075343490533101"><img alt="Illo5" src="./Alice&#39;s Adventures in Wonderland, by Lewis Carroll_files/i005_th.jpg" id="id-5171882188453704008"></a></p><p>When the Rabbit came near her, Alice began, in a low, timid voice, "If
you please, sir‚Äî" The Rabbit started violently, dropped the white
kid-gloves and the fan and skurried away into the darkness as hard as he
could go.</p>
<p>Alice took up the fan and gloves and she kept fanning herself all the
time she went on talking. "Dear, dear! How queer everything is to-day!
And yesterday things went on just as usual. <i>Was</i> I the same when I got
up this morning? But if I'm not the same, the next question is, 'Who in
the world am I?' Ah, <i>that's</i> the great puzzle!"</p>
<p>As she said this, she looked down at her hands and was surprised to see
that she had put on one of the Rabbit's little white kid-gloves while
she was talking. "How <i>can</i> I have done that?" she thought. "I must be
growing small again." She got up and went to the table to measure
herself by it and found that she was now about two feet high and was
going on<span class="pagenum"><a id="Page_11">[Pg 11]</a></span> shrinking rapidly. She soon found out that the cause of this
was the fan she was holding and she dropped it hastily, just in time to
save herself from shrinking away altogether.</p>
<p>"That <i>was</i> a narrow escape!" said Alice, a good deal frightened at the
sudden change, but very glad to find herself still in existence. "And
now for the garden!" And she ran with all speed back to the little door;
but, alas! the little door was shut again and the little golden key was
lying on the glass table as before. "Things are worse than ever,"
thought the poor child, "for I never was so small as this before,
never!"</p>
<p>As she said these words, her foot slipped, and in another moment,
splash! she was up to her chin in salt-water. Her first idea was that
she had somehow fallen into the sea. However, she soon made out that she
was in the pool of tears which she had wept when she was nine feet high.
"""

In [100]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# setup. a custom prompt
# a defaukt one is provide: write a concise summary
prompt_template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    prompt=prompt, 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

num_tokens=901

Alice was very tall and bumped her head on the roof of the hall. She tried to get into the garden but it was too hard. She started to cry and made a big pool of tears. Then she heard a noise and saw the White Rabbit. He was wearing fancy clothes and had a fan and gloves. He was in a hurry and said he was late for the Duchess. Alice picked up the fan and gloves and kept talking. She looked down and saw she was wearing one of the Rabbit's gloves. She was shrinking and dropped the fan to stop it. She was very scared but happy to still be alive. She ran back to the door but it was locked. She slipped and fell into the pool of tears she had made.


---
# 6. Models
LangChain provides interfaces and integrations for two types of models:
- LLMs: Models that take a text string as input and return a text string
- Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message

<br/>

**Resources**
> - Model Component: https://python.langchain.com/docs/modules/model_io/models/
> - List of models: https://platform.openai.com/docs/models


---
## Langage Model 
LLMs: Models that take a text string as input and return a text string

In [101]:
from langchain.llms import OpenAI

# additnal parameters to select a mode, pass the API key ...
llm = OpenAI(model_name="text-ada-001", temperature=0.7)

llm("What day comes after Friday?")

'\n\nSaturday'

---
## Chat Model 
Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat 

Also make sense for a unique interaction as Chat API is less expensive.


In [103]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=1)

In [104]:
messages = [ 
    SystemMessage(content="You are a nice AI and help users to feature out what to eat."),
    HumanMessage(content="I like tuna, list some recipes.")
]
     
chat(messages)

AIMessage(content='Here are a few tuna recipes that you might enjoy:\n\n1. Tuna Salad: Mix canned tuna with mayonnaise, diced onions, celery, and seasonings like salt, pepper, and lemon juice. Serve it on a sandwich, wrap, or with crackers.\n\n2. Tuna Melt: Spread tuna salad on sliced bread, top with cheese, and toast it in a skillet until the cheese is melted and bread is crispy.\n\n3. Tuna Pasta Salad: Cook pasta according to package instructions. Drain and cool it. Then, mix it with canned tuna, diced vegetables (like bell peppers, cherry tomatoes, and cucumbers), olives, and a dressing of your choice (such as lemon vinaigrette or creamy ranch).\n\n4. Tuna Steaks: Pat dry fresh tuna steaks and season them with salt, pepper, and a squeeze of lemon. Cook them on a hot grill or sear them in a skillet for a few minutes on each side until desired doneness.\n\n5. Tuna Poke Bowl: Marinate diced fresh tuna in soy sauce, sesame oil, and a splash of rice vinegar. Serve it over a bed of rice o

---
### Text Embedding Model

Convert text into a series of numbers (a vector) which holds the meaning of the text.

Mainly used for text comparison.

In [105]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

text="A leader should know all about truth and honesty, and when to see the difference. (Truck) - Bromeliad Trilogy"

text_embedding = embeddings.embed_query(text)

print(f"embedding length: {len(text_embedding)}")
print(f"5 first values of the vector: {text_embedding[:5]}")

embedding length: 1536
5 first values of the vector: [-0.0020272971596568823, -0.016961609944701195, 0.013975410722196102, -0.014824817888438702, 0.001639920868910849]


---
# 7. prompts
A "prompt" refers to the input to the model. This input is rarely hard coded, but rather is often constructed from multiple components. A PromptTemplate is responsible for the construction of this input. LangChain provides several classes and functions to make constructing and working with prompts easy.

LangChain documentation is split into four sections:
- PromptValue: The class representing an input to a model.
- Prompt Templates: The class in charge of constructing a PromptValue.
- Example Selectors: Often times it is useful to include examples in prompts. These examples can be hardcoded, but it is often more powerful if they are dynamically selected.
- Output Parsers: Language models (and Chat Models) output text. But many times you may want to get more structured information than just text back. This is where output parsers come in. Output Parsers are responsible for (1) instructing the model how output should be formatted, (2) parsing output into the desired formatting (including retrying if necessary).

<br/>

**Resources**
> - Prompts Component: https://docs.langchain.com/docs/components/prompts/

---
## Simple prompt

In [106]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# write a simple  prompt. use """ to allow multiline string.
prompt = """
Today is Monday. Tomorrow is Wednesday.

What is wrong with this statement?
"""

# query the model
print(llm(prompt))


It is incorrect; tomorrow is Tuesday.


---
## Prompt with template and placeholder.

In [107]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# setup a prompt. use """ to allow multiline string.
template = PromptTemplate (
    input_variables=["today", "tomorrow"],
    template="""
    Today is {today}. Tomorrow is {tomorrow}.

    What is wrong with this statement?
    """
)

prompt = template.format(today="Monday", tomorrow="Wednesday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

prompt='\n    Today is Monday. Tomorrow is Wednesday.\n\n    What is wrong with this statement?\n    '

The statement is incorrect; tomorrow is Tuesday.


In [None]:
prompt = template.format(today="Thursday", tomorrow="Friday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

---
## Example selectors and Few Shot Learning

A way to select from a series of examples in few shot learning 

**Resources**
> - Example Selector: https://api.python.langchain.com/en/latest/modules/example_selector.html
> - Few shot learning: https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples



### Example selectors and Few Shot Learning with NGram


<div class="alert alert-block alert-warning"> FIXME </div>

In [120]:
from langchain.llms import OpenAI
from langchain.prompts.example_selector import NGramOverlapExampleSelector
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))


# Select and order examples based on ngram overlap score (sentence_bleu score).

question = "pink bold"

example_selector = NGramOverlapExampleSelector.select_examples(
    examples,
    question
)

"""
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # This is the number of examples to produce.
    k=1
)
"""


# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    #example_selector=example_selector, 
    examples=selected_examples, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input=question)

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))



=== exemple prompt ===
question: red bold
color:red; font-style:bold;


AttributeError: 'str' object has no attribute 'values'

### Example selectors and Few Shot Learning with similarities

requires a vector database

In [121]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Annoy
#from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))

# Example selector that selects examples based on SemanticSimilarity.

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    #Chroma,
    Annoy,
    # This is the number of examples to produce.
    k=2
)

# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    example_selector=example_selector, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))



=== exemple prompt ===
question: red bold
color:red; font-style:bold;

=== prompt ===
question: red bold
color:red; font-style:bold;

question: pink italic
color:pink; font-style:italic;

question: pink bold

=== answer ===

color:pink; font-weight:bold;


---
## Output Parser and response format

A way to format the outpu
- Format nstructions: An autogenerated prompt telling how the result should be formatted
- parser: a method which will extract the output int hte desired format. you may prvie a custom parser


**Resources**
> - OutputParser:https://docs.langchain.com/docs/components/prompts/output-parser

In [122]:
from langchain.llms import OpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.prompts.prompt import PromptTemplate


# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# how you would like the response to be structured
# periods at the send of sentence are required. 
# If not there description ends up in the json text and break the JSON format
response_schemas = [
    ResponseSchema(name="bad_string", description="This is a poorly formatted string."),
    ResponseSchema(name="good_string", description="This is a your string reformatted.")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# check instructions
format_instructions =output_parser.get_format_instructions()
print("\nformat_instructions")      
print(format_instructions)      

template = """
You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


{format_instructions}

% USER_INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt_template = PromptTemplate(
    input_variables=['user_input'],
    partial_variables={'format_instructions': format_instructions},
    template=template
)

# format the user input as a prompt
# for whateveer reason it does not work well with format.
# format_promt retruns an object, not a string and should be converted to a string 
prompt = prompt_template.format_prompt(user_input="Wellcom to Californya!").to_string()
print("\nprompt")
print(prompt)

# gets the response
response = llm(prompt)
print("\nresponse=")      
print(response)      

# gets the JSON document
print("\nparsed output=")     

# comma sometimes missing
response.replace('"good_string"',',"good_string"')

output_parser.parse(response)                   



format_instructions
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

prompt

You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

% USER_INPUT:
Wellcom to Californya!

YOUR RESPONSE:


response=
```json
{
	"bad_string": "Wellcom to Californya!",
	"good_string": "Welcome to California!"
}
```

parsed output=


{'bad_string': 'Wellcom to Californya!',
 'good_string': 'Welcome to California!'}

---
# 8. Indexes

Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.

LangChain documentation is split into four sections:

- Document Loaders: Classes responsible for loading documents from various sources.
- Text Splitters: Classes responsible for splitting text into smaller chunks.
- VectorStores: The most common type of index. One that relies on embeddings.
- Retrievers: Interface for fetching relevant documents to combine with language models.

<br/>

**Resource**
> - Indexes Component: https://docs.langchain.com/docs/components/indexing/


**Instructions**

For the example below, make sure that:
- a vector database client is installed

---
## Document Loaders

Easy ways to import documents from other sources 
and make it available for use in your language models.

**Resources**
> -  Document Loaders: https://python.langchain.com/docs/modules/data_connection/document_loaders
> - List of loaders: https://github.com/hwchase17/langchain/tree/master/langchain/document_loaders

In [123]:
from langchain.document_loaders import HNLoader
 
# Setup a Hacker News loader
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")
 
data = loader.load()
 
print(f"Found {len(data)} comments")


sample = '\n'.join([x.page_content[:100] for x in data[:2]])
print("\nHere's a sample (first 100 chars of the 3 first items)")
print(sample)
                 

Found 76 comments

Here's a sample (first 100 chars of the 3 first items)
Ozzie_osman 5 months ago  
             | next [‚Äì] 

LangChain is awesome. For people not sure what 
Ozzie_osman 5 months ago  
             | parent | next [‚Äì] 

Also, another library to check out is 


---
## Text Splitters

allow you to split a document into smaller chunk

<div class="alert alert-block alert-warning"> TODO  resource +FIXME pb with loader </div>


In [130]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# This is a long document we can split up.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


print("docuument content")
start = 2200
print(documents[0].page_content[start-200:start+30])

 
# The recommended TextSplitter is the RecursiveCharacterTextSplitter. 
# This will split documents recursively by different characters - starting with "\n\n", then "\n", then " ".
# This is nice because it will try to keep all the semantically relevant content in the same place 
# for as long as possible.
# Important parameters to know here are chunkSize and chunkOverlap. 
# chunkSize controls the max size (in terms of number of characters) of the final documents. 
# chunkOverlap specifies how much overlap there should be between chunks. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=200,
    chunk_overlap=20,
)
 
texts = text_splitter.create_documents([document[0].page_content])
 
print(f"\nSplitted into {len(texts)} parts")
 
print("Preview:")
i = int(start/150)
print(texts[i+1].page_content, "\n-")
print(texts[i+2].page_content, "\n-")
print(texts[i+3].page_content)


FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

---
## Vextor Store and Retrievers 
A retriever is an interface that returns documents given an unstructured query. 

A retriever does not need to be able to store documents, only to return (or retrieve) it. 

It usually relies to a vector store as a document management backbone.

A vector store is a particular type of database optimized for storing documents and their embeddings, and then fetching of the most relevant documents for a particular query, ie. those whose embeddings are most similar to the embedding of the query.

- local : ChromaDB, FAISS, Annoy
- Online: Pinecone, Weaviate

However a retriever is more general than a vector store and there are other types of retrievers as well, e.g. Wikipedia or search engines like Elastic Search or Kendra.


Question answering over documents consists of four steps:
1. Create an index
2. Create a Retriever from that index
3. Create a question answering chain
4. Ask questions

<br/>

**Resources**
> - Lit of retrievers: https://python.langchain.com/docs/modules/data_connection/retrievers/
> - LangChain Supported VectorStores: https://api.python.langchain.com/en/latest/modules/vectorstores.html
> - Retrievers: https://github.com/hwchase17/langchain/tree/master/langchain/retrievers

### Store document in a Vector Store and retrieve information


<div class="alert alert-block alert-warning"> FIXME pb with loader </div>


In [127]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


# Get your splitter ready
# Using small chunk for the sake of example. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=25)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)

print(f"\nSplitted into {len(texts)} parts")

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(texts, embeddings)

FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

In [None]:
# Init a retriever for this db
retriever = db.as_retriever()

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
docs = retriever.get_relevant_documents(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)

In [None]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Asking theLLM
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

qa.run(query)

In [None]:
qa.run(query)

### Save and load db


In [None]:
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings

docstore_file_path = "alice_docstore"

db.save_local(docstore_file_path)

loaded_vector_store = Annoy.load_local(
   docstore_file_path, embeddings=OpenAIEmbeddings()
)

# same document similar to White Red abbit
loaded_vector_store.similarity_search_with_score("White Rabbit", k=3)

### One line index creation and information retrieval

In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)

# creating an indexer
# default to Chroma as a vector database
# Use CharacterTextSplitter. May also be RecursiveCharacterTextSplitter.
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Annoy,
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
)

index = index_creator.from_loaders([loader])

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
index.query(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)

In [None]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Ask the question to the model 
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), 
                                 chain_type="stuff", 
                                 retriever=index.vectorstore.as_retriever())

qa.run(query)

In [None]:
qa.run(query)

---
## Wikipedia retriever


<div class="alert alert-block alert-warning"> TODO wikipedia retriever </div>

<div class="alert alert-block alert-warning"> 
    Move to tools agent_excutor example  <br>
</div>



<div class="alert alert-block alert-warning"> FIXME pb with loader </div>


In [131]:
from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

# model_name='gpt-4'
llm = ChatOpenAI(temperature=0)

wikipedia = WikipediaAPIWrapper()

tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    ),
]

agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

output = agent_executor.run("Can you please provide a quick summary of Napoleon Bonaparte? \
                          Then do a separate search and tell me what the commonalities are with Serena Williams")



[1m> Entering new  chain...[0m
[32;1m[1;3mI should start by searching for a quick summary of Napoleon Bonaparte on Wikipedia. Then, I can search for Serena Williams and compare the two to find any commonalities.
Action: Wikipedia
Action Input: "Napoleon Bonaparte"[0m
Observation: [36;1m[1;3mPage: Napoleon
Summary: Napoleon Bonaparte (born Napoleone Buonaparte; 15 August 1769 ‚Äì 5 May 1821), later known by his regnal name Napoleon I, was a French military commander and political leader who rose to prominence during the French Revolution and led successful campaigns during the Revolutionary Wars. He was the de facto leader of the French Republic as First Consul from 1799 to 1804, then of the French Empire as Emperor of the French from 1804 until 1814 and again in 1815. Napoleon's political and cultural legacy endures to this day, as a highly celebrated and controversial leader. He initiated many liberal reforms that have persisted in society, and is considered one of the greate

OutputParserException: Could not parse LLM output: `Based on the summaries of Napoleon Bonaparte and Serena Williams, the commonalities between the two are that they both achieved great success in their respective fields. Napoleon Bonaparte was a highly celebrated and controversial leader who is considered one of the greatest military commanders in history. Serena Williams is widely regarded as one of the greatest tennis players of all time, having won 23 Grand Slam women's singles titles, the most in the Open Era. Both Napoleon and Serena have had a significant impact on their respective domains and have left a lasting legacy.`

---
# 9. Memory


Memory is the concept of storing and retrieving data in the process of a conversation. 

There are two main methods:
- Based on input, fetch any relevant pieces of data
- Based on the input and output, update state accordingly

There are two main types of memory: short term and long term.
- Short term memory generally refers to how to pass data in the context of a singular conversation (generally is previous ChatMessages or summaries of them).
- Long term memory deals with how to fetch and update information between conversations.

<br/>

**Resource**
> - Memory Component: https://docs.langchain.com/docs/components/memory/
> - Chat Message History: https://docs.langchain.com/docs/components/memory/chat_message_history
> - [LangChain: Enhancing Performance with Memory Capacity](https://towardsdatascience.com/langchain-enhancing-performance-with-memory-capacity-c7168e097f81)


<div class="alert alert-block alert-warning"> TODO vs Conversation and buffer memory (check blog)?</div>


<div class="alert alert-block alert-warning"> TODO Long term memory</div>


In [None]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI
from pprint import pprint
 
chat = ChatOpenAI(temperature=0)
 
history = ChatMessageHistory()
 
history.add_ai_message("hi!")
 
history.add_user_message("what is the capital of france?")

#After adding messages to the history, you can pass this history to the language model 
#to generate context-aware responses:

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

In [None]:
history.add_user_message("what is the population os this city?")

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response.content=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

---
# 10. Chains
Chains are sequences of modular components (or other chains) combined in a particular way to accomplish a common use case.


Example:
- chaining LLM and tool
- summarization chain

<br/>

**Resources**
> - Chain Component: https://docs.langchain.com/docs/components/chains/


<div class="alert alert-block alert-warning"> TODO index related chain https://docs.langchain.com/docs/components/chains/index_related_chains  </div>




## Simple sequential model

A Simple Sequential Chain helps break up tasks to avoid language models getting distracted, confused, or hallucinating when asked to perform too many tasks in a row.

In this example, the chain first receives the user location (Rome) and outputs a classic dish from Rome. Then, it provides a simple recipe for that classic dish. The verbose=True parameter ensures that the chain prints statements during its execution, making it easier to debug and understand the chain‚Äôs progress.

In [132]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain
 
# Cretae a model with high randomness
llm = OpenAI(temperature=1)
 
# Step 1 - dish for location

template = """
Your job is to come up with a classic dish from the area that the users suggests. 

% USER LOCATION {user_location} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

location_chain = LLMChain(llm=llm, prompt=prompt_template)
 

# Step 2 - Recipe
template = """
Given a meal, give a short and simple recipe on how to make that dish at home. 

% MEAL {user_meal} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)
 
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

# chain the steps
# set verbose to True to check what happes
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=False)
 
review = overall_chain.run("Rome")

## Summarization Chain

The Summarization Chain breaks the text into smaller chunks and summarizing each chunk, creating a final summary based on the individual summaries.

In this example, the chain first splits the essay into chunks of 700 characters. It then generates summaries for each chunk and creates a final concise summary based on these individual summaries.

In [133]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Cretae a model with low randomness
llm = OpenAI(temperature=1)

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()
 
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)
 
# Split your docs into texts
# only kept first 1 000 characters of the document to save computing
texts = text_splitter.split_documents(documents[:1000])
 
# There is a lot of complexity hidden in this one line. 
# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
summary = chain.run(texts)
    
print(summary)

FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

**OUTPUT**
 
Some map summaries

> Alice hears the White Rabbit muttering to itself, concerned that the Duchess will execute it for losing the fan and pair of white kid-gloves. Alice offers to help the Rabbit search for them, but they are nowhere to be found because everything has changed since Alice's dip in the pool.

> Alice meets a Rabbit who accuses her of being his housemaid Mary Ann and orders her to fetch his gloves and fan. She finds a neat little house with the Rabbit's name on a brass plate and goes in without knocking. She is afraid of meeting the real Mary Ann before she can find the fan and gloves.

> Alice finds her way into a room with a table in the window, containing a fan and some gloves. She notices a bottle and drinks from it, hoping it will make her grow large again. When she drinks half of the bottle she finds her head pressing against the ceiling, so she hastily puts it down.

> A character wishes she wouldn't grow anymore, but sadly she continues to grow rapidly. As a result, she kneels on the floor, puts her arm out the window and her foot up the chimney, and is uncertain of her fate.

 
Final summary

> In Lewis Carroll's Alice's Adventures in Wonderland, Alice follows a White Rabbit into a strange world and has to navigate unexpected events and peculiar characters. She eventually meets a Caterpillar who helps her regain control of her changing size. Project Gutenberg is a non-profit organization committed to making electronic books free to the public. Donations up to $5,000 are available, and the full license stipulates amounts and terms of use.

## Summarize stored documents

<div class="alert alert-block alert-warning"> TODO  make use of the vector db</div>

# 11. Agents

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a ‚Äúagent‚Äù which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

It splits the documentation into the following sections:
> - Tools: How language models interact with other resources.
> - Agents: The language model that drives decision making.
> - Toolkits: Sets of tools that when used together can accomplish a specific task.
> - Agent Executor: The logic for running agents with tools.


**Resources**
> - Agents: https://docs.langchain.com/docs/components/agents/

<div class="alert alert-block alert-warning"> TODO </div>

## Tool
Tools are interfaces an agent can call to interact with other services

**Resources**
> - Tools: https://python.langchain.com/docs/modules/agents/tools/

**Instructions**

For the example below, make sure that:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

In [None]:
from langchain.tools import Tool
from langchain.utilities import GoogleSearchAPIWrapper

search = GoogleSearchAPIWrapper()

tool = Tool(
    name="Google Search",
    description="Search Google for recent results.",
    func=search.run,
)

tool.run("Who is the French Prime Minister name since May 2022?")

## Agent leveraging tools

Google Search and LLM-math are predefined tools:
- LLM-Math is a langage model trained to do math logic.
- Google)search tool allow to place queries on Google Search

In [None]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
llm = OpenAI(temperature=0)

# load some tools
tools = load_tools(["google-search", "llm-math"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [None]:
agent.run("How many Teslas have been sold in 2022. Multiple by 2")

In [None]:
agent.run("Multiply by 2 the population of the capital of Frannce")

In [None]:
agent.run("""Who is the current prime minister of France. 
Is he or she younger than the President?""") 

In [None]:
if False:
    # too complex
    # either fails because it tries to add dates and nulber
    # or give weird results like
    # '√âlisabeth Borne will be 70 in the year 2215.'
    agent.run("""Who is the current prime minister of France. 
    When will he or she be 70?""") 

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN USE CASES</div>



---
# [UC] 1. Summarization

---
## Summaries Of Short Text
Just write a summarization prompt

In [None]:
# text to be summarized
text_sample = """
The first thing she heard was a general chorus of "There goes Bill!"
then the Rabbit's voice alone‚Äî"Catch him, you by the hedge!" Then
silence and then another confusion of voices‚Äî"Hold up his head‚ÄîBrandy
now‚ÄîDon't choke him‚ÄîWhat happened to you?"

Last came a little feeble, squeaking voice, "Well, I hardly know‚ÄîNo
more, thank ye. I'm better now‚Äîall I know is, something comes at me
like a Jack-in-the-box and up I goes like a sky-rocket!"

After a minute or two of silence, they began moving about again, and
Alice heard the Rabbit say, "A barrowful will do, to begin with."

"A barrowful of what?" thought Alice. But she had not long to doubt,
for the next moment a shower of little pebbles came rattling in at the
window and some of them hit her in the face. Alice noticed, with some
surprise, that the pebbles were all turning into little cakes as they
lay on the floor and a bright idea came into her head. "If I eat one of
these cakes," she thought, "it's sure to make some< change in my size."

So she swallowed one of the cakes and was delighted to find that she
began shrinking directly. As soon as she was small enough to get through
the door, she ran out of the house and found quite a crowd of little
animals and birds waiting outside. They all made a rush at Alice the
moment she appeared, but she ran off as hard as she could and soon found
herself safe in a thick wood.
"""

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# Summarization prompt template
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# Create a LangChain prompt template that we can insert values to later
prompt_template = PromptTemplate(
    input_variables=["text"],
    template=template
)

prompt = prompt_template.format(text=text_sample)

#print("\nPrompt")
#print(prompt)

# run the model
output = llm(prompt)

print("\nOutput")
print (output)


---
## Summaries of Short text leveraging Summarization Chain

In [None]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

---
## Summaries of Short text leveraging Summarization Chain and custom prompt

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# setup. a custom prompt
# a defaukt one is provide: write a concise summary
prompt_template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    prompt=prompt, 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

---
## Summaries Of longer Text
If the text is longer than the limit in tokens, the text must be splitted in chunks. 
Langchain components will take care of splitting and chaining the summarization tasks.

The Summarization Chain breaks the text into smaller chunks and summarizing each chunk, creating a final summary based on the individual summaries.

Check notebook tests-large-dpcuments in the same folder as this notebook.

In this example, the chain first splits the essay into chunks of 2000 characters. It then generates summaries for each chunk and creates a final concise summary based on these individual summaries.

<br/>
**Resources**

> - Qummarization quickstart: https://python.langchain.com/docs/modules/chains/popular/summarize

---
# [UC] 2.  Question & Answering Using Documents As Context
Question answering in this context refers to question answering over your document data. F

It is basically the example in Indexes.

In order to use LLMs for question and answer we must:
- Pass the LLM relevant context it needs to answer a question
- Pass it our question that we want answered

<br/>

++Resources**
> - [QA] LangChain Question & Answer Docs

---
# [UC] 2.  Question & Answering Using Documents As Context
It is basically the example in Indexes.

In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

# Get your splitter ready
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
print(f"Generated {len(texts)} parts")

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(texts, embeddings)

# Init a retriever for this db
#retriever = db.as_retriever()
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":4})

# ra query
query = "who is the White Rabbit?"

# retrieve and count indexed documents relevant for the query
docs = retriever.get_relevant_documents(query)
print(f"\nFound {len(docs)} relevant documen(s)")

#samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
#print(samples)

# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True)

response = qa({"query": query})
print(response['result'])

In [None]:
# using instructions to get a more interesting reponse
instructions = ". Give a funny answer 30 words long."
response = qa({"query": query + instructions})
print(response['result'])

---
### Questions and Answer using a loaded vector store

In [None]:
# saving the database for future use

docstore_file_path = "alice_docstore_2"

db.save_local(docstore_file_path)


In [None]:
# loading the database 

docstore_file_path = "alice_docstore_2"

loaded_vector_store = Annoy.load_local(
   docstore_file_path, embeddings=OpenAIEmbeddings()
)

# expose this index in a retriever interface
retriever = loaded_vector_store.as_retriever(search_type="similarity", search_kwargs={"k":4})

# ra query
query = "who is the White Rabbit?"

# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True)

instructions = ". Give a pedantic answer 50 words long."

response = qa({"query": query + instructions})
print(response['result'])

---
# [UC] 3. Extraction

Extraction is the process of parsing data from a piece of text. This is commonly used with output parsing in order to structure our data.



<br>

**Resources**
> - https://python.langchain.com/en/latest/use_cases/extraction.html

In [27]:
# To help construct our Chat Messages
from langchain.schema import HumanMessage
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate

# We will be using a chat model, defaults to gpt-3.5-turbo
from langchain.chat_models import ChatOpenAI

# To parse outputs and get structured data back
from langchain.output_parsers import StructuredOutputParser, ResponseSchema

chat_model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')


## Vanilla Extraction

Let's start off withan easy example. Here I simply supply a prompt with instructions with the type of output I want.

In [29]:
instructions = """
You will be given a sentence with fruit names, extract those fruit names and assign an emoji to them
Return the fruit name and emojis in a python dictionary
"""

fruit_names = """
Apple, Pear, this is an kiwi
"""

# Make your prompt which combines the instructions w/ the fruit names
prompt = (instructions + fruit_names)

# Call the LLM
output = chat_model([HumanMessage(content=prompt)])

print (output.content)
print (type(output.content))

#Let's turn this into a proper python dictionary

output_dict = eval(output.content)

print (output_dict)
print (type(output_dict))

{
  "Apple": "üçé",
  "Pear": "üçê",
  "kiwi": "ü•ù"
}
<class 'str'>
{'Apple': 'üçé', 'Pear': 'üçê', 'kiwi': 'ü•ù'}
<class 'dict'>


In [None]:
---
## Using LangChain's Response Schema

LangChain's response schema will does two things for us:
- Autogenerate the a prompt with bonafide format instructions. This is great because I don't need to worry about the prompt engineering side, I'll leave that up to LangChain!
- Read the output from the LLM and turn it into a proper python object for me

Here I define the schema I want. I'm going to pull out the song and artist that a user wants to play from a pseudo chat message.



In [36]:
# The schema I want out
response_schemas = [
    ResponseSchema(name="artist", description="The name of the musical artist"),
    ResponseSchema(name="song", description="The name of the song that the artist plays")
]

# The parser that will look for the LLM output in my schema and return it back to me
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
# The format instructions that LangChain makes. Let's look at them
format_instructions = output_parser.get_format_instructions()
print("\n**FORMAT**")
print(format_instructions)

#The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
# rxample
#{
#	"artist": string  // The name of the musical artist
#	"song": string  // The name of the song that the artist plays
#}

# The prompt template that brings it all together
# Note: This is a different prompt template than before because we are using a Chat Model

prompt_template = """
Given a command from the user, extract the artist and song names

{format_instructions}

{user_prompt}
"""

prompt = ChatPromptTemplate(
    messages=[
        HumanMessagePromptTemplate.from_template(prompt_template)  
    ],
    input_variables=["user_prompt"],
    partial_variables={"format_instructions": format_instructions}
)

chat_query = prompt.format_prompt(
    user_prompt="I really like So Young by Portugal. The Man"
)
print("*ƒ±*QUERY**")
print (chat_query.messages[0].content)

# Given a command from the user, extract the artist and song names 
# The output should be a markdown code snippet formatted in the following schema, 
#including the leading and trailing "\`\`\`json" and "\`\`\`":
## ```json
# {
# 	"artist": string  // The name of the musical artist
# 	"song": string  // The name of the song that the artist plays
#}
#```

chat_output = chat_model(chat_query.to_messages())
response = output_parser.parse(chat_output.content)

print("\n**RESPONSE*")
print (response)
print (type(response))

# example
#{'artist': 'Portugal. The Man', 'song': 'So Young'}

#Warning: The parser looks for an output from the LLM in a specific format. 
##our model may not output the same format every time. 
#Make sure to handle errors with this one. GPT4 and future iterations will be more reliable.

# For more advanced parsing check out Kor


**FORMAT**
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```
*ƒ±*QUERY**

Given a command from the user, extract the artist and song names

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"artist": string  // The name of the musical artist
	"song": string  // The name of the song that the artist plays
}
```

I really like So Young by Portugal. The Man


**RESPONSE*
{'artist': 'Portugal. The Man', 'song': 'So Young'}
<class 'dict'>


In [38]:
chat_query = prompt.format_prompt(
    user_prompt="I would like to listen Sound of Muzak by Porcupine Tree"
)
chat_output = chat_model(chat_query.to_messages())
response = output_parser.parse(chat_output.content)

print (response)


{'artist': 'Porcupine Tree', 'song': 'Sound of Muzak'}


---


<div class="alert alert-block alert-warning"> TODO Kor
</div>

Kor

This is a half-baked prototype that ‚Äúhelps‚Äù you extract structured data from text using LLMs

---
# [UC] 4. Evaluation

Evaluation is the process of doing quality checks on the output of your applications. 

Normal, deterministic, code has tests we can run, but judging the output of LLMs is more difficult 
because of the unpredictableness and variability of natural language. 

LangChain provides tools that aid us in this journey.

<br/>

**Resources**
> - https://python.langchain.com/en/latest/use_cases/evaluation.html
> - https://docs.langchain.com/docs/use-cases/evaluation

In [12]:
# Embeddings, store, and retrieval
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

# Model and doc loader
from langchain import OpenAI
from langchain.document_loaders import BSHTMLLoader

# Eval!
from langchain.evaluation.qa import QAEvalChain

llm = OpenAI(temperature=0)

# Our long essay from before
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"

loader = BSHTMLLoader(document_path)
doc = loader.load()

print (f"You have {len(doc)} document")
print (f"You have {len(doc[0].page_content)} characters in that document")


You have 1 document
You have 72216 characters in that document


In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=3000, chunk_overlap=400)
docs = text_splitter.split_documents(doc)

# Get the total number of characters so we can see the average later
num_total_characters = sum([len(x.page_content) for x in docs])

print (f"Now you have {len(docs)} documents that have an average of {num_total_characters / len(docs):,.0f} characters")
    

Now you have 35 documents that have an average of 2,179 characters


In [17]:
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(docs, embeddings)


Make your retrieval chain. Notice how I have an input_key parameter now. This tells the chain which key from a dictionary I supply has my prompt/query in it. I specify question to match the question in the dict below

In [19]:
chain = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", 
                                    retriever=db.as_retriever(), 
                                    input_key="question")

Now pass a list of questions and ground truth answers to the LLM that I know are correct 
(I validated them as a human).

In [30]:
question_answers = [
    {'question' : "Who is the protagonist?", 
     'answer' : 'Alice'},
    {'question' : "Who is Gutenberg in the book?",
     'answer' : 'The name of the project'}, 
    {'question' : "How many characters are they?",
     'answer' : '50'}
]

chain.apply runs questions one by one separately.

It gets back another key in the dictionary result which will be the output from the LLM.

Note:  3rd question is ambigious and tough to answer in one pass so the LLM would get it incorrect

In [31]:
from pprint import pprint
predictions = chain.apply(question_answers)
pprint(predictions)

[{'answer': 'Alice',
  'question': 'Who is the protagonist?',
  'result': ' Alice is the protagonist.'},
 {'answer': 'The name of the project',
  'question': 'Who is Gutenberg in the book?',
  'result': " Gutenberg is not a character in the book Alice's Adventures in "
            'Wonderland. It is the name of the organization that produced the '
            'ebook version of the book.'},
 {'answer': '50',
  'question': 'How many characters are they?',
  'result': ' There are six characters mentioned in the context: Alice, the '
            'Caterpillar, the Queen, the Cat, the King, and the White Rabbit.'}]


**OUTPUT%**

[{'answer': 'Alice',
  'question': 'Who is the protagonist?',
  'result': ' Alice is the protagonist.'},
 {'answer': 'The name of the project',
  'question': 'Who is Gutenberg in the book?',
  'result': " Gutenberg is not a character in the book Alice's Adventures in "
            'Wonderland. It is the name of the organization that produced the '
            'ebook version of the book.'},
 {'answer': '50',
  'question': 'How many characters are they?',
  'result': ' There are six characters mentioned in the context: Alice, the '
            'Caterpillar, the Queen, the Cat, the King, and the White Rabbit.'}]

In [32]:
# Start your eval chain
eval_chain = QAEvalChain.from_llm(llm)

# Have it grade itself. The code below helps the eval_chain know where the different parts are
graded_outputs = eval_chain.evaluate(question_answers,
                                     predictions,
                                     question_key="question",
                                     prediction_key="result",
                                     answer_key='answer')
graded_outputs

[{'text': ' CORRECT'}, {'text': ' CORRECT'}, {'text': ' INCORRECT'}]

**OUTPUT**

[{'text': ' CORRECT'}, {'text': ' CORRECT'}, {'text': ' INCORRECT'}]

---
# [UC] 5. Querying Tabular Data

The most common type of data in the world sits in tabular form (ok, ok, besides unstructured data). It is super powerful to be able to query this data with LangChain and pass it through to an LLM

Steps:

- Find which table to use
- Find which column to use
- Construct the correct sql query
- Execute that query
- Get the result
- Return a natural language reponse back

For futher reading check out "Agents + Tabular Data" (Pandas, SQL, CSV)

**Resources**
> - https://python.langchain.com/en/latest/use_cases/tabular.html
> - https://python.langchain.com/docs/modules/chains/popular/sqlite.html
 
<div class="alert alert-block alert-warning"> TODO move csv and db to subfolders
    
SQL Agent, 
Pandas Agent, 
CSV Agent

</div>


Sample datasets

https://scikit-learn.org/stable/datasets/toy_dataset.html

https://github.com/mwaskom/seaborn-data

https://www.kaggle.com/datasets/


## The movies dataset

IMDB Movies dataset from Kaggle
> - https://www.kaggle.com/datasets/harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows

**Resources**
> - https://www.kaggle.com/docs/datasets

## Download the dataset

Requires kaggle is installed and api keys are setup. Check the first part of the notebbok if need be.

**Resources**
> - https://lindevs.com/set-up-kaggle-api

In [83]:
!kaggle datasets download -d harshitshankhdhar/imdb-dataset-of-top-1000-movies-and-tv-shows -p data

Downloading imdb-dataset-of-top-1000-movies-and-tv-shows.zip to data
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 175k/175k [00:00<00:00, 995kB/s]
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 175k/175k [00:00<00:00, 861kB/s]


In [92]:
!unzip data/imdb-dataset-of-top-1000-movies-and-tv-shows.zip -d data

Archive:  data/imdb-dataset-of-top-1000-movies-and-tv-shows.zip
  inflating: data/imdb_top_1000.csv  


## SQL QUerying 

check the dedicated notebook tests-sql

---
# [UC] 6. Code Understanding

A big part of this is having a LLM that can understand code and help you with a particular task.

**Resources**
> - https://python.langchain.com/en/latest/use_cases/code.html


<div class="alert alert-block alert-warning"> TODO
</div>


In [36]:
# Helper to read local files
import os

# Vector Support
from langchain.vectorstores import Annoy
from langchain.embeddings.openai import OpenAIEmbeddings

# Model and chain
from langchain.chat_models import ChatOpenAI

# Text splitters
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader

llm = ChatOpenAI(model_name='gpt-3.5-turbo')

# create the vector store
embeddings = OpenAIEmbeddings(disallowed_special=())


In [None]:
!curl https://github.com/seatgeek/thefuzz/archive/refs/heads/master.zip -o data/thefuzz.zip

In [None]:
!unzip data/thefuzz.zip -d data


<div class="alert alert-block alert-warning"> TODO download file
</div>

In [None]:
!unzip data/thefuzz-master.zip -d data

## Load all files into a document store

In [47]:
root_dir = 'data/thefuzz-master'
docs = []

# Go through each folder
for dirpath, dirnames, filenames in os.walk(root_dir):
    
    # Go through each file
    for file in filenames:
        try: 
            # Load up the file as a doc and split
            loader = TextLoader(os.path.join(dirpath, file), encoding='utf-8')
            docs.extend(loader.load_and_split())
        except Exception as e: 
            pass

In [48]:
print (f"You have {len(docs)} documents\n")
print ("------ Start Document ------")
print (docs[0].page_content[:300])

You have 175 documents

------ Start Document ------
Changelog

0.17.0 (2018-08-20)
-------------------

- Make benchmarks script Py3 compatible. [Stefan Behnel]

- Add Go lang port. [iddober]

- Add reference to C# port. [ericcoleman]

- Chore: remove license header from files. [Jose Diaz-Gonzalez]

  The files should all inherit the projec


Embed and store them in a docstore. This will make an API call to OpenAI

In [55]:
from langchain.chains import RetrievalQA

docsearch = Annoy.from_documents(docs, embeddings)

# Get our retriever ready
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())


## Query the doc

In [56]:
query = "What function do I use if I want to find the most similar item in a list of items?"
output = qa.run(query)
print (output)

You can use the `process.extractOne()` function from the `thefuzz` library to find the most similar item in a list of items. It takes a query string and a list of choices, and returns the best match along with its similarity score. Here's an example:

```python
from thefuzz import process

choices = ["apple", "banana", "cherry", "durian"]
query = "berry"

best_match = process.extractOne(query, choices)
print(best_match)
```

Output:
```
('cherry', 62)
```

In this example, the best match for the query "berry" in the list of choices is "cherry" with a similarity score of 62.


In [57]:
query = "Can you write the code to use the process.extractOne() function? Only respond with code. No other text or explanation"
output = qa.run(query)
print (output)

query = "new york mets at chicago cubs"
choices = [
    None,
    "new york mets vs chicago cubs",
    "new york yankees vs boston red sox",
    None,
    None
]

best = process.extractOne(query, choices)
print(best[0])


---
# [UC] 7. Interacting with APIs

Very simple example to demonstrate how ot works.

<br/>

**Resources**
>- https://python.langchain.com/en/latest/use_cases/apis.html


In [134]:
from langchain.chains import APIChain
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)

LangChain's APIChain has the ability to read API documentation and understand which endpoint it needs to call.

In [135]:
# API documentation to be used

api_docs = """

BASE URL: https://restcountries.com/

API Documentation:

The API endpoint /v3.1/name/{name} Used to find informatin about a country. All URL parameters are listed below:
    - name: Name of country - Ex: italy, france
    
The API endpoint /v3.1/currency/{currency} Uesd to find information about a region. All URL parameters are listed below:
    - currency: 3 letter currency. Example: USD, COP
    
Woo! This is my documentation
"""

chain_new = APIChain.from_llm_and_api_docs(llm, api_docs, verbose=True)

In [136]:
# try to use the API meant for the country endpoint

chain_new.run('Can you tell me information about France?')




[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"R√©publique fran√ßaise","common":"France"}}},"tld":[".fr"],"cca2":"FR","ccn3":"250","cca3":"FRA","cioc":"FRA","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"EUR":{"name":"Euro","symbol":"‚Ç¨"}},"idd":{"root":"+3","suffixes":["3"]},"capital":["Paris"],"altSpellings":["FR","French Republic","R√©publique fran√ßaise"],"region":"Europe","subregion":"Western Europe","languages":{"fra":"French"},"translations":{"ara":{"official":"ÿßŸÑÿ¨ŸÖŸáŸàÿ±Ÿäÿ© ÿßŸÑŸÅÿ±ŸÜÿ≥Ÿäÿ©","common":"ŸÅÿ±ŸÜÿ≥ÿß"},"bre":{"official":"Republik Fra√±s","common":"Fra√±s"},"ces":{"official":"Francouzsk√° republika","common":"Francie"},"cym":{"official":"French Republic","common":"France"},"deu":{"official":"Franz√∂sische Republik","common":"Frankreich"},"est":{"official":"Prantsuse Vabariik","c

' France is an officially-assigned, independent country located in Western Europe. Its capital is Paris and its official language is French. Its currency is the Euro (‚Ç¨). It has a population of 67,391,582 and its borders are shared with Andorra, Belgium, Germany, Italy, Luxembourg, Monaco, Spain, and Switzerland.'

In [15]:
chain_new.run('Can you tell me about the currency COP?')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/currency/COP[0m
[33;1m[1;3m[{"name":{"common":"Colombia","official":"Republic of Colombia","nativeName":{"spa":{"official":"Rep√∫blica de Colombia","common":"Colombia"}}},"tld":[".co"],"cca2":"CO","ccn3":"170","cca3":"COL","cioc":"COL","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"COP":{"name":"Colombian peso","symbol":"$"}},"idd":{"root":"+5","suffixes":["7"]},"capital":["Bogot√°"],"altSpellings":["CO","Republic of Colombia","Rep√∫blica de Colombia"],"region":"Americas","subregion":"South America","languages":{"spa":"Spanish"},"translations":{"ara":{"official":"ÿ¨ŸÖŸáŸàÿ±Ÿäÿ© ŸÉŸàŸÑŸàŸÖÿ®Ÿäÿß","common":"ŸÉŸàŸÑŸàŸÖÿ®Ÿäÿß"},"bre":{"official":"Republik Kolombia","common":"Kolombia"},"ces":{"official":"Kolumbijsk√° republika","common":"Kolumbie"},"cym":{"official":"Gweriniaeth Colombia","common":"Colombia"},"deu":{"official":"Republik Kolumbien","common":"Kolumbien"},"est"

' The currency of Colombia is the Colombian peso (COP), symbolized by the "$" sign.'

In [16]:
# test a country not listed in the documentation

chain_new.run('Can you tell me information about Norway?')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/norway[0m
[33;1m[1;3m[{"name":{"common":"Norway","official":"Kingdom of Norway","nativeName":{"nno":{"official":"Kongeriket Noreg","common":"Noreg"},"nob":{"official":"Kongeriket Norge","common":"Norge"},"smi":{"official":"Norgga gonagasriika","common":"Norgga"}}},"tld":[".no"],"cca2":"NO","ccn3":"578","cca3":"NOR","cioc":"NOR","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"NOK":{"name":"Norwegian krone","symbol":"kr"}},"idd":{"root":"+4","suffixes":["7"]},"capital":["Oslo"],"altSpellings":["NO","Norge","Noreg","Kingdom of Norway","Kongeriket Norge","Kongeriket Noreg"],"region":"Europe","subregion":"Northern Europe","languages":{"nno":"Norwegian Nynorsk","nob":"Norwegian Bokm√•l","smi":"Sami"},"translations":{"ara":{"official":"ŸÖŸÖŸÑŸÉÿ© ÿßŸÑŸÜÿ±ŸàŸäÿ¨","common":"ÿßŸÑŸÜÿ±ŸàŸäÿ¨"},"bre":{"official":"Rouantelezh Norvegia","common":"Norvegia"},"ces":{"official":"Norsk

' Norway is an officially-assigned, independent country located in Northern Europe. Its capital is Oslo and its official currency is the Norwegian krone (NOK). It has a population of 5,379,475 and its official languages are Norwegian Nynorsk, Norwegian Bokm√•l, and Sami.'

In [19]:
# test tretrieval of some specifc attribute
# seems that it just translates afterwards

chain_new.run('Can you tell me the name of France in swedish?')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france?fields=name;translations[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"R√©publique fran√ßaise","common":"France"}}}}][0m

[1m> Finished chain.[0m


' The name of France in Swedish is "Frankrike".'

In [20]:
# test tretrieval of some specifc attribute
# cnnot find gini throw oot is tthere (but is a dict)

chain_new.run('Can you tell me the populattion and Gini coefficient of France')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france?fields=population;gini[0m
[33;1m[1;3m[{"population":67391582}][0m

[1m> Finished chain.[0m


' The population of France is 67,391,582 and the Gini coefficient is not available.'

In [21]:
# test tretrieval of some specifc attribute
# "population":67391582,"gini":{"2018":32.4}

chain_new.run('Can you tell me the populattion and gini of France')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france?fields=population;gini[0m
[33;1m[1;3m[{"population":67391582}][0m

[1m> Finished chain.[0m


' The population of France is 67,391,582 and the Gini index is not available.'

In [23]:
# test tretrieval of some specifc attribute

chain_new.run('Can you tell me the timezone of France')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"R√©publique fran√ßaise","common":"France"}}},"tld":[".fr"],"cca2":"FR","ccn3":"250","cca3":"FRA","cioc":"FRA","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"EUR":{"name":"Euro","symbol":"‚Ç¨"}},"idd":{"root":"+3","suffixes":["3"]},"capital":["Paris"],"altSpellings":["FR","French Republic","R√©publique fran√ßaise"],"region":"Europe","subregion":"Western Europe","languages":{"fra":"French"},"translations":{"ara":{"official":"ÿßŸÑÿ¨ŸÖŸáŸàÿ±Ÿäÿ© ÿßŸÑŸÅÿ±ŸÜÿ≥Ÿäÿ©","common":"ŸÅÿ±ŸÜÿ≥ÿß"},"bre":{"official":"Republik Fra√±s","common":"Fra√±s"},"ces":{"official":"Francouzsk√° republika","common":"Francie"},"cym":{"official":"French Republic","common":"France"},"deu":{"official":"Franz√∂sische Republik","common":"Frankreich"},"est":{"official":"Prantsuse Vabariik","c

' France has a timezone of UTC+01:00, UTC+02:00, UTC+03:00, UTC+04:00, UTC+05:00, UTC+10:00, UTC+11:00, and UTC+12:00.'

In [24]:
# test tretrieval of some specifc attribute
# gini is fine when alone

chain_new.run('Can you tell me the  gini index of France')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france?fields=giniIndex[0m
[33;1m[1;3m[{}][0m

[1m> Finished chain.[0m


' The gini index of France is 0.294.'

In [26]:
# test tretrieval of some specifc attribute
# fine but got extra currency

chain_new.run('Can you tell me the population and status of France')



[1m> Entering new  chain...[0m
[32;1m[1;3m https://restcountries.com/v3.1/name/france[0m
[33;1m[1;3m[{"name":{"common":"France","official":"French Republic","nativeName":{"fra":{"official":"R√©publique fran√ßaise","common":"France"}}},"tld":[".fr"],"cca2":"FR","ccn3":"250","cca3":"FRA","cioc":"FRA","independent":true,"status":"officially-assigned","unMember":true,"currencies":{"EUR":{"name":"Euro","symbol":"‚Ç¨"}},"idd":{"root":"+3","suffixes":["3"]},"capital":["Paris"],"altSpellings":["FR","French Republic","R√©publique fran√ßaise"],"region":"Europe","subregion":"Western Europe","languages":{"fra":"French"},"translations":{"ara":{"official":"ÿßŸÑÿ¨ŸÖŸáŸàÿ±Ÿäÿ© ÿßŸÑŸÅÿ±ŸÜÿ≥Ÿäÿ©","common":"ŸÅÿ±ŸÜÿ≥ÿß"},"bre":{"official":"Republik Fra√±s","common":"Fra√±s"},"ces":{"official":"Francouzsk√° republika","common":"Francie"},"cym":{"official":"French Republic","common":"France"},"deu":{"official":"Franz√∂sische Republik","common":"Frankreich"},"est":{"official":"Prantsuse Vabariik","c

' France is an officially-assigned, independent country in Western Europe with a population of 67,391,582 and its official currency is the Euro (‚Ç¨).'


In both cases the APIChain read the instructions and understood which API call it needed to make.

Once the response returned, it was parsed and then my question was answered.

Where does it gets parameters to the url. Information is not specific about that. 


<div class="alert alert-block alert-warning"> TODO test arbitrary API
</div>


---
# [UC] 8. Chatbot

For this use case I'm going to show you how to customize the context that is given to a chatbot.

You could pass instructions on how the bot should respond, but also any additional relevant information it needs.

<br/>

**Resources**
> - https://python.langchain.com/en/latest/use_cases/chatbots.html


<div class="alert alert-block alert-warning"> TODO
</div>


In [137]:
from langchain.llms import OpenAI
from langchain import LLMChain
from langchain.prompts.prompt import PromptTemplate

# Chat specific components
from langchain.memory import ConversationBufferMemory

In [138]:
template = """
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

{chat_history}
Human: {human_input}
Chatbot:"""

prompt = PromptTemplate(
    input_variables=["chat_history", "human_input"], 
    template=template
)
memory = ConversationBufferMemory(memory_key="chat_history")

In [139]:
llm_chain = LLMChain(
    llm=OpenAI(), 
    prompt=prompt, 
    verbose=True, 
    memory=memory
)

In [140]:
llm_chain.predict(human_input="Is an pear a fruit or vegetable?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it


Human: Is an pear a fruit or vegetable?
Chatbot:[0m

[1m> Finished chain.[0m


" Haha, it's both! You can't have your fruit and veg it too!"

In [23]:
llm_chain.predict(human_input="What was one of the fruits I first asked you about?")




[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a chatbot that is unhelpful.
Your goal is to not help the user but only make jokes.
Take what the user is saying and make a joke out of it

Human: Is an pear a fruit or vegetable?
AI:  It's neither! Pears are actually a kind of magical creature that can only be seen by the most enlightened of us.
Human: What was one of the fruits I first asked you about?
Chatbot:[0m

[1m> Finished chain.[0m


" I think you meant to ask if pears were a fruit or vegetable, but don't worry about it - they're both super tasty!"

Notice how my 1st interaction was put into the prompt of my 2nd interaction. This is the memory piece at work.

There are many ways to structure a conversation, check out the different ways on the docs


<div class="alert alert-block alert-warning"> TODO Chat GPT Clone
</div>
https://python.langchain.com/docs/modules/agents/how_to/chatgpt_clone.html


<div class="alert alert-block alert-warning"> TODO Conversational Agent
</div>
https://python.langchain.com/docs/modules/agents/agent_types/chat_conversation_agent.html


<div class="alert alert-block alert-warning"> TODO different ways of using memory
</div>
https://python.langchain.com/docs/modules/memory/

---
# [UC] 9. Agents

Agents are the decision makers that can look a data, reason about what the next action should be, and execute that action for you via tools

<div class="alert alert-block alert-warning"> TODO
</div>

**Resources**
> - https://python.langchain.com/docs/modules/agents.html


In [143]:
# Helpers
import os
import json

from langchain.llms import OpenAI

# Agent imports
from langchain.agents import load_tools
from langchain.agents import initialize_agent

# Tool imports
from langchain.agents import Tool
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.utilities import TextRequestsWrapper

In [144]:
llm = OpenAI(temperature=0)

In [146]:
# ensure env vars have been retrieved at the beginning of the notebook

GOOGLE_CSE_ID = os.getenv('GOOGLE_CSE_ID', 'YourAPIKeyIfNotSet')
GOOGLE_API_KEY = os.getenv('GOOGLE_API_KEY', 'YourAPIKeyIfNotSet')

search = GoogleSearchAPIWrapper(google_api_key=GOOGLE_API_KEY, google_cse_id=GOOGLE_CSE_ID)

requests = TextRequestsWrapper()

In [147]:
# put the tool in a toolkit

toolkit = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to search google to answer questions about current events"
    ),
    Tool(
        name = "Requests",
        func=requests.get,
        description="Useful for when you to make a request to a URL"
    ),
]

# create an agent
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)


In [148]:
#ask a question
response = agent({"input":"What is the capital of canada?"})
response['output']



[1m> Entering new  chain...[0m
[32;1m[1;3m I need to find out what the capital of Canada is.
Action: Search
Action Input: "capital of Canada"[0m
Observation: [36;1m[1;3mCanada's capital is Ottawa and its three largest metropolitan areas are Toronto, Montreal, and Vancouver. Canada. A vertical triband design (red, white, red)¬†... Browse available job openings at Capital One - CA. ... Together, we will build one of Canada's leading information-based technology companies ‚Äì join us,¬†... Ottawa is the capital city of Canada. It is located in the southern portion of the province of Ontario, at the confluence of the Ottawa River and the Rideau¬†... Jun 29, 2023 ... Ottawa, city, capital of Canada, located in southeastern Ontario. In the eastern extreme of the province, Ottawa is situated on the south¬†... Shopify Capital offers small business funding in the form of merchant cash advances to eligible merchants in Canada. If you live in Canada and need¬†... The national capital is 

'Ottawa is the capital of Canada.'

---

https://python.langchain.com/docs/modules/chains/additional/question_answering

In [None]:
<div class="alert alert-block alert-warning"> TODO explain refine and mapreduce </div>

<div class="alert alert-block alert-warning"> TODO </div>

prompt
parse and map
seq chain
```python
output_parser = RegexParser(
    regex=r"(.*?)\nScore: (.*)",
    output_keys=["answer", "score"],
)
PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
    output_parser=output_parser,
```

<div class="alert alert-block alert-warning"> TODO 
how to use qa in chain and do something like make a list and gie details.
Another option parsed output and browse the list Ouotput parser as list ?
alternative conversation.
</div>

# [UC] ...
AAnalyzing stuctured data

https://python.langchain.com/docs/use_cases/tabular.html

https://python.langchain.com/docs/modules/agents/toolkits/csv.html

https://python.langchain.com/docs/modules/agents/toolkits/sql_database.html

https://python.langchain.com/docs/modules/agents/toolkits/pandas.html



<div class="alert alert-block alert-warning"> TODO </div>

# [UC] ...
API Chains

https://python.langchain.com/docs/modules/chains/popular/api.html


<div class="alert alert-block alert-warning"> TODO </div>

In [None]:
# [UC] ...
graph index creator


<div class="alert alert-block alert-warning"> TODO </div>