# Langchain Demo

## What is LangChain?

LangChain is an open source framework that allows AI developers to combine Large Language Models (LLMs) with external data. 

**Resources**

> LangChain resources
> - Landpage: https://readthedocs.org/projects/langchain/db2d
> - Comonents: https://docs.langchain.com/docs/category/components
> - git: https://github.com/hwchase17/langchain.git
> - API Reference: https://api.python.langchain.com/en/latest/

> LangChain applications
> - [LangChain Awesome](https://github.com/kyrolabs/awesome-langchain)

> This notebook is largely based on Greg Kamradt's videos and cookbooks
> - [Langchain tuorial suite](https://www.youtube.com/playlist?list=PLqZXAkvF1bPNQER9mLmDbntNfSpzdDIU5)
> - [LangChain cookbooks](https://github.com/gkamradt/langchain-tutorials)

> Additonal resources and tutorial
> - [Cookbook Comprehensive Guide](https://nathankjer.com/introduction-to-langchain/)
> - [A Gentle Intro to Chaining LLMs, Agents, and utils via LangChain](https://towardsdatascience.com/a-gentle-intro-to-chaining-llms-agents-and-utils-via-langchain-16cd385fca81)

## This notebook

This notebook collects Python examples. The chapters are based oo the LangChain compoents documented here https://docs.langchain.com/docs/category/components.

Some changes though:
- use Annoy instead of FAISS as a vector database
- use Google Search API instead of SerpAPI
- change in examples and additional examples 
- change in API keys setup



This notebook has been tested in June 2023 on AWS SageMaker using DataScience 3.0 image.

Test environment:
> - AWS SageMaker Studio's notebook 
>> - Kernel image Data Science 3.0
>> - t3.medium 2CPU - 4GB
>> - Python 3.9.15
>> - Linux default 4.14.304-226.531.amzn2.x86_64
> - installed packages:
>> - langchain 0.0.218
>> - openai 0.27.8
>> - google_api_python_client 2.90.0
>> - tikitoken 0.4.0



---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">NOTEBOOK SETUP</div>



**Instructions**

All setups are at the top of the notebook so that you can run all this section initialize the notebook.

Notebook chapters are not dependant on each other and may be run in isolation.

Before running the setup you may need to create the following resources
- request an OpenAI API keys. OpenAI APIs are not free.
- create a Custom Search Engine in Google Search. it is free.
- request an API key for the Google Search service. It is free.

Confer to the setup sections for instruction on how to create those resources.

---
## API keys and environment

Langchain will get the API keys from environment variables or function parameters.

**Instructions**

- Never show the keys in shared notebooks, whether it part of the code or a log. A simple way to avoid key leakage, is to use environement variables.  You set the environment variable in the terminal or some local configuration. If so you do not have to set the key here.

- If it is easier for you to set the key here by assigning the value, do not forget to empty the string right after you run this block. The environment will be kept in memory as long as the kernel runs.

- Be careful when printing the keys. Ensure that you remove the outputs. 

- Before sharing check that the keys are not printed out by some features of the libraries. Avoid to print libraries' objects. They often hold the API keys as a property and may disclose the key value.


I Store API keys and configuration information in AWS Secrets Manager. The code below retrieves the secret holding the keys. The secret is a JSON string consisting in key/value pairs. It will be used later to set various environnement variables.

When using Notebooks an SageMaker do not forget to give permissions to read this secret to SageMaker execution role.

In [None]:
%%bash --out secrets 
# using AWS's Secret Manager to store keys
# garb the keys and store it into a Pytthon variable
export RESPONSE=$(aws secretsmanager get-secret-value --secret-id 'salvia/labbench/tests' )
export SECRETS=$( echo $RESPONSE | jq '.SecretString | fromjson')

echo $SECRETS

---
## LangChain Setup

**Resources**
> - [LangChain GetStarted](https://python.langchain.com/docs/get_started/quickstart)

In [None]:
pip install langchain==0.0.218


---
## OpenAI Setup

**Resources**
> - [OpenAI tutorial on API keys](https://platform.openai.com/docs/quickstart)
> - [OpenAI package on Pypi](https://pypi.org/project/openai/)

In [None]:
import os

os.environ["OPENAI_API_KEY"] = eval(secrets)["OPENAI_API_KEY"]


In [None]:
pip install openai==0.27.8


---
## Google Search setup

**Resources**

> How to configure the Google search in LangChain 
> - https://python.langchain.com/docs/ecosystem/integrations/google_search

> Custom Search Engine configuration 
> - https://stackoverflow.com/questions/37083058/programmatically-searching-google-in-python-using-custom-search

> CSE API 
> - repo: https://github.com/google/google-api-python-client
> - more info: https://developers.google.com/api-client-library/python/apis/customsearch/v1
> - complete docs: https://api-python-client-doc.appspot.com/

> Get an API key
> - https://developers.google.com/custom-search/v1/introduction

> Package information
> - [Google API client package on Pypi](https://pypi.org/project/google-api-python-client/)

In [None]:
# Unlock the API and get a key 
os.environ["GOOGLE_API_KEY"] = eval(secrets)["GOOGLE_API_KEY"]
# Create or use an existing Custom Search Engine
# on the CSE page under Searcg Engone ID
os.environ["GOOGLE_CSE_ID"] = eval(secrets)["GOOGLE_CSE_ID"]


In [None]:
pip install google-api-python-client==2.90.0

## Setup Annoy as a vector database 

Some examples requires a Vector Database (document selector, document retrieval).

LangChain use ChromaDB by default. For whatever reason it failed to install. Used Annoy instead. An alterntive is FAIIS. You may also want to use online Vector database like Pinecone or Weaviate. 

Most of these packages include c++ code and requires GCC at the install time. It is not included in SageMaker DataScience 3 image. So the first step is installing GCC. 

NOTE: Annoy is read-only - once the index is built you cannot add any more emebddings.

<br/>

**Resources**
> - [Annoy package on Pypi](https://pypi.org/project/annoy/)

In [None]:
!apt-get update && apt-get install -y build-essential

In [None]:
pip install annoy==1.17.3

# Setup additionalm API tools
<div class="alert alert-block alert-warning"> 
    TODO <br>
</div>



In [None]:
pip install wikipedia

## Setup additional tools for embeddings

When working with embeddings additonal packages are required.

- tiktoken, as a encoder and tokenizer

**Resources**
> - [Tiktoken package on Pypi](https://pypi.org/project/tiktoken/)

 

In [None]:
pip install tiktoken==0.4.0

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN OVERVIEW</div>


---
# 1. Basic features

---
## Get prediction from a langage model

In [None]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))


---
## Manage prompts with templates

In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked by {interest}"
)

In [None]:
text = prompt.format(interest="food")
print(f"{text=}")
print(llm(text))

In [None]:
text = prompt.format(interest="siteseeing")
print(f"{text=}")
print(llm(text))

---
# 2. Chains

Chains are sequences of modular components (or other chains) combined in a particular way to accomplish a common use case.


Example:
- chaining LLM and tool
- summarization chain

---
## Built-in chains

In [None]:
from langchain.chains import PALChain
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(temperature=0.7)

palchain = PALChain.from_math_prompt(llm=llm, verbose=True)


text = """If my age is half of my dad's age 
and he is going to be 60 next year, 
what is my current age?"""
#palchain.run("If my age is half of my dad's age and he is going to be 60 next year, what is my current age?")
palchain.run(text)


<div class="alert alert-block alert-warning"> 
    TODO <br>
    - different result each run <br>
    - and should be 29.5
</div>


> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_next_year = 60
    my_age_fraction = 0.5
    my_age_now = dad_age_next_year * my_age_fraction
    result = my_age_now
    return result

> Finished chain.
'30.0'

> Entering new  chain...
def solution():
    """If my age is half of my dad's age and he is going to be 60 next year, what is my current age?"""
    dad_age_current = 59
    my_age_current = dad_age_current / 2
    result = my_age_current
    return result

> Finished chain.
'29.5'

---
## Multi-step workflow to feed prompt into the model

In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain

# loads the model.
llm = OpenAI(temperature=0.9)

# setup a prompt
prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

# chain feeds the prompt into the langage mmodel.
chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
chain.run("science")

In [None]:
print(chain.run("tv shows"))

---
## Using OpenAI Chat API (less expensive)
requires a chain to feed the prompt into the chat 

<div class="alert alert-block alert-warning"> TODO  move to components + desribe resource </div>

**Resources**
> - Other Chat APIs: https://api.python.langchain.com/en/latest/modules/chat_models.html

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

chatopenai = ChatOpenAI(model_name="gpt-3.5-turbo")

prompt = PromptTemplate (
    input_variables=["interest"],
    template="what are the 5 best countries in Europe ranked on {interest}"
)

llmchain_chat = LLMChain(llm=chatopenai, prompt=prompt)
print(llmchain_chat.run("food"))


---
## Leverage LLM Math

Evaluating chains that know how to do math.

**Resources**
> - Langchain module LLM_Math: ttps://python.langchain.com/docs/guides/evaluation/llm_math

In [None]:
from langchain.prompts import load_prompt
from langchain.chains import LLMMathChain

# loads the model.
llm = OpenAI(temperature=0.9)

prompt = load_prompt('lc://prompts/llm_math/prompt.json')

# deprecated
##chain = LLMMathChain(llm=llm, prompt=prompt)

chain = LLMChain(llm=llm, prompt=prompt)

print(chain.run("what is the largest prime number lower than 20"))


---
# 3. Agent

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.


---
## Test with LLM model only 


In [None]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
# low temperature to avoid randomness
llm = OpenAI(model_name="text-davinci-003", temperature=0)

text = "Who is the prime minister of France since may 2022"

# Actual API call - may tale a while.
print(llm(text))


**OUTPUT**

'The Prime Minister of France since May 2022 is Jean Castex.'

This answer is wrong. Since the model has been trained mid 2021, it is not up-to-date. Elisabeth Borne is Prime Minister since may 2022.

---
## Agent leveraging Google Search

**Instructions**

Make sure:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

In [None]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
llm = OpenAI(temperature=0)

# load some tools
tools = load_tools(["google-search"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [None]:
agent.run("Who is the prime minister of France since may 2022")

**OUTPUT**

'Élisabeth Borne is the prime minister of France since May 16, 2022.'

This is true.

---
# 4. Memory - Conversation

<div class="alert alert-block alert-warning"> TODO  what is a conversation </div>


In [None]:
from langchain import OpenAI, ConversationChain

# create a model
llm = OpenAI(temperature=0)

conversation = ConversationChain(llm=llm, verbose=True)

conversation.predict(input="Hi There")



In [None]:
conversation.predict(input="What is the first thing that I said to you?")


In [None]:
conversation.predict(input="What is an alternative for the first thing that I said to you?")


---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN COMPONENTS</div>


---
# 5. Schemas

Basic data types and schemas that are used throughout the codebase.

There are 3 types of schemas
- Text (see above)
- Prompts
- Messages 
- Document


<br/>

**Resources**
> - Schhemas component:  https://docs.langchain.com/docs/components/schema/


---
## Text

In [None]:
from langchain.llms import OpenAI

# loads the model.
# OPENAI_API_KEY is requested. Get it from the OpenAI site.
# a paid account and available units are requested to be able to place a request.
llm = OpenAI(temperature=0.9)

text = "what are the 5 best countries in Europe"

# Actual API call - may tale a while.
print(llm(text))

---
## Chat messages
Chat messages are like text with a type

There are 3 types
- System: background context that tells the AI what to do
- Human: inputs sent by the user
- AI : response of the AI


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=0.7)

In [None]:
messages = [ SystemMessage(content="You are a nice AI and help users to feature out what to eat.")]
     
messages.append( HumanMessage(content="I like tuna, list some recipes.") )

In [None]:
response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

In [None]:
messages.append( HumanMessage(content="show the first one.") )

response = chat(messages)
messages.append( AIMessage(content=response.content) )

print(response.content)

---
## Examples
An list of input output pairs thet represent the input and expected output.

Used to fine tune a model or do in-context learning.

**Resources**
> - Prompt Template:  https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples


In [None]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))


# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))


---
## Documents

An unstructured object that conaints a pieces of text and metadatas.

<div class="alert alert-block alert-warning"> TODO  resource </div>


<div class="alert alert-block alert-warning"> TODO how to use this concept? 
make some knowledge available?
how to use metadata?
</div>


In [None]:
from langchain.schema import Document
from langchain.llms import OpenAI

# temperature 0 means no randomness
llm = OpenAI(temperature=0)


document = Document(
    page_content="""

        So she swallowed one of the cakes and was delighted to find that she
        began shrinking directly. As soon as she was small enough to get through
        the door, she ran out of the house and found quite a crowd of little
        animals and birds waiting outside. They all made a rush at Alice the
        moment she appeared, but she ran off as hard as she could and soon found
        herself safe in a thick wood.
        """,
    metadata={
        'author':"Lewis Caroll",
        'identifier':"1234"
    }
)

print("Document")
print(document)

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    verbose=False)

# run the chain against the documment
summary = chain.run([document])
    
print("\nSummary")
print(summary)


In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# setup. a custom prompt
# a defaukt one is provide: write a concise summary
prompt_template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    prompt=prompt, 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

---
# 6. Models
LangChain provides interfaces and integrations for two types of models:
- LLMs: Models that take a text string as input and return a text string
- Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat Message

<br/>

**Resources**
> - Model Component: https://python.langchain.com/docs/modules/model_io/models/
> - List of models: https://platform.openai.com/docs/models


---
## Langage Model 
LLMs: Models that take a text string as input and return a text string

In [None]:
from langchain.llms import OpenAI

# additnal parameters to select a mode, pass the API key ...
llm = OpenAI(model_name="text-ada-001", temperature=0.7)

llm("What day comes after Friday?")

---
## Chat Model 
Chat models: Models that are backed by a language model but take a list of Chat Messages as input and return a Chat 

Also make sense for a unique interaction as Chat API is less expensive.


In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=1)

In [None]:
messages = [ 
    SystemMessage(content="You are a nice AI and help users to feature out what to eat."),
    HumanMessage(content="I like tuna, list some recipes.")
]
     
chat(messages)

---
### Text Embedding Model

Convert text into a series of numbers (a vector) which holds the meaning of the text.

Mainly used for text comparison.

In [None]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

text="A leader should know all about truth and honesty, and when to see the difference. (Truck) - Bromeliad Trilogy"

text_embedding = embeddings.embed_query(text)

print(f"embedding length: {len(text_embedding)}")
print(f"5 first values of the vector: {text_embedding[:5]}")

---
# 7. prompts
A "prompt" refers to the input to the model. This input is rarely hard coded, but rather is often constructed from multiple components. A PromptTemplate is responsible for the construction of this input. LangChain provides several classes and functions to make constructing and working with prompts easy.

LangChain documentation is split into four sections:
- PromptValue: The class representing an input to a model.
- Prompt Templates: The class in charge of constructing a PromptValue.
- Example Selectors: Often times it is useful to include examples in prompts. These examples can be hardcoded, but it is often more powerful if they are dynamically selected.
- Output Parsers: Language models (and Chat Models) output text. But many times you may want to get more structured information than just text back. This is where output parsers come in. Output Parsers are responsible for (1) instructing the model how output should be formatted, (2) parsing output into the desired formatting (including retrying if necessary).

<br/>

**Resources**
> - Prompts Component: https://docs.langchain.com/docs/components/prompts/

---
## Simple prompt

In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# write a simple  prompt. use """ to allow multiline string.
prompt = """
Today is Monday. Tomorrow is Wednesday.

What is wrong with this statement?
"""

# query the model
print(llm(prompt))

---
## Prompt with template and placeholder.

In [None]:
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# setup a prompt. use """ to allow multiline string.
template = PromptTemplate (
    input_variables=["today", "tomorrow"],
    template="""
    Today is {today}. Tomorrow is {tomorrow}.

    What is wrong with this statement?
    """
)

prompt = template.format(today="Monday", tomorrow="Wednesday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

In [None]:
prompt = template.format(today="Thursday", tomorrow="Friday")
print(f"{prompt=}")

# query the model

print(llm(prompt))

---
## Example selectors and Few Shot Learning

A way to select from a series of examples in few shot learning 

**Resources**
> - Example Selector: https://api.python.langchain.com/en/latest/modules/example_selector.html
> - Few shot learning: https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/few_shot_examples



### Example selectors and Few Shot Learning with NGram


<div class="alert alert-block alert-warning"> FIXME </div>

In [None]:
from langchain.llms import OpenAI
from langchain.prompts.example_selector import NGramOverlapExampleSelector
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))


# Select and order examples based on ngram overlap score (sentence_bleu score).

question = "pink bold"

example_selector = NGramOverlapExampleSelector.select_examples(
    examples,
    question
)

"""
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # This is the number of examples to produce.
    k=1
)
"""


# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    #example_selector=example_selector, 
    examples=selected_examples, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input=question)

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))


### Example selectors and Few Shot Learning with similarities

requires a vector database

In [None]:
from langchain.llms import OpenAI
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Annoy
#from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# loads the model.
llm = OpenAI(temperature=0.9)

# create the example set

examples = [
    { "question": "red bold", "answer": "color:red; font-style:bold;"},
    { "question": "green italic", "answer":  "color:green; font-style:italic;"},
    { "question": "blue bold", "answer":  "color:blue; font-style:bold;"},
    { "question": "pink", "answer":  "color:pink;"},
    { "question": "green", "answer":  "color:green;"},
    { "question": "pink italic", "answer":  "color:pink; font-style:italic;"}
    
]    

# Configure a formatter that will format the few shot examples into a string. 
# This formatter should be a PromptTemplate object.

example_prompt = PromptTemplate (
    input_variables=["question", "answer"], 
    template="question: {question}\n{answer}"
)

print("\n=== exemple prompt ===")
print(example_prompt.format(**examples[0]))

# Example selector that selects examples based on SemanticSimilarity.

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    #Chroma,
    Annoy,
    # This is the number of examples to produce.
    k=2
)

# Finally, create a FewShotPromptTemplate object. 
# This object takes in the few shot examples and the formatter for the few shot examples.

prompt_template = FewShotPromptTemplate(
    example_selector=example_selector, 
    example_prompt=example_prompt, 
    suffix="question: {input}", 
    input_variables=["input"]
)

prompt = prompt_template.format(input="pink bold")

print("\n=== prompt ===")
print(prompt)

print("\n=== answer ===")
print(llm(prompt))


---
## Output Parser and response format

A way to format the outpu
- Format nstructions: An autogenerated prompt telling how the result should be formatted
- parser: a method which will extract the output int hte desired format. you may prvie a custom parser


**Resources**
> - OutputParser:https://docs.langchain.com/docs/components/prompts/output-parser

In [75]:
from langchain.llms import OpenAI
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.schema import HumanMessage, SystemMessage, AIMessage
from langchain.prompts.prompt import PromptTemplate


# loads the model.
llm = OpenAI(model_name="text-davinci-003", temperature=0.9)

# how you would like the response to be structured
# periods at the send of sentence are required. 
# If not there description ends up in the json text and break the JSON format
response_schemas = [
    ResponseSchema(name="bad_string", description="This is a poorly formatted string."),
    ResponseSchema(name="good_string", description="This is a your string reformatted.")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# check instructions
format_instructions =output_parser.get_format_instructions()
print("\nformat_instructions")      
print(format_instructions)      

template = """
You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


{format_instructions}

% USER_INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt_template = PromptTemplate(
    input_variables=['user_input'],
    partial_variables={'format_instructions': format_instructions},
    template=template
)

# format the user input as a prompt
# for whateveer reason it does not work well with format.
# format_promt retruns an object, not a string and should be converted to a string 
prompt = prompt_template.format_prompt(user_input="Wellcom to Californya!").to_string()
print("\nprompt")
print(prompt)

# gets the response
response = llm(prompt)
print("\nresponse=")      
print(response)      

# gets the JSON document
print("\nparsed output=")     

# comma sometimes missing
response.replace('"good_string"',',"good_string"')

output_parser.parse(response)                   



format_instructions
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

prompt

You will be given a poorly formatted string from a user. 
Reformat it and make sure all the words are spelled correctly.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This is a poorly formatted string.
	"good_string": string  // This is a your string reformatted.
}
```

% USER_INPUT:
Wellcom to Californya!

YOUR RESPONSE:


response=
```json
{
	"bad_string": "Wellcom to Californya!",
	"good_string": "Welcome to California!"
}
```

parsed output=


{'bad_string': 'Wellcom to Californya!',
 'good_string': 'Welcome to California!'}

---
# 8. Indexes

Indexes refer to ways to structure documents so that LLMs can best interact with them. This module contains utility functions for working with documents, different types of indexes, and then examples for using those indexes in chains.

LangChain documentation is split into four sections:

- Document Loaders: Classes responsible for loading documents from various sources.
- Text Splitters: Classes responsible for splitting text into smaller chunks.
- VectorStores: The most common type of index. One that relies on embeddings.
- Retrievers: Interface for fetching relevant documents to combine with language models.

<br/>

**Resource**
> - Indexes Component: https://docs.langchain.com/docs/components/indexing/


**Instructions**

For the example below, make sure that:
- a vector database client is installed

---
## Document Loaders

Easy ways to import documents from other sources 
and make it available for use in your language models.

**Resources**
> -  Document Loaders: https://python.langchain.com/docs/modules/data_connection/document_loaders
> - List of loaders: https://github.com/hwchase17/langchain/tree/master/langchain/document_loaders

In [None]:
from langchain.document_loaders import HNLoader
 
# Setup a Hacker News loader
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")
 
data = loader.load()
 
print(f"Found {len(data)} comments")


sample = '\n'.join([x.page_content[:100] for x in data[:2]])
print("\nHere's a sample (first 100 chars of the 3 first items)")
print(sample)
                 

---
## Text Splitters

allow you to split a document into smaller chunk

<div class="alert alert-block alert-warning"> TODO  resource </div>


In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# This is a long document we can split up.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


print("docuument content")
start = 2200
print(documents[0].page_content[start-200:start+300])

 
# The recommended TextSplitter is the RecursiveCharacterTextSplitter. 
# This will split documents recursively by different characters - starting with "\n\n", then "\n", then " ".
# This is nice because it will try to keep all the semantically relevant content in the same place 
# for as long as possible.
# Important parameters to know here are chunkSize and chunkOverlap. 
# chunkSize controls the max size (in terms of number of characters) of the final documents. 
# chunkOverlap specifies how much overlap there should be between chunks. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=200,
    chunk_overlap=20,
)
 
texts = text_splitter.create_documents([document[0].page_content])
 
print(f"\nSplitted into {len(texts)} parts")
 
print("Preview:")
i = int(start/150)
print(texts[i+1].page_content, "\n-")
print(texts[i+2].page_content, "\n-")
print(texts[i+3].page_content)


---
## Vextor Store and Retrievers 
A retriever is an interface that returns documents given an unstructured query. 

A retriever does not need to be able to store documents, only to return (or retrieve) it. 

It usually relies to a vector store as a document management backbone.

A vector store is a particular type of database optimized for storing documents and their embeddings, and then fetching of the most relevant documents for a particular query, ie. those whose embeddings are most similar to the embedding of the query.

- local : ChromaDB, FAISS, Annoy
- Online: Pinecone, Weaviate

However a retriever is more general than a vector store and there are other types of retrievers as well, e.g. Wikipedia or search engines like Elastic Search or Kendra.


Question answering over documents consists of four steps:
1. Create an index
2. Create a Retriever from that index
3. Create a question answering chain
4. Ask questions

<br/>

**Resources**
> - Lit of retrievers: https://python.langchain.com/docs/modules/data_connection/retrievers/
> - LangChain Supported VectorStores: https://api.python.langchain.com/en/latest/modules/vectorstores.html
> - Retrievers: https://github.com/hwchase17/langchain/tree/master/langchain/retrievers

### Store document in a Vector Store and retrieve information

In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

print(f"Found {len(documents)} document(s)")


# Get your splitter ready
# Using small chunk for the sake of example. 
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=25)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)

print(f"\nSplitted into {len(texts)} parts")

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(texts, embeddings)

In [None]:
# Init a retriever for this db
retriever = db.as_retriever()

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
docs = retriever.get_relevant_documents(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)

In [None]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Asking theLLM
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

qa.run(query)

In [None]:
qa.run(query)

### Save and load db


In [None]:
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings

docstore_file_path = "alice_docstore"

db.save_local(docstore_file_path)

loaded_vector_store = Annoy.load_local(
   docstore_file_path, embeddings=OpenAIEmbeddings()
)

# same document similar to White Red abbit
loaded_vector_store.similarity_search_with_score("White Rabbit", k=3)

### One line index creation and information retrieval

In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)

# creating an indexer
# default to Chroma as a vector database
# Use CharacterTextSplitter. May also be RecursiveCharacterTextSplitter.
index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Annoy,
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
)

index = index_creator.from_loaders([loader])

# retrieve indexed documents relevant for the query
query = "who is the White Rabbit?"
index.query(query)

print(f"\nFound {len(docs)}")

samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
print(samples)

In [None]:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Ask the question to the model 
# the response will be based on the retrieved documents 
qa = RetrievalQA.from_chain_type(llm=OpenAI(), 
                                 chain_type="stuff", 
                                 retriever=index.vectorstore.as_retriever())

qa.run(query)

In [None]:
qa.run(query)

---
## Wikipedia retriever


<div class="alert alert-block alert-warning"> TODO wikipedia retriever </div>

<div class="alert alert-block alert-warning"> 
    Move to tools agent_excutor example  <br>
</div>


In [None]:
from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, Tool
from langchain.utilities import WikipediaAPIWrapper

# model_name='gpt-4'
llm = ChatOpenAI(temperature=0)

wikipedia = WikipediaAPIWrapper()

tools = [
    Tool(
        name="Wikipedia",
        func=wikipedia.run,
        description="Useful for when you need to get information from wikipedia about a single topic"
    ),
]

agent_executor = initialize_agent(tools, llm, agent='zero-shot-react-description', verbose=True)

output = agent_executor.run("Can you please provide a quick summary of Napoleon Bonaparte? \
                          Then do a separate search and tell me what the commonalities are with Serena Williams")

---
# 9. Memory


Memory is the concept of storing and retrieving data in the process of a conversation. 

There are two main methods:
- Based on input, fetch any relevant pieces of data
- Based on the input and output, update state accordingly

There are two main types of memory: short term and long term.
- Short term memory generally refers to how to pass data in the context of a singular conversation (generally is previous ChatMessages or summaries of them).
- Long term memory deals with how to fetch and update information between conversations.

<br/>

**Resource**
> - Memory Component: https://docs.langchain.com/docs/components/memory/
> - Chat Message History: https://docs.langchain.com/docs/components/memory/chat_message_history
> - [LangChain: Enhancing Performance with Memory Capacity](https://towardsdatascience.com/langchain-enhancing-performance-with-memory-capacity-c7168e097f81)


<div class="alert alert-block alert-warning"> TODO vs Conversation and buffer memory (check blog)?</div>


<div class="alert alert-block alert-warning"> TODO Long term memory</div>


In [None]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI
from pprint import pprint
 
chat = ChatOpenAI(temperature=0)
 
history = ChatMessageHistory()
 
history.add_ai_message("hi!")
 
history.add_user_message("what is the capital of france?")

#After adding messages to the history, you can pass this history to the language model 
#to generate context-aware responses:

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

In [None]:
history.add_user_message("what is the population os this city?")

ai_response = chat(history.messages)
history.add_ai_message(ai_response.content)

print(f"{ai_response.content=}")
print(f"\nhistory.messages:")
pprint(history.messages, compact=False)

---
# 10. Chains
Chains are sequences of modular components (or other chains) combined in a particular way to accomplish a common use case.


Example:
- chaining LLM and tool
- summarization chain

<br/>

**Resources**
> - Chain Component: https://docs.langchain.com/docs/components/chains/


<div class="alert alert-block alert-warning"> TODO index related chain https://docs.langchain.com/docs/components/chains/index_related_chains  </div>




## Simple sequential model

A Simple Sequential Chain helps break up tasks to avoid language models getting distracted, confused, or hallucinating when asked to perform too many tasks in a row.

In this example, the chain first receives the user location (Rome) and outputs a classic dish from Rome. Then, it provides a simple recipe for that classic dish. The verbose=True parameter ensures that the chain prints statements during its execution, making it easier to debug and understand the chain’s progress.

In [None]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain
 
# Cretae a model with high randomness
llm = OpenAI(temperature=1)
 
# Step 1 - dish for location

template = """
Your job is to come up with a classic dish from the area that the users suggests. 

% USER LOCATION {user_location} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

location_chain = LLMChain(llm=llm, prompt=prompt_template)
 

# Step 2 - Recipe
template = """
Given a meal, give a short and simple recipe on how to make that dish at home. 

% MEAL {user_meal} 

YOUR RESPONSE: 
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)
 
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

# chain the steps
# set verbose to True to check what happes
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=False)
 
review = overall_chain.run("Rome")

## Summarization Chain

The Summarization Chain breaks the text into smaller chunks and summarizing each chunk, creating a final summary based on the individual summaries.

In this example, the chain first splits the essay into chunks of 700 characters. It then generates summaries for each chunk and creates a final concise summary based on these individual summaries.

In [None]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Cretae a model with low randomness
llm = OpenAI(temperature=1)

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()
 
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)
 
# Split your docs into texts
# only kept first 1 000 characters of the document to save computing
texts = text_splitter.split_documents(documents[:1000])
 
# There is a lot of complexity hidden in this one line. 
# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
summary = chain.run(texts)
    
print(summary)

**OUTPUT**
 
Some map summaries

> Alice hears the White Rabbit muttering to itself, concerned that the Duchess will execute it for losing the fan and pair of white kid-gloves. Alice offers to help the Rabbit search for them, but they are nowhere to be found because everything has changed since Alice's dip in the pool.

> Alice meets a Rabbit who accuses her of being his housemaid Mary Ann and orders her to fetch his gloves and fan. She finds a neat little house with the Rabbit's name on a brass plate and goes in without knocking. She is afraid of meeting the real Mary Ann before she can find the fan and gloves.

> Alice finds her way into a room with a table in the window, containing a fan and some gloves. She notices a bottle and drinks from it, hoping it will make her grow large again. When she drinks half of the bottle she finds her head pressing against the ceiling, so she hastily puts it down.

> A character wishes she wouldn't grow anymore, but sadly she continues to grow rapidly. As a result, she kneels on the floor, puts her arm out the window and her foot up the chimney, and is uncertain of her fate.

 
Final summary

> In Lewis Carroll's Alice's Adventures in Wonderland, Alice follows a White Rabbit into a strange world and has to navigate unexpected events and peculiar characters. She eventually meets a Caterpillar who helps her regain control of her changing size. Project Gutenberg is a non-profit organization committed to making electronic books free to the public. Donations up to $5,000 are available, and the full license stipulates amounts and terms of use.

## Summarize stored documents

<div class="alert alert-block alert-warning"> TODO  make use of the vector db</div>

# 11. Agents

LangChain define agents as decision making engines:
> Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

It splits the documentation into the following sections:
> - Tools: How language models interact with other resources.
> - Agents: The language model that drives decision making.
> - Toolkits: Sets of tools that when used together can accomplish a specific task.
> - Agent Executor: The logic for running agents with tools.


**Resources**
> - Agents: https://docs.langchain.com/docs/components/agents/

<div class="alert alert-block alert-warning"> TODO </div>

## Tool
Tools are interfaces an agent can call to interact with other services

**Resources**
> - Tools: https://python.langchain.com/docs/modules/agents/tools/

**Instructions**

For the example below, make sure that:
- Google API client is installed
- a Custome Search Engine is available (CSE)
- the API key has been setup up

In [None]:
from langchain.tools import Tool
from langchain.utilities import GoogleSearchAPIWrapper

search = GoogleSearchAPIWrapper()

tool = Tool(
    name="Google Search",
    description="Search Google for recent results.",
    func=search.run,
)

tool.run("Who is the French Prime Minister name since May 2022?")

## Agent leveraging tools

Google Search and LLM-math are predefined tools:
- LLM-Math is a langage model trained to do math logic.
- Google)search tool allow to place queries on Google Search

In [None]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.llms import OpenAI

# create a model
llm = OpenAI(temperature=0)

# load some tools
tools = load_tools(["google-search", "llm-math"], llm=llm)

# setup an agent
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


In [None]:
agent.run("How many Teslas have been sold in 2022. Multiple by 2")

In [None]:
agent.run("Multiply by 2 the population of the capital of Frannce")

In [None]:
agent.run("""Who is the current prime minister of France. 
Is he or she younger than the President?""") 

In [None]:
if False:
    # too complex
    # either fails because it tries to add dates and nulber
    # or give weird results like
    # 'Élisabeth Borne will be 70 in the year 2215.'
    agent.run("""Who is the current prime minister of France. 
    When will he or she be 70?""") 

---
<div style="background-color:green;color:black;text-align:center;padding:1rem;font-size:1.5rem;">LANGCHAIN USE CASES</div>



---
# [UC] 1. Summarization

---
## Summaries Of Short Text
Just write a summarization prompt

In [None]:
# text to be summarized
text_sample = """
The first thing she heard was a general chorus of "There goes Bill!"
then the Rabbit's voice alone—"Catch him, you by the hedge!" Then
silence and then another confusion of voices—"Hold up his head—Brandy
now—Don't choke him—What happened to you?"

Last came a little feeble, squeaking voice, "Well, I hardly know—No
more, thank ye. I'm better now—all I know is, something comes at me
like a Jack-in-the-box and up I goes like a sky-rocket!"

After a minute or two of silence, they began moving about again, and
Alice heard the Rabbit say, "A barrowful will do, to begin with."

"A barrowful of what?" thought Alice. But she had not long to doubt,
for the next moment a shower of little pebbles came rattling in at the
window and some of them hit her in the face. Alice noticed, with some
surprise, that the pebbles were all turning into little cakes as they
lay on the floor and a bright idea came into her head. "If I eat one of
these cakes," she thought, "it's sure to make some< change in my size."

So she swallowed one of the cakes and was delighted to find that she
began shrinking directly. As soon as she was small enough to get through
the door, she ran out of the house and found quite a crowd of little
animals and birds waiting outside. They all made a rush at Alice the
moment she appeared, but she ran off as hard as she could and soon found
herself safe in a thick wood.
"""

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# Summarization prompt template
template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

# Create a LangChain prompt template that we can insert values to later
prompt_template = PromptTemplate(
    input_variables=["text"],
    template=template
)

prompt = prompt_template.format(text=text_sample)

#print("\nPrompt")
#print(prompt)

# run the model
output = llm(prompt)

print("\nOutput")
print (output)


---
## Summaries of Short text leveraging Summarization Chain

In [None]:
from langchain.llms import OpenAI
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

---
## Summaries of Short text leveraging Summarization Chain and custom prompt

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# check the number of tokens
num_tokens = llm.get_num_tokens(text_sample)
print(f"{num_tokens=}")

# build a document reuse text sampke above
doc = Document(
    page_content=text_sample,
    metadata={
        'author':"Lewis Caroll",
        'title':"Alice in Wonderland"
    }
)

# chain expect a list of documents
docs = [doc]

# setup. a custom prompt
# a defaukt one is provide: write a concise summary
prompt_template = """
%INSTRUCTIONS:
Please summarize the following piece of text.
Respond in a manner that a 5 year old would understand.

%TEXT:
{text}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

# the attribute stuff instruct the run the chain once
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    prompt=prompt, 
    verbose=False)

# run the chain against the documment
summary = chain.run(docs)
    
print(summary)

---
## Summaries Of longer Text
If the text is longer than the limit in tokens, the text must be splitted in chunks. 
Langchain components will take care of splitting and chaining the summarization tasks.

The Summarization Chain breaks the text into smaller chunks and summarizing each chunk, creating a final summary based on the individual summaries.

In this example, the chain first splits the essay into chunks of 700 characters. It then generates summaries for each chunk and creates a final concise summary based on these individual summaries.

<br/>
**Resources**
> - Qummarization quickstart: https://python.langchain.com/docs/modules/chains/popular/summarize

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0, model_name='text-davinci-003')

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()
 
# check the number of tokens
num_tokens = llm.get_num_tokens(documents[0].page_content)
print(f"{num_tokens=}")

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
 
# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
prompt_template = """Write a concise summary of the following text. 
Focus on the story and ignore details of Project Gutenberg. 

% TEXT:

{text}
"""

prompt = PromptTemplate(template=prompt_template, input_variables=["text"])

# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(
    llm, 
    chain_type="map_reduce", 
    map_prompt=prompt, 
    combine_prompt=prompt, 
    verbose=False)

# run the chain against all the document chunks
summary = chain.run(texts)

# save the final summary
with open('alice_summary.txt', 'w') as file:
    file.write(summary)
    
print(summary)

**OUTPUT (default prompt)**

Alice's Adventures in Wonderland is a classic novel by Lewis Carroll, originally published in 1916. 
It follows Alice as she falls down a rabbit hole and embarks on a series of strange and wonderful adventures 
in the magical world of Wonderland. 
Project Gutenberg is a library of free electronic works owned by the Project Gutenberg Literary Archive Foundation, 
which allows users to copy, distribute, perform, display or create derivative works based on the work 
as long as all references to Project Gutenberg are removed. 
Professor Michael S. Hart was the originator of the concept and has been producing and distributing 
Project Gutenberg eBooks for 40 years.


**OUTPUT (custom prompt)**

Alice visits the Queen's Croquet Ground and is asked to play a game of croquet with the Queen. 
Alice is surprised to find that the balls are live hedgehogs and the mallets are flamingos. 
After winning the game, Alice is invited to join the Queen's procession and finds herself in a court of justice, 
where she is put on trial for stealing the Queen's tarts. 
Alice is defended by the White Rabbit and the jury is made up of animals and birds. 
Alice is found not guilty, but the Queen is furious and orders Alice to leave. 
Alice is saved by the Cheshire Cat who appears and tells the Queen that she can't do anything to Alice. 
Alice then meets the Duchess who is out of prison and they walk off together, 
but the Queen appears and gives Alice a warning. 
Alice then has a dream in which she encounters a pack of cards that come to life and try to attack her. 
She wakes up to find her sister and they go home.

In [None]:
---
# [UC] 2.  Question & Answering Using Documents As Context
Question answering in this context refers to question answering over your document data. F

It is basically the example in Indexes.

In order to use LLMs for question and answer we must:
- Pass the LLM relevant context it needs to answer a question
- Pass it our question that we want answered

<br/>

++Resources**
> - [QA] LangChain Question & Answer Docs

---
# [UC] 2.  Question & Answering Using Documents As Context
It is basically the example in Indexes.

In [None]:
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
 
# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

# Get your splitter ready
# in practice they default to 4000 and 200 respectively.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
print(f"Generated {len(texts)} parts")

# Get embedding engine ready
embeddings = OpenAIEmbeddings()
 
# Embedd your texts andd store them in the vector database
# dtabase is in memory. it might be savecd to a file and loader later on.
db = Annoy.from_documents(texts, embeddings)

# Init a retriever for this db
#retriever = db.as_retriever()
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k":4})

# ra query
query = "who is the White Rabbit?"

# retrieve and count indexed documents relevant for the query
docs = retriever.get_relevant_documents(query)
print(f"\nFound {len(docs)} relevant documen(s)")

#samples = "\n\n".join([x.page_content[:200] for x in docs[:5]])
#print(samples)

# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True)

response = qa({"query": query})
print(response['result'])

In [None]:
# using instructions to get a more interesting reponse
instructions = ". Give a funny answer 30 words long."
response = qa({"query": query + instructions})
print(response['result'])

---
### Questions and Answer using a loaded vector store

In [None]:
# saving the database for future use

docstore_file_path = "alice_docstore_2"

db.save_local(docstore_file_path)


In [None]:
# loading the database 

docstore_file_path = "alice_docstore_2"

loaded_vector_store = Annoy.load_local(
   docstore_file_path, embeddings=OpenAIEmbeddings()
)

# expose this index in a retriever interface
retriever = loaded_vector_store.as_retriever(search_type="similarity", search_kwargs={"k":4})

# ra query
query = "who is the White Rabbit?"

# create a chain to answer questions 
qa = RetrievalQA.from_chain_type(
    llm=OpenAI(), 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True)

instructions = ". Give a pedantic answer 50 words long."

response = qa({"query": query + instructions})
print(response['result'])

---
### Complex search on large document


In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
 
# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
llm = OpenAI(temperature=0.3, model_name='text-davinci-003')

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
print(f"\nFound {len(texts)} part(s)")


# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
map_prompt_template = """
Make a detailled summary.
Focus on the story and ignore details of Project Gutenberg.
List all the characters.
Output the list of characters as a bullet points lust which showsthe name and description of the characters. 

% TEXT:

{text}
"""

map_prompt = PromptTemplate(template=map_prompt_template, input_variables=["text"])


# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
# elements of the list of sample will diseapper from the list 
combine_prompt_template = """
Make a summary of summaries and merge all character lists.
List all the characters.
Output the list of characters as a bullet points list which shows the name and description of the characters. 

% TEXT:

{text}
"""

combine_prompt = PromptTemplate(template=combine_prompt_template, input_variables=["text"])


# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(
    llm, 
    chain_type="map_reduce", 
    map_prompt=map_prompt, 
    combine_prompt=combine_prompt, 
    verbose=False)

# run the chain against all the document chunks
summary = chain.run(texts)

print(summary)

Merged List of Characters: 
```
• Alice – A young girl who falls down a rabbit hole into a fantasy world.
• White Rabbit – A rabbit who wears a waistcoat and is always in a hurry.
• The Cheshire Cat – A mysterious cat with a wide grin who can disappear and reappear at will.
• The Mad Hatter – A strange man who wears a top hat and throws tea parties.
• The Queen of Hearts – A tyrannical ruler who is always shouting "Off with their heads!"
• The March Hare – A hare who is always late and attends the Mad Hatter's tea parties.
• The Caterpillar – A wise creature who smokes a hookah and gives Alice advice.
• The Duchess – A rude woman who lives in a chaotic house.
• The Mock Turtle – A sad creature who tells Alice stories of his past.
• The Gryphon – A strange creature with the head of an eagle and the body of a lion.
• Dinah – Alice's cat who she misses and hopes will get her saucer of milk at tea-time.
• Little Table – A table made of solid glass with a
• King of Hearts – Character in the trial mentioned in
```

- Alice: A young girl who falls down a rabbit hole and enters a strange and magical world.
- White Rabbit: A white rabbit who Alice follows down the rabbit hole.
- Mad Hatter: A strange character who hosts a tea party with the March Hare and the Dormouse.
- March Hare: A hare who attends the Mad Hatter's tea party.
- Dormouse: A sleepy mouse who attends the Mad Hatter's tea party.
- Queen of Hearts: A tyrannical ruler who orders the beheading of anyone who offends her.
- King of Hearts: The Queen of Hearts' husband who is easily manipulated by her.
- Caterpillar: A large caterpillar who smokes a hookah and speaks in riddles.
- Gryphon: A creature with the head and wings of an eagle and the body of a lion.
- Mock Turtle: A sad creature who tells Alice a story about his life.

- Caterpillar: A wise caterpillar who smokes a hookah and offers advice to Alice.
- Queen of Hearts: A tyrannical queen who demands to have everyone executed for trivial offenses.
- Cheshire Cat: A mysterious, mischievous cat with a wide grin that appears and disappears at will.
- Mad Hatter: An eccentric character who hosts a tea party and speaks in nonsensical riddles.
- Dormouse: A sleepy character that sits near the Mad Hatter at the tea party.
- March Hare: A hare that is often seen with the Mad Hatter and Dormouse at the tea party.
- Mock Turtle: A turtle with the head of an ox that talks of its school days.

- Luke Skywalker – A young farm boy from Tatooine who discovers his destiny as a Jedi Knight.
- Princess Leia – A brave and resourceful leader of the Rebel Alliance.
- Han Solo – A roguish smuggler and pilot who joins forces with the Rebel Alliance.
- Obi-Wan Kenobi – A wise and powerful Jedi Master who mentors Luke Skywalker.
- Darth Vader – A powerful Sith Lord and the main antagonist of the original trilogy.
- C-3PO – A protocol droid built by Anakin Skywalker and programmed for etiquette and protocol.
- R2-D2 – An astromech droid who serves as a companion to Luke Skywalker.
- Chewbacca – A loyal Wookiee and Han Solo's co-pilot.
- Yoda – A wise and powerful Jedi Master who trains Luke Skywalker in the ways of the Force.
- Jabba the Hutt – A powerful crime lord who controls much of the criminal underworld in the galaxy.
- Boba Fett – A bounty hunter hired by Darth Vader to capture Han Solo.
- Lando Calrissian – A smooth-talking smuggler and former friend of Han Solo.
- Emperor Palpatine – The evil Sith Lord who

---
### Complex search on large document using DB

In [None]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.vectorstores import Annoy
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains.summarize import load_summarize_chain
 
# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
#model_name="gpt-3.5-turbo" # fdoes not work with map reduce
model_name='text-davinci-003'
llm = OpenAI(temperature=0.5, model_name=model_name)

# loading the database 

docstore_file_path = "alice_docstore_2"

loaded_vector_store = Annoy.load_local(
   docstore_file_path, embeddings=OpenAIEmbeddings()
)

# expose this index in a retriever interface
# retriev all the docuùents
retriever = loaded_vector_store.as_retriever(search_type="similarity", 
                                             search_kwargs={"k":2000, 
                                                           "score_threshold": 0})

# retrieve and count indexed documents to ensure all the documents are selected
docs = retriever.get_relevant_documents("all the story")
print(f"\nFound {len(docs)} documen(s)")
 
# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
map_prompt_template = """
Make a detailled summary.
Focus on the story and ignore details of Project Gutenberg.
List all the characters.
Output the list of characters as a bullet points lust which showsthe name and description of the characters. 

% TEXT:

{text}
"""

map_prompt = PromptTemplate(template=map_prompt_template, input_variables=["text"])


# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
combine_prompt_template = """
Make a summary of summaries and merge all character lists.
List all the characters.
Output the list of characters as a bullet points list which shows the name and description of the characters. 

% TEXT:

{text}
"""

combine_prompt = PromptTemplate(template=combine_prompt_template, input_variables=["text"])


# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(
    llm, 
    chain_type="map_reduce", 
    map_prompt=map_prompt, 
    combine_prompt=combine_prompt, 
    verbose=False)

# run the chain against all the document chunks
summary = chain.run(docs)
    
print(summary)

- Lory: A bird who is looking for a way out of the wood
- Duck: A bird who is looking for a way out of the wood
- Eaglet: A bird who is looking for a way out of the wood
- Dodo: A bird who is looking for a way out of the wood
- Caterpillar: A creature who is smoking a hookah
- King and Queen of Hearts: The rulers of the court
- Knave of Hearts: A character who is accused of stealing the tarts
- Three Gardeners: Characters who are painting white roses red
- Five and Seven: Two characters who are guarding the Queen
- Two: A character who is the White Rabbit's servant
- Ten Soldiers: Characters who are guarding the Queen
- Ten Courtiers: Characters who are attending the Queen
- Royal Children: Characters who are attending the Queen
- Guests: Characters who are attending the Queen
- Small Door in a Wall: A mysterious door that Alice finds
- Mushroom: A mysterious mushroom that Alice finds
- Bottle with the Words "DRINK ME" Printed on It: A mysterious bottle that Alice finds
- Fan: A mysterious fan that Alice finds
- Pair of White Kid-Gloves


Characters:
- Alice: The protagonist of the story. She is a young girl who finds herself in a strange world.
- White Rabbit: A talking rabbit who is always in a hurry and is late for important appointments.
- Cake: A cake with the words "Eat Me" written on it.
- Key: A tiny golden key found under the table. 
- Rabbit-Hole: A deep dark hole leading into a corridor.


### get the list then query the characters


In [60]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.output_parsers import CommaSeparatedListOutputParser


# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
#model_name="gpt-3.5-turbo" # fdoes not work with map reduce
model_name='text-davinci-003'
llm = OpenAI(temperature=0.3, model_name=model_name)

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
print(f"\nFound {len(texts)} part(s)")

output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
map_prompt_template = """
Make a detailled summary.
Focus on the story and ignore details of Project Gutenberg.
Find the characters and list their names.

{format_instructions}

% TEXT:

{text}
"""

map_prompt = PromptTemplate(template=map_prompt_template, 
                        input_variables=["text"],
                        partial_variables={"format_instructions": format_instructions}
                       )

print("\nMap Prompt")
print(map_prompt)

# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
combine_prompt_template = """
To make a summary, keep all lines starting with the word characters.

% TEXT:

{text}
"""

combine_prompt = PromptTemplate(template=combine_prompt_template, 
                        input_variables=["text"]
                        #partial_variables={"format_instructions": format_instructions}
                       )

print("\nCombine Prompt")
print(combine_prompt)


# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(
    llm, 
    chain_type="map_reduce", 
    map_prompt=map_prompt, 
    combine_prompt=combine_prompt, 
    verbose=False)

# run the chain against all the document chunks
response = chain.run(texts)

print("\nResponse")
print(response)

#characters = output_parser.parse(response)

print("\nCharacters")
print(characters)


Found 50 part(s)

Map Prompt
input_variables=['text'] output_parser=None partial_variables={'format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz`'} template='\nMake a detailled summary.\nFocus on the story and ignore details of Project Gutenberg.\nFind the characters and list thier names.\n\n{format_instructions}\n\n% TEXT:\n\n{text}\n' template_format='f-string' validate_template=True

Combine Prompt
input_variables=['text'] output_parser=None partial_variables={} template='\nTo make a summary, keep all lines starting with the word characters.\n\n% TEXT:\n\n{text}\n' template_format='f-string' validate_template=True

Response
No summary.

Characters
['No answer required']


In [83]:
from langchain.llms import OpenAI
from langchain import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import BSHTMLLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.output_parsers import CommaSeparatedListOutputParser
from langchain.schema import Document

# Note, the default model is already 'text-davinci-003' 
# temperature 0 means no randomness
#model_name="gpt-3.5-turbo" # fdoes not work with map reduce
model_name='text-davinci-003'
llm = OpenAI(temperature=0.3, model_name=model_name)

# This is the source document.    
document_path = "data/Alice's Adventures in Wonderland, by Lewis Carroll.html"
 
# Setup a HTML loader
loader = BSHTMLLoader(document_path)
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=100)
 
# Split your docs into texts
texts = text_splitter.split_documents(documents)
print(f"\nFound {len(texts)} part(s)")

output_parser = CommaSeparatedListOutputParser()
format_instructions = output_parser.get_format_instructions()

# setup. a custom prompt
# the Summarization Chain provides a defaults prompt: write a concise summary.
map_prompt_template = """
ou will be given a text.
Extract the characters's names.
Ignore details of Project Gutenberg.

{format_instructions}. Add "characters:" in front of the list.

% TEXT:

{text}
"""

map_prompt = PromptTemplate(template=map_prompt_template, 
                        input_variables=["text"],
                        partial_variables={"format_instructions": format_instructions}
                       )

print("\nMap Prompt")
print(map_prompt)


# the attribute map_reduce instruct the chain to 
# - first apply the model to each chunck (map stage) 
# - then all map results and apply the model (reduce stage)
chain = load_summarize_chain(
    llm, 
    chain_type="stuff", 
    prompt=map_prompt,  
    verbose=False)

i = 0
characters = []
prefix = "Characters:"
for text in texts:
    i += 1
    
    # run the chain against all the document chunks
    # chain expect a list of documents
    response = chain.run([text])

    #print(f"\nResponse {i}")
    #print(response)

    lines = response.split('\n')
    #print(f"\nlines {i}")
    #print(lines)
    for line in lines:
        #print(f"\nline {i}")
        #print(line)
        if line.startswith(prefix):
            part_characters = output_parser.parse(line.replace(prefix, ''))

            print(f"\nPart Characters {i}")
            print(part_characters)

            characters.extend(part_characters)
    

print(f"\nAll Characters {i}")
print(set(characters))



Found 50 part(s)

Map Prompt
input_variables=['text'] output_parser=None partial_variables={'format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz`'} template='\nou will be given a text.\nExtract the characters\'s names.\nIgnore details of Project Gutenberg.\n\n{format_instructions}. Add "characters:" in front of the list.\n\n% TEXT:\n\n{text}\n' template_format='f-string' validate_template=True

Part Characters 1
['Alice', 'The Duchess', 'The White Rabbit', 'The Cheshire Cat', 'The Mad Hatter', 'The March Hare', 'The Queen of Hearts.']

Part Characters 2
['Alice', 'White Rabbit', "Alice's sister"]

Part Characters 3
['Alice', 'Dinah', 'White Rabbit']

Part Characters 4
['Alice']

Part Characters 5
['Alice,']

Part Characters 6
['Alice', 'White Rabbit']

Part Characters 7
['Alice', 'Mouse']

Part Characters 8
['Alice', 'William the Conqueror', 'Mouse', 'Dinah']

Part Characters 9
['Duck', 'Dodo', 'Lory', 'Eaglet']

Part Characters 10
['III

<div class="alert alert-block alert-warning"> TODO lookup each character + filter relevant part of the document</div>


Characters:
- Alice: The protagonist of the story. She is a young girl who finds herself in a strange world.
- White Rabbit: A talking rabbit who is always in a hurry and is late for important appointments.
- Cake: A cake with the words "Eat Me" written on it.
- Key: A tiny golden key found under the table. 
- Rabbit-Hole: A deep dark hole leading into a corridor.

----


In [None]:
<div class="alert alert-block alert-warning"> TODO essayer sequence mapper list + qa par personnage</div>

<div class="alert alert-block alert-warning"> TODO and FIXME</div>

vector store backed retriever 

https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/vectorstore

https://python.langchain.com/docs/modules/chains/additional/question_answering

In [None]:
<div class="alert alert-block alert-warning"> TODO explain refine and mapreduce </div>

### TOdo get a list and show each character to demo output prser and seq chain

<div class="alert alert-block alert-warning"> TODO </div>

prompt
parse and map
seq chain
```python
output_parser = RegexParser(
    regex=r"(.*?)\nScore: (.*)",
    output_keys=["answer", "score"],
)
PROMPT = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
    output_parser=output_parser,
```

<div class="alert alert-block alert-warning"> TODO 
how to use qa in chain and do something like make a list and gie details.
Another option parsed output and browse the list Ouotput parser as list ?
alternative conversation.
</div>

# [UC] ...
AAnalyzing stuctured data

https://python.langchain.com/docs/use_cases/tabular.html

https://python.langchain.com/docs/modules/agents/toolkits/csv.html

https://python.langchain.com/docs/modules/agents/toolkits/sql_database.html

https://python.langchain.com/docs/modules/agents/toolkits/pandas.html



<div class="alert alert-block alert-warning"> TODO </div>

# [UC] ...
API Chains

https://python.langchain.com/docs/modules/chains/popular/api.html


<div class="alert alert-block alert-warning"> TODO </div>

In [None]:
# [UC] ...
graph index creator


<div class="alert alert-block alert-warning"> TODO </div>