<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0"> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LangChain for Generative AI</h1>
<h1>LangChain</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint
from operator import itemgetter

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import torch

import openai
from openai import OpenAI

import transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results


import langchain
from langchain.chains import create_sql_query_chain
from langchain.tools import DuckDuckGoSearchRun

import langchain_openai
from langchain_openai import ChatOpenAI

import langchain_anthropic
from langchain_anthropic import ChatAnthropic

import langchain_core
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough

import langchain_community
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

import watermark

%load_ext watermark
%matplotlib inline

We start by print out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 23.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 69f6f030264aa590fba1d589b725de7bc70dda35

langchain_community: 0.2.1
pandas             : 2.2.3
numpy              : 1.26.4
matplotlib         : 3.8.0
langchain_core     : 0.2.3
langchain_openai   : 0.1.8
langchain          : 0.2.2
torch              : 2.3.0
langchain_anthropic: 0.1.15
openai             : 1.30.5
watermark          : 2.4.3
transformers       : 4.41.1



Load default figure style

In [3]:
plt.style.use('./d4sci.mplstyle')

# OpenAI

The first step is generate API key on the OpenAI website and store it as the "OPENAI_API_KEY" variable in your local environment. Without it we won't be able to do anything. You can find your API key in your using settings: https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key. Then we are ready to instantiate the client

In [4]:
client = OpenAI()

In [5]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
        {
            "role": "user", 
            "content": "What was Superman's weakness?"
        },
    ]
)

In [6]:
print(response)

ChatCompletion(id='chatcmpl-AVityPbiedbi1Mng8mAOO8OMqbwqh', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Superman's primary weakness is Kryptonite, a radioactive substance from his home planet, Krypton. Exposure to green Kryptonite can weaken Superman, strip him of his powers, and prolonged exposure can be fatal. There are other forms of Kryptonite in various Superman storylines, each with different effects, but green Kryptonite is the most well-known and consistently depicted weakness. Additionally, Superman is vulnerable to magic, and his powers can be diminished under a red sun, like that of Krypton.", role='assistant', function_call=None, tool_calls=None, refusal=None))], created=1732124134, model='gpt-4o-2024-08-06', object='chat.completion', system_fingerprint='fp_7f6be3efb0', usage=CompletionUsage(completion_tokens=98, prompt_tokens=13, total_tokens=111, prompt_tokens_details={'cached_tokens': 0, 'audio_tokens': 0}, complet

In [7]:
print(response.choices[0].message.content)

Superman's primary weakness is Kryptonite, a radioactive substance from his home planet, Krypton. Exposure to green Kryptonite can weaken Superman, strip him of his powers, and prolonged exposure can be fatal. There are other forms of Kryptonite in various Superman storylines, each with different effects, but green Kryptonite is the most well-known and consistently depicted weakness. Additionally, Superman is vulnerable to magic, and his powers can be diminished under a red sun, like that of Krypton.


# LangChain

We instantiate the LangChain interface for OpenAI

In [8]:
model = ChatOpenAI(model="gpt-4o")

In [9]:
messages = [
    SystemMessage(content="What was Superman's weakness?"),
]

output = model.invoke(messages)
print(output)

content="Superman's primary weakness is kryptonite, a mineral from his home planet of Krypton. When exposed to it, Superman loses his powers and can become severely weakened or even die with prolonged exposure. Different forms of kryptonite have various effects, but the most well-known is green kryptonite. \n\nAdditionally, Superman can be vulnerable to red sun radiation, similar to the sun of his home planet, which can strip him of his powers. He can also be harmed by magic, as he has no special defenses against magical forces." response_metadata={'token_usage': {'completion_tokens': 105, 'prompt_tokens': 13, 'total_tokens': 118, 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0}, 'completion_tokens_details': {'reasoning_tokens': 0, 'audio_tokens': 0, 'accepted_prediction_tokens': 0, 'rejected_prediction_tokens': 0}}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_7f6be3efb0', 'finish_reason': 'stop', 'logprobs': None} id='run-ad10198e-0aa4-4be5-8bf6-6cafff52ad85-0' u

In [10]:
output.response_metadata["token_usage"]

{'completion_tokens': 105,
 'prompt_tokens': 13,
 'total_tokens': 118,
 'prompt_tokens_details': {'cached_tokens': 0, 'audio_tokens': 0},
 'completion_tokens_details': {'reasoning_tokens': 0,
  'audio_tokens': 0,
  'accepted_prediction_tokens': 0,
  'rejected_prediction_tokens': 0}}

In [11]:
parser = StrOutputParser()

In [12]:
result = model.invoke(messages)

In [13]:
parser.invoke(result)

"Superman's most well-known weakness is kryptonite, a radioactive mineral from his home planet, Krypton. Exposure to kryptonite can weaken Superman, strip him of his powers, and potentially harm or even kill him if he is exposed to it for too long. \n\nIn addition to kryptonite, Superman has other vulnerabilities. He is susceptible to magic, which can affect him just as it would any ordinary person. Additionally, he can be overpowered by beings with comparable or greater strength, such as other Kryptonians or cosmic entities. While Superman is incredibly resilient, he still requires solar energy from Earth's yellow sun to maintain his powers, so deprivation of sunlight can weaken him over time."

Let us create our first chain. Stages of the chain are conencted with the pipe '|' character

In [14]:
chain = model | parser

Now whenver we call __invoke()__ on the chain, it automatically runs all the steps

In [15]:
chain.invoke(messages)

"Superman's primary weakness is Kryptonite, a radioactive substance from his home planet of Krypton. Exposure to Kryptonite weakens Superman and can even be lethal to him if he is exposed to it for prolonged periods. There are different types of Kryptonite, with green Kryptonite being the most common and harmful. Other forms, like red or gold Kryptonite, have varying effects on Superman, often causing unpredictable changes to his powers or personality.\n\nAdditionally, Superman is vulnerable to magic, which can affect him in ways that other physical attacks cannot. He is also susceptible to the effects of a red sun, which can drain his powers, as Earth's yellow sun is the source of his strength."

We can also create templates for our prompts, following conventions similar to the Jinja templating system

In [16]:
system_template = "Translate the following into {language}:"

And we can combine multiple messages into a single template

In [17]:
prompt_template = ChatPromptTemplate.from_messages(
    [
     ("system", system_template), 
     ("user", "{text}")]
)

To instantiate the prompt, we must provide the correct fields

In [18]:
result = prompt_template.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

result

ChatPromptValue(messages=[SystemMessage(content='Translate the following into italian:'), HumanMessage(content='Be the change that you wish to see in the world.')])

The full interaction is:

In [19]:
result.to_messages()

[SystemMessage(content='Translate the following into italian:'),
 HumanMessage(content='Be the change that you wish to see in the world.')]

In [20]:
chain = prompt_template | model | parser

In [21]:
chain.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sei addestrato su dati fino a ottobre 2023.'

# Anthropic

In [22]:
model_a = ChatAnthropic(model="claude-3-opus-20240229")

In [23]:
chain_a = prompt_template | model_a | parser

In [24]:
model_a

ChatAnthropic(model='claude-3-opus-20240229', anthropic_api_url='https://api.anthropic.com', anthropic_api_key=SecretStr('**********'), _client=<anthropic.Anthropic object at 0x371d6acd0>, _async_client=<anthropic.AsyncAnthropic object at 0x374b36090>)

In [25]:
chain_a.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sii il cambiamento che desideri vedere nel mondo.'

# Message History

In [26]:
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


with_message_history = RunnableWithMessageHistory(model, get_session_history)

In [27]:
config = {"configurable": {"session_id": "abc2"}}

In [28]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Bob")],
    config=config,
)

response.content

'Hello, Bob! How can I assist you today?'

In [29]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'You mentioned that your name is Bob. How can I help you further?'

In [30]:
config = {"configurable": {"session_id": "abc3"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

"I'm sorry, but I can't determine your name based on the information provided."

In [31]:
config = {"configurable": {"session_id": "abc2"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

"You told me earlier that your name is Bob. If there's anything else you'd like to discuss or ask, feel free to let me know!"

In [32]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model | parser

In [33]:
response = chain.invoke({"messages": [HumanMessage(content="hi! I'm bob")]})

response

'Hello Bob! How can I assist you today?'

In [34]:
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

In [35]:
config = {"configurable": {"session_id": "abc5"}}

In [36]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Jim")],
    config=config,
)

response

'Hello Jim! How can I assist you today?'

In [37]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response

'Your name is Jim. How can I help you today?'

# Database Integration

In [38]:
db = SQLDatabase.from_uri("sqlite:///data/Northwind_small.sqlite")

In [39]:
print(db.dialect)

sqlite


In [40]:
print(db.get_usable_table_names())

['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']


In [41]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

In [42]:
write_query = create_sql_query_chain(llm, db)

In [43]:
response = write_query.invoke({"question": "How many customers are there"}) 
response

'SELECT COUNT("Id") AS TotalCustomers FROM Customer;'

In [44]:
db.run(response)

'[(91,)]'

In [45]:
execute_query = QuerySQLDataBaseTool(db=db)

In [46]:
sql_chain = write_query | execute_query

In [47]:
sql_chain.invoke({"question": "How many employees are there"})

'[(9,)]'

In [48]:
answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

answer = answer_prompt | llm | StrOutputParser()
chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | execute_query
    )
    | answer
)

chain.invoke({"question": "How many employees are there"})

'There are a total of 9 employees.'

In [49]:
search = DuckDuckGoSearchRun()
search.run("When will the next solar eclipse be?")

"Get Ready for These Upcoming Eclipses! More Eclipses Solar Eclipses Date Solar Eclipse Type Geographic Region of Visibility Oct. 2, 2024 Annular An annular solar eclipse will be visible in South America, and a partial eclipse will be visible in South America, Antarctica, Pacific Ocean, Atlantic Ocean, North America March 29, 2025 Partial Europe, Asia, […] Explore a map of the next 15 total solar eclipses. In case you miss this year's solar eclipse, there are 14 more in the next 20 years. This map of eclipse paths from 2024 to 2044 reveals that ... It will be 20 years before there's a chance to witness a total solar eclipse in the United States again. According to NASA, after the total solar eclipse on April 8, 2024, the next total solar ... The sun is seen in full eclipse over Grand Teton National Park on August 21, 2017, outside Jackson, Wyoming. After 2024, the next total solar eclipse visible in the U.S. will be in 2044. The total length of the 2024 eclipse path is 9,190 miles (14,

<center>
     <img src="data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>