<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0"> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LangChain for Generative AI</h1>
<h1>LangChain</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint
from operator import itemgetter

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import torch

import openai
from openai import OpenAI

import transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results


import langchain
from langchain.chains import create_sql_query_chain
from langchain.tools import DuckDuckGoSearchRun

import langchain_openai
from langchain_openai import ChatOpenAI

import langchain_anthropic
from langchain_anthropic import ChatAnthropic

import langchain_core
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough

import langchain_community
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

import watermark

%load_ext watermark
%matplotlib inline

We start by print out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.13.3
IPython version      : 9.2.0

Compiler    : Clang 17.0.0 (clang-1700.0.13.3)
OS          : Darwin
Release     : 25.0.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 24f5062fbf46a87bfe9be08eb40e50ecbf9f4e00

langchain_openai   : 0.3.18
numpy              : 2.2.5
langchain_anthropic: 0.3.14
transformers       : 4.52.3
langchain          : 0.3.25
langchain_core     : 0.3.62
openai             : 1.78.1
watermark          : 2.5.0
pandas             : 2.2.3
torch              : 2.7.0
langchain_community: 0.3.24
matplotlib         : 3.10.3



Load default figure style

In [3]:
plt.style.use('d4sci.mplstyle')

# OpenAI

The first step is generate API key on the OpenAI website and store it as the "OPENAI_API_KEY" variable in your local environment. Without it we won't be able to do anything. You can find your API key in your using settings: https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key. Then we are ready to instantiate the client

In [4]:
client = OpenAI()

In [5]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
        {
            "role": "user", 
            "content": "What was Superman's weakness?"
        },
    ]
)

In [6]:
print(response)

ChatCompletion(id='chatcmpl-CQyUSd4zeD6aP2loivPW92WGF6Lv0', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Superman's primary weakness is kryptonite, a radioactive mineral from his home planet of Krypton. Exposure to kryptonite can weaken Superman, strip him of his powers, and even cause him harm or death if he is exposed to it for an extended period. In addition to kryptonite, Superman is also vulnerable to magic, and his powers can be nullified by red solar radiation, which mimics the conditions of Krypton's sun.", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1760545088, model='gpt-4o-2024-08-06', object='chat.completion', service_tier='default', system_fingerprint='fp_f64f290af2', usage=CompletionUsage(completion_tokens=86, prompt_tokens=13, total_tokens=99, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_token

In [7]:
print(response.choices[0].message.content)

Superman's primary weakness is kryptonite, a radioactive mineral from his home planet of Krypton. Exposure to kryptonite can weaken Superman, strip him of his powers, and even cause him harm or death if he is exposed to it for an extended period. In addition to kryptonite, Superman is also vulnerable to magic, and his powers can be nullified by red solar radiation, which mimics the conditions of Krypton's sun.


# LangChain

We instantiate the LangChain interface for OpenAI

In [8]:
model = ChatOpenAI(model="gpt-4o")

In [9]:
messages = [
    SystemMessage(content="What was Superman's weakness?"),
]

output = model.invoke(messages)
print(output)

content="Superman's primary weakness is Kryptonite, a mineral from his home planet of Krypton. Exposure to this substance weakens him, drains his powers, and can be lethal over prolonged exposure. Kryptonite typically appears in a green form, which is the most common and most harmful to Superman. Other forms of kryptonite exist in the comics, each with a different effect. For example, red kryptonite causes bizarre, unpredictable changes in behavior or powers, while gold kryptonite can remove his powers permanently.\n\nAdditionally, Superman is vulnerable to magic and has limitations under a red sun, similar to the sun in his native star system, which renders him powerless. Being susceptible to magic means that spells, magical creatures, and enchanted objects can affect him in ways that physical force or conventional weaponry cannot." additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 158, 'prompt_tokens': 13, 'total_tokens': 171, 'completion_tok

In [10]:
output.response_metadata["token_usage"]

{'completion_tokens': 158,
 'prompt_tokens': 13,
 'total_tokens': 171,
 'completion_tokens_details': {'accepted_prediction_tokens': 0,
  'audio_tokens': 0,
  'reasoning_tokens': 0,
  'rejected_prediction_tokens': 0},
 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}

In [11]:
parser = StrOutputParser()

In [12]:
result = model.invoke(messages)

In [13]:
parser.invoke(result)

"Superman's most well-known weakness is Kryptonite, a mineral from his home planet of Krypton. Different forms of Kryptonite affect him in various ways, with green Kryptonite being the most common and harmful, significantly weakening him and causing pain and, with prolonged exposure, can be lethal. Other types of Kryptonite, such as red, gold, blue, and others, have unique effects ranging from altering his behavior to removing his powers temporarily. Besides Kryptonite, Superman is also vulnerable to magic and attacks involving magical forces, which can bypass his otherwise invulnerable defenses. Additionally, he requires solar energy from Earth's yellow sun to maintain his powers, so being deprived of sunlight or exposed to red sunlight (like that of Krypton) can weaken or eliminate his abilities."

Let us create our first chain. Stages of the chain are conencted with the pipe '|' character

In [14]:
chain = model | parser

Now whenver we call __invoke()__ on the chain, it automatically runs all the steps

In [15]:
chain.invoke(messages)

"Superman's most widely known weakness is Kryptonite, a radioactive mineral from his home planet of Krypton. Exposure to Kryptonite weakens Superman and can render him powerless. Prolonged exposure can even be fatal. There are different types of Kryptonite, with green Kryptonite being the most common. Each variant has different effects on Superman and other Kryptonian beings.\n\nBesides Kryptonite, Superman also has vulnerabilities to magic and red solar radiation. Magic can affect him in ways that bypass his usual invulnerability, and exposure to red solar radiation (similar to the light of Krypton's sun) can gradually strip him of his powers, as it neutralizes the yellow solar radiation from Earth's sun that gives him his strength."

We can also create templates for our prompts, following conventions similar to the Jinja templating system

In [16]:
system_template = "Translate the following into {language}:"

And we can combine multiple messages into a single template

In [17]:
prompt_template = ChatPromptTemplate.from_messages(
    [
     ("system", system_template), 
     ("user", "{text}")
    ]
)

To instantiate the prompt, we must provide the correct fields

In [18]:
result = prompt_template.invoke(
    {
        "language": "italian", 
        "text": "Be the change that you wish to see in the world."
    }
)

result

ChatPromptValue(messages=[SystemMessage(content='Translate the following into italian:', additional_kwargs={}, response_metadata={}), HumanMessage(content='Be the change that you wish to see in the world.', additional_kwargs={}, response_metadata={})])

The full interaction is:

In [19]:
result.to_messages()

[SystemMessage(content='Translate the following into italian:', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Be the change that you wish to see in the world.', additional_kwargs={}, response_metadata={})]

In [20]:
chain = prompt_template | model | parser

In [21]:
chain.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sei addestrato su dati fino a ottobre 2023.'

# Anthropic

In [22]:
model_a = ChatAnthropic(model="claude-3-opus-20240229")

In [23]:
chain_a = prompt_template | model_a | parser

In [24]:
model_a

ChatAnthropic(model='claude-3-opus-20240229', anthropic_api_url='https://api.anthropic.com', anthropic_api_key=SecretStr('**********'), model_kwargs={})

In [25]:
chain_a.invoke(
    {
        "language": "italian", 
        "text": "Be the change that you wish to see in the world."
    }
)

'Sii il cambiamento che desideri vedere nel mondo.'

# Message History

In [26]:
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

In [27]:
with_message_history = RunnableWithMessageHistory(model_a, get_session_history)

In [28]:
config = {"configurable": {"session_id": "abc2"}}

In [29]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Bruno")],
    config=config,
)

response.content

"Hello Bruno, it's nice to meet you! I'm Claude, an AI assistant. How can I help you today?"

In [30]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'Your name is Bruno. You introduced yourself as Bruno when we started chatting.'

In [31]:
config = {"configurable": {"session_id": "abc3"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

"I apologize, but I don't know your name. You haven't shared it with me, and as an AI language model, I don't have access to personal information about the users I interact with unless they provide it to me explicitly."

In [32]:
config = {"configurable": {"session_id": "abc2"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'As I mentioned, your name is Bruno. You told me your name was Bruno when you first greeted me.'

In [33]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

In [34]:
chain = prompt | model_a | parser

In [35]:
response = chain.invoke({"messages": [HumanMessage(content="hi! I'm bob")]})

response

"Hello Bob, it's nice to meet you! I'm an AI assistant. How can I help you today?"

In [36]:
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

In [37]:
config = {"configurable": {"session_id": "abc5"}}

In [38]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Jim")],
    config=config,
)

response

"Hello Jim, it's nice to meet you! I'm Claude, an AI assistant. How can I help you today?"

# Database Integration

In [39]:
db = SQLDatabase.from_uri("sqlite:///data/Northwind_small.sqlite")

In [40]:
print(db.dialect)

sqlite


In [41]:
print(db.get_usable_table_names())

['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']


In [42]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

In [43]:
write_query = create_sql_query_chain(llm, db)

In [44]:
response = write_query.invoke({"question": "How many customers are there"}) 
response

'SELECT COUNT("Id") AS TotalCustomers FROM Customer;'

In [45]:
db.run(response)

'[(91,)]'

In [46]:
execute_query = QuerySQLDataBaseTool(db=db)

  execute_query = QuerySQLDataBaseTool(db=db)


In [47]:
sql_chain = write_query | execute_query

In [48]:
sql_chain.invoke({"question": "How many employees are there"})

'[(9,)]'

In [49]:
answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

answer = answer_prompt | llm | StrOutputParser()
chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | execute_query
    )
    | answer
)

chain.invoke({"question": "How many employees are there"})

'There are 9 employees.'

In [50]:
RunnablePassthrough.assign(query=write_query).invoke({"question": "How many employees are there"})

{'question': 'How many employees are there',
 'query': 'SELECT COUNT("Id") AS TotalEmployees\nFROM Employee'}

In [51]:
RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query")).invoke({"question": "How many employees are there"})

{'question': 'How many employees are there',
 'query': 'SELECT COUNT("Id") AS TotalEmployees FROM Employee;',
 'result': 'SELECT COUNT("Id") AS TotalEmployees FROM Employee;'}

In [53]:
search = DuckDuckGoSearchRun()
search.run("When will the next solar eclipse be?")

"Future Eclipses Get Ready for These Upcoming Eclipses! Solar Eclipses ... The date listed for each eclipse is the local date where the eclipse occurs. Lunar Eclipses ... Eclipse News More NASA News Discover all solar and lunar eclipse dates for 2025 and 2026, including visibility, times, and types. Plan your skywatching with our eclipse calendar. The next annular solar eclipse will be on Feb. 17, 2026 but you'll only be able to view it in Antarctica. On the same day, a partial eclipse will be visible in Antarctica, Africa, South America ... After the 2024 total solar eclipse, astronomy lovers are eager to know when the next extraterrestrial event will be visible in the U.S. Here is the schedule for the upcoming solar eclipses. Upcoming Solar Eclipses This page provides a list of upcoming solar eclipses. It shows, in chronological order, every eclipse over the next 10 years. The blue lines represent the areas where a Total or Annular Eclipse will be visible. The green lines represent t

<center>
     <img src="data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>