<div style="width: 100%; overflow: hidden;">
    <div style="width: 150px; float: left;"> <img src="data/D4Sci_logo_ball.png" alt="Data For Science, Inc" align="left" border="0"> </div>
    <div style="float: left; margin-left: 10px;"> <h1>LangChain for Generative AI</h1>
<h1>LangChain</h1>
        <p>Bruno Gonçalves<br/>
        <a href="http://www.data4sci.com/">www.data4sci.com</a><br/>
            @bgoncalves, @data4sci</p></div>
</div>

In [1]:
from collections import Counter
from pprint import pprint
from operator import itemgetter

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

import torch

import openai
from openai import OpenAI

import transformers
from transformers import pipeline
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results


import langchain
from langchain.chains import create_sql_query_chain
from langchain.tools import DuckDuckGoSearchRun

import langchain_openai
from langchain_openai import ChatOpenAI

import langchain_anthropic
from langchain_anthropic import ChatAnthropic

import langchain_core
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough

import langchain_community
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.utilities import SQLDatabase
from langchain_community.tools.sql_database.tool import QuerySQLDataBaseTool

import watermark

%load_ext watermark
%matplotlib inline

We start by print out the versions of the libraries we're using for future reference

In [2]:
%watermark -n -v -m -g -iv

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.12.3

Compiler    : Clang 14.0.6 
OS          : Darwin
Release     : 23.6.0
Machine     : arm64
Processor   : arm
CPU cores   : 16
Architecture: 64bit

Git hash: 5f98e46f7abc79011d7877a17645c79f3e7672ab

torch              : 2.3.0
langchain_openai   : 0.1.8
matplotlib         : 3.8.0
transformers       : 4.41.1
numpy              : 1.26.4
langchain          : 0.2.2
langchain_core     : 0.2.3
langchain_community: 0.2.1
watermark          : 2.4.3
openai             : 1.30.5
pandas             : 1.5.3
langchain_anthropic: 0.1.15



Load default figure style

In [3]:
plt.style.use('./d4sci.mplstyle')

# OpenAI

The first step is generate API key on the OpenAI website and store it as the "OPENAI_API_KEY" variable in your local environment. Without it we won't be able to do anything. You can find your API key in your using settings: https://help.openai.com/en/articles/4936850-where-do-i-find-my-secret-api-key. Then we are ready to instantiate the client

In [4]:
client = OpenAI()

In [5]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
        {
            "role": "user", 
            "content": "What was Superman's weakness?"
        },
    ]
)

In [6]:
print(response)

ChatCompletion(id='chatcmpl-ABi1E8X8wxKn1HNmlluhm3viW0Crx', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Superman's primary weakness is kryptonite, a mineral from his home planet, Krypton. When exposed to kryptonite, Superman's powers are weakened, and prolonged exposure can be deadly to him. There are different types of kryptonite, each with its own effects. For example, green kryptonite weakens and can potentially kill him, while red kryptonite causes unpredictable and often bizarre changes in his behavior or abilities.\n\nAdditionally, Superman is vulnerable to magic and can be harmed by magical spells and artifacts, which bypass his typical invulnerability. He also requires solar energy from Earth's yellow sun to maintain his superpowers, so being deprived of sunlight for extended periods can weaken him.\n\nLastly, despite his superhuman abilities, Superman's strong moral code and empathy for others can be considered weaknesse

In [7]:
print(response.choices[0].message.content)

Superman's primary weakness is kryptonite, a mineral from his home planet, Krypton. When exposed to kryptonite, Superman's powers are weakened, and prolonged exposure can be deadly to him. There are different types of kryptonite, each with its own effects. For example, green kryptonite weakens and can potentially kill him, while red kryptonite causes unpredictable and often bizarre changes in his behavior or abilities.

Additionally, Superman is vulnerable to magic and can be harmed by magical spells and artifacts, which bypass his typical invulnerability. He also requires solar energy from Earth's yellow sun to maintain his superpowers, so being deprived of sunlight for extended periods can weaken him.

Lastly, despite his superhuman abilities, Superman's strong moral code and empathy for others can be considered weaknesses in a strategic sense, as his enemies often exploit his compassion to manipulate him.


# LangChain

We instantiate the LangChain interface for OpenAI

In [8]:
model = ChatOpenAI(model="gpt-4o")

In [9]:
messages = [
    SystemMessage(content="What was Superman's weakness?"),
]

output = model.invoke(messages)
print(output)

content="Superman's primary weakness is Kryptonite, a radioactive mineral from his home planet, Krypton. Exposure to Kryptonite weakens Superman significantly, stripping him of his powers and making him vulnerable. There are different types of Kryptonite, each with varying effects. The most common type is green Kryptonite, which weakens and can potentially kill him with prolonged exposure. Other types, such as red, blue, gold, and black Kryptonite, have different and sometimes unpredictable effects on Superman and other Kryptonians." response_metadata={'token_usage': {'completion_tokens': 101, 'prompt_tokens': 13, 'total_tokens': 114, 'completion_tokens_details': {'reasoning_tokens': 0}}, 'model_name': 'gpt-4o', 'system_fingerprint': 'fp_52a7f40b0b', 'finish_reason': 'stop', 'logprobs': None} id='run-6ff54646-7c59-490f-b561-f8a0ef450f6c-0' usage_metadata={'input_tokens': 13, 'output_tokens': 101, 'total_tokens': 114}


In [10]:
output.response_metadata["token_usage"]

{'completion_tokens': 101,
 'prompt_tokens': 13,
 'total_tokens': 114,
 'completion_tokens_details': {'reasoning_tokens': 0}}

In [11]:
parser = StrOutputParser()

In [12]:
result = model.invoke(messages)

In [13]:
parser.invoke(result)

"Superman's most well-known weakness is Kryptonite, a radioactive material from his home planet of Krypton. Exposure to Kryptonite weakens Superman, strips him of his powers, and prolonged exposure can be fatal. There are various forms of Kryptonite, each with different effects, but green Kryptonite is the most common and harmful to him. Additionally, Superman is vulnerable to magic and can be harmed by it, unlike most other forms of attack which he can usually withstand due to his superhuman abilities."

Let us create our first chain. Stages of the chain are conencted with the pipe '|' character

In [14]:
chain = model | parser

Now whenver we call __invoke()__ on the chain, it automatically runs all the steps

In [15]:
chain.invoke(messages)

"Superman's primary weakness is Kryptonite, a mineral from his home planet, Krypton. Exposure to Kryptonite, especially the green variety, can weaken him, strip him of his powers, and, with prolonged exposure, even be fatal. Different forms of Kryptonite have varying effects; for example, Red Kryptonite causes unpredictable changes in his physiology, and Gold Kryptonite can permanently remove his powers. Additionally, Superman is vulnerable to magic, which can harm him in ways that physical force cannot. His powers also diminish under a red sun, similar to the red sun of Krypton, unlike Earth's yellow sun, which gives him his superhuman abilities."

We can also create templates for our prompts, following conventions similar to the Jinja templating system

In [16]:
system_template = "Translate the following into {language}:"

And we can combine multiple messages into a single template

In [17]:
prompt_template = ChatPromptTemplate.from_messages(
    [
     ("system", system_template), 
     ("user", "{text}")]
)

To instantiate the prompt, we must provide the correct fields

In [18]:
result = prompt_template.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

result

ChatPromptValue(messages=[SystemMessage(content='Translate the following into italian:'), HumanMessage(content='Be the change that you wish to see in the world.')])

The full interaction is:

In [19]:
result.to_messages()

[SystemMessage(content='Translate the following into italian:'),
 HumanMessage(content='Be the change that you wish to see in the world.')]

In [20]:
chain = prompt_template | model | parser

In [21]:
chain.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sii il cambiamento che vuoi vedere nel mondo.'

# Anthropic

In [22]:
model_a = ChatAnthropic(model="claude-3-opus-20240229")

In [23]:
chain_a = prompt_template | model_a | parser

In [24]:
model_a

ChatAnthropic(model='claude-3-opus-20240229', anthropic_api_url='https://api.anthropic.com', anthropic_api_key=SecretStr('**********'), _client=<anthropic.Anthropic object at 0x174e9d2d0>, _async_client=<anthropic.AsyncAnthropic object at 0x174e722d0>)

In [25]:
chain_a.invoke({
    "language": "italian", 
    "text": "Be the change that you wish to see in the world."
})

'Sii il cambiamento che desideri vedere nel mondo.'

# Message History

In [26]:
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


with_message_history = RunnableWithMessageHistory(model, get_session_history)

In [27]:
config = {"configurable": {"session_id": "abc2"}}

In [28]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Bob")],
    config=config,
)

response.content

'Hi Bob! How can I assist you today?'

In [29]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'Your name is Bob! How can I help you today?'

In [30]:
config = {"configurable": {"session_id": "abc3"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

"I'm sorry, but I don't have access to personal information about individuals unless it has been shared with me in the course of our conversation. How can I assist you today?"

In [31]:
config = {"configurable": {"session_id": "abc2"}}

response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'You mentioned that your name is Bob. How can I assist you further?'

In [32]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a helpful assistant. Answer all questions to the best of your ability.",
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model | parser

In [33]:
response = chain.invoke({"messages": [HumanMessage(content="hi! I'm bob")]})

response

'Hi Bob! How can I help you today?'

In [34]:
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

In [35]:
config = {"configurable": {"session_id": "abc5"}}

In [36]:
response = with_message_history.invoke(
    [HumanMessage(content="Hi! I'm Jim")],
    config=config,
)

response

'Hi Jim! How can I assist you today?'

In [37]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response

'Your name is Jim. How can I help you today, Jim?'

# Database Integration

In [38]:
db = SQLDatabase.from_uri("sqlite:///data/Northwind_small.sqlite")

In [39]:
print(db.dialect)

sqlite


In [40]:
print(db.get_usable_table_names())

['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']


In [41]:
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

In [42]:
write_query = create_sql_query_chain(llm, db)

In [43]:
response = write_query.invoke({"question": "How many customers are there"}) 
response

'SELECT COUNT("Id") AS TotalCustomers FROM Customer;'

In [44]:
db.run(response)

'[(91,)]'

In [45]:
execute_query = QuerySQLDataBaseTool(db=db)

In [46]:
sql_chain = write_query | execute_query

In [47]:
sql_chain.invoke({"question": "How many employees are there"})

'[(9,)]'

In [48]:
answer_prompt = PromptTemplate.from_template(
    """Given the following user question, corresponding SQL query, and SQL result, answer the user question.

Question: {question}
SQL Query: {query}
SQL Result: {result}
Answer: """
)

answer = answer_prompt | llm | StrOutputParser()
chain = (
    RunnablePassthrough.assign(query=write_query).assign(
        result=itemgetter("query") | execute_query
    )
    | answer
)

chain.invoke({"question": "How many employees are there"})

'There are a total of 9 employees.'

In [49]:
search = DuckDuckGoSearchRun()
search.run("When will the next solar eclipse be?")

'Explore an interactive map of the paths and dates of the next 15 total solar eclipses, based on NASA\'s database of five millennium canon. Find out when and where you can see totality or partial eclipse in your region or around the world. Total Solar Eclipse - April 8, 2024. On April 8, 2024, a total solar eclipse will cross North America, passing over Mexico, the United States, and Canada. A total solar eclipse happens when the Moon passes between the Sun and Earth, completely blocking the face of the Sun. The sky will darken as if it were dawn or dusk. The first solar eclipse since April 8\'s total in North America, the "ring of fire" on October 2, 2024, will be seen from the Pacific, South America and the Atlantic. The total length of the 2024 eclipse path is 9,190 miles (14,790 km). The magnitude of this eclipse is 1.0565, which means the Moon\'s diameter is 5.65 percent larger than the Sun\'s. Only when ... The global path of the \'ring of fire\' annular solar eclipse on Oct. 2, 

<center>
     <img src="data/D4Sci_logo_full.png" alt="Data For Science, Inc" align="center" border="0" width=300px> 
</center>