<a href="https://colab.research.google.com/github/cooolbabu/GoogleGemini101/blob/main/OpenAI/11_LangChain_SQL_Queries_Postgres_Claude.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This is a follow-up to Justin Grammens LetsLearnLangChain presentation.
# Here we are using Claude for comparision with OpenAI and other LLMs
- [Course content on Github](https://github.com/justingrammens/LetsLearnLangChain)
- [Youtube link](https://www.youtube.com/watch?v=QT3wALFDZBo)



## Installation


In [8]:
!pip install langchain langchain_experimental langchain-anthropic -q
!pip install supabase -q
!pip install pyodbc -q

## Helper functions

In [53]:
#
# Retrieve a file from Github Repository

def get_ChinookDB_ER_from_github(file_url):

  # Send a GET request to the file URL
  response = requests.get(file_url)

  # Check if the request was successful
  if response.status_code == 200:
      # Get the content of the file
      file_content = response.text
      return file_content
  else:
      return(f'Failed to retrieve the file: {response.status_code}')

## Get API Keys and setup

In [29]:
from langchain import PromptTemplate, LLMChain, ConversationChain
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase

from langchain_experimental.sql import SQLDatabaseChain
from langchain_community.agent_toolkits import SQLDatabaseToolkit

from langchain_anthropic import ChatAnthropic

from supabase import create_client, Client

from sqlalchemy import create_engine

import requests
import json
import pandas as pd
import pyodbc

from google.colab import userdata

# setup supabase
supabase_url = userdata.get('SUPABASE_URL')
supabase_key = userdata.get('SUPABASE_KEY')
supabase_client = create_client(supabase_url, supabase_key)

# OpenAI key
claude_api_key = userdata.get('Claude_api_key')

In [10]:
# Test connection to Supabase using Supabase client

# There are 59 records. Displaying the first 3

response = supabase_client.table('Customer').select("*").execute()

## response object has data attribute that contains the result
## response.data
response_df = pd.DataFrame(response.data)
response_df.head(3)


Unnamed: 0,CustomerId,FirstName,LastName,Company,Address,City,State,Country,PostalCode,Phone,Fax,Email,SupportRepId
0,1,Luís,Gonçalves,Embraer - Empresa Brasileira de Aeronáutica S.A.,"Av. Brigadeiro Faria Lima, 2170",São José dos Campos,SP,Brazil,12227-000,+55 (12) 3923-5555,+55 (12) 3923-5566,luisg@embraer.com.br,3
1,2,Leonie,Köhler,,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,+49 0711 2842222,,leonekohler@surfeu.de,5
2,3,François,Tremblay,,1498 rue Bélanger,Montréal,QC,Canada,H2G 1A7,+1 (514) 721-4711,,ftremblay@gmail.com,3


In [65]:
# Connecting to Supabase using SQL Alchemy

# Make sure the connection string is a proper string within quotes
supabase_conn_str = userdata.get('supabase_conn_str')
#supabase_conn_str = "postgresql+psycopg2://[USERID:PASSWORD]@[HOSTNAME:PORT_NUMBER]/[DATABASE_NAME]"


# Use the connection string with SQLAlchemy's create_engine function
db_engine = create_engine(supabase_conn_str)
db = SQLDatabase(db_engine)

# test the connection
print(db.dialect)
print(db.get_usable_table_names())


postgresql
['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track', 'countries']


In [60]:


chinookDB_ER_from_github = get_ChinookDB_ER_from_github("https://raw.githubusercontent.com/cooolbabu/GoogleGemini101/main/OpenAI/ChinookDB_ER.txt")
# Markdown(chinookDB_ER_from_github) ## uncoment to view file from Github

## Build the prompt
dba_instructions="""
1. You are a Data Architect for ChinookDB.
2. An entiry-relationship diagram for ChinookDB, in the form of a Mermaid.js script is provide to you
3. You will use this ER diagram to deliver SQL queries.
4. Database dialect is PostgreSQL
5. Use quotes for database object names in SQL statements

"""

dba_system_message = SystemMessage(content=dba_instructions + chinookDB_ER_from_github)

prompt = ChatPromptTemplate.from_messages([
    dba_system_message,
    ("user", "{input}")
])


## Lets run some chains

In [69]:

llm = ChatAnthropic(model="claude-3-sonnet-20240229", temperature=0, max_tokens=1024, anthropic_api_key= claude_api_key)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)


In [71]:
db_chain.invoke("How many employees are there?")



[1m> Entering new SQLDatabaseChain chain...[0m
How many employees are there?
SQLQuery:[32;1m[1;3mSQLQuery:
SELECT COUNT(*) AS "Number of Employees"
FROM "Employee";[0m
SQLResult: [33;1m[1;3m[(8,)][0m
Answer:[32;1m[1;3mThere are 8 employees.[0m
[1m> Finished chain.[0m


{'query': 'How many employees are there?', 'result': 'There are 8 employees.'}

In [72]:
sales_by_genre = db_chain.invoke("Give me total sales by genre for all genres")



[1m> Entering new SQLDatabaseChain chain...[0m
Give me total sales by genre for all genres
SQLQuery:[32;1m[1;3mSQLQuery:
SELECT g."Name" AS "Genre", SUM(il."UnitPrice" * il."Quantity") AS "Total Sales"
FROM "InvoiceLine" il
JOIN "Track" t ON il."TrackId" = t."TrackId"
JOIN "Genre" g ON t."GenreId" = g."GenreId"
GROUP BY g."Name"
ORDER BY "Total Sales" DESC
LIMIT 5;[0m
SQLResult: [33;1m[1;3m[('Rock', Decimal('826.65')), ('Latin', Decimal('382.14')), ('Metal', Decimal('261.36')), ('Alternative & Punk', Decimal('241.56')), ('TV Shows', Decimal('93.53'))][0m
Answer:[32;1m[1;3mThe total sales by genre, ordered from highest to lowest, are:

1. Rock: $826.65
2. Latin: $382.14
3. Metal: $261.36
4. Alternative & Punk: $241.56
5. TV Shows: $93.53[0m
[1m> Finished chain.[0m


In [75]:
# And yes. The table has sales going into 2025. That is not an error.
# Convert this text-to-speech and we don't need BI Dashboards.
sales_by_quarter= db_chain.invoke("Give me total sales for each year by quarter")




[1m> Entering new SQLDatabaseChain chain...[0m
Give me total sales for each year by quarter
SQLQuery:[32;1m[1;3mSQLQuery:
SELECT
    EXTRACT(YEAR FROM "InvoiceDate") AS "Year",
    EXTRACT(QUARTER FROM "InvoiceDate") AS "Quarter",
    SUM("Total") AS "Total Sales"
FROM
    "Invoice"
GROUP BY
    EXTRACT(YEAR FROM "InvoiceDate"),
    EXTRACT(QUARTER FROM "InvoiceDate")
ORDER BY
    "Year", "Quarter";[0m
SQLResult: [33;1m[1;3m[(Decimal('2021'), Decimal('1'), Decimal('110.88')), (Decimal('2021'), Decimal('2'), Decimal('112.86')), (Decimal('2021'), Decimal('3'), Decimal('112.86')), (Decimal('2021'), Decimal('4'), Decimal('112.86')), (Decimal('2022'), Decimal('1'), Decimal('143.86')), (Decimal('2022'), Decimal('2'), Decimal('112.86')), (Decimal('2022'), Decimal('3'), Decimal('111.87')), (Decimal('2022'), Decimal('4'), Decimal('112.86')), (Decimal('2023'), Decimal('1'), Decimal('112.86')), (Decimal('2023'), Decimal('2'), Decimal('144.86')), (Decimal('2023'), Decimal('3'), Decimal('1

## Oops!!! Out of tokens :(

In [76]:
db_chain.run("Give me total sales by quarter for each genre")



[1m> Entering new SQLDatabaseChain chain...[0m
Give me total sales by quarter for each genre
SQLQuery:[32;1m[1;3mSQLQuery:
SELECT g."Name" AS "Genre", DATE_PART('quarter', i."InvoiceDate") AS "Quarter", SUM(il."UnitPrice" * il."Quantity") AS "Total Sales"
FROM "Genre" g
JOIN "Track" t ON g."GenreId" = t."GenreId"
JOIN "InvoiceLine" il ON t."TrackId" = il."TrackId"
JOIN "Invoice" i ON il."InvoiceId" = i."InvoiceId"
GROUP BY g."Name", DATE_PART('quarter', i."InvoiceDate")
ORDER BY "Genre", "Quarter";[0m
SQLResult: [33;1m[1;3m[('Alternative', 1.0, Decimal('5.94')), ('Alternative', 2.0, Decimal('3.96')), ('Alternative', 4.0, Decimal('3.96')), ('Alternative & Punk', 1.0, Decimal('60.39')), ('Alternative & Punk', 2.0, Decimal('60.39')), ('Alternative & Punk', 3.0, Decimal('45.54')), ('Alternative & Punk', 4.0, Decimal('75.24')), ('Blues', 1.0, Decimal('15.84')), ('Blues', 2.0, Decimal('10.89')), ('Blues', 3.0, Decimal('14.85')), ('Blues', 4.0, Decimal('18.81')), ('Bossa Nova', 1.0, 

'The query groups the sales data by genre and quarter, and calculates the total sales for each combination. The results show the total sales amount for each genre, broken down by quarter.\n\nSome key observations from the results:\n\n- Rock genre had the highest sales across all quarters, ranging from around $200 to $220 per quarter.\n- Latin genre also had consistently high sales, around $85 to $110 per quarter.\n- Metal genre sales were around $55 to $80 per quarter.\n- Genres like Alternative & Punk, Blues, Jazz also had decent sales numbers across quarters.\n- Genres like Comedy, Easy Listening, Sci Fi & Fantasy, Science Fiction had relatively lower sales.\n\nThe breakdown by quarter allows analyzing sales trends over time for each genre. This type of analysis can help identify popular genres, plan inventory, and make data-driven decisions for the music business.'

# So what do you guys's think. What do you think is the toughest query. Post it on the Applied AI Slack channel.