Create a test sql database from titanic dataset.

https://python.langchain.com/docs/use_cases/sql/csv/

In [1]:
import pandas as pd
from pyprojroot import here

In [2]:
df = pd.read_csv(here("data/for_upload/ocean_2.csv"))
print(df.shape)
print(df.columns.tolist())
display(df.head(3))

(19, 9)
['Location', 'Depth', 'Temperature', 'Salinity', 'Pressure', 'Dissolved Oxygen', 'Sea Level', 'Tsunami Risk Level', 'Conductivity']


Unnamed: 0,Location,Depth,Temperature,Salinity,Pressure,Dissolved Oxygen,Sea Level,Tsunami Risk Level,Conductivity
0,Paradip Coast,2628.04,15.98,31.67,359135.72,7.49,1.085,Medium,32.03698
1,Haldia Coast,6655.49,1.55,31.68,754228.57,5.2,1.316,Low,54.468067
2,Daman Coast,5126.64,10.1,30.37,604248.38,2.95,1.387,Low,51.024729


### **SQL**

Using SQL to interact with CSV data is the recommended approach because it is easier to limit permissions and sanitize queries than with arbitrary Python.

Most SQL databases make it easy to load a CSV file in as a table (DuckDB, SQLite, etc.). Once you’ve done this you can use all of the chain and agent-creating techniques outlined in the SQL use case guide. Here’s a quick example of how we might do this with SQLite:

In [3]:
from langchain_community.utilities import SQLDatabase
from sqlalchemy import create_engine
db_path = str(here("data")) + "/test_sqldb.db"
db_path = f"sqlite:///{db_path}"

engine = create_engine(db_path)
# df.to_sql("titanic", engine, index=False)
df.to_sql("test_sqldb2", engine, index=False)

19

For multiple csv files, we can create a sql with multiple tables:
```
df1.to_sql("csv1_name", engine, index=False)
df2.to_sql("csv2_name", engine, index=False)
```

In [4]:
db = SQLDatabase(engine=engine)
print(db.dialect)
print(db.get_usable_table_names())
db.run("SELECT * FROM test_sqldb2 WHERE 'Tsunami Risk Level' == 'High';")

sqlite
['test_sqldb2']


''

**Equivalent in Pandas**

In [5]:
df[df["Tsunami Risk Level"] == 'High']

Unnamed: 0,Location,Depth,Temperature,Salinity,Pressure,Dissolved Oxygen,Sea Level,Tsunami Risk Level,Conductivity
4,Kochi Coast,1100.57,1.9,36.64,209290.92,6.76,0.543,High,34.865852
5,Visakhapatnam Coast,1100.4,29.7,34.75,209274.24,3.9,1.455,High,36.001406
6,Goa Coast,416.0,9.67,38.44,142134.6,7.14,1.015,High,52.468499
14,Haldia Coast,1280.96,10.47,35.31,226987.18,5.93,0.63,High,30.099893
15,Haldia Coast,1292.0,27.89,39.37,228070.2,2.91,1.043,High,51.223348


### **Create an agent to interact with the Database**

In [6]:
import os
from dotenv import load_dotenv
import warnings
warnings.filterwarnings("ignore")
print("Environment variables are loaded:", load_dotenv())
print("test by reading a variable:", os.getenv("GOOGLE_API_KEY"))


Environment variables are loaded: True
test by reading a variable: AIzaSyCIjJcdvtJQulHXlLPLxSqcBXkOcv-IR9Q


In [7]:
from langchain_google_genai import ChatGoogleGenerativeAI

google_api_key = os.environ["GOOGLE_API_KEY"]
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",  
    google_api_key=google_api_key,
    temperature=0.0
)

In [8]:
from langchain_community.agent_toolkits import create_sql_agent
agent_executor = create_sql_agent(llm, db=db, agent_type="tool-calling", verbose=True)

In [9]:
agent_executor.invoke({"input": "what is the average value of depth"})



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m
Invoking: `sql_db_list_tables` with `{}`


[0m[38;5;200m[1;3mtest_sqldb2[0m[32;1m[1;3m
Invoking: `sql_db_schema` with `{'table_names': 'test_sqldb2'}`


[0m[33;1m[1;3m
CREATE TABLE test_sqldb2 (
	"Location" TEXT, 
	"Depth" FLOAT, 
	"Temperature" FLOAT, 
	"Salinity" FLOAT, 
	"Pressure" FLOAT, 
	"Dissolved Oxygen" FLOAT, 
	"Sea Level" FLOAT, 
	"Tsunami Risk Level" TEXT, 
	"Conductivity" FLOAT
)

/*
3 rows from test_sqldb2 table:
Location	Depth	Temperature	Salinity	Pressure	Dissolved Oxygen	Sea Level	Tsunami Risk Level	Conductivity
Paradip Coast	2628.04	15.98	31.67	359135.72	7.49	1.085	Medium	32.03698004
Haldia Coast	6655.49	1.55	31.68	754228.57	5.2	1.316	Low	54.46806663
Daman Coast	5126.64	10.1	30.37	604248.38	2.95	1.387	Low	51.02472877
*/[0m[32;1m[1;3m
Invoking: `sql_db_query` with `{'query': 'SELECT AVG(Depth) FROM test_sqldb2'}`


[0m[36;1m[1;3m[(3270.583684210527,)][0m[32;1m[1;3mThe average depth is 

{'input': 'what is the average value of depth',
 'output': 'The average depth is 3270.58.'}