# Use Langchain to interact with a SQL database 

The following code showcases an example of the Databricks SQL Agent. With the Databricks SQL agent any Databricks users can interact with a specified schema in Databrick Unity Catalog and generate insights on their data.

## Requirements

- To use this notebook, please provide your OpenAI API Token.
- Databricks Runtime 13.3 ML and above

### Imports

Databricks recommends the latest version of `langchain` and the `databricks-sql-connector`.

In [0]:
%pip install --upgrade langchain databricks-sql-connector sqlalchemy langchain-openai

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
dbutils.library.restartPython()

In [0]:
import os
os.environ["OPENAI_API_KEY"] = ""

### SQL Database Agent

This is an example of how to interact with a certain schema in Unity Catalog. Please note that the agent can't create new tables or delete tables. It can only query tables.

The database instance is created within:
```
db = SQLDatabase.from_databricks(catalog="...", schema="...")
```
And the agent (and the required tools) are created by:
```
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, **kwargs)
```

In [0]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain_openai import OpenAI

db = SQLDatabase.from_databricks(catalog="samples", schema="nyctaxi")
llm = OpenAI(temperature=.7)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)

In [0]:
agent.invoke("What is the longest trip distance and how long did it take?")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m This information is likely stored in a table with trip distances and durations.
Action: sql_db_query
Action Input: SELECT MAX(trip_distance), MAX(trip_duration) FROM trips;[0m[36;1m[1;3mError: (databricks.sql.exc.ServerOperationError) [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column, variable, or function parameter with name `trip_duration` cannot be resolved. Did you mean one of the following? [`trip_distance`, `pickup_zip`, `dropoff_zip`, `fare_amount`, `tpep_pickup_datetime`]. SQLSTATE: 42703; line 1 pos 31
[SQL: SELECT MAX(trip_distance), MAX(trip_duration) FROM trips;]
(Background on this error at: https://sqlalche.me/e/20/4xp6)[0m[32;1m[1;3mIt seems like the column `trip_duration` does not exist in the `trips` table. I should use the `sql_db_schema` tool to check the correct table fields.
Action: sql_db_schema
Action Input: trips[0m[33;1m[1;3m
CREATE TABLE trips (
	tpep_pickup_datetime TIMESTAMP, 
	tpep_dropo

{'input': 'What is the longest trip distance and how long did it take?',
 'output': 'The longest trip distance is 30.6 miles and it took $275.'}