# Use Langchain to interact with a SQL database 

The following code showcases an example of the Databricks SQL Agent. With the Databricks SQL agent any Databricks users can interact with a specified schema in Databrick Unity Catalog and generate insights on their data.

## Requirements

- To use this notebook, please provide your OpenAI API Token.
- Databricks Runtime 13.3 ML and above

### Imports

Databricks recommends the latest version of `langchain` and the `databricks-sql-connector`.

In [0]:
%pip install --upgrade langchain databricks-sql-connector sqlalchemy langchain-openai

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
dbutils.library.restartPython()

In [0]:
import os
os.environ["OPENAI_API_KEY"] = ""

### SQL Database Agent

This is an example of how to interact with a certain schema in Unity Catalog. Please note that the agent can't create new tables or delete tables. It can only query tables.

The database instance is created within:
```
db = SQLDatabase.from_databricks(catalog="...", schema="...")
```
And the agent (and the required tools) are created by:
```
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, **kwargs)
```

In [0]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain_openai import OpenAI

db = SQLDatabase.from_databricks(catalog="samples", schema="nyctaxi")
llm = OpenAI(temperature=.7)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
agent = create_sql_agent(llm=llm, toolkit=toolkit, verbose=True)

In [0]:
agent.invoke("What is the longest trip distance and how long did it take?")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m This question requires data from two different tables, so I will need to use a join statement in my SQL query.
Action: sql_db_query
Action Input: SELECT MAX(trip_distance) AS longest_trip_distance, trip_duration AS trip_duration FROM trips JOIN durations ON trips.trip_id = durations.trip_id[0m[36;1m[1;3mError: (databricks.sql.exc.ServerOperationError) [TABLE_OR_VIEW_NOT_FOUND] The table or view `samples`.`nyctaxi`.`durations` cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01
[SQL: SELECT MAX(trip_distance) AS longest_trip_distance, trip_duration AS trip_duration FROM trips JOIN durations ON trips.trip_id = durations.trip_id]
(Background on this error at: ht

{'input': 'What is the longest trip distance and how long did it take?',
 'output': 'Agent stopped due to iteration limit or time limit.'}