# Create a LangChain NL2SQL Agent using Azure OpenAI and Azure SQL Database
This notebook goes through the process of creating a LangChain SQL Agent using Azure OpenAI as the LLM against an Azure SQL Database.

## Install the required python libraries
Start by installing the required libraries. Run the following at the terminal in the project folder so it references the project's requirements.txt:

```bash
pip install -r requirements.txt
```


## ODBC Driver for MS SQL Install

Use the **odbcDriverInstallUbuntu.txt** script to install the Microsoft ODBC Driver for MS SQL (version 18).

If you are not using codespace or Ubuntu, you can find the correct script to install the driver for linux [here](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server), for windows [here](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server), and for MacOS [here](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos).

## Create the table in the database
(all SQL commands are in the database.sql script)

In the database that will be used for this notebook, run the following:

(create table permission and access to the dbo schema is needed. It's best to keep the roles and permissions at a minimum with working with NL2SQL)

```SQL
create table [dbo].[langtable] (id int Identity, username nvarchar(100))
GO

insert into [dbo].[langtable] (username) values('sammy')
insert into [dbo].[langtable] (username) values('mary')
insert into [dbo].[langtable] (username) values('jane')
insert into [dbo].[langtable] (username) values('fred')
insert into [dbo].[langtable] (username) values('billy')
insert into [dbo].[langtable] (username) values('jonny')
insert into [dbo].[langtable] (username) values('kenny')
insert into [dbo].[langtable] (username) values('dan')
insert into [dbo].[langtable] (username) values('frank')
insert into [dbo].[langtable] (username) values('jenny')
GO

select * from [dbo].[langtable]
GO
```


## .env file
Fill out the .env file with your server and key values. For this notebook, you need to add your values to the **AZURE_OPENAI_API_KEY**, **AZURE_OPENAI_ENDPOINT** and **py-connectionString** variables.

```BASH
AZURE_OPENAI_API_KEY="" 
AZURE_OPENAI_ENDPOINT="" 
OPENAI_API_KEY="" 
py-connectionString="mssql+pyodbc://USERNAME:PASSWORD@SERVER_NAME.database.windows.net/DATABASE_NAME?driver=ODBC+Driver+18+for+SQL+Server"
```

## Notebook Kernel
Be sure to select a kernel for the python notebook by using the **Select Kernel** button in the upper right of the notebook.

## Starting the Example
The first section sets up the python environment and gets any environment variables that were set.

In [1]:

import pyodbc
import os
from dotenv import load_dotenv
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_openai import AzureChatOpenAI
load_dotenv()

True

Next, create the database connection and test.

In [2]:
# connect to the Azure SQL database

from sqlalchemy import create_engine

connectionString=os.environ["py-connectionString"]

db_engine = create_engine(connectionString)

db = SQLDatabase(db_engine, view_support=True, schema="dbo", include_tables=['rpt_mvd_metrics_score_estate', 'rpt_mvd_metrics_score_psm', 'rpt_mvd_metrics_score_region'])
# db = SQLDatabase(db_engine, view_support=True, schema="dbo", include_tables=['FFB_Production_block', 'metric_FFB_Production_block'])

# test the connection
print(db.dialect)
print(db.get_usable_table_names())
db.run("select convert(varchar(25), getdate(), 120)")


mssql
['rpt_mvd_metrics_score_estate', 'rpt_mvd_metrics_score_psm', 'rpt_mvd_metrics_score_region']


"[('2024-03-01 06:35:32',)]"

Create a reference to Azure OpenAI as the LLM to be used with the SQL agent. Replace DEPLOYMENT_NAME with the name of your Azure OpenAI gpt-3.5-turbo-instruct deployment

In [3]:
azurellm = AzureChatOpenAI(
    azure_deployment="gpt-4",
    model="gpt-4",
    api_version="2024-02-15-preview"
)

Run the following to create the SQL Agent

In [4]:
toolkit = SQLDatabaseToolkit(db=db, llm=azurellm)

agent_executor = create_sql_agent(
    llm=azurellm,
    toolkit=toolkit,
    verbose=True,
    agent_type="openai-tools",
)

print(agent_executor)

name='SQL Agent Executor' verbose=True agent=RunnableAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_log_to_str(x['intermediate_steps']))
})
| PromptTemplate(input_variables=['agent_scratchpad', 'input'], partial_variables={'tools': "sql_db_query: Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', use sql_db_schema to query the correct table fields.\nsql_db_schema: Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: table1, table2, table3\nsql_db_list_tables: Input is an empty string, output is a comma separated list of tables in the database.\nsql

Now, test the agent by creating a prompt using natural language asking about a database object.

In [5]:
agent_executor.invoke("Give me FFB Production in January 2022 in BAME")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mTo answer this question, I need to know the structure of the relevant tables in the database that contain the data for FFB (Fresh Fruit Bunch) Production, particularly for BAME (which might be a location or a company acronym) in January 2022.

Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mrpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[32;1m[1;3mGiven the list of tables, I need to determine which table contains the FFB Production data for BAME. The table names suggest they are related to metrics scores for estates, psm, and regions. I will need to check the schema of these tables to find out which one contains the required data.

Action: sql_db_schema
Action Input: rpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[33;1m[1;3m
CREATE TABLE dbo.rpt_mvd_metrics_score_estate (
	year VARCHAR(50) COLLATE SQL_Latin1_General_CP1_

{'input': 'Give me FFB Production in January 2022 in BAME',
 'output': 'The FFB Production in January 2022 in BAME was 1799.65.'}

In [6]:
agent_executor.invoke("Give me FFB Production in January 2022 in PSM 3")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mTo answer this question, I will need to know the structure of the database to find out which table contains the FFB (Fresh Fruit Bunch) Production data and specifically filter for January 2022 in PSM 3. I will start by listing the tables to understand what tables are available.

Action: sql_db_list_tables
Action Input: (empty string)[0m[38;5;200m[1;3mrpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[32;1m[1;3mThe tables listed might contain the data I need, but it's not immediately clear which one contains the FFB Production data for PSM 3. I need to look at the schema of these tables to find out which one has the relevant data.

Action: sql_db_schema
Action Input: rpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[33;1m[1;3m
CREATE TABLE dbo.rpt_mvd_metrics_score_estate (
	year VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	m

{'input': 'Give me FFB Production in January 2022 in PSM 3',
 'output': 'For PSM 3 in January 2022, there are three distinct FFB production values recorded in the database: 14,440.74, 55,482.1, and 69,922.84. Additional context is required to determine the significance of each distinct value.'}

In [7]:
agent_executor.invoke("Give me FFB Production in January 2022 in KALSEL1")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mThought: To answer the question, I need to know the structure of the database to find out which table contains information about FFB (Fresh Fruit Bunch) Production in January 2022 in KALSEL1. I will first list all the available tables to find the one that is likely to have this data.

Action: sql_db_list_tables
Action Input: (no input required)[0m[38;5;200m[1;3mrpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[32;1m[1;3mThought: The tables listed seem to be related to reporting metrics, and since the question is about production data, one of these tables could contain the required information. I will need to check the schema of these tables to determine which one is likely to have the FFB Production data for KALSEL1.

Action: sql_db_schema
Action Input: rpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[33;1m[1;3m
CREATE TABLE dbo.rpt_m

{'input': 'Give me FFB Production in January 2022 in KALSEL1',
 'output': 'The FFB Production in January 2022 in KALSEL1 was 1,070,309.4 (rounded to one decimal place).'}

In [8]:
agent_executor.invoke("list all region with their FFB Production")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mTo list all regions with their FFB (Fresh Fruit Bunch) production, I will need to know which table contains this information. First, I'll list all the tables to find out which ones are available.

Action: sql_db_list_tables
Action Input: (empty string)[0m[38;5;200m[1;3mrpt_mvd_metrics_score_estate, rpt_mvd_metrics_score_psm, rpt_mvd_metrics_score_region[0m[32;1m[1;3mI see there are three tables related to metrics scores for different scopes: estate, psm, and region. It's likely that FFB production data might be in the region-related table. I should check the schema of the rpt_mvd_metrics_score_region table to confirm if it contains FFB production data.

Action: sql_db_schema
Action Input: rpt_mvd_metrics_score_region[0m[33;1m[1;3m
CREATE TABLE dbo.rpt_mvd_metrics_score_region (
	year VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	month VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, 
	psmid VARC

{'input': 'list all region with their FFB Production',
 'output': 'The regions and their Fresh Fruit Bunch (FFB) production are as follows:\n\n- BABEL: Various entries with different FFB production values.\n- KETAPANG2: Various entries with different FFB production values.\n- KALTIM1: Various entries with different FFB production values.\n- SUMSEL1: Various entries with different FFB production values.\n- KALSEL2: Various entries with different FFB production values.\n- LAMPUNG: Various entries with different FFB production values.\n- KETAPANG1: Various entries with different FFB production values.\n- SUMSEL2: Various entries with different FFB production values.\n- SEMITAU: Various entries with different FFB production values.\n- KALSEL1: Various entries with different FFB production values.\n- KALTIM2: Various entries with different FFB production values.\n- PAPUA: Various entries with different FFB production values.\n\nEach region has multiple entries with different FFB production 