# Create a LangChain NL2SQL Agent using Azure OpenAI and Azure SQL Database
This notebook goes through the process of creating a LangChain SQL Agent using Azure OpenAI as the LLM against an Azure SQL Database.

## Install the required python libraries
Start by installing the required libraries. Run the following at the terminal in the project folder so it references the project's requirements.txt:

```bash
pip install -r requirements.txt
```


## ODBC Driver for MS SQL Install

Use the **odbcDriverInstallUbuntu.txt** script to install the Microsoft ODBC Driver for MS SQL (version 18).

If you are not using codespace or Ubuntu, you can find the correct script to install the driver for linux [here](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server), for windows [here](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server), and for MacOS [here](https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/install-microsoft-odbc-driver-sql-server-macos).

## Create the table in the database
(all SQL commands are in the database.sql script)

In the database that will be used for this notebook, you will need to have "market_industry" table with the following columns

```
id,
year
month
market
industry
link_clicks
impression
spend
cpc
cpm
ctr
```


## .env file
Fill out the .env file with your server and key values. For this notebook, you need to add your values to the **AZURE_OPENAI_API_KEY**, **AZURE_OPENAI_ENDPOINT** and **py-connectionString** variables.

```BASH
AZURE_OPENAI_API_KEY="" 
AZURE_OPENAI_ENDPOINT="" 
OPENAI_API_KEY="" 
py-connectionString="mssql+pyodbc://USERNAME:PASSWORD@SERVER_NAME.database.windows.net/DATABASE_NAME?driver=ODBC+Driver+18+for+SQL+Server"
```

## Notebook Kernel
Be sure to select a kernel for the python notebook by using the **Select Kernel** button in the upper right of the notebook.

## Starting the Example
The first section sets up the python environment and gets any environment variables that were set.

In [1]:

import pyodbc
import os
from dotenv import load_dotenv
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_openai import AzureOpenAI
load_dotenv()

True

Next, create the database connection and test.

In [2]:
# connect to the Azure SQL database

from sqlalchemy import create_engine
from sqlalchemy.engine import URL

# connectionString=os.environ["py-connectionString"]
# db_engine = create_engine(connectionString)

engine_str = URL.create(
    drivername="mssql+pyodbc",
    username=os.environ["AZURE_SQL_USER"],
    password=os.environ["AZURE_SQL_PASSWORD"],
    host=os.environ["AZURE_SQL_SERVER"],
    port=1433,
    database=os.environ["AZURE_SQL_DATABASE"],
    query={
        "driver": "ODBC Driver 17 for SQL Server",
        "TrustServerCertificate": "no",
        "Connection Timeout": "30",
        "Encrypt": "yes",
    },
)

db_engine = create_engine(engine_str)
db = SQLDatabase(db_engine, view_support=True, schema="dbo")

# test the connection
print(db.dialect)
print(db.get_usable_table_names())
db.run("select convert(varchar(25), getdate(), 120)")


mssql
['market_industry']


"[('2024-04-08 20:39:39',)]"

Create a reference to Azure OpenAI as the LLM to be used with the SQL agent. Replace DEPLOYMENT_NAME with the name of your Azure OpenAI gpt-3.5-turbo-instruct deployment

In [3]:
azurellm = AzureOpenAI(
    deployment_name="jhl-35-ti",
    model_name="gpt-35-turbo-instruct",
    api_version="2024-02-15-preview"
)

Run the following to create the SQL Agent

In [4]:
toolkit = SQLDatabaseToolkit(db=db, llm=azurellm)

agent_executor = create_sql_agent(
    llm=azurellm,
    toolkit=toolkit,
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)


Now, test the agent by creating a prompt using natural language asking about a database object.

In [5]:
agent_executor.invoke("For Automotive industries in Germany, how much did CPMs changed since 2023? give me a by month break down?") 



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3m I need to query the database for CPMs and filter for Automotive industries in Germany. I also need to specify a date range and group the results by month.
Action: sql_db_query
Action Input: SELECT CPM, date FROM Automotive WHERE industry = 'Germany' AND date >= '2023-01-01' AND date <= '2023-12-31' GROUP BY MONTH(date)[0m[36;1m[1;3mError: (pyodbc.ProgrammingError) ('42S02', "[42S02] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name 'Automotive'. (208) (SQLExecDirectW)")
[SQL: SELECT CPM, date FROM Automotive WHERE industry = 'Germany' AND date >= '2023-01-01' AND date <= '2023-12-31' GROUP BY MONTH(date)]
(Background on this error at: https://sqlalche.me/e/20/f405)[0m[32;1m[1;3mI need to check the table name to make sure it is correct.
Action: sql_db_list_tables
Action Input: ""[0m[38;5;200m[1;3mmarket_industry[0m[32;1m[1;3m I need to check the schema for the market_industry table to 

{'input': 'For Automotive industries in Germany, how much did CPMs changed since 2023? give me a by month break down?',
 'output': 'Based on the sample data from market_industry, the average CPM for Automotive industries in Germany in 2023 is 3.7799034118652344.'}