# This notebook is part 1 of using langchain with local ai models to query an arbitrary database.
In this notebook I will try to use langchain with local models to generate a SQL query, run the query, then format the results into natural language.  I will be using the other unnumbered langchain notebooks as a reference.  I'll also be using some of the logic from the previous autogen notebooks.

In [None]:
%pip install python-dotenv
%pip install langchain
%pip install pip install pyodbc

## Connect to database

In [1]:
# connect to MSSQL database
import pyodbc

SERVER = '127.0.0.1'
DATABASE = 'TimeBasedCommitments'
USERNAME = 'sa'
PASSWORD = 'BadDefaultPassword!'

connectionString = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'

conn = pyodbc.connect(connectionString)

# conn.close()

## Fetch database schema to use as context for AI prompts

In [2]:
# Get table names
get_all_tables_stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES;"
cursor = conn.cursor()
cursor.execute(get_all_tables_stmt)
table_names = [row[0] for row in cursor.fetchall()]

# Get all table definitions and format them into CREATE TABLE statements
get_def_stmt = f"""
SELECT COLUMN_NAME, DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ?
"""

definitions = []
for table_name in table_names:
    cursor.execute(get_def_stmt, (table_name,))
    rows = cursor.fetchall()

    create_table_stmt = f"CREATE TABLE {table_name} (\n"
    for row in rows:
        create_table_stmt += f"{row[0]} {row[1]},\n"
    create_table_stmt = create_table_stmt.rstrip(",\n") + "\n);"
    
    definitions.append(create_table_stmt)

table_definitions = "\n\n".join(definitions)

## Get environment variables

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.
KOBOLD_API_BASE = os.getenv('KOBOLD_API_BASE')
print(KOBOLD_API_BASE)

http://localhost:5001/api/


## Build AI prompt used to request a SQL query
This prompt seemed to work well with mixtrel.  It uses the recommended prompt format from here https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF.
I may want to enhance this by specifying that the query should return names and not ids.
I may also want to split out the template a bit so I don't rebuild the template each time I create a prompt.
The model keeps escaping _ characters with \ in its response so I had to strip them out. Asking it to not use \ in the prompt only helped sometimes.

In [18]:
from langchain.llms import KoboldApiLLM
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

llm = KoboldApiLLM(endpoint=KOBOLD_API_BASE, temperature=0.1, )

template = """[INST]
Create a SQL query that can be used to answer this question: {question}

Use these TABLE_DEFINITIONS to satisfy the database query.

[TABLE_DEFINITIONS]
{table_definitions}
[/TABLE_DEFINITIONS]

Respond only with the SQL query as raw text. I need to be able to easily parse the sql query from your response.
[/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["question", "table_definitions"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

question = "Who completed the most specific commitments?"

sql_query = llm_chain.run(question = question, table_definitions = table_definitions)

# remove all isntances of \ in the sql query
sql_query = sql_query.replace("\\", "")

print(sql_query)

SELECT TOP 1 completed_by, COUNT(*) as num_completions
FROM tblSpecificCommitments
WHERE completed IS NOT NULL
GROUP BY completed_by
ORDER BY num_completions DESC;


## Execute the SQL query
I should wrap this in a loop.  That way if the query fails to run I can generate a new one and try again.

In [19]:
cursor = conn.cursor()
cursor.execute(sql_query)
rows = cursor.fetchall()
print(rows)

[(1, 1)]


## Format a response

In [8]:
template = """[INST]
Answer this question: {question}

Assume this SQL query was run: {sql_query}

Assume this was the data returned from the query: {data}
[/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["question", "sql_query", "data"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

response = llm_chain.run(question = question, sql_query = sql_query, data = rows)

print(response)

Based on the information provided, the individual who completed the most specific commitments is the one with the ID of 1, who has completed a total of 1 commitment. Since there are no other records in the result set, it can be assumed that no other individuals have completed any specific commitments at this time.


## Cleanup

In [38]:
conn.close()