# This notebook is part 2 of using langchain with local ai models to query an arbitrary database.
In this notebook I will try to use langchain with two local models to generate a SQL query, run the query, then format the results into natural language.  I will be using the other unnumbered langchain notebooks as a reference.  I'll also be using some of the logic from the previous autogen notebooks.

In [None]:
%pip install langchain
%pip install pip install pyodbc

## Connect to database

In [None]:
# connect to MSSQL database
import pyodbc

SERVER = '127.0.0.1'
DATABASE = 'TimeBasedCommitments'
USERNAME = 'sa'
PASSWORD = 'BadDefaultPassword!'

connectionString = f'DRIVER={{ODBC Driver 17 for SQL Server}};SERVER={SERVER};DATABASE={DATABASE};UID={USERNAME};PWD={PASSWORD}'

conn = pyodbc.connect(connectionString)

# conn.close()

## Fetch database schema to use as context for AI prompts

In [None]:
# Get table names
get_all_tables_stmt = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES;"
cursor = conn.cursor()
cursor.execute(get_all_tables_stmt)
table_names = [row[0] for row in cursor.fetchall()]

# Get all table definitions and format them into CREATE TABLE statements
get_def_stmt = f"""
SELECT COLUMN_NAME, DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ?
"""

definitions = []
for table_name in table_names:
    cursor.execute(get_def_stmt, (table_name,))
    rows = cursor.fetchall()

    create_table_stmt = f"CREATE TABLE {table_name} (\n"
    for row in rows:
        create_table_stmt += f"{row[0]} {row[1]},\n"
    create_table_stmt = create_table_stmt.rstrip(",\n") + "\n);"
    
    definitions.append(create_table_stmt)

table_definitions = "\n\n".join(definitions)

## Set model api endpoints
This should be updated to use environment variables

In [None]:
import os

DEEPSEEK_CODER_API_BASE = 'http://localhost:5001/v1'
MISTRAL_API_BASE = 'http://localhost:5002/api/'

## Build AI prompt used to request a SQL query
This prompt works well with https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct-GGUF.  It uses the recommended prompt template.

I had to specify some paramters in the prompt like not using GROUP BY and LIMIT because it was coming up with queries that MSSQL didn't support.

I specified not to return just ids because a lot of the query answers were just user ids that wouldnt mean anything to the user.

The model seems to respond with markdown wrappers around the SQL query

In [None]:

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import KoboldApiLLM

deepseek_llm = KoboldApiLLM(endpoint=DEEPSEEK_CODER_API_BASE, temperature=0.1, max_context_length=2048, max_length=1024 )

template = """You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer.
### Instruction:
Create a SQL query that can be used to answer this question: Who has the most specific commitments assigned to them?

The query should respond with information other than ids.  For instance instead of returning id return email.

If using GROUP BY be sure to include all required columns

Do not use the LIMIT keyword because it is not supported in Microsoft SQL

Use these TABLE_DEFINITIONS to satisfy the database query.

TABLE_DEFINITIONS

{table_definitions}
### Response:
"""

prompt = PromptTemplate(template=template, input_variables=["question", "table_definitions"])

llm_chain = LLMChain(prompt=prompt, llm=deepseek_llm)

question = "Who has the most specific commitments assigned to them?"

response = llm_chain.run(question = question, table_definitions = table_definitions)

sql_query = response.split('```sql')[1].split('```')[0]

print("#### RESPONSE ####")
print(response)

print("#### SQL QUERY ####")
print(sql_query)

## Execute the SQL query
I should wrap this in a loop.  That way if the query fails to run I can generate a new one and try again.

In [None]:
cursor = conn.cursor()
cursor.execute(sql_query)
rows = cursor.fetchall()
print(rows)

## Connect to the second model and craft a response based on question, query, and results
For this I am using https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF

In [None]:
mistral_llm = KoboldApiLLM(endpoint=MISTRAL_API_BASE, temperature=0.1, max_context_length=2048, max_length=1024 )

template = """[INST]
Assume this SQL query was run: {sql_query}

Assume this was the data returned from the query: {data}

Answer this question without mentioning the query: {question}
[/INST]
"""

prompt = PromptTemplate(template=template, input_variables=["question", "sql_query", "data"])

llm_chain = LLMChain(prompt=prompt, llm=mistral_llm)

response = llm_chain.run(question = question, sql_query = sql_query, data = rows)

print(response)

## Cleanup

In [None]:
conn.close()