# LLM-Powered SQL Query Assistant using Llama and LangChain

**Objectives:**
1. Create a SQL agent that can query a database
2. Use Langsmith to trace the agent
3. Use Langsmith to evaluate the agent
3. Evaluate the agent with metrics?
4. Visualise the output with dataframes, plots
5. Create a chat interface for the agent

## Install Dependencies


In [9]:
# !pip install -U langchain langchain_community langchain-ollama langchain_huggingface faiss-cpu pymysql

## Import Ollama and set up LangSmith Integration


In [3]:
# Environment variables must be set before agent is created
from dotenv import load_dotenv
import os

# Load environment variables from .env file
env_path = './.env'
load_dotenv(dotenv_path=env_path)

# Verify the environment variables are set
print("LANGCHAIN_TRACING_V2:", os.getenv("LANGCHAIN_TRACING_V2"))
print("LANGCHAIN_API_KEY:", "***" if os.getenv("LANGCHAIN_API_KEY") else "Not set")
print("LANGCHAIN_PROJECT:", os.getenv("LANGCHAIN_PROJECT"))

LANGCHAIN_TRACING_V2: true
LANGCHAIN_API_KEY: ***
LANGCHAIN_PROJECT: pr-ajar-upward-57


In [4]:
# Import Ollama and test connection
from langchain_ollama import ChatOllama

# Create agent
llm = ChatOllama(model='llama3', base_url='http://localhost:11434')
llm.invoke('Hello, world!') # The run should be traced and appear in LangSmith

AIMessage(content="Hello there! It's great to meet you! I'm your friendly AI assistant, and I'm here to help with any questions or topics you'd like to discuss. What brings you to this corner of the internet today?", additional_kwargs={}, response_metadata={'model': 'llama3', 'created_at': '2024-12-20T09:18:56.5303299Z', 'done': True, 'done_reason': 'stop', 'total_duration': 4408384600, 'load_duration': 2997534400, 'prompt_eval_count': 14, 'prompt_eval_duration': 170000000, 'eval_count': 46, 'eval_duration': 1239000000, 'message': Message(role='assistant', content='', images=None, tool_calls=None)}, id='run-f2148272-284e-49b8-bf6c-6fc296e997b6-0', usage_metadata={'input_tokens': 14, 'output_tokens': 46, 'total_tokens': 60})

## Create SQL agent

In [19]:
from langchain_community.utilities import SQLDatabase

db = SQLDatabase.from_uri("mysql+pymysql://root:admin@localhost/test_data")

# Inspect the database
print("Dialect:", db.dialect)
print("Tables:", db.get_usable_table_names())

# Test query
db.run("select * from sales")


Dialect: mysql
Tables: ['sales']


"[(1, 'Laptop', 2, Decimal('1200.00'), datetime.date(2024, 1, 15)), (2, 'Headphones', 5, Decimal('200.00'), datetime.date(2024, 2, 10)), (3, 'Monitor', 3, Decimal('300.00'), datetime.date(2024, 2, 20)), (4, 'Keyboard', 10, Decimal('50.00'), datetime.date(2024, 3, 5)), (5, 'Mouse', 8, Decimal('25.00'), datetime.date(2024, 3, 10)), (6, 'Smartphone', 4, Decimal('800.00'), datetime.date(2024, 4, 1)), (7, 'Tablet', 6, Decimal('500.00'), datetime.date(2024, 4, 15)), (8, 'Printer', 2, Decimal('150.00'), datetime.date(2024, 5, 5)), (9, 'Scanner', 1, Decimal('100.00'), datetime.date(2024, 5, 15)), (10, 'Camera', 3, Decimal('750.00'), datetime.date(2024, 6, 1))]"

In [14]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain.agents import AgentExecutor

# Create SQL agent
agent_executor = create_sql_agent(
    llm=llm,
    db=db,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Run the agent with a query
response = agent_executor.invoke(
    "What are all the sales records in the database?"
)

print(response['output'])




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mLet's start by listing the tables in the database.

Thought: I need to know the list of tables in the database before I can proceed.
Action: sql_db_list_tables
Action Input: (empty string)[0m[38;5;200m[1;3msales[0m[32;1m[1;3mThought: Now that I have the list of tables, I should query the schema of the most relevant tables.

Action: sql_db_schema
Action Input: sales[0m[33;1m[1;3m
CREATE TABLE sales (
	sale_id INTEGER NOT NULL AUTO_INCREMENT, 
	product_name VARCHAR(255), 
	quantity INTEGER, 
	price_per_unit DECIMAL(10, 2), 
	sale_date DATE, 
	PRIMARY KEY (sale_id)
)ENGINE=InnoDB COLLATE utf8mb4_0900_ai_ci DEFAULT CHARSET=utf8mb4

/*
3 rows from sales table:
sale_id	product_name	quantity	price_per_unit	sale_date
1	Laptop	2	1200.00	2024-01-15
2	Headphones	5	200.00	2024-02-10
3	Monitor	3	300.00	2024-02-20
*/[0m[32;1m[1;3mWhat a great start!

Thought: Now that I know the schema of the sales table, I should construc

## Create few shot examples
This will help save tokens

In [26]:
# Create some few shot examples

examples = [
    {
        "input": "What are all the sales records in the database?",
        "output": "SELECT * FROM sales"
    }
]

In [24]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-l6-v2')

In [31]:
from langchain_community.vectorstores import FAISS
from langchain_core.example_selectors import SemanticSimilarityExampleSelector

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples, # Few shot examples
    embeddings, # Embedding model
    FAISS, # Vector store
    k=2, # Number of examples to return
    input_keys=["input"] # Input keys from examples
)

example_selector.vectorstore.search("How many sales are there?", search_type="mmr")


[Document(id='2c231c85-c9c1-4024-9012-1b9b63abe09a', metadata={'input': 'What are all the sales records in the database?', 'output': 'SELECT * FROM sales'}, page_content='What are all the sales records in the database?')]