<h1>Connect to LLAMA3 over docker

Moved to local model instead. In order to return to Docker model, uncomment below and enter docker IP address in settings.py

In [4]:
from langchain_community.llms.ollama import Ollama
from modules.settings import MODEL#, MODEL_URL
llm = Ollama(model=MODEL, temperature=0)#, base_url=MODEL_URL)

llm("Pretend you are a program. Say 'Hello world!'")


'I am not capable of displaying output on a screen, but if I were a program written in python, my first line would be:\n\nprint("hello world!")\n\nwhen this line is executed by the python interpreter, it will print the string "hello world!" to the console or terminal window.'

<h1>Get table schema

In [5]:
from langchain_community.utilities import SQLDatabase
from modules.settings import DB_FILE
# Database connection
db = SQLDatabase.from_uri(f"sqlite:///{DB_FILE}")

schema = db.get_table_info()
print(schema)


CREATE TABLE crime_scene_report (
	date INTEGER, 
	type TEXT, 
	description TEXT, 
	city TEXT
)

/*
3 rows from crime_scene_report table:
date	type	description	city
20180115	robbery	A Man Dressed as Spider-Man Is on a Robbery Spree	NYC
20180115	murder	Life? Dont talk to me about life.	Albany
20180115	murder	Mama, I killed a man, put a gun against his head...	Reno
*/


CREATE TABLE drivers_license (
	id INTEGER, 
	age INTEGER, 
	height INTEGER, 
	eye_color TEXT, 
	hair_color TEXT, 
	gender TEXT, 
	plate_number TEXT, 
	car_make TEXT, 
	car_model TEXT, 
	PRIMARY KEY (id)
)

/*
3 rows from drivers_license table:
id	age	height	eye_color	hair_color	gender	plate_number	car_make	car_model
100280	72	57	brown	red	male	P24L4U	Acura	MDX
100460	63	72	brown	brown	female	XF02T6	Cadillac	SRX
101029	62	74	green	green	female	VKY5KR	Scion	xB
*/


CREATE TABLE facebook_event_checkin (
	person_id INTEGER, 
	event_id INTEGER, 
	event_name TEXT, 
	date INTEGER, 
	FOREIGN KEY(person_id) REFERENCES person (id

<h1>Extract SQL query from model response

In [47]:
from modules.ai.rag_pipeline import extract_query_from_model
model_response = """
Question: How many crimes were reported on January 15, 2018?

SQLQuery: SELECT COUNT(*) FROM crime_scene_report WHERE date = 20180115;

Answer: According to the crime_scene_report table, there were three crimes reported on January 15, 2018.

Result: The query returns an integer value representing the count of rows in the crime_scene_report table where the date column matches 20180115. In this case, it returns 3.
Generated SQL Query: SELECT COUNT(*) FROM crime_scene_report WHERE date = 20180115;
"""

# Extract SQL query from the response
extracted_query = extract_query_from_model(model_response)
print(f"Extracted query:\n {extracted_query}")

Extracted query:
 SELECT COUNT(*) FROM crime_scene_report WHERE date = 20180115;


<h1>Try building a prompt template

In [49]:
from langchain.chains import create_sql_query_chain
from langchain_community.llms.ollama import Ollama
from modules.settings import MODEL
from modules.database.db import database_connect
import re
# Database connection
db = database_connect()
# Ollama model
llm = Ollama(model=MODEL)
chain = create_sql_query_chain(llm, db)
# Static instructions for the model
STATIC_MESSAGE = 'You are an experienced data scientist. Answer the question verbally and add the SQL query to prove it. Make the query one line so it is easy to test. Use the simplest and most efficient query'
# Main question
QUESTION = "How many people are there?"
# Generate response
response = chain.invoke({f"question": f"{QUESTION} {STATIC_MESSAGE}"})
print(response)

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

Question: How many people live in New York City?

To answer this question, we'll need to join the crime_scene_report and person tables. Since we don't know how many results we want to retrieve, let's set a LIMIT of 5 for demonstration purposes. We also only need to query the columns necessary to answer the question: the city name from the crime_scene_report table, and the COUNT function from the person table to find the number of people who live in New York City (NYC).

SQLQuery: SELECT DISTINCT p.city AS city, COUNT(DISTINCT pe.id) AS num_people FROM crime_scene_report cr JOIN person pe ON cr.city = pe.address_street_name WHERE cr.city = 'NYC' LIMIT 5;
Extracted query:
 SELECT DISTINCT p.city AS city, COUNT(DISTINCT pe.id) AS num_people FROM crime_scene_report cr JOIN person pe ON cr.city = pe.address_street_name WHERE cr.city = 'NYC' LIMIT 5;


<H1>Working prompt template v1

In [51]:
# Working template v1
from langchain_core.prompts import PromptTemplate
from modules.database.db import get_db_schema
dialect="sqlite"
table_info=get_db_schema()
question = "How many people are there in total?"

template = '''Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

Only use the following tables:

{table_info}.

Question: {input}
TopK: {top_k}'''
prompt = PromptTemplate.from_template(template)
chain = create_sql_query_chain(llm, db, prompt)

response = chain.invoke({"question": question, "table_info": table_info, "dialect":"sqlite", "top_k": 1})
print(response)

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

To answer the question, we need to count the number of unique IDs in the "person" table. Here's the SQL query:

SELECT COUNT(DISTINCT id) FROM person;

Assuming there are no duplicates in the "person" table, this query will return the total number of people. We can then save this result to a variable and use it as the answer to our question:
Extracted query:
 SELECT COUNT(DISTINCT id) FROM person;


<h1>Working prompt template v2

In [54]:
# Build the chain and prompt template
from langchain_core.prompts import PromptTemplate
from modules.database.db import get_db_schema
dialect="sqlite"
table_info=get_db_schema()
question = "How many people are there in total?"

template = '''Given a question, create a {dialect} query, run it, and return the answer.
Format:

Question: "Question here"
SQL: "SQL Query to run"
Result: "Result of the SQLQuery"
A: "Final answer here"

Tables: {table_info}

Question: {input}
Top_K: {top_k}'''
prompt = PromptTemplate.from_template(template)
chain = create_sql_query_chain(llm, db, prompt)
# Ask the question and get the response from the model
response = chain.invoke({"question": question, "table_info": table_info, "dialect":"sqlite", "top_k": 1})
print(response)

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

Result: The total number of individuals represented in the provided tables is 24 (18 from person table and 6 from get_fit_now_member table).

A: The answer to how many people are there in total is 24.

Note: If you want to retrieve specific information related to this question, you can create a SQL query that joins the person and get_fit_now_member tables and groups them by id (which is unique for each person) and counts the number of rows. Here's an example SQL query:

```sql
SELECT COUNT(DISTINCT id) AS total_people FROM (
  SELECT DISTINCT p.id FROM person p
  UNION ALL
  SELECT gfnm.membership_id FROM get_fit_now_member gfnm
);
```

This query returns the total number of unique ids found in the person and get_fit_now_member tables, which gives us the total number of individuals represented in these tables (24). If you want to know the number of people who are currently members of Get Fit Now gym (as of the current date), you can add a join between the get_fit_now_member and get_fit

<H1>Main game<br></h1>
A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database.

In [55]:
# TEST LLAMA3
# Build the chain and prompt template
from langchain_core.prompts import PromptTemplate
from modules.database.db import get_db_schema
dialect="sqlite"
table_info=get_db_schema()

template = '''Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Use the following format:

**Input:**

- **Question:** "Question here"
- **SQLQuery:** "SQL Query to run"
- **SQLResult:** "Result of the SQLQuery"
- **Answer:** "Final answer here"

**Only use the following tables:**

{table_info}

**Question:** {input}

**TopK:** {top_k}
'''
prompt = PromptTemplate.from_template(template)
chain = create_sql_query_chain(llm, db, prompt)

# Compressed question v3
input = "Retrieve murder crime scene report from police department's database from jan.15, 2018 in SQL City"

input_data = {
    "question": input,
    "table_info": table_info,
    "dialect": "sqlite",
    "top_k": 1
}

# Ask the question and get the response from the model
response = chain.invoke(input_data)
print(response)

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

Here's the solution:

Input:
- **Question:** Retrieve murder crime scene report from police department's database from Jan. 15, 2018 in SQL City
- **SQLQuery:** SELECT * FROM crime_scene_report WHERE date = 20180115 AND city LIKE 'SQL%';
- **SQLResult:** "Result of the SQLQuery" (contains the murder crime scene report from Jan. 15, 2018 in SQL City)
- **Answer:** "Murder occurred in SQL City on January 15, 2018 as per the police department's database."

Note: The LIKE operator is used to search for strings that match a specified pattern. In this case, we're using it to search for any city names starting with 'SQL'. This should return only crime scene reports from SQL City on January 15, 2018. You may need to replace 'SQL%' with the actual name of your city if it starts with something else.

Alternatively, you can also modify the query to search for a specific city without the wildcard:

SQLQuery: SELECT * FROM crime_scene_report WHERE date = 20180115 AND city = 'NYC';

This would retur

<H1>Game step 2

In [56]:
input = "The report is from 'SQL City'"
response = chain.invoke(input_data)
print(response)

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

SELECT * FROM crime_scene_report WHERE date = '20180115' AND city = 'SQL City';
Extracted query:
 SELECT * FROM crime_scene_report WHERE date = '20180115' AND city = 'SQL City';


<H1>LOCAL OLLAMA PERFORMANCE COMPARISON

In [None]:
# LOCAL OLLAMA
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities import SQLDatabase
# from langchain_community.llms.ollama import Ollama
from langchain_community.chat_models import ChatOllama
from langchain.chains import create_sql_query_chain

from modules.database.db import get_db_schema, database_connect
dialect="sqlite"
table_info=get_db_schema()

template = '''Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Use the following format:

**Input:**

- **Question:** "Question here"
- **SQLQuery:** "SQL Query to run"
- **SQLResult:** "Result of the SQLQuery"
- **Answer:** "Final answer here"

**Only use the following tables:**

{table_info}

**Question:** {input}

**TopK:** {top_k}
'''
prompt = PromptTemplate.from_template(template)
chain = create_sql_query_chain(llm, db, prompt)

# Compressed question v3
input = "A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database."

input_data = {
    "question": input,
    "table_info": table_info,
    "dialect": "sqlite",
    "top_k": 1
}

import time

# Timer start
start_time = time.time()

# Ask the question and get the response from the model
response = chain.invoke(input_data)
print(response)

# Timer stop
end_time = time.time()

# Show execution time
print(f"Function took: {end_time - start_time} seconds to run.")

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

In [11]:
# DOCKER OLLAMA
from langchain_core.prompts import PromptTemplate
from langchain_community.utilities import SQLDatabase
from langchain_community.llms.ollama import Ollama
from langchain.chains import create_sql_query_chain

from modules.settings import MODEL, MODEL_URL

from modules.database.db import database_connect, get_db_schema

# Database connection
db = database_connect()
# Ollama model connection via Docker
llm = Ollama(model=MODEL, base_url=MODEL_URL)
dialect="sqlite"
table_info=get_db_schema()

template = '''Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. Use the following format:

**Input:**

- **Question:** "Question here"
- **SQLQuery:** "SQL Query to run"
- **SQLResult:** "Result of the SQLQuery"
- **Answer:** "Final answer here"

**Only use the following tables:**

{table_info}

**Question:** {input}

**TopK:** {top_k}
'''
prompt = PromptTemplate.from_template(template)
chain = create_sql_query_chain(llm, db, prompt)

# Compressed question v3
input = "A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’s database."

input_data = {
    "question": input,
    "table_info": table_info,
    "dialect": "sqlite",
    "top_k": 1
}

import time

# Timer start
start_time = time.time()

# Ask the question and get the response from the model
response = chain.invoke(input_data)
print(response)

# Timer stop
end_time = time.time()

# Show execution time
print(f"Function took: {end_time - start_time} seconds to run.")

# Extract SQL query from the response
from modules.ai.rag_pipeline import extract_query_from_model
extracted_query = extract_query_from_model(response)
print(f"Extracted query:\n {extracted_query}")

**Input:**

* **Question:** A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a ​murder​ that occurred sometime on ​Jan.15, 2018​ and that it took place in ​SQL City​. Start by retrieving the corresponding crime scene report from the police department’­s database.
* **SQLQuery:** SELECT * FROM crime_­scene_­report WHERE date = '20180115' AND city = 'Albany' OR city = 'Reno';
* **SQLResult:**
	+ date	| type	| description			| city
	+ 20180115 | murder | Life? Dont talk to me about life. | Albany
	+ 20180115 | murder | Mama, I killed a man, put a gun against his head... | Reno

**Answer:** The crime scene report for the murder that occurred on Jan. 15, 2018 in either Albany or Reno is:

date: 20180115
type: murder
description: Life? Dont talk to me about life.
city: Albany

OR

date: 20180115
type: murder
description: Mama, I killed a man, put a gun against his head...


<h1>Local Ollama model turned out to be 10 times faster in this case