## Divide and Conquer COT

In [1]:
import os
from src.cand_gen.utils import divide_and_conquer, parse_sql_cand
from src.model.inference_endpoints import LLM
from openai import OpenAI

import httpx
from dotenv import load_dotenv
load_dotenv()

model = 'tgi'

client = OpenAI(
    base_url=os.environ['BASE_URL'],
    api_key=os.environ['API_KEY']
)

llm = LLM(
    client = client,
    model = model, 
    gen_params = {
        'STREAM': False,
        'TEMPERATURE': 0, 
        'MAX_NEW_TOKENS': 2048
    }
)

database = "california_schools"
database_path = f"{os.environ['DATABASE_ROOT_PATH']}/{database}"
ir = ["`schools`.`City`.`San Diego`", "`frpm`.`Low Grade`", "`frpm`.`School Name`.`Vidya Mandir`", "`frpm`.`CDSCode`", "`schools`.`CDSCode`", "`schools`.`State`", "`schools`.`Latitude`"]
test_question = "In which city can you find the school in the state of California with the lowest latitude coordinates and what is its lowest grade? Indicate the school name."
test_hint = "State of California refers to state = 'CA'"
test_ground_truth = "SELECT T2.City, T1.`Low Grade`, T1.`School Name` FROM frpm AS T1 INNER JOIN schools AS T2 ON T1.CDSCode = T2.CDSCode WHERE T2.State = 'CA' ORDER BY T2.Latitude ASC LIMIT 1"

answer = divide_and_conquer(
    database_name = database,
    database_path = database_path,
    ir = ir,
    question = test_question,
    hint = test_hint,
    model = llm
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
print(parse_sql_cand(answer))

with the lowest latitude coordinates and what is its lowest grade? Indicate the school name.
* **Analysis:** We need to find the school with the lowest latitude in the state of California. The school name and lowest grade are required as output.
* **Pseudo SQL:** SELECT 'T1'.'City', 'T1'.'Low Grade', 'T2'.'Latitude' FROM 'frpm' AS 'T1' INNER JOIN 'schools' AS 'T2' ON 'T1'.'CDSCode' = 'T2'.'CDSCode' WHERE 'T2'.'State' = 'CA' AND 'T2'.'Latitude' = (SELECT MIN('T2'.'Latitude') FROM 'schools' AS 'T2' WHERE 'T2'.'State' = 'CA')
* **Sub-question 1:** school with the lowest latitude in the state of California
* **Analysis:** To get the lowest latitude, we can use the MIN() function on the 'Latitude' column and filter the rows with 'State' = 'CA'.
* **Pseudo SQL:** SELECT MIN('T2'.'Latitude') FROM 'schools' AS 'T2' WHERE 'T2'.'State' = 'CA'
* **Sub-question 2:** school name and lowest grade in the city with the lowest latitude
* **Analysis:** The school name and lowest grade can be obtained fr

In [3]:
print(answer)

* **Main Question:** In which city can you find the school in the state of California with the lowest latitude coordinates and what is its lowest grade? Indicate the school name.
* **Analysis:** We need to find the school with the lowest latitude in the state of California. The school name and lowest grade are required as output.
* **Pseudo SQL:** SELECT 'T1'.'City', 'T1'.'Low Grade', 'T2'.'Latitude' FROM 'frpm' AS 'T1' INNER JOIN 'schools' AS 'T2' ON 'T1'.'CDSCode' = 'T2'.'CDSCode' WHERE 'T2'.'State' = 'CA' AND 'T2'.'Latitude' = (SELECT MIN('T2'.'Latitude') FROM 'schools' AS 'T2' WHERE 'T2'.'State' = 'CA')
* **Sub-question 1:** school with the lowest latitude in the state of California
* **Analysis:** To get the lowest latitude, we can use the MIN() function on the 'Latitude' column and filter the rows with 'State' = 'CA'.
* **Pseudo SQL:** SELECT MIN('T2'.'Latitude') FROM 'schools' AS 'T2' WHERE 'T2'.'State' = 'CA'
* **Sub-question 2:** school name and lowest grade in the city with t

## Query Plan COT

In [4]:
from src.cand_gen.utils import query_plan_cot

answer = query_plan_cot(
    database_name = database,
    database_path = database_path,
    ir = ir,
    question = test_question,
    hint = test_hint,
    model = llm
)

In [5]:
print(answer)

**Preparation Steps:**
1. Initialize the process: Start preparing to execute the query.
2. Prepare storage: Set up storage space (registers) to hold temporary results, initializing them to NULL.
3. Open the schools table: Open the schools table so we can read from it.
4. Open the coordinates table: Open the coordinates table so we can read from it.
**Finding the School with Lowest Latitude:**
1. Start reading the coordinates table: Move to the first row in the coordinates table.
2. Check if the state matches: Look at the state column of the current row in coordinates. If it's not 'CA', skip this row.
3. Identify the matching row: Store the identifier (row ID) of this coordinates entry.
4. Find the corresponding school row: Use the row ID from coordinates to directly find the matching row in schools.
5. Check if the current school has the lowest latitude: Compare the latitude value with the current minimum value. If it's lower, store the school ID, name, and grade as the new minimum.
6.

In [6]:
print(parse_sql_cand(answer))

with Lowest Latitude:**
1. Start reading the coordinates table: Move to the first row in the coordinates table.
2. Check if the state matches: Look at the state column of the current row in coordinates. If it's not 'CA', skip this row.
3. Identify the matching row: Store the identifier (row ID) of this coordinates entry.
4. Find the corresponding school row: Use the row ID from coordinates to directly find the matching row in schools.
5. Check if the current school has the lowest latitude: Compare the latitude value with the current minimum value. If it's lower, store the school ID, name, and grade as the new minimum.
6. Move to the next row in coordinates: Go back to the coordinates table and move to the next row, repeating the process until all rows are checked.
**Delivering the Result:**
1. Output the result: Return the city, school name, and lowest grade for the school with the lowest latitude coordinates in California.
2. End the process: Stop the query execution process.
3. Setup

## Synthetic example gen

In [7]:
from src.cand_gen.utils import run_synth_gen_pipeline

answer = run_synth_gen_pipeline(
    database_name = database,
    database_path = database_path,
    ir = ir,
    question = test_question,
    hint = test_hint,
    model = llm    
)

In [8]:
print(parse_sql_cand(answer))

SELECT s.City, f.School Name, f.Low Grade
FROM schools s
JOIN frpm f ON s.CDSCode = f.CDSCode
WHERE s.State = 'CA'
AND s.Latitude = (SELECT MIN(Latitude) FROM schools WHERE State = 'CA')
LIMIT 1
