# SQL Query Engine Sample
This notebook demonstrates the implementation and usage of the SQL query engine.

## IMPORTANT
If you are going to use this notebook, please make a copy of it and change the name so that 
this notebook may be preserved for others as an example!

In [1]:
from currensee.schema.schema import PostgresTables
from currensee.query_engines.sql_query_engine.query_engine import create_sql_workflow
from currensee.utils.db_utils import create_pg_engine

In [2]:
# required to run asynchronous code

import nest_asyncio

nest_asyncio.apply()

## Create the SQL Workflow

The SQL workflow can take the following parameters:

1. source_db: the name of the database where the table is stored (e.g. `crm`)
2. source_tables: a list of the name(s) of the table(s) that we want the query engine to have access to
  * note that multiple tables can be passed - this is if you want the query engine to try to join tables
    in the queries that may have relationships to one another
  * THIS IS LEVEL 2!! So do not attempt until you get the hang of just using one table at a time!!
    
3. table_descriptions: a list of the description(s) of the table(s) passed above
4. text_to_sql_tmpl: a string containing the prompt telling the LLM how to produce the SQL query from the text given
   * defaults to the variable `text_to_sql_tmpl` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
5. response_synthesis_prompt_str: a string containing the prompt telling the LLM how to synthesize the final response from the SQL table(s)
   * defaults to the variable `response_synthesis_prompt_str` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
6. model: the name of the model to use for all of the tasks
   * defaults to `gemini-1.5-flash`
   * you may override this with any of the models defined at https://ai.google.dev/gemini-api/docs/models#model-variations using the string with dashes defined in the "Model variant" column.
   * **BE VERY CAREFUL TO PAY ATTENTION TO THE PRICING!!!!!** I recommend that you use the default model until you understand the other models better!!!
7. temperature: the temperature parameter to pass to the model
   * default is 0.0
   * the higher the temperature, the more creative it is. Recommend keeping low for the SQL query generation.

In [3]:
employee_table_description = """
    Contains data about each financial advisor working at Bankwell Financial
    Columns:
     - employee_id (PK)
     - employee_first_name
     - employee_last_name
     - title
     - email
     - phone
     - hire_date
     - department
"""

### Below is the default defined in `prompting.py`

In [4]:
text_to_sql_tmpl = """
    Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    You can order the results by the find_date column (from earliest to latest) to return the most interesting examples in the database.

    GUIDELINES:
    * Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
    * Pay attention to use only the column names that you can see in the schema description.
    * Be careful to not query for columns that do not exist.
    * Pay attention to which column is in which table.
    * Make sure to filter on all criteria mentioned in the query.
    * If using a LIMIT to restrict the results, make sure it comes only in the end of the query.

    IMPORTANT NOTE:
    * Use the ~* operator instead of = when filtering with WHERE on text columns.
    * Add word boundaries '\y' to the beginning and end of each search term in the query.

    You are required to use the following format, each taking one line:

    Question: Question here
    SQLQuery: SQL Query to run
    SQLResult: Result of the SQLQuery
    Answer: Final answer here

    Only use tables listed below.
    {schema}

    Question: {query_str}
    SQLQuery:

"""

### Below is the default defined in `prompting.py`

In [5]:
response_synthesis_prompt_str = """

    Query: {query_str}
    SQL: {sql_query}
    SQL Response: {context_str}

    IMPORTANT INSTRUCTIONS:
    * If SQL Response is empty or 0, apologise and mention that you could not find
     examples to answer the query.
    * In such cases, kindly nudge the user towards providing more details or refining
    their search.
    * Additionally, you can tell them to rephrase specific keywords.
    * Do not explicitly state phrases such as 'based on the SQL query executed' or related
     references to context in your Response.
    * Never mention the underlying sql query, or the underling sql tables and other database elements
    * Never mention that sql was used to answer this question

    Considering the IMPORTANT INSTRUCTIONS above, create an response using the information
    returned from the database and no prior knowledge.


    Response:
"""

### Define the DB information
**IMPORTANT**: The table names MUST be lowercase in order for the engine to find them.

In [6]:
source_db = 'crm'
table_description_mapping = {
    'employees': employee_table_description

}

In [7]:
sql_workflow = create_sql_workflow(
    source_db = source_db,
    table_description_mapping=table_description_mapping,
    text_to_sql_tmpl=text_to_sql_tmpl,
    response_synthesis_prompt_str=response_synthesis_prompt_str
    
)

## Define the Query

In [8]:
query = "Who works for our company?"

## Retrieve and Output the Query

In [9]:
result = await sql_workflow.run(query=query)

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


In [10]:
result

Response(response="Here's a list of employees who work for the company: Jane Moneypenny, Amanda Pennington, Hayley Garcia, Dylan Williams, John Montes, Andrew Guerra, Samantha Reynolds, James Craig, Eric Ortiz, Kristina Glover, Dawn Reed, Regina Smith, Sara Johnson, Hayley Howard, Michelle Benson, Joseph Taylor, Steven Massey, Jessica Howard, Gavin Church, Peter Williams, Amanda Harrison, Kaylee Cruz, Lauren Hale, Michael Long, Joshua Farmer, Johnny Vargas, Joseph Harrison, Caitlyn Alexander, Marcus Bailey, John Murray, Brian Bradshaw, Daniel Taylor, James Rivera, Andrew Snow, Melissa Lee, Lisa Morris, Phyllis Wells, Stephanie Fisher, Jamie Campbell, Kari Johnson, Kristin Murray, Erica Franklin, Joseph Hamilton, James Strong, Nancy Jackson, Paul Klein, Katherine Walsh, Pamela Rosario, Richard Mccormick, Timothy Poole, Jennifer Gill, Walter Carpenter, Beth Cook, Michael Norton, Douglas Adams, Sara Ramirez, Bridget Velez, Crystal Lopez, April Shelton, Steven Spencer, Carol Gomez, Maria H