# SQL Query Engine Sample
This notebook demonstrates the implementation and usage of the SQL query engine.

## IMPORTANT
If you are going to use this notebook, please make a copy of it and change the name so that 
this notebook may be preserved for others as an example!

In [1]:
from currensee.schema.schema import PostgresTables
from currensee.query_engines.sql_query_engine.query_engine import create_sql_workflow
from currensee.utils.db_utils import create_pg_engine
from currensee.query_engines.workflow_descriptions import outlook_email_table_desc, outlook_meeting_table_desc

In [2]:
# required to run asynchronous code

import nest_asyncio

nest_asyncio.apply()

## Create the SQL Workflow

The SQL workflow can take the following parameters:

1. source_db: the name of the database where the table is stored (e.g. `crm`)
2. source_tables: a list of the name(s) of the table(s) that we want the query engine to have access to
  * note that multiple tables can be passed - this is if you want the query engine to try to join tables
    in the queries that may have relationships to one another
  * THIS IS LEVEL 2!! So do not attempt until you get the hang of just using one table at a time!!
    
3. table_descriptions: a list of the description(s) of the table(s) passed above
4. text_to_sql_tmpl: a string containing the prompt telling the LLM how to produce the SQL query from the text given
   * defaults to the variable `text_to_sql_tmpl` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
5. response_synthesis_prompt_str: a string containing the prompt telling the LLM how to synthesize the final response from the SQL table(s)
   * defaults to the variable `response_synthesis_prompt_str` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
6. model: the name of the model to use for all of the tasks
   * defaults to `gemini-1.5-flash`
   * you may override this with any of the models defined at https://ai.google.dev/gemini-api/docs/models#model-variations using the string with dashes defined in the "Model variant" column.
   * **BE VERY CAREFUL TO PAY ATTENTION TO THE PRICING!!!!!** I recommend that you use the default model until you understand the other models better!!!
7. temperature: the temperature parameter to pass to the model
   * default is 0.0
   * the higher the temperature, the more creative it is. Recommend keeping low for the SQL query generation.

In [3]:
from sqlalchemy import create_engine, text, inspect
engine = create_pg_engine('outlook')
inspector = inspect(engine)
tables = inspector.get_table_names()
print(tables)


['email_data', 'meeting_data']


In [4]:
for table in ['email_data', 'meeting_data']: print(f"{table}: {[col['name'] for col in inspect(engine).get_columns(table)]}")

email_data: ['email_timestamp', 'to_names', 'to_emails', 'from_name', 'from_email', 'email_subject', 'email_body']
meeting_data: ['meeting_timestamp', 'host', 'host_email', 'invitees', 'invitee_emails', 'meeting_subject']


### Below is the default defined in `prompting.py`

In [5]:
text_to_sql_tmpl = """
    Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    You can order the results by the find_date column (from earliest to latest) to return the most interesting examples in the database.

    GUIDELINES:
    * Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
    * Pay attention to use only the column names that you can see in the schema description.
    * Be careful to not query for columns that do not exist.
    * Pay attention to which column is in which table.
    * Make sure to filter on all criteria mentioned in the query.
    * If using a LIMIT to restrict the results, make sure it comes only in the end of the query.

    IMPORTANT NOTE:
    * Use the ~* operator instead of = when filtering with WHERE on text columns.
    * Add word boundaries '\y' to the beginning and end of each search term in the query.

    You are required to use the following format, each taking one line:

    Question: Question here
    SQLQuery: SQL Query to run
    SQLResult: Result of the SQLQuery
    Answer: Final answer here

    Only use tables listed below.
    {schema}

    Question: {query_str}
    SQLQuery:

"""


### Below is the default defined in `prompting.py`

In [6]:
response_synthesis_prompt_str = """

    Query: {query_str}
    SQL: {sql_query}
    SQL Response: {context_str}

    IMPORTANT INSTRUCTIONS:
    * If SQL Response is empty or 0, apologise and mention that you could not find
     examples to answer the query.
    * In such cases, kindly nudge the user towards providing more details or refining
    their search.
    * Additionally, you can tell them to rephrase specific keywords.
    * Do not explicitly state phrases such as 'based on the SQL query executed' or related
     references to context in your Response.
    * Never mention the underlying sql query, or the underling sql tables and other database elements
    * Never mention that sql was used to answer this question

    Considering the IMPORTANT INSTRUCTIONS above, create an response using the information
    returned from the database and no prior knowledge.


    Response:
"""

### Define the DB information
**IMPORTANT**: The table names MUST be lowercase in order for the engine to find them.

In [7]:
source_db = 'outlook'
table_description_mapping = {
    'email_data': outlook_email_table_desc,
    'meeting_data': outlook_meeting_table_desc
}

In [8]:
sql_workflow = create_sql_workflow(
    source_db=source_db,
    table_description_mapping=table_description_mapping,
    text_to_sql_tmpl=text_to_sql_tmpl,
    response_synthesis_prompt_str=response_synthesis_prompt_str
)

## Define the Query

In [15]:
query = "Show all the email correspondence Jane had with Cynthia Hobbs"

## Retrieve and Output the Query

In [16]:
result = await sql_workflow.run(query=query)

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


In [17]:
result

Response(response='Here is the email correspondence between Jane Moneypenny and Cynthia Hobbs:\n\n**2018-07-15 09:30:00:**  Jane introduced herself and Bankwell Financial Services, proposing a call to discuss corporate financial solutions for AbbVie.\n\n**2018-07-16 11:15:00:** Jane confirmed a meeting time with Cynthia.\n\n**2019-06-12 09:45:00:** Jane followed up on the implementation of a new cash concentration structure at AbbVie.\n\n**2020-06-15 11:30:00:** Jane proposed scheduling a quarterly relationship review call.\n\n**2020-06-16 13:15:00:** Jane confirmed the time for the quarterly relationship review call.\n\n**2021-06-10 14:00:00:** Jane offered information on International ACH (IAT) payments.\n\n**2021-06-11 11:30:00:** Jane followed up on the IAT information, attaching a service overview.\n\n**2022-06-22 11:30:00:** Jane discussed setting up IAT payments for AbbVie.\n\n**2022-06-23 10:15:00:** Jane confirmed that an IAT specialist would contact Cynthia to schedule a setu

In [12]:
query = text("""
SELECT *
FROM email_data
WHERE to_names LIKE '%Cynthia Hobbs%';
""")

with engine.connect() as conn:
    result = conn.execute(query)
    for row in result:
        print(row)

('2018-07-15 09:30:00', 'Cynthia Hobbs', 'cynthia.hobbs@abbvie.com', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Introduction - Bankwell Financial Services for AbbVie', "Hi Cynthia,\\n\\nMy name is Jane Moneypenny, and I'm a Financial Advisor with Bankwell Financial, specializing in corporate financial solutions for t ... (260 characters truncated) ... n\\nBest regards,\\n\\n--\\nJane Moneypenny\\nFinancial Advisor\\nBankwell Financial\\nPhone: (555) 123-4567\\nEmail: jane.moneypenny1@outlook.com\\n")
('2018-07-16 11:15:00', 'Cynthia Hobbs', 'cynthia.hobbs@abbvie.com', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Re: Introduction - Bankwell Financial Services for AbbVie', "Hi Cynthia,\\n\\nTuesday at 2 PM CT works perfectly. I'll send a calendar invitation shortly. Looking forward to our conversation.\\n\\nBest regards,\\n\\n--\\nJane Moneypenny\\nFinancial Advisor\\nBankwell Financial\\nPhone: (555) 123-4567\\nEmail: jane.moneypenny1@outlook.com\\n")
('2019-06-12 09:45:

## Test Query

In [18]:
query = "what did Jane discussed with Jennifer Phelps"

In [19]:
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="Jane and Jennifer Phelps discussed AeroVironment's financial strategy, focusing on FX risk management for international components sourcing.  Their conversations also covered currency hedging solutions, specifically EUR/USD forwards for 3 and 6-month terms.  They successfully executed EUR/USD forward contracts for Q3.  Further discussions included AeroVironment's potential implementation of a supply chain finance (SCF) program,  reviewing program details, scheduling calls to discuss potential benefits and implementation, and following up on the progress of identifying a pilot supplier group for the SCF program.\n", source_nodes=[NodeWithScore(node=TextNode(id_='146260ef-634b-4c29-902f-2d2ae91ea053', embedding=None, metadata={'sql_query': "SELECT email_body FROM email_data WHERE (from_email ~* '\\yjane\\.moneypenny1\\@outlook\\.com\\y' AND to_names ~* '\\yJennifer Phelps\\y') OR (to_emails ~* '\\yjennifer\\.phelps\\@example\\.com\\y' AND from_name ~* '\\yJane Moneypen

In [20]:
query = "when did Jane met with Jennifer Phelps and what did they discussed"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Jane met with Jennifer Phelps on several occasions to discuss matters related to AeroVironment.  Their meetings covered a range of topics including initial discussions on financial needs, the supply chain finance program, supplier onboarding, and updates on a pilot program for supply chain finance.  Specific meeting dates and times are available: July 24th, 2018; July 22nd, 2020; July 28th, 2021; August 1st, 2023; and February 6th, 2024.\n', source_nodes=[NodeWithScore(node=TextNode(id_='b49c6569-0565-49c6-8f92-2abb515f3d75', embedding=None, metadata={'sql_query': "SELECT meeting_timestamp, meeting_subject FROM meeting_data WHERE host ~* '\\yJane\\y' AND invitees ~* '\\yJennifer Phelps\\y'", 'result': [('2018-07-24 14:00:00', 'AeroVironment - Initial Discussion on Financial Needs'), ('2020-07-22 16:00:00', 'AeroVironment - Discuss Supply Chain Finance Program'), ('2021-07-28 14:00:00', 'AeroVironment - Discuss SCF Supplier Onboarding'), ('2023-08-01 14:00:00', 'AeroV