# SQL Query Engine Sample
This notebook demonstrates the implementation and usage of the SQL query engine.

## IMPORTANT
If you are going to use this notebook, please make a copy of it and change the name so that 
this notebook may be preserved for others as an example!

In [1]:
from currensee.schema.schema import PostgresTables
from currensee.query_engines.sql_query_engine.query_engine import create_sql_workflow
from currensee.utils.db_utils import create_pg_engine
from currensee.query_engines.workflow_descriptions import outlook_email_table_desc, outlook_meeting_table_desc

In [2]:
# required to run asynchronous code

import nest_asyncio

nest_asyncio.apply()

## Create the SQL Workflow

The SQL workflow can take the following parameters:

1. source_db: the name of the database where the table is stored (e.g. `crm`)
2. source_tables: a list of the name(s) of the table(s) that we want the query engine to have access to
  * note that multiple tables can be passed - this is if you want the query engine to try to join tables
    in the queries that may have relationships to one another
  * THIS IS LEVEL 2!! So do not attempt until you get the hang of just using one table at a time!!
    
3. table_descriptions: a list of the description(s) of the table(s) passed above
4. text_to_sql_tmpl: a string containing the prompt telling the LLM how to produce the SQL query from the text given
   * defaults to the variable `text_to_sql_tmpl` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
5. response_synthesis_prompt_str: a string containing the prompt telling the LLM how to synthesize the final response from the SQL table(s)
   * defaults to the variable `response_synthesis_prompt_str` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
6. model: the name of the model to use for all of the tasks
   * defaults to `gemini-1.5-flash`
   * you may override this with any of the models defined at https://ai.google.dev/gemini-api/docs/models#model-variations using the string with dashes defined in the "Model variant" column.
   * **BE VERY CAREFUL TO PAY ATTENTION TO THE PRICING!!!!!** I recommend that you use the default model until you understand the other models better!!!
7. temperature: the temperature parameter to pass to the model
   * default is 0.0
   * the higher the temperature, the more creative it is. Recommend keeping low for the SQL query generation.

In [3]:
from sqlalchemy import create_engine, text, inspect
engine = create_pg_engine('outlook')
inspector = inspect(engine)
tables = inspector.get_table_names()
print(tables)


['email_data', 'meeting_data']


In [4]:
for table in ['email_data', 'meeting_data']: print(f"{table}: {[col['name'] for col in inspect(engine).get_columns(table)]}")

email_data: ['email_timestamp', 'to_names', 'to_emails', 'from_name', 'from_email', 'email_subject', 'email_body']
meeting_data: ['meeting_timestamp', 'host', 'host_email', 'invitees', 'invitee_emails', 'meeting_subject']


### Below is the default defined in `prompting.py`

In [5]:
text_to_sql_tmpl = """
    Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    You can order the results by the find_date column (from earliest to latest) to return the most interesting examples in the database.

    GUIDELINES:
    * Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
    * Pay attention to use only the column names that you can see in the schema description.
    * Be careful to not query for columns that do not exist.
    * Pay attention to which column is in which table.
    * Make sure to filter on all criteria mentioned in the query.
    * If using a LIMIT to restrict the results, make sure it comes only in the end of the query.

    IMPORTANT NOTE:
    * Use the ~* operator instead of = when filtering with WHERE on text columns.
    * Add word boundaries '\y' to the beginning and end of each search term in the query.

    You are required to use the following format, each taking one line:

    Question: Question here
    SQLQuery: SQL Query to run
    SQLResult: Result of the SQLQuery
    Answer: Final answer here

    Only use tables listed below.
    {schema}

    Question: {query_str}
    SQLQuery:

"""


### Below is the default defined in `prompting.py`

In [6]:
response_synthesis_prompt_str = """

    Query: {query_str}
    SQL: {sql_query}
    SQL Response: {context_str}

    IMPORTANT INSTRUCTIONS:
    * If SQL Response is empty or 0, apologise and mention that you could not find
     examples to answer the query.
    * In such cases, kindly nudge the user towards providing more details or refining
    their search.
    * Additionally, you can tell them to rephrase specific keywords.
    * Do not explicitly state phrases such as 'based on the SQL query executed' or related
     references to context in your Response.
    * Never mention the underlying sql query, or the underling sql tables and other database elements
    * Never mention that sql was used to answer this question

    Considering the IMPORTANT INSTRUCTIONS above, create an response using the information
    returned from the database and no prior knowledge.


    Response:
"""

### Define the DB information
**IMPORTANT**: The table names MUST be lowercase in order for the engine to find them.

In [7]:
source_db = 'outlook'
table_description_mapping = {
    'email_data': outlook_email_table_desc,
    'meeting_data': outlook_meeting_table_desc
}

In [8]:
sql_workflow = create_sql_workflow(
    source_db=source_db,
    table_description_mapping=table_description_mapping,
    text_to_sql_tmpl=text_to_sql_tmpl,
    response_synthesis_prompt_str=response_synthesis_prompt_str
)

## Define the Query

In [9]:
query = "Show all the email correspondence Jane had with Cynthia Hobbs"

## Retrieve and Output the Query

In [10]:
result = await sql_workflow.run(query=query)

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


In [11]:
result

Response(response="Here's the email correspondence between Jane Moneypenny and Cynthia Hobbs:\n\n**July 15, 2018:** Jane introduced Bankwell Financial Services to Cynthia.\n\n**July 16, 2018:** Cynthia responded, scheduling a call.  Jane confirmed the meeting time.\n\n**June 12, 2019:** Jane provided an update on the cash management implementation.\n\n**June 13, 2019:** Cynthia confirmed the successful implementation.\n\n**June 15, 2020:** Jane proposed a quarterly relationship review.\n\n**June 16, 2020:** Cynthia suggested a date and time for the review, which Jane confirmed.\n\n**June 10, 2021:** Jane shared information about International ACH (IAT) payments.\n\n**June 11, 2021:** Cynthia expressed interest and requested more details. Jane sent the information.\n\n**June 22, 2022:** Jane followed up on setting up IAT payments.\n\n**June 23, 2022:** Cynthia agreed and requested a call with the operations team. Jane confirmed this would be arranged.\n\n**July 6, 2023:** Jane proposed 

In [12]:
query = text("""
SELECT *
FROM email_data
WHERE to_names LIKE '%Cynthia Hobbs%';
""")

with engine.connect() as conn:
    result = conn.execute(query)
    for row in result:
        print(row)

('2018-07-15 09:30:00', 'Cynthia Hobbs', 'cynthia.hobbs@abbvie.com', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Introduction - Bankwell Financial Services for AbbVie', "Hi Cynthia,\\n\\nMy name is Jane Moneypenny, and I'm a Financial Advisor with Bankwell Financial, specializing in corporate financial solutions for t ... (260 characters truncated) ... n\\nBest regards,\\n\\n--\\nJane Moneypenny\\nFinancial Advisor\\nBankwell Financial\\nPhone: (555) 123-4567\\nEmail: jane.moneypenny1@outlook.com\\n")
('2018-07-16 11:15:00', 'Cynthia Hobbs', 'cynthia.hobbs@abbvie.com', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Re: Introduction - Bankwell Financial Services for AbbVie', "Hi Cynthia,\\n\\nTuesday at 2 PM CT works perfectly. I'll send a calendar invitation shortly. Looking forward to our conversation.\\n\\nBest regards,\\n\\n--\\nJane Moneypenny\\nFinancial Advisor\\nBankwell Financial\\nPhone: (555) 123-4567\\nEmail: jane.moneypenny1@outlook.com\\n")
('2019-06-12 09:45:

## Test Query

In [18]:
query = "give me all the emails with Jennifer Phelps" #Needs to fiture out why this one is so messy

In [19]:
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Here are the email exchanges between Jennifer Phelps and Jane Moneypenny:  Their communication spans several years, covering topics such as currency hedging, FX contract execution, and a potential supply chain finance program.\n', source_nodes=[NodeWithScore(node=TextNode(id_='aaa9d2b5-29d7-4185-8733-3a10498e653e', embedding=None, metadata={'sql_query': "SELECT email_timestamp, from_name, from_email, to_names, to_emails, email_subject, email_body FROM email_data WHERE to_names ~* '\\yJennifer\\y Phelps\\y' OR from_name ~* '\\yJennifer\\y Phelps\\y' OR to_emails ~* '\\yjennifer\\.phelps\\y' OR from_email ~* '\\yjennifer\\.phelps\\y' ORDER BY email_timestamp", 'result': [('2018-08-01 14:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Jennifer Phelps', 'jennifer.phelps@aerovironment.com', 'Follow Up: AeroVironment Financial Needs', "Hi Jennifer,\\n\\nIt was a pleasure speaking with you last week regarding AeroVironment's financial strategy, particularly arou

In [15]:
with engine.connect() as conn:
    result = conn.execute(text("SELECT * FROM meeting_data ORDER BY meeting_timestamp DESC LIMIT 10"))
    rows = result.fetchall()
    for row in rows:
        print(row)

('2024-03-26 11:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Adam Clay', 'adam.clay@compass.com', 'Compass - Annual Credit Facility Review Meeting')
('2024-03-19 16:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Kelly Smith', 'kelly.smith@presidiopropertytrust.com', 'Presidio Property Trust - Annual Relationship Review & Market Update')
('2024-03-12 09:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'David Moreno', 'david.moreno@manpowergroup.com', 'ManpowerGroup - Review H1 2024 FX Hedging / Discuss H2 2024 Strategy')
('2024-03-05 13:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Lisa Kennedy', 'lisa.kennedy@lockheedmartincorporation.com', 'Lockheed Martin - Follow-up Discussion on ESG Investment Solutions')
('2024-02-27 10:00:00', 'Jane Moneypenny', 'jane.moneypenny1@outlook.com', 'Timothy Ochoa', 'timothy.ochoa@hyatthotels.com', 'Hyatt Hotels - Check Status of 401k Advisory RFI Launch')
('2024-02-20 11:00:00', 'Jane Moneypenny', 'j

In [16]:
#Check meeting_data
query = "when did Jane meet jennifer.phelps@aerovironment.com and what did they discussed"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Jane met Jennifer Phelps from AeroVironment on several occasions.  Their most recent meeting was on February 6th, 2024, to discuss the AeroVironment SCF Pilot Program Status Update.  Previous meetings covered topics such as SCF supplier onboarding strategies and the overall Supply Chain Finance program.  The earliest recorded meeting was on July 24th, 2018, for an initial discussion of financial needs.\n', source_nodes=[NodeWithScore(node=TextNode(id_='d61d880c-147e-45f3-a9ba-43547e1a7756', embedding=None, metadata={'sql_query': "SELECT meeting_timestamp, meeting_subject FROM meeting_data WHERE invitee_emails ~* '\\yjennifer\\.phelps\\@aerovironment\\.com\\y' ORDER BY meeting_timestamp DESC", 'result': [('2024-02-06 15:00:00', 'AeroVironment - SCF Pilot Program Status Update'), ('2023-08-01 14:00:00', 'AeroVironment - Discuss SCF Supplier Onboarding Strategy'), ('2021-07-28 14:00:00', 'AeroVironment - Discuss SCF Supplier Onboarding'), ('2020-07-22 16:00:00', 'AeroVi

In [17]:
query = "when did Jane meet with Jennifer"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Jennifer Phelps had several meetings:  July 24, 2018; July 22, 2020; July 28, 2021; August 1, 2023; and February 6, 2024.  The subjects of these meetings all relate to AeroVironment and its supply chain finance program.  To find meetings with a different person named Jennifer, or to narrow down the results further, please provide more details, such as the date range or a more specific description of the meeting topic.\n', source_nodes=[NodeWithScore(node=TextNode(id_='203eaf62-e3b8-457a-8c50-f388d56f52c0', embedding=None, metadata={'sql_query': "SELECT meeting_timestamp, invitees, invitee_emails, meeting_subject FROM meeting_data WHERE invitees ~* '\\yJennifer\\y' ORDER BY meeting_timestamp DESC", 'result': [('2024-02-06 15:00:00', 'Jennifer Phelps', 'jennifer.phelps@aerovironment.com', 'AeroVironment - SCF Pilot Program Status Update'), ('2023-08-01 14:00:00', 'Jennifer Phelps', 'jennifer.phelps@aerovironment.com', 'AeroVironment - Discuss SCF Supplier Onboarding St

In [20]:
query = "What's most recent meeting Jane had with Adam"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Jane\'s most recent meeting with Adam was on March 26th, 2024 at 11:00 AM.  The meeting was titled "Compass - Annual Credit Facility Review Meeting".\n', source_nodes=[NodeWithScore(node=TextNode(id_='35f9bad7-1dc2-4a6b-80e9-819e77dd6c8c', embedding=None, metadata={'sql_query': "SELECT meeting_timestamp, invitees, invitee_emails, meeting_subject FROM meeting_data WHERE invitees ~* '\\yAdam\\y' ORDER BY meeting_timestamp DESC LIMIT 1;", 'result': [('2024-03-26 11:00:00', 'Adam Clay', 'adam.clay@compass.com', 'Compass - Annual Credit Facility Review Meeting')], 'col_keys': ['meeting_timestamp', 'invitees', 'invitee_emails', 'meeting_subject']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text="[('2024-03-26 11:00:00', 'Adam Clay', 'adam.clay@compass.com', 'Compass - Annual Credit Facility Review Meeting')]"