# CRM Query Engine Test
This notebook demonstrates the implementation and usage of the SQL query engine.

In [14]:
from currensee.schema.schema import PostgresTables
from currensee.query_engines.sql_query_engine.query_engine import create_sql_workflow
from currensee.utils.db_utils import create_pg_engine
from currensee.query_engines.workflow_descriptions import crm_portfolio_table_desc, crm_client_alignment_table_desc,crm_client_info_table_desc, crm_employees_table_desc,crm_fund_details_desc 

In [15]:
# required to run asynchronous code

import nest_asyncio

nest_asyncio.apply()

## Create the SQL Workflow

The SQL workflow can take the following parameters:

1. source_db: the name of the database where the table is stored (e.g. `crm`)
2. source_tables: a list of the name(s) of the table(s) that we want the query engine to have access to
  * note that multiple tables can be passed - this is if you want the query engine to try to join tables
    in the queries that may have relationships to one another
  * THIS IS LEVEL 2!! So do not attempt until you get the hang of just using one table at a time!!
    
3. table_descriptions: a list of the description(s) of the table(s) passed above
4. text_to_sql_tmpl: a string containing the prompt telling the LLM how to produce the SQL query from the text given
   * defaults to the variable `text_to_sql_tmpl` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
5. response_synthesis_prompt_str: a string containing the prompt telling the LLM how to synthesize the final response from the SQL table(s)
   * defaults to the variable `response_synthesis_prompt_str` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
6. model: the name of the model to use for all of the tasks
   * defaults to `gemini-1.5-flash`
   * you may override this with any of the models defined at https://ai.google.dev/gemini-api/docs/models#model-variations using the string with dashes defined in the "Model variant" column.
   * **BE VERY CAREFUL TO PAY ATTENTION TO THE PRICING!!!!!** I recommend that you use the default model until you understand the other models better!!!
7. temperature: the temperature parameter to pass to the model
   * default is 0.0
   * the higher the temperature, the more creative it is. Recommend keeping low for the SQL query generation.

### Below is the default defined in `prompting.py`

In [16]:
text_to_sql_tmpl = """
    Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    You can order the results by the find_date column (from earliest to latest) to return the most interesting examples in the database.

    GUIDELINES:
    * Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
    * Pay attention to use only the column names that you can see in the schema description.
    * Be careful to not query for columns that do not exist.
    * Pay attention to which column is in which table.
    * Make sure to filter on all criteria mentioned in the query.
    * If using a LIMIT to restrict the results, make sure it comes only in the end of the query.

    IMPORTANT NOTE:
    * Use the ~* operator instead of = when filtering with WHERE on text columns.
    * Add word boundaries '\y' to the beginning and end of each search term in the query.

    You are required to use the following format, each taking one line:

    Question: Question here
    SQLQuery: SQL Query to run
    SQLResult: Result of the SQLQuery
    Answer: Final answer here

    Only use tables listed below.
    {schema}

    Question: {query_str}
    SQLQuery:

"""

### Below is the default defined in `prompting.py`

In [17]:
response_synthesis_prompt_str = """

    Query: {query_str}
    SQL: {sql_query}
    SQL Response: {context_str}

    IMPORTANT INSTRUCTIONS:
    * If SQL Response is empty or 0, apologise and mention that you could not find
     examples to answer the query.
    * In such cases, kindly nudge the user towards providing more details or refining
    their search.
    * Additionally, you can tell them to rephrase specific keywords.
    * Do not explicitly state phrases such as 'based on the SQL query executed' or related
     references to context in your Response.
    * Never mention the underlying sql query, or the underling sql tables and other database elements
    * Never mention that sql was used to answer this question

    Considering the IMPORTANT INSTRUCTIONS above, create an response using the information
    returned from the database and no prior knowledge.


    Response:
"""

### Define the DB information
**IMPORTANT**: The table names MUST be lowercase in order for the engine to find them.

In [29]:
source_db = 'crm'
table_description_mapping = {
    'employees': crm_employees_table_desc,
    'portfolio': crm_portfolio_table_desc,
    'fund_detail': crm_fund_details_desc,
    'client_alignment': crm_client_alignment_table_desc,
   # 'clients_contact': crm_client_info_table_desc
}

In [30]:
sql_workflow = create_sql_workflow(
    source_db = source_db,
    table_description_mapping=table_description_mapping,
    text_to_sql_tmpl=text_to_sql_tmpl,
    response_synthesis_prompt_str=response_synthesis_prompt_str
    
)

## Define the Query

In [31]:
query = "Who works for bankwell?"

## Retrieve and Output the Query

In [32]:
result = await sql_workflow.run(query=query)

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


In [33]:
result

Response(response="Here's a list of employees who work at Bankwell:\n\nJane Moneypenny, Russell Sherman, Veronica West, Marie Howard, Debra Kelly, Meghan Wiggins, Christopher Flores, Brandon Hernandez, Kevin Rocha, Karen Smith, Jennifer Ayala, Mark Davis, Kenneth Padilla, Mary Martinez, Jay Baker, Thomas Walter, Timothy Dyer, Teresa Carroll, Jacob Jennings, Daniel Wallace, Tonya Kidd, David Jones, Pamela Spencer, Amanda Allen, Adam Rodriguez, Thomas Watkins, Sabrina Gregory, Daniel Sawyer, Kathleen Kelley, Alexandria Collins, James Wolfe, Janice Tucker, Laura Brown, Lisa Mccormick, Michelle Mcbride, Nicholas Garcia, Garrett Swanson, Michael Wilson, Jessica Hanson, Michelle Smith, Jennifer Miller, Julie Jimenez, Alicia Peterson, Rhonda Jenkins, Peter Marshall, Laurie Turner, Benjamin Gilbert, Stephanie Gutierrez, Kelsey Charles, Jesus Yates, Matthew Stephenson, Samantha Barnes, Ann Mcdaniel, Catherine Hunter, Stephanie Garner, Anthony Mitchell, Julie Harmon, Amanda Gonzales, Karen Flore

# Test Queries

In [34]:
query = "what financial instruments does Mariott own?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Marriott appears to hold investments in Bond Funds (specifically, BND and TLT) and an Equity Fund (VSMPX).\n', source_nodes=[NodeWithScore(node=TextNode(id_='65f70a92-43fc-462e-9247-4091e8ebde6a', embedding=None, metadata={'sql_query': "SELECT symbol, fund_type FROM portfolio WHERE company ~* '\\yMariott\\y'", 'result': [('BND', 'Bond Fund'), ('VSMPX', 'Equity Fund'), ('TLT', 'Bond Fund')], 'col_keys': ['symbol', 'fund_type']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text="[('BND', 'Bond Fund'), ('VSMPX', 'Equity Fund'), ('TLT', 'Bond Fund')]", mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=None)], metadata={'65f70a92-43fc-462e-9247-4091e8ebde6a': {'sql_query': "SELECT symbol, fund_type FROM portfolio WHERE co

In [35]:
query = "how many funds does Broadcom own? What types of funds are they?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response='Broadcom owns 5 funds.  These funds are categorized as Bond Funds and Equity Funds.\n', source_nodes=[NodeWithScore(node=TextNode(id_='44dd1e35-ca85-4395-a650-5e4f723824bc', embedding=None, metadata={'sql_query': "SELECT COUNT(DISTINCT symbol), array_agg(DISTINCT fund_type) FROM portfolio WHERE company ~* '\\yBroadcom\\y';", 'result': [(5, ['Bond Fund', 'Equity Fund'])], 'col_keys': ['count', 'array_agg']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text="[(5, ['Bond Fund', 'Equity Fund'])]", mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=None)], metadata={'44dd1e35-ca85-4395-a650-5e4f723824bc': {'sql_query': "SELECT COUNT(DISTINCT symbol), array_agg(DISTINCT fund_type) FROM portfolio WHERE company ~* '\\yBroadco

In [36]:
query = "what employees work on the mariott client account?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="I apologize, but I couldn't find any employees currently assigned to the Marriott client account.  To help me find the information you need, could you please provide more details, perhaps a different way of specifying the client name, or a different search term?\n", source_nodes=[NodeWithScore(node=TextNode(id_='e456e386-315a-48f8-bc06-99396a31de50', embedding=None, metadata={'sql_query': "SELECT employee_first_name, employee_last_name FROM client_alignment WHERE company ~* '\\yMarriott\\y'", 'result': [], 'col_keys': ['employee_first_name', 'employee_last_name']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=None)], metadata={'e456e386-315a-48f8-bc06-99396a

In [38]:
query = "what is the contact email on the Marriot client account?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="I apologize, but I couldn't find a contact email for Marriott in our records.  To help me locate the correct information, could you please provide any additional details, such as a specific Marriott property or contact person?  Perhaps rephrasing your search terms might also be helpful.\n", source_nodes=[NodeWithScore(node=TextNode(id_='b5c9d0b1-0f37-44b7-b990-1ab6c5c90d33', embedding=None, metadata={'sql_query': "SELECT contact_email FROM client_alignment WHERE company ~* '\\yMarriot\\y'", 'result': [], 'col_keys': ['contact_email']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=None)], metadata={'b5c9d0b1-0f37-44b7-b990-1ab6c5c90d33': {'sql_query': "SELECT