# CRM Query Engine Test
This notebook demonstrates the implementation and usage of the SQL query engine.

In [1]:
from currensee.schema.schema import PostgresTables
from currensee.query_engines.sql_query_engine.query_engine import create_sql_workflow
from currensee.utils.db_utils import create_pg_engine
from currensee.query_engines.workflow_descriptions import crm_portfolio_table_desc, crm_client_alignment_table_desc,crm_client_info_table_desc, crm_employees_table_desc 

In [2]:
# required to run asynchronous code

import nest_asyncio

nest_asyncio.apply()

## Create the SQL Workflow

The SQL workflow can take the following parameters:

1. source_db: the name of the database where the table is stored (e.g. `crm`)
2. source_tables: a list of the name(s) of the table(s) that we want the query engine to have access to
  * note that multiple tables can be passed - this is if you want the query engine to try to join tables
    in the queries that may have relationships to one another
  * THIS IS LEVEL 2!! So do not attempt until you get the hang of just using one table at a time!!
    
3. table_descriptions: a list of the description(s) of the table(s) passed above
4. text_to_sql_tmpl: a string containing the prompt telling the LLM how to produce the SQL query from the text given
   * defaults to the variable `text_to_sql_tmpl` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
5. response_synthesis_prompt_str: a string containing the prompt telling the LLM how to synthesize the final response from the SQL table(s)
   * defaults to the variable `response_synthesis_prompt_str` defined in `currensee.query_engines.prompting.py`
   * you may override this by passing in your own string
6. model: the name of the model to use for all of the tasks
   * defaults to `gemini-1.5-flash`
   * you may override this with any of the models defined at https://ai.google.dev/gemini-api/docs/models#model-variations using the string with dashes defined in the "Model variant" column.
   * **BE VERY CAREFUL TO PAY ATTENTION TO THE PRICING!!!!!** I recommend that you use the default model until you understand the other models better!!!
7. temperature: the temperature parameter to pass to the model
   * default is 0.0
   * the higher the temperature, the more creative it is. Recommend keeping low for the SQL query generation.

### Below is the default defined in `prompting.py`

In [3]:
text_to_sql_tmpl = """
    Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
    You can order the results by the find_date column (from earliest to latest) to return the most interesting examples in the database.

    GUIDELINES:
    * Never query for all the columns from a specific table, only ask for a few relevant columns given the question.
    * Pay attention to use only the column names that you can see in the schema description.
    * Be careful to not query for columns that do not exist.
    * Pay attention to which column is in which table.
    * Make sure to filter on all criteria mentioned in the query.
    * If using a LIMIT to restrict the results, make sure it comes only in the end of the query.

    IMPORTANT NOTE:
    * Use the ~* operator instead of = when filtering with WHERE on text columns.
    * Add word boundaries '\y' to the beginning and end of each search term in the query.

    You are required to use the following format, each taking one line:

    Question: Question here
    SQLQuery: SQL Query to run
    SQLResult: Result of the SQLQuery
    Answer: Final answer here

    Only use tables listed below.
    {schema}

    Question: {query_str}
    SQLQuery:

"""

### Below is the default defined in `prompting.py`

In [4]:
response_synthesis_prompt_str = """

    Query: {query_str}
    SQL: {sql_query}
    SQL Response: {context_str}

    IMPORTANT INSTRUCTIONS:
    * If SQL Response is empty or 0, apologise and mention that you could not find
     examples to answer the query.
    * In such cases, kindly nudge the user towards providing more details or refining
    their search.
    * Additionally, you can tell them to rephrase specific keywords.
    * Do not explicitly state phrases such as 'based on the SQL query executed' or related
     references to context in your Response.
    * Never mention the underlying sql query, or the underling sql tables and other database elements
    * Never mention that sql was used to answer this question

    Considering the IMPORTANT INSTRUCTIONS above, create an response using the information
    returned from the database and no prior knowledge.


    Response:
"""

### Define the DB information
**IMPORTANT**: The table names MUST be lowercase in order for the engine to find them.

In [5]:
source_db = 'crm'
table_description_mapping = {
    'employees': crm_employees_table_desc,
    'portfolio': crm_portfolio_table_desc,
    #'client_alignment': crm_client_alignment_table_desc,
    #'clients_contact': crm_client_info_table_desc
}

In [6]:
sql_workflow = create_sql_workflow(
    source_db = source_db,
    table_description_mapping=table_description_mapping,
    text_to_sql_tmpl=text_to_sql_tmpl,
    response_synthesis_prompt_str=response_synthesis_prompt_str
    
)

## Define the Query

In [7]:
query = "Who works for bankwell?"

## Retrieve and Output the Query

In [8]:
result = await sql_workflow.run(query=query)

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


In [9]:
result

Response(response='The following individuals work at Bankwell: Jane Moneypenny, Michael Walker, Jennifer Duran, Tyler Mullen, John Crawford, Melissa Kramer, Christina Morse, Kim Hill, Justin Anderson, Jessica Webster, Anthony Frost, Fernando Anderson, Stephen Cooper, Jeremiah Gates, Sharon Bryan, Kristopher Gilbert, Sara Warren, Vincent Hughes, Breanna Owen, Timothy Hartman, Heather Bass, Erica David, Arthur Gilbert, Robert Pennington, Joanne Taylor, Mark Moore, Caleb Patel, Leah Coffey, William Brown, Angela Reid, David Freeman, Amy Parker, Patricia Beard, Amber Williams, Sue Johnson, Jamie Doyle, James Hill, Mark Johnson, Tyler Davis, Alyssa Michael, Tiffany Patel, Megan Green, Troy Fletcher, Megan Harris, Margaret Edwards, Donna Oconnell, Lauren Nelson, Kenneth Wolfe, Travis Chan, Mike Wilson, Peter Nelson, Tonya Little, Joseph Austin, Raymond Jackson, Stephanie Anderson, Vanessa Ferguson, Joseph Warner, Anthony Fritz, George Pearson, Bonnie Merritt, Kelly Clark, Stephen Snyder, Jes

# Test Queries

In [13]:
query = "what financial instruments does Mariott own?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="I found that Marriott appears to hold stocks in Amazon (AMZN), Microsoft (MSFT), Meta (META), and Apple (AAPL), as well as mutual funds SWPPX and VFIAX.  However, please note that this information may not be entirely comprehensive.  To get a more precise picture of Marriott's financial holdings, you might need to provide more specific details in your request.\n", source_nodes=[NodeWithScore(node=TextNode(id_='d351279b-098b-4dd3-b8d0-cfc5cf2b8b67', embedding=None, metadata={'sql_query': "SELECT instrument_type, symbol FROM portfolio WHERE company ~* '\\yMariott\\y'", 'result': [('Stock', 'AMZN'), ('Stock', 'MSFT'), ('Stock', 'META'), ('Mutual Fund', 'SWPPX'), ('Mutual Fund', 'VFIAX'), ('Stock', 'AAPL')], 'col_keys': ['instrument_type', 'symbol']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text="[('Stock'

In [17]:
query = "how many stocks does Broadcom own?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="I apologize, but I couldn't find any information about the number of stocks owned by Broadcom.  To help me find the answer, could you please provide more details or refine your request?  Perhaps rephrasing your search terms might be helpful.\n", source_nodes=[NodeWithScore(node=TextNode(id_='91dec136-3a35-4676-8b3d-f9f37763374f', embedding=None, metadata={'sql_query': "SELECT count(*) FROM portfolio WHERE company ~* '\\yBroadcom\\y' AND instrument_type = 'stock';", 'result': [(0,)], 'col_keys': ['count']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='[(0,)]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\n\n{content}'), score=None)], metadata={'91dec136-3a35-4676-8b3d-f9f37763374f': {'sql_query': "SELECT count(*) FROM portfolio W

In [41]:
query = "what employees work on the mariott client account?"
result = await sql_workflow.run(query=query)
result

Running step generate_sql_response
Step generate_sql_response produced event StopEvent


Response(response="I apologize, but I couldn't find any employees currently assigned to the Marriott client account.  To help me find the information you need, could you please provide more details, perhaps refining your search terms or specifying a particular Marriott property or project?  Rephrasing your request might also be helpful.\n", source_nodes=[NodeWithScore(node=TextNode(id_='aeef48d6-2fda-44d0-b485-67879fbeae15', embedding=None, metadata={'sql_query': "SELECT e.first_name, e.last_name FROM employees e JOIN portfolio p ON e.employee_id = p.account_id WHERE p.company ~* '\\yMarriott\\y'", 'result': [], 'col_keys': ['first_name', 'last_name']}, excluded_embed_metadata_keys=['sql_query', 'result', 'col_keys'], excluded_llm_metadata_keys=['sql_query', 'result', 'col_keys'], relationships={}, metadata_template='{key}: {value}', metadata_separator='\n', text='[]', mimetype='text/plain', start_char_idx=None, end_char_idx=None, metadata_seperator='\n', text_template='{metadata_str}\