# Initialize the Connector

All testing will require the Snowflake connector. We provide Snowflake the credentials and information to query here.
Note you will need `snowflake_utils.py` and a `creds.json` file with the relevant Snowflake credentials.

In [None]:

from snowflake_utils import *

import bodo

if bodo.get_rank() == 0:
    conn = get_snowflake_connection('creds.json', database="E3_PROD", warehouse="BODOW01")

# Testing Setup

In this script we assume that you have two files: `perf_query.sql` and `correctness_query.sql`.

`perf_query.sql` is the original query, but the destination has been replace with a Bodo specific target
table as well as any other changes needed to compile with Bodo syntax constraints.

`correctness_query.sql` is similar to `perf_query.sql` but it should contain additional filters so that it
can run on 1 small node and just verify correctness. You should determine these filters by gathering data
about relevant tables that are used, often looking at the join conditions.

# Table Setup

For testing you will need to create and drop tables for correctness and/or performance. If you are testing a query
with `Insert Into` you will need to create a table that matches the original target. We recommend using
`create table like` so you get the exact table metadata, but that will not copy data. To instead copy data you can
use `clone table`, but you may need to manually copy table clustering. When you are finish testing correctness
you should drop the tables.

In [None]:
# Create tables
if bodo.get_rank() == 0:
    # Correctness tables
    conn.cursor().execute("CREATE TABLE BODO.q27_correctness_output_snowflake like ...")
    conn.cursor().execute("CREATE TABLE BODO.q27_correctness_output_bodo like ...")
    # Performance tables
    conn.cursor().execute("CREATE TABLE BODO.q27_output_snowflake like ...")
    conn.cursor().execute("CREATE TABLE BODO.q27_output_bodo like ...")

In [None]:
# Drop tables
if bodo.get_rank() == 0:
    # Correctness tables
    conn.cursor().execute("DROP TABLE BODO.q27_correctness_output_snowflake")
    conn.cursor().execute("DROP TABLE BODO.q27_correctness_output_bodo")
    # Performance tables
    conn.cursor().execute("DROP TABLE BODO.q27_output_snowflake")
    conn.cursor().execute("DROP TABLE BODO.q27_output_bodo")

# Execute the Snowflake Query

To test things on Snowflake you can run your `correctness_query.sql` and `perf_query.sql` here. You will need to
change your destination tables for proper testing.

In [None]:
# Run the Snowflake Result

with open("correctness_query.sql", "r") as f:
    if bodo.get_rank() == 0:
        query = f.read()
        x = conn.cursor().execute(query)

# Check for correctness results

To check correctness you should compare the result in the Snowflake table with the result in the Bodo table.
We recommend first verifying sizes and then checking that all entries match.

In [None]:
# Check bodo output size
if bodo.get_rank() == 0:
    x = conn.cursor().execute("select count(*) from BODO.q27_correctness_output_bodo")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
# Check snowflake output size
if bodo.get_rank() == 0:
    x = conn.cursor().execute("select count(*) from BODO.q27_correctness_output_snowflake")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
# Directly compare results. If there are floating point columns you may need to exclude those as we
# may not match directly. This is also true if any data is generated that relies on exactly when the query is
# run (e.g. current_time()).
if bodo.get_rank() == 0:
    x = conn.cursor().execute("select * from BODO.q27_correctness_output_snowflake except select * from BODO.q27_correctness_output_bodo")
    res = x.fetch_pandas_all()
else:
    res = None
res

# Finding additional filters

Here are some example queries used to find additinoal filters. Notice that the main things we check for each table are
`count(*)` and `select A, count(*) as cnt from table group by A order by cnt DESC`. Looking at clustered keys
can help derive simple filters for testing correctness. We are also sure to manually copy any static filters so
we are looking at the correct data.

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select comp_mst_product_id, count(*) as cnt from link.links WHERE link_type in ('EXACT','GOOD', 'FAIR') and link_status = 'APPROVED' and client_id not in ('7c58cb6f-26c5-469d-8796-940c67cf2051', 'c098b626-4f3b-4db1-8526-a802d5573f7c', 
                           'ec380af9-7e76-403a-8cec-7ff125a99f3d', '88251f63-0680-4b59-8029-3b37a821b0a8') and ret_product_id in (57735178, 75779006, 17384476) group by comp_mst_product_id order by cnt desc""")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select count(*) from product.ret_product where ret_product_id in (57735178, 75779006, 17384476)""")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select count(*) from product.ret_product where mst_product_id in (-1, 1007388, 1195330, 1159305)""")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select ret_product_id, count(ret_product_id) as cnt from product.ret_product group by ret_product_id
    order by cnt DESC""")
    res = x.fetch_pandas_all()
else:
    res = None
res
# All unique

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select mst_product_id, count(mst_product_id) as cnt from product.ret_product group by mst_product_id
    order by cnt DESC""")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select ret_product_id, count(ret_product_id) as cnt from link.links WHERE link_type in ('EXACT','GOOD', 'FAIR') and link_status = 'APPROVED' and client_id not in ('7c58cb6f-26c5-469d-8796-940c67cf2051', 'c098b626-4f3b-4db1-8526-a802d5573f7c', 
                           'ec380af9-7e76-403a-8cec-7ff125a99f3d', '88251f63-0680-4b59-8029-3b37a821b0a8') group by ret_product_id
    order by cnt DESC""")
    res = x.fetch_pandas_all()
else:
    res = None
res

In [None]:
if bodo.get_rank() == 0:
    x = conn.cursor().execute("""select count(*) from coverage.product_coverage_store_set where 
    store_set_id = '93d5112a-47ef-4103-b94e-7aacd22f6012' and level = 0 and ret_product_id in (57735178, 75779006, 17384476)""")
    res = x.fetch_pandas_all()
else:
    res = None
res