#### SQL-PYDOUGH CODE TESTING NOTEBOOK

Setup for the PyDough package is done on the next cell, run it to import necessary packages

In [73]:
import pydough

%load_ext pydough.jupyter_extensions

#Necessary for comparison
import pandas as pd
from pandas.testing import assert_frame_equal, assert_series_equal
import re
import dfcompare

import collections
import numpy as np
import sqlite3 as sql
import os


The pydough.jupyter_extensions extension is already loaded. To reload it, use:
  %reload_ext pydough.jupyter_extensions


### Now we can set the SQLite database and connect it to PyDough. Please change the next strings to match: 
1. .sql filename to initialize the database
2. Metadata path for the graphs
3. Graph name of the graph you want to use

In [65]:
#YOUR .SQL FILE TO CREATE THE DATABASE, COPY IT TO THIS FOLDER.
SQL_filename = 'broker_sqlite.sql'

#METADATA FOR THE GRAPH .JSON
metadata_path = "../../tests/test_metadata/defog_graphs.json"

#GRAPH NAME
graph_name = "Broker"

#DESIRED DATABASE NAME
DB_name = "DATABASE.db"



with open(SQL_filename, 'r') as sql_file:
    sql_script = sql_file.read()

os.remove(DB_name)
connection = sql.connect(DB_name)
cursor = connection.cursor()
cursor.executescript(sql_script)

pydough.active_session.load_metadata_graph(metadata_path, graph_name)
pydough.active_session.connect_database("sqlite", database=DB_name)

DatabaseContext(connection=<pydough.database_connectors.database_connector.DatabaseConnection object at 0x7f8707c65a90>, dialect=<DatabaseDialect.SQLITE: 'sqlite'>)

### Graph Structure
In case you want to have the structure of the graph to understand the relations and names, you can run this next cell and select "View as a scrollable element" at the bottom of the result to be able to see the full structure in case the result does not display the complete list

In [66]:
graph = pydough.active_session.metadata
print(pydough.explain_structure(graph))

Structure of PyDough graph: Broker

  Customers
  ├── _id
  ├── address1
  ├── address2
  ├── city
  ├── country
  ├── email
  ├── join_date
  ├── name
  ├── phone
  ├── postal_code
  ├── state
  ├── status
  └── transactions_made [multiple Transactions] (reverse of Transactions.customer)

  DailyPrices
  ├── close
  ├── date
  ├── epoch_ms
  ├── high
  ├── low
  ├── open
  ├── source
  ├── ticker_id
  ├── volume
  └── ticker [one member of Tickers] (reverse of Tickers.historical_prices)

  Tickers
  ├── _id
  ├── currency
  ├── db2x
  ├── exchange
  ├── is_active
  ├── name
  ├── symbol
  ├── ticker_type
  ├── historical_prices [multiple DailyPrices] (reverse of DailyPrices.ticker)
  └── transactions_of [multiple Transactions] (reverse of Transactions.ticker)

  Transactions
  ├── amount
  ├── commission
  ├── currency
  ├── customer_id
  ├── date_time
  ├── kpx
  ├── price
  ├── settlement_date_str
  ├── shares
  ├── status
  ├── tax
  ├── ticker_id
  ├── transaction_id
  ├── transac

### SQL Test template
You can use this template to run your SQL code and visually compare the results to those of the PyDough code.
Just paste your SQL code inside the ''' ''''. You can also copy this template and paste is wherever you neet to.
Remember to use the column and table names from the original .sql file

In [92]:
query = '''
 SELECT
    *
 FROM
    sbCustomer
'''
sql_output = pd.read_sql_query(query, connection)
sql_output

Unnamed: 0,sbCustId,sbCustName,sbCustEmail,sbCustPhone,sbCustAddress1,sbCustAddress2,sbCustCity,sbCustState,sbCustCountry,sbCustPostalCode,sbCustJoinDate,sbCustStatus
0,C001,john doe,john.doe@email.com,555-123-4567,123 Main St,,Anytown,CA,USA,90001,2020-01-01,active
1,C002,Jane Smith,jane.smith@email.com,555-987-6543,456 Oak Rd,,Someville,NY,USA,10002,2019-03-15,active
2,C003,Bob Johnson,bob.johnson@email.com,555-246-8135,789 Pine Ave,,Mytown,TX,USA,75000,2022-06-01,inactive
3,C004,Samantha Lee,samantha.lee@email.com,555-135-7902,246 Elm St,,Yourtown,CA,USA,92101,2018-09-22,suspended
4,C005,Michael Chen,michael.chen@email.com,555-864-2319,159 Cedar Ln,,Anothertown,FL,USA,33101,2021-02-28,active
5,C006,Emily Davis,emily.davis@email.com,555-753-1904,753 Maple Dr,,Mytown,TX,USA,75000,2020-07-15,active
6,C007,David Kim,david.kim@email.com,555-370-2648,864 Oak St,,Anothertown,FL,USA,33101,2022-11-05,active
7,C008,Sarah Nguyen,sarah.nguyen@email.com,555-623-7419,951 Pine Rd,,Yourtown,CA,USA,92101,2019-04-01,closed
8,C009,William Garcia,william.garcia@email.com,555-148-5326,258 Elm Ave,,Anytown,CA,USA,90001,2021-08-22,active
9,C010,Jessica Hernandez,jessica.hernandez@email.com,555-963-8520,147 Cedar Blvd,,Someville,NY,USA,10002,2020-03-10,inactive


### Pydough template
The important part about this template is to run the PyDough code and store it in a variable called pydough_output for future comparison.

In [94]:
%%pydough

#Setting up the tables that we will need information from in the context
tables = Customers

#The condition we would like the content to fulfill
filter = Customers

#The information we want to receive in the resulting table
output = filter

#Execute the PyDough code
pydough_output = pydough.to_df(output)
pydough_output

Unnamed: 0,_id,name,email,phone,address1,address2,city,state,country,postal_code,join_date,status
0,C001,john doe,john.doe@email.com,555-123-4567,123 Main St,,Anytown,CA,USA,90001,2020-01-01,active
1,C002,Jane Smith,jane.smith@email.com,555-987-6543,456 Oak Rd,,Someville,NY,USA,10002,2019-03-15,active
2,C003,Bob Johnson,bob.johnson@email.com,555-246-8135,789 Pine Ave,,Mytown,TX,USA,75000,2022-06-01,inactive
3,C004,Samantha Lee,samantha.lee@email.com,555-135-7902,246 Elm St,,Yourtown,CA,USA,92101,2018-09-22,suspended
4,C005,Michael Chen,michael.chen@email.com,555-864-2319,159 Cedar Ln,,Anothertown,FL,USA,33101,2021-02-28,active
5,C006,Emily Davis,emily.davis@email.com,555-753-1904,753 Maple Dr,,Mytown,TX,USA,75000,2020-07-15,active
6,C007,David Kim,david.kim@email.com,555-370-2648,864 Oak St,,Anothertown,FL,USA,33101,2022-11-05,active
7,C008,Sarah Nguyen,sarah.nguyen@email.com,555-623-7419,951 Pine Rd,,Yourtown,CA,USA,92101,2019-04-01,closed
8,C009,William Garcia,william.garcia@email.com,555-148-5326,258 Elm Ave,,Anytown,CA,USA,90001,2021-08-22,active
9,C010,Jessica Hernandez,jessica.hernandez@email.com,555-963-8520,147 Cedar Blvd,,Someville,NY,USA,10002,2020-03-10,inactive


Comparison template: Run this to compare the two data frames you have obtained as a result of the queries

In [95]:
dfcompare.compare_df(pydough_output, sql_output, query_category="a", question="a")


True