# Load Prep

Using this notebook to test the structure and load of our data into the Postgres tables.

## Imports

In [41]:
import pandas as pd
import psycopg2
import os
import time
from dotenv import load_dotenv

## Get into format for load

My process for the test will be to load in one of the sample CSVs into a dataframe and loop through the values to get INSERT statements built. The actual workflow will be different start from the API instead of the CSV.

I think I'll test on one with a date value to make sure I get a chance to see how that preps. FCT_BETTING could be a good one.

### Load DataFrame

In [32]:
#read in fct_betting sample CSV
df = pd.read_csv('../data/Exports/fct_betting.csv')

df['last_update'] = pd.to_datetime(df['last_update'])

df.head()

Unnamed: 0.1,Unnamed: 0,game_id,sportsbook_id,last_update,total_over_under,over_odds,under_odds,home_spread,home_odds,away_spread,away_odds,home_ml_odds,away_ml_odds
0,0,20230924_DAL@ARI,1,2023-09-24 13:26:08.483308800,43.0,-110,-110,12.5,-105,-12.5,-115,525,-750
1,1,20230924_DAL@ARI,2,2023-09-24 13:26:08.483308800,43.0,-110,-110,13.0,-110,-13.0,-110,525,-750
2,2,20230924_DAL@ARI,3,2023-09-24 13:26:08.483308800,43.5,-105,-115,12.5,-110,-12.5,-110,520,-720
3,3,20230924_DAL@ARI,5,2023-09-24 13:26:08.483308800,43.0,-112,-109,13.0,-110,-13.0,-110,500,-770
4,4,20230924_DAL@ARI,6,2023-09-24 13:26:08.483308800,43.5,-110,-110,12.5,-110,-12.5,-110,550,-800


In [33]:
#drop unnamed column
df = df.drop(columns='Unnamed: 0')

df.head()

Unnamed: 0,game_id,sportsbook_id,last_update,total_over_under,over_odds,under_odds,home_spread,home_odds,away_spread,away_odds,home_ml_odds,away_ml_odds
0,20230924_DAL@ARI,1,2023-09-24 13:26:08.483308800,43.0,-110,-110,12.5,-105,-12.5,-115,525,-750
1,20230924_DAL@ARI,2,2023-09-24 13:26:08.483308800,43.0,-110,-110,13.0,-110,-13.0,-110,525,-750
2,20230924_DAL@ARI,3,2023-09-24 13:26:08.483308800,43.5,-105,-115,12.5,-110,-12.5,-110,520,-720
3,20230924_DAL@ARI,5,2023-09-24 13:26:08.483308800,43.0,-112,-109,13.0,-110,-13.0,-110,500,-770
4,20230924_DAL@ARI,6,2023-09-24 13:26:08.483308800,43.5,-110,-110,12.5,-110,-12.5,-110,550,-800


### Create INSERT Statements

I think there is a pandas function 'to_sql', but I want to be able to see under the hood a little more. So I want to itterate through the records to create SQL insert statements that can then be run using sqlalchemy or psycopg2.

In [34]:
#set data table name
schema_table = 'nfl.fct_betting'

In [38]:
#save column names to a list
column_names = df.columns.tolist()

In [39]:
#create empty list to store insert statements
insert_statements = []

for index, row in df.iterrows():
    values = (', '.join([str(val) if not isinstance(val, str) else f"'{val}'" for val in row]))
    insert_statement = f"INSERT INTO {schema_table} ({', '.join(column_names)}) VALUES ({values});"
    insert_statements.append(insert_statement)

In [40]:
for sql in insert_statements:
    print(sql)

INSERT INTO nfl.fct_betting (game_id, sportsbook_id, last_update, total_over_under, over_odds, under_odds, home_spread, home_odds, away_spread, away_odds, home_ml_odds, away_ml_odds) VALUES ('20230924_DAL@ARI', 1, 2023-09-24 13:26:08.483308800, 43.0, -110, -110, 12.5, -105, -12.5, -115, 525, -750);
INSERT INTO nfl.fct_betting (game_id, sportsbook_id, last_update, total_over_under, over_odds, under_odds, home_spread, home_odds, away_spread, away_odds, home_ml_odds, away_ml_odds) VALUES ('20230924_DAL@ARI', 2, 2023-09-24 13:26:08.483308800, 43.0, -110, -110, 13.0, -110, -13.0, -110, 525, -750);
INSERT INTO nfl.fct_betting (game_id, sportsbook_id, last_update, total_over_under, over_odds, under_odds, home_spread, home_odds, away_spread, away_odds, home_ml_odds, away_ml_odds) VALUES ('20230924_DAL@ARI', 3, 2023-09-24 13:26:08.483308800, 43.5, -105, -115, 12.5, -110, -12.5, -110, 520, -720);
INSERT INTO nfl.fct_betting (game_id, sportsbook_id, last_update, total_over_under, over_odds, under

Looks good! Now for the actual load test.

### Test Load to PostgreSQL Database

In [42]:
#setup connection variables
# Load .env to get PostgreSQL user login info
load_dotenv()

# set postgres access variables
db_host = 'localhost'
db_user = os.getenv('psql_username')
db_password = os.getenv('psql_password')
db_name = 'team_flow'

#put connection details into params variable
db_params = {
    'host': '{host}'.format(host=db_host),
    'database': '{database}'.format(database=db_name),  
    'user': '{user}'.format(user=db_user),
    'password': '{password}'.format(password=db_password)
}