## Outline of this code

The function of this code is to create a database suited for the purposes of this project from the data [provided by Wyatt Walsh on kaggle](https://www.kaggle.com/datasets/wyattowalsh/basketball) 

In [1]:
###     Dependencies



## Reflect original NBA database (did not work)

In [57]:
#   Dependencies: SQLAlchemy
import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, func, inspect

In [67]:
#   Create engine to nba.sqlite
engine = create_engine("sqlite:///../nba.sqlite")

#   Reflect existing database into a new model
Base = automap_base()

#   Reflect the tables
Base.prepare(autoload_with = engine)

In [69]:
# View all the classes automap found
Base.classes.keys()

[]

This is an unexpected error –– [this stack overflow forum](https://stackoverflow.com/questions/42946174/sqlalchemy-automap-not-generating-base-classes-table-name) discusses a potential issue with `automap_base()` not working with sqlite tables without primary keys. The Kaggle dataset also provides a csv, so I will manually set up a db instead.

# Database preparation

## Database Modelling

#### Draft 1

![alt](QuickDBD-export.png)

The above is the ERD I sketched based on the CSV files I was provided.

- kept variables of interest

- This proved to be quite out of scope for the task at hand (involving sqlalchemy and sqlite) –– A simpler single table will be used

#### Draft 2

![alt](QuickDBD-export2.png)

This single table is better suited for conveniently faciliating the analysis.

- note the underscore and (imported_from_______) "columns"; these are just to denote that these columns have to be joined.

## CSV Creation

In [96]:
#   Dependencies
import pandas as pd


In [92]:
#   Read in CSVs
team_details_df = pd.read_csv("./../csv/team_details.csv")
game_info_df = pd.read_csv("./../csv/game_info.csv")
game_df = pd.read_csv("./../csv/game.csv")

In [93]:
#   Select only columns of interest:
team_details_df = team_details_df[["team_id", "arena"]]
game_info_df = game_info_df[["game_id", "attendance"]]
game_df = game_df[["game_id", "game_date", "team_id_home", "team_id_away", "team_name_home", "team_abbreviation_home", "team_name_away", "team_abbreviation_away",
                     "matchup_home", "wl_home", "wl_away", "fgm_home", "fga_home", "fg_pct_home", "ftm_home", "fta_home", "ft_pct_home", "pts_home", "fgm_away", "fga_away", 
                     "fg_pct_away", "ftm_away", "fta_away", "ft_pct_away", "pts_away"]]

In [140]:
#   Merge the dataframes into a single df
new_game_df = pd.merge(game_df, game_info_df, how = "left", on = "game_id")

#   The purpose of merging the team_details_df on the other df's home id is because it is assumed that the arena in which the match takes place is in the home arena
team_details_df.rename(columns = {"team_id": "team_id_home"}, inplace = True)

new_game_df = pd.merge(new_game_df, team_details_df, how = "left", on = "team_id_home")

# Reorder columns
new_game_df = new_game_df[["game_id", "game_date", "arena", "attendance", "team_id_home", "team_id_away", "team_name_home", "team_abbreviation_home", "team_name_away", "team_abbreviation_away",
                     "matchup_home", "wl_home", "wl_away", "fgm_home", "fga_home", "fg_pct_home", "ftm_home", "fta_home", "ft_pct_home", "pts_home", "fgm_away", "fga_away", 
                     "fg_pct_away", "ftm_away", "fta_away", "ft_pct_away", "pts_away"]]

new_game_df.head()



Unnamed: 0,game_id,game_date,arena,attendance,team_id_home,team_id_away,team_name_home,team_abbreviation_home,team_name_away,team_abbreviation_away,...,fta_home,ft_pct_home,pts_home,fgm_away,fga_away,fg_pct_away,ftm_away,fta_away,ft_pct_away,pts_away
0,24600001,1946-11-01 00:00:00,,,1610610035,1610612752,Toronto Huskies,HUS,New York Knicks,NYK,...,29.0,0.552,66,24.0,,,20.0,26.0,0.769,68
1,24600003,1946-11-02 00:00:00,,,1610610034,1610610031,St. Louis Bombers,BOM,Pittsburgh Ironmen,PIT,...,,,56,16.0,72.0,0.222,19.0,,,51
2,24600002,1946-11-02 00:00:00,,,1610610032,1610612738,Providence Steamrollers,PRO,Boston Celtics,BOS,...,,,59,21.0,,,11.0,,,53
3,24600004,1946-11-02 00:00:00,,,1610610025,1610612752,Chicago Stags,CHS,New York Knicks,NYK,...,,,63,16.0,,,15.0,,,47
4,24600005,1946-11-02 00:00:00,,,1610610028,1610610036,Detroit Falcons,DEF,Washington Capitols,WAS,...,,,33,18.0,,,14.0,,,50


In [141]:
#   Export data frame as csv
new_game_df.to_csv("./csv/nba_games.csv", index = False)

---

## Database Creation

In [156]:
#   Dependencies
import pandas as pd

# SQLAlchemy
import sqlalchemy
from sqlalchemy import create_engine, inspect
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy import Column, Integer, String, Float, DateTime

In [160]:
#   Read in csv
games = pd.read_csv("./csv/nba_games.csv")

#   Set game_id as the index
games.set_index("game_id", inplace = True)

#Preview
games.head()

Unnamed: 0_level_0,game_date,arena,attendance,team_id_home,team_id_away,team_name_home,team_abbreviation_home,team_name_away,team_abbreviation_away,matchup_home,...,fta_home,ft_pct_home,pts_home,fgm_away,fga_away,fg_pct_away,ftm_away,fta_away,ft_pct_away,pts_away
game_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
24600001,1946-11-01 00:00:00,,,1610610035,1610612752,Toronto Huskies,HUS,New York Knicks,NYK,HUS vs. NYK,...,29.0,0.552,66,24.0,,,20.0,26.0,0.769,68
24600003,1946-11-02 00:00:00,,,1610610034,1610610031,St. Louis Bombers,BOM,Pittsburgh Ironmen,PIT,BOM vs. PIT,...,,,56,16.0,72.0,0.222,19.0,,,51
24600002,1946-11-02 00:00:00,,,1610610032,1610612738,Providence Steamrollers,PRO,Boston Celtics,BOS,PRO vs. BOS,...,,,59,21.0,,,11.0,,,53
24600004,1946-11-02 00:00:00,,,1610610025,1610612752,Chicago Stags,CHS,New York Knicks,NYK,CHS vs. NYK,...,,,63,16.0,,,15.0,,,47
24600005,1946-11-02 00:00:00,,,1610610028,1610610036,Detroit Falcons,DEF,Washington Capitols,WAS,DEF vs. WAS,...,,,33,18.0,,,14.0,,,50


In [161]:
#   Establish Base for table's class construction
Base = declarative_base()

# games table class construction
class Games(Base):
    __tablename__ = "games"

    game_id = Column(Integer, primary_key = True)
    game_date = Column(DateTime)
    arena = Column(String)
    attendance = Column(Integer)
    team_id_home = Column(Integer) 
    team_id_away = Column(Integer)
    team_name_home = Column(String)
    team_abbreviation_home = Column(String)
    team_name_away = Column(String)
    team_abbreviation_away = Column(String)
    matchup_home = Column(String)
    wl_home = Column(String)
    wl_away = Column(String)
    fgm_home = Column(Integer)
    fga_home = Column(Integer)
    fg_pct_home = Column(Integer)
    ftm_home = Column(Integer)
    fta_home = Column(Integer)
    ft_pct_home = Column(Integer)
    pts_home = Column(Integer)
    fgm_away = Column(Integer)
    fga_away = Column(Integer)
    fg_pct_away = Column(Integer)
    ftm_away = Column(Integer)
    fta_away = Column(Integer)
    ft_pct_away = Column(Integer)
    pts_away = Column(Integer)

In [162]:
#   Create engine
engine = create_engine("sqlite:///NBA.sqlite")

#   Connect to database
con = engine.connect()

#   Create the games table within the database
Base.metadata.create_all(con)

#    Import CSV into the database table
games.to_sql("games", con, if_exists = "append")


- if an error occurs during the above cell, consider whether it is due to the last method and the argument `if_exists`.

    - An existing database will not be able to append duplicate data, so consider a method of 'updating it'

        -   I would simply delete the table and use `declarative_base()` again, but I haven't tested it for updating

#