## Outline of this code

The function of this code is to create a database suited for the purposes of this project from the data [provided by Wyatt Walsh on kaggle](https://www.kaggle.com/datasets/wyattowalsh/basketball) 

In [1]:
###     Dependencies



## Reflect original NBA database (did not work)

In [57]:
#   Dependencies: SQLAlchemy
import sqlalchemy
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine, func, inspect

In [67]:
#   Create engine to nba.sqlite
engine = create_engine("sqlite:///../nba.sqlite")

#   Reflect existing database into a new model
Base = automap_base()

#   Reflect the tables
Base.prepare(autoload_with = engine)

In [69]:
# View all the classes automap found
Base.classes.keys()

[]

This is an unexpected error –– [this stack overflow forum](https://stackoverflow.com/questions/42946174/sqlalchemy-automap-not-generating-base-classes-table-name) discusses a potential issue with `automap_base()` not working with sqlite tables without primary keys. The Kaggle dataset also provides a csv, so I will manually set up a db instead.

# Database preparation

## Database Modelling

#### Draft 1

![alt](QuickDBD-export.png)

The above is the ERD I sketched based on the CSV files I was provided.

- kept variables of interest

- This proved to be quite out of scope for the task at hand (involving sqlalchemy and sqlite) –– A simpler single table will be used

#### Draft 2

![alt](QuickDBD-export2.png)

This single table is better suited for conveniently faciliating the analysis.

- note the underscore and (imported_from_______) "columns"; these are just to denote that these columns have to be joined.

## CSV Creation

In [11]:
#   Dependencies
import pandas as pd


In [12]:
#   Read in CSVs
team_details_df = pd.read_csv("./../csv/team_details.csv")
game_info_df = pd.read_csv("./../csv/game_info.csv")
game_df = pd.read_csv("./../csv/game.csv")

In [13]:
#   Select only columns of interest:
team_details_df = team_details_df[["team_id", "arena"]]
game_info_df = game_info_df[["game_id", "attendance"]]
game_df = game_df[["game_id", "game_date", "team_id_home", "team_id_away", "team_name_home", "team_abbreviation_home", "team_name_away", "team_abbreviation_away",
                     "matchup_home", "wl_home", "wl_away", "fgm_home", "fga_home", "fg_pct_home", "ftm_home", "fta_home", "ft_pct_home", "pts_home", "fgm_away", "fga_away", 
                     "fg_pct_away", "ftm_away", "fta_away", "ft_pct_away", "pts_away"]]

##### Manually create latitude and longitude columns tfor team_details_df

In [18]:
#   View df
coords =[
    [33.757222, -84.916944],
    [42.366303,-71.062228],
    [41.496389, -81.688056],
    [29.948889, -90.081944],
    [41.880556, -87.674167],
    [32.790556, -96.810278],    
    [39.748611, -105.0075],
    [37.768056, -122.3875],
    [29.750833, -95.362222],
    [34.043056, -118.267222],
    [34.043056, -118.267222],
    [25.781389, -80.188056],
    [43.043611, -87.916944],
    [44.979444, -93.276111],
    [40.68265, -73.974689],
    [40.750556, -73.993611],
    [28.539167, -81.383611],
    [39.763889, -86.155556],
    [39.901111, -75.171944],
    [33.445833, -112.071389],
    [45.531667, -122.666667],
    [38.649167, -121.518056],
    [29.426944, -98.4375],
    [35.463333, -97.515],
    [43.643333, -79.379167],
    [40.768333, -111.901111],
    [35.138333, -90.050556],
    [38.898056, -77.020833],
    [42.341111, -83.055],
    [35.225, -80.839167]
]

lat = [x[0] for x in coords]
lon = [x[1] for x in coords]


team_details_df["arena_lat"] = lat
team_details_df["arena_lon"] = lon

team_details_df.head()

Unnamed: 0,team_id,arena,arena_lat,arena_lon
0,1610612737,State Farm Arena,33.757222,-84.916944
1,1610612738,TD Garden,42.366303,-71.062228
2,1610612739,Rocket Mortgage FieldHouse,41.496389,-81.688056
3,1610612740,Smoothie King Center,29.948889,-90.081944
4,1610612741,United Center,41.880556,-87.674167


----

In [32]:
#   Merge the dataframes into a single df
new_game_df = pd.merge(game_df, game_info_df, how = "left", on = "game_id")

#   The purpose of merging the team_details_df on the other df's home id is because it is assumed that the arena in which the match takes place is in the home arena
team_details_df.rename(columns = {"team_id": "team_id_home"}, inplace = True)

new_game_df = pd.merge(new_game_df, team_details_df, how = "left", on = "team_id_home")

# Reorder columns
new_game_df = new_game_df[["game_id", "game_date", "arena","arena_lat", "arena_lon", "attendance", "team_id_home", "team_id_away", "team_name_home", "team_abbreviation_home", "team_name_away", "team_abbreviation_away",
                     "matchup_home", "wl_home", "wl_away", "fgm_home", "fga_home", "fg_pct_home", "ftm_home", "fta_home", "ft_pct_home", "pts_home", "fgm_away", "fga_away", 
                     "fg_pct_away", "ftm_away", "fta_away", "ft_pct_away", "pts_away"]]

#   Drop NA
new_game_df.dropna(inplace = True)

new_game_df.head()



Unnamed: 0,game_id,game_date,arena,arena_lat,arena_lon,attendance,team_id_home,team_id_away,team_name_home,team_abbreviation_home,...,fta_home,ft_pct_home,pts_home,fgm_away,fga_away,fg_pct_away,ftm_away,fta_away,ft_pct_away,pts_away
3460,25600076,1956-12-07 00:00:00,State Farm Arena,33.757222,-84.916944,5174.0,1610612737,1610612752,St. Louis Hawks,STL,...,45.0,0.778,101,33.0,76.0,0.434,41.0,55.0,0.745,107
3575,25600194,1957-02-02 00:00:00,Crypto.com Arena,34.043056,-118.267222,7123.0,1610612747,1610612737,Minneapolis Lakers,MNL,...,58.0,0.776,97,38.0,95.0,0.4,30.0,43.0,0.698,106
3609,25600226,1957-02-15 00:00:00,State Farm Arena,33.757222,-84.916944,6328.0,1610612737,1610612738,St. Louis Hawks,STL,...,43.0,0.698,116,44.0,119.0,0.37,35.0,48.0,0.729,123
3617,25600235,1957-02-19 00:00:00,State Farm Arena,33.757222,-84.916944,4128.0,1610612737,1610612765,St. Louis Hawks,STL,...,29.0,0.759,96,29.0,97.0,0.299,25.0,41.0,0.61,83
3674,25700001,1957-10-22 00:00:00,State Farm Arena,33.757222,-84.916944,9024.0,1610612737,1610612738,St. Louis Hawks,STL,...,54.0,0.741,90,44.0,111.0,0.396,27.0,38.0,0.711,115


In [33]:
#   Export data frame as csv
new_game_df.to_csv("./csv/nba_games.csv", index = False)

---

## Database Creation

In [34]:
#   Dependencies
import pandas as pd

# SQLAlchemy
import sqlalchemy
from sqlalchemy import create_engine, inspect
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
from sqlalchemy import Column, Integer, String, Float, DateTime

In [35]:
#   Read in csv
games = pd.read_csv("./csv/nba_games.csv")

#   Set game_id as the index
games.set_index("game_id", inplace = True)

#Preview
games.head()

Unnamed: 0_level_0,game_date,arena,arena_lat,arena_lon,attendance,team_id_home,team_id_away,team_name_home,team_abbreviation_home,team_name_away,...,fta_home,ft_pct_home,pts_home,fgm_away,fga_away,fg_pct_away,ftm_away,fta_away,ft_pct_away,pts_away
game_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
25600076,1956-12-07 00:00:00,State Farm Arena,33.757222,-84.916944,5174.0,1610612737,1610612752,St. Louis Hawks,STL,New York Knicks,...,45.0,0.778,101,33.0,76.0,0.434,41.0,55.0,0.745,107
25600194,1957-02-02 00:00:00,Crypto.com Arena,34.043056,-118.267222,7123.0,1610612747,1610612737,Minneapolis Lakers,MNL,St. Louis Hawks,...,58.0,0.776,97,38.0,95.0,0.4,30.0,43.0,0.698,106
25600226,1957-02-15 00:00:00,State Farm Arena,33.757222,-84.916944,6328.0,1610612737,1610612738,St. Louis Hawks,STL,Boston Celtics,...,43.0,0.698,116,44.0,119.0,0.37,35.0,48.0,0.729,123
25600235,1957-02-19 00:00:00,State Farm Arena,33.757222,-84.916944,4128.0,1610612737,1610612765,St. Louis Hawks,STL,Ft. Wayne Zollner Pistons,...,29.0,0.759,96,29.0,97.0,0.299,25.0,41.0,0.61,83
25700001,1957-10-22 00:00:00,State Farm Arena,33.757222,-84.916944,9024.0,1610612737,1610612738,St. Louis Hawks,STL,Boston Celtics,...,54.0,0.741,90,44.0,111.0,0.396,27.0,38.0,0.711,115


In [36]:
#   Establish Base for table's class construction
Base = declarative_base()

# games table class construction
class Games(Base):
    __tablename__ = "games"

    game_id = Column(Integer, primary_key = True)
    game_date = Column(DateTime)
    arena = Column(String)
    arena_lat = Column(Float)
    arena_lon = Column(Float)
    attendance = Column(Integer)
    team_id_home = Column(Integer) 
    team_id_away = Column(Integer)
    team_name_home = Column(String)
    team_abbreviation_home = Column(String)
    team_name_away = Column(String)
    team_abbreviation_away = Column(String)
    matchup_home = Column(String)
    wl_home = Column(String)
    wl_away = Column(String)
    fgm_home = Column(Integer)
    fga_home = Column(Integer)
    fg_pct_home = Column(Integer)
    ftm_home = Column(Integer)
    fta_home = Column(Integer)
    ft_pct_home = Column(Integer)
    pts_home = Column(Integer)
    fgm_away = Column(Integer)
    fga_away = Column(Integer)
    fg_pct_away = Column(Integer)
    ftm_away = Column(Integer)
    fta_away = Column(Integer)
    ft_pct_away = Column(Integer)
    pts_away = Column(Integer)

  


In [38]:
#   Create engine
engine = create_engine("sqlite:///NBA.sqlite")

#   Connect to database
con = engine.connect()

#   Create the games table within the database
Base.metadata.create_all(con)

#    Import CSV into the database table
games.to_sql("games", con, if_exists = "append")


- if an error occurs during the above cell, consider whether it is due to the last method and the argument `if_exists`.

    - An existing database will not be able to append duplicate data, so consider a method of 'updating it'

        -   I would simply delete the table and use `declarative_base()` again, but I haven't tested it for updating

#