<a href="https://colab.research.google.com/github/wyattowalsh/sports-analytics/blob/main/basketball/notebooks/data_collection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align='center'> Basketball Data Collection </h1>

This notebook contains the associated work necessary to collect the data that composes the [***Kaggle Basketball Dataset*** (wyattowalsh/basketball)](https://www.kaggle.com/wyattowalsh/basketball) and serves as the foundation for the [basketball related projects](https://github.com/wyattowalsh/sports-analytics/tree/main/basketball) within my [sports analytics GitHub repository](https://github.com/wyattowalsh/sports-analytics).

One of the goals for the data collection component of this project is to produce a `robust`, *organized* dataset that can grow to as **large of a scale** as possible. You can find an explanation of my solution for storing the files related to the [***Basketball Dataset***](https://www.kaggle.com/wyattowalsh/basketball) below.

<img src="https://unsplash.com/photos/Kv-gAzpUSRg/download?force=true">

## Overview

***Kaggle*** offers many formats of which one can save files to a dataset, which include: `CSV`, `JSON`, `SQLite`, and `Archives`, among others. The platform essentially acts similarly to industrial cloud solutions like *Google Cloud Platform's* (**GCP**) ***Cloud Storage*** or *Amazon Web Service's* (**AWS**) ***S3*** albeit with a **100GB** storage capacity. ***Kaggle*** datasets as well as these industrial solutions can be considered as broad object/file storage and in certain data engineering paradigms can serve as data lakes. 

It seems that many state-of-the-art (SOTA) data storage solutions pivot around an organizational-wide data lake (of which itself allows for general object storage) that has multiple inputs (*"tributaries"*) both streaming into and routinely added to the overall lake. One benefit of this paradigm is that the lake facilitates the storage of both structured (tabular) and unstructured (image, video, audio, text, etc) data. This can prove useful because, as time progresses, new techniques for extracting useful information from unstructured data can be utilized. Thus it also seems like a good idea to hold onto all extracted data, if possible. 

***Kaggle*** datasets can serve as data lakes through the archival process or simply by storing data files in their raw file format. This certainly serves as a strong foundation for building a &#8212; one day in the future &#8212; <b><i>"big data"</i></b> collection. 

However, there is further work that can be done in configuring ***Kaggle*** datasets to enable additional platform functionality as well as improved storage efficiency. Structured data, whether structured upon extraction or structured through some pre-processing, can be stored in a ***SQLite*** database (`.sqlite` file type) as opposed to storing individual files such as `CSVs` or `JSONs` within the dataset. Thus, a single database file is stored as an object within the dataset, enabling additional functionality. One easily discerned advantage with storing in ***SQLite*** is that histograms of the distribution of across continuous variables are given directly within ***Kaggle***. 

As this project moves forward, I hope to collect a large collection of both structured and unstructured data. I hope that the ***SQLite*** database (`basketball.sqlite`) can serve to house the structured data in an efficient, useful format, similarly to the [***European Soccer Database***](https://www.kaggle.com/hugomathien/soccer).

## View System Information

In [None]:
print("********************** CUDA Version ********************** \n - \n")
!nvcc --version
print("********************** CPU Info ********************** \n - \n")
!cat /proc/cpuinfo
print("********************** CPU Count ********************** \n - \n")
import os
print(os.cpu_count())
print("********************** GPU Info ********************** \n - \n")
!nvidia-smi
print("********************** Python Version ********************** \n - \n")
!python -V

********************** CUDA Version ********************** 
 - 

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
********************** CPU Info ********************** 
 - 

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2299.998
cache size	: 46080 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_singl

## Prepare Development Environment

### Clone Project Repository and Install Dependencies

In [None]:
# remove sample data and clone repo
!rm -r sample_data/
!rm -r sports-analytics/
!git clone https://github.com/wyattowalsh/sports-analytics.git

# change directory to directory that contains this notebook
%cd /content/sports-analytics/basketball/notebooks/

# install dependencies
!pip install -r ../../dependencies/basketball/data_collection.txt

rm: cannot remove 'sample_data/': No such file or directory
rm: cannot remove 'sports-analytics/': No such file or directory
Cloning into 'sports-analytics'...
remote: Enumerating objects: 387, done.[K
remote: Counting objects: 100% (387/387), done.[K
remote: Compressing objects: 100% (278/278), done.[K
remote: Total 387 (delta 144), reused 268 (delta 64), pack-reused 0[K
Receiving objects: 100% (387/387), 768.27 KiB | 8.00 MiB/s, done.
Resolving deltas: 100% (144/144), done.
/content/sports-analytics/basketball/notebooks


### Import Dependencies and Enable Tools

In [1]:
# nba_api dependencies
from nba_api.stats.static import players, teams
from nba_api.stats.endpoints import commonplayerinfo, playercareerstats, teamdetails, leaguegamelog, boxscoresummaryv2

# datascience stack
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn
import sqlite3 as sql

# system utility stack
import os
import time
import urllib
from functools import partial

pd.options.display.max_columns = None

# # Upload kaggle.json to /content/
# from google.colab import files 
# uploaded = files.upload()

# Move and change permissions as needed, allowing for import
# !mkdir -p ~/.kaggle/ && cp kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json
import kaggle

# # change directory to directory that contains this notebook
# %cd /content/sports-analytics/basketball/notebooks/

# # utilize Colab Monitor
# from urllib.request import urlopen
# exec(urlopen("http://colab-monitor.smankusors.com/track.py").read())
# _colabMonitor = ColabMonitor().start()

## Collect Data

### Connect to Database

In [2]:
conn = sql.connect('../data/basketball.sqlite')

### Players

#### Get Players DataFrame and Type ID as String

In [None]:
df_players = pd.DataFrame(players.get_players()).astype({'id': 'str'})
df_players

Unnamed: 0,id,full_name,first_name,last_name,is_active
0,76001,Alaa Abdelnaby,Alaa,Abdelnaby,False
1,76002,Zaid Abdul-Aziz,Zaid,Abdul-Aziz,False
2,76003,Kareem Abdul-Jabbar,Kareem,Abdul-Jabbar,False
3,51,Mahmoud Abdul-Rauf,Mahmoud,Abdul-Rauf,False
4,1505,Tariq Abdul-Wahad,Tariq,Abdul-Wahad,False
...,...,...,...,...,...
4496,1627790,Ante Zizic,Ante,Zizic,True
4497,78647,Jim Zoet,Jim,Zoet,False
4498,78648,Bill Zopf,Bill,Zopf,False
4499,1627826,Ivica Zubac,Ivica,Zubac,True


In [None]:
df_players.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4501 entries, 0 to 4500
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          4501 non-null   object
 1   full_name   4501 non-null   object
 2   first_name  4501 non-null   object
 3   last_name   4501 non-null   object
 4   is_active   4501 non-null   bool  
dtypes: bool(1), object(4)
memory usage: 145.2+ KB


#### Add Dataframe as Table to Database, Unless it Already Exists

In [None]:
try:
    df_players.to_sql('Player', conn, index=False)
except:
    pass

### Teams

#### Get Teams DataFrame, Type ID as String and Convert Year to Datetime

In [None]:
df_teams = pd.DataFrame(teams.get_teams()).astype({'id': 'str'})
df_teams['year_founded'] =  pd.to_datetime(df_teams['year_founded'], format='%Y').dt.year # convert year to datetime type
df_teams.head()

Unnamed: 0,id,full_name,abbreviation,nickname,city,state,year_founded
0,1610612737,Atlanta Hawks,ATL,Hawks,Atlanta,Atlanta,1949
1,1610612738,Boston Celtics,BOS,Celtics,Boston,Massachusetts,1946
2,1610612739,Cleveland Cavaliers,CLE,Cavaliers,Cleveland,Ohio,1970
3,1610612740,New Orleans Pelicans,NOP,Pelicans,New Orleans,Louisiana,2002
4,1610612741,Chicago Bulls,CHI,Bulls,Chicago,Illinois,1966


In [None]:
df_teams.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            30 non-null     object
 1   full_name     30 non-null     object
 2   abbreviation  30 non-null     object
 3   nickname      30 non-null     object
 4   city          30 non-null     object
 5   state         30 non-null     object
 6   year_founded  30 non-null     int64 
dtypes: int64(1), object(6)
memory usage: 1.8+ KB


#### Add Dataframe as Table to Database, Unless it Already Exists

In [None]:
try:
    df_teams.to_sql('Team', conn, index=False)
except:
    pass

### Common Player Information

In [None]:
# define function to extract common player info for a single player
def get_common_player_info(player_id, proxies):
    # define helpful variables
    no_res = True
    proxy_collection_counter = 0
    proxy_index = 0
    # while no response
    while no_res:
        # try getting a response without a proxy
        try:
            res = commonplayerinfo.CommonPlayerInfo(player_id=player_id, timeout=3).get_data_frames()
            no_res = False
            print(player_id)
            break
        except:
            # if that fails
            while no_res:
                # try getting with a certain proxy
                try: 
                    res = commonplayerinfo.CommonPlayerInfo(player_id=player_id, proxy="http://" + proxies[proxy_index], timeout=3).get_data_frames()
                    no_res = False
                    break
                except:
                    # if that fails, move on to next proxy unless out of proxies
                    if (proxy_index + 1) >= len(proxies):
                        # unless tried proxies 5 times
                        if proxy_collection_counter < 6:
                            # if out of proxies: get more proxies, fix counters, and try without a proxy again
                            proxy_index = 0
                            proxy_collection_counter = proxy_collection_counter + 1
                            print(player_id, ' failed {} times'.format(proxy_collection_counter))
                            proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1000&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]
                            break
                        else:
                            return None
                    else:
                        proxy_index = proxy_index + 1
                        
    # merge the common player info and player headline stats and drop timeframe                   
    res_df = pd.merge(res[0], res[1], how='left', left_on=['PERSON_ID', 'DISPLAY_FIRST_LAST'], right_on=['PLAYER_ID', 'PLAYER_NAME'])
    res_df = res_df.drop(['TimeFrame'], axis=1)
    return res_df

# get proxies
proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1000&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]

# get common player info for each player in the db
dfs = []
player_ids = pd.read_sql('SELECT id FROM Player', conn)['id'].values
dfs = [get_common_player_info(player_id=player_id, proxies=proxies) for player_id in player_ids]
df = pd.concat(dfs)
df.head()

76001
76002
76003
51
1505
949
76005
76006
76007
203518
101165
76008
76009
76010
203112
76011
76012
200801
1629121
203919
149
203500
912
1628389
1629061
76015
202399
201167
200772
76016
201336
76017
201582
76018
203006
1629152
202374
76019
76020
203128
202332
200746
76021
1626146
724
2042
76022
201570
1629734
2349
1629638
76024
1628959
76028
1628960
1628386
706
1628443
202730
76027
2124
76025
951
2754
76029
200984
76030
201165
308
1747
1824
680
732
202329
200811
76034
2365
2431
101187
202079
76035
76036
1507
76037
944
246
202341
76040
1626147
72
76041
203937
76042
76043
98
76045
76046
201583
1000
335
76048
101149
76049
1628387
76050
1512
203507
1628961
203648
2546
21
201202
203544
1628384
203951
2737
76053
76054
2425
1627853
76055
2240
2772
76056
76057
769
76061
2220
353
200756
76060
76059
76062
76063
76064
2306
201589
76065
1628503
201600
355
173
76068
76069
1088
76070
76071
278
201571
200788
1134
76073
76074
138
76076
1895
76078
202970
201965
1629028
101235
203569
202337
76079
76080
1

203114
77589
2036
77590
600012
77591
77593
77594
101223
101139
2032
77596
1527
2545
77598
1889
292
77605
77603
1802
203121
77602
77599
77600
77601
1626259
2034
932
203113
397
77604
168
77606
201988
371
202407
200794
1629003
114
77609
77610
65
77612
202703
77613
1954
1628378
77615
77616
417
77614
203183
203502
1628513
77618
77619
1749
211
77621
77622
1737
2040
77623
1629690
77624
77625
77626
77627
2752
1628370
600006
202328
77628
1626242
77630
376
200081
77631
1629760
77632
1628500
202734
77634
77633
77635
1630
77636
77637
929
1629630
203961
77639
77640
1629752
77641
77642
77644
356
202721
77647
1628537
202693
77646
1628420
201043
2242
202694
200747
77648
77649
201627
132
1972
77652
77653
77654
734
202700
1628405
2069
203102
297
202389
77657
1626144
77658
203498
77659
77660
201957
904
77662
942
204098
77664
77665
785
49
77668
77669
77672
203513
77670
77671
203122
77673
77674
2211
1627749
2436
1627750
77675
441
145
77676
203315
1629058
203488
77677
1054
87
982
939
1629004
1626122  failed

Unnamed: 0,PERSON_ID,FIRST_NAME,LAST_NAME,DISPLAY_FIRST_LAST,DISPLAY_LAST_COMMA_FIRST,DISPLAY_FI_LAST,PLAYER_SLUG,BIRTHDATE,SCHOOL,COUNTRY,LAST_AFFILIATION,HEIGHT,WEIGHT,SEASON_EXP,JERSEY,POSITION,ROSTERSTATUS,GAMES_PLAYED_CURRENT_SEASON_FLAG,TEAM_ID,TEAM_NAME,TEAM_ABBREVIATION,TEAM_CODE,TEAM_CITY,PLAYERCODE,FROM_YEAR,TO_YEAR,DLEAGUE_FLAG,NBA_FLAG,GAMES_PLAYED_FLAG,DRAFT_YEAR,DRAFT_ROUND,DRAFT_NUMBER,PLAYER_ID,PLAYER_NAME,PTS,AST,REB,ALL_STAR_APPEARANCES,PIE
0,76001,Alaa,Abdelnaby,Alaa Abdelnaby,"Abdelnaby, Alaa",A. Abdelnaby,alaa-abdelnaby,1968-06-24T00:00:00,Duke,USA,Duke/USA,6-10,240,4,30,Forward,Inactive,N,1610612757,Trail Blazers,POR,blazers,Portland,HISTADD_alaa_abdelnaby,1990,1994,N,Y,Y,1990,1,25,76001,Alaa Abdelnaby,5.7,0.3,3.3,0,
0,76002,Zaid,Abdul-Aziz,Zaid Abdul-Aziz,"Abdul-Aziz, Zaid",Z. Abdul-Aziz,zaid-abdul-aziz,1946-04-07T00:00:00,Iowa State,USA,Iowa State/USA,6-9,235,9,54,Center,Inactive,N,1610612745,Rockets,HOU,rockets,Houston,HISTADD_zaid_abdul-aziz,1968,1977,N,Y,Y,1968,1,5,76002,Zaid Abdul-Aziz,9.0,1.2,8.0,0,
0,76003,Kareem,Abdul-Jabbar,Kareem Abdul-Jabbar,"Abdul-Jabbar, Kareem",K. Abdul-Jabbar,kareem-abdul-jabbar,1947-04-16T00:00:00,UCLA,USA,UCLA/USA,7-2,225,19,33,Center,Inactive,N,1610612747,Lakers,LAL,lakers,Los Angeles,HISTADD_kareem_abdul-jabbar,1969,1988,N,Y,Y,1969,1,1,76003,Kareem Abdul-Jabbar,24.6,3.6,11.2,18,
0,51,Mahmoud,Abdul-Rauf,Mahmoud Abdul-Rauf,"Abdul-Rauf, Mahmoud",M. Abdul-Rauf,mahmoud-abdul-rauf,1969-03-09T00:00:00,Louisiana State,USA,Louisiana State/USA,6-1,162,8,1,Guard,Inactive,N,1610612743,Nuggets,DEN,nuggets,Denver,mahmoud_abdul-rauf,1990,2000,N,Y,Y,1990,1,3,51,Mahmoud Abdul-Rauf,14.6,3.5,1.9,0,
0,1505,Tariq,Abdul-Wahad,Tariq Abdul-Wahad,"Abdul-Wahad, Tariq",T. Abdul-Wahad,tariq-abdul-wahad,1974-11-03T00:00:00,San Jose State,France,San Jose State/France,6-6,235,6,9,Forward-Guard,Inactive,N,1610612758,Kings,SAC,kings,Sacramento,tariq_abdul-wahad,1997,2003,N,Y,Y,1997,1,11,1505,Tariq Abdul-Wahad,7.8,1.1,3.3,0,


In [None]:
dff = df.copy()
df.rename(columns = {'PERSON_ID':'ID'}, inplace = True)
df['ID'] = df["ID"].astype(str)
df['BIRTHDATE'] = df['BIRTHDATE'].astype(str)
df['HEIGHT'] = df['HEIGHT'].apply(lambda x: int(x.split('-')[0])*12 + int(x.split('-')[1]) \
                                  if len(x.strip()) > 0 else np.nan)
df['WEIGHT'] = df['WEIGHT'].apply(lambda x: int(x) if len(x.strip()) > 0 else np.nan)
df['TEAM_ID'] = df['TEAM_ID'].astype(str)
df['FROM_YEAR'] = df['FROM_YEAR'].astype(str)
df['TO_YEAR'] = df['TO_YEAR'].astype(str)
df['DRAFT_YEAR'] = df['DRAFT_YEAR'].astype(str)
df['DRAFT_ROUND'] = df['DRAFT_ROUND'].astype(str)
df['DRAFT_NUMBER'] = df['DRAFT_NUMBER'].astype(str)
df = df.drop(['PLAYER_ID', 'PLAYER_NAME'], axis=1)
df['PTS'] = pd.to_numeric(df['PTS'], errors='coerce')
df['AST'] = pd.to_numeric(df['AST'], errors='coerce')
df['REB'] = pd.to_numeric(df['REB'], errors='coerce')
df['ALL_STAR_APPEARANCES'] = pd.to_numeric(df['ALL_STAR_APPEARANCES'], errors='coerce')
df['PIE'] = pd.to_numeric(df['PIE'], errors='coerce')
df = df.reset_index(drop=True)

In [None]:
df.describe()

Unnamed: 0,HEIGHT,WEIGHT,SEASON_EXP,PTS,AST,REB,ALL_STAR_APPEARANCES,PIE
count,4403.0,4399.0,4500.0,4485.0,4485.0,4193.0,4056.0,429.0
mean,78.070634,211.146852,4.295556,6.357101,1.436299,2.981588,0.343195,0.084002
std,3.638698,27.075019,4.492292,4.92501,1.409078,2.288022,1.474006,0.088903
min,65.0,133.0,0.0,0.0,0.0,0.0,0.0,-1.5
25%,75.0,190.0,1.0,2.8,0.5,1.4,0.0,0.067
50%,78.0,210.0,3.0,5.1,1.0,2.4,0.0,0.089
75%,81.0,230.0,7.0,8.6,1.9,4.0,0.0,0.112
max,91.0,360.0,22.0,31.8,11.2,22.9,18.0,0.211


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4500 entries, 0 to 4499
Data columns (total 37 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   ID                                4500 non-null   object 
 1   FIRST_NAME                        4500 non-null   object 
 2   LAST_NAME                         4500 non-null   object 
 3   DISPLAY_FIRST_LAST                4500 non-null   object 
 4   DISPLAY_LAST_COMMA_FIRST          4500 non-null   object 
 5   DISPLAY_FI_LAST                   4500 non-null   object 
 6   PLAYER_SLUG                       4500 non-null   object 
 7   BIRTHDATE                         4500 non-null   object 
 8   SCHOOL                            4497 non-null   object 
 9   COUNTRY                           4500 non-null   object 
 10  LAST_AFFILIATION                  4500 non-null   object 
 11  HEIGHT                            4403 non-null   float64
 12  WEIGHT

In [None]:
df.to_sql('Player_Attributes', conn, index=False, if_exists='replace')

In [None]:
try:
    df.to_sql('Player_Attributes', conn, index=False)
except:
    pass

#### Upload to Kaggle

In [None]:
!kaggle datasets version -p ../data -m "adding common player info"

Starting upload for file basketball.sqlite
100%|███████████████████████████████████████| 1.70M/1.70M [00:07<00:00, 225kB/s]
Upload successful: basketball.sqlite (2MB)
Dataset version creation error: You have exceeded the max category limit


### Team Details

In [None]:
# define function to extract common player info for a single player
def get_team_details(team_id, proxies):
    # define helpful variables
    no_res = True
    proxy_collection_counter = 0
    proxy_index = 0
    # while no response
    while no_res:
        # try getting a response without a proxy
        try:
            res = teamdetails.TeamDetails(team_id, timeout=3).get_data_frames()
            no_res = False
            print(team_id)
            break
        except:
            # if that fails
            while no_res:
                # try getting with a certain proxy
                try: 
                    res = teamdetails.TeamDetails(team_id, proxy="http://" + proxies[proxy_index], timeout=3).get_data_frames()
                    no_res = False
                    break
                except:
                    # if that fails, move on to next proxy unless out of proxies
                    if (proxy_index + 1) >= len(proxies):
                        # unless tried proxies 5 times
                        if proxy_collection_counter < 5:
                            # if out of proxies: get more proxies, fix counters, and try without a proxy again
                            proxy_index = 0
                            proxy_collection_counter = proxy_collection_counter + 1
                            print(team_id, ' failed {} times'.format(proxy_collection_counter))
                            proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1000&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]
                            break
                        else:
                            return None
                    else:
                        proxy_index = proxy_index + 1
                        
    # merge the common player info and player headline stats and drop timeframe  
    dfs = res
    df = dfs[0]
    try:
        df['FACEBOOK_WEBSITE_LINK'] = dfs[2].loc[dfs[2]['ACCOUNTTYPE'] == 'Facebook']['WEBSITE_LINK'].values[0]
    except:
        df['FACEBOOK_WEBSITE_LINK'] = np.nan
    try:
        df['INSTAGRAM_WEBSITE_LINK'] = dfs[2].loc[dfs[2]['ACCOUNTTYPE'] == 'Instagram']['WEBSITE_LINK'].values[0]
    except:
        df['INSTAGRAM_WEBSITE_LINK'] = np.nan
    try:
        df['TWITTER_WEBSITE_LINK'] = dfs[2].loc[dfs[2]['ACCOUNTTYPE'] == 'Twitter']['WEBSITE_LINK'].values[0]
    except:
        df['TWITTER_WEBSITE_LINK'] = np.nan
    df.rename(columns = {'TEAM_ID':'ID'}, inplace = True)
    df['ID'] = df["ID"].astype(str)
    df['YEARFOUNDED'] = df['YEARFOUNDED'].astype(str)
    df['ARENACAPACITY'] = pd.to_numeric(df['ARENACAPACITY'], errors='coerce')
    df_1 = dfs[1]
    df_1.rename(columns = {'TEAM_ID':'ID'}, inplace = True)
    df_1['ID'] = df_1["ID"].astype(str)
    df_1['YEARFOUNDED'] = df_1['YEARFOUNDED'].astype(str)
    df_1['YEARACTIVETILL'] = df_1['YEARACTIVETILL'].astype(str)
    return [df, df_1]

# get proxies
proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1000&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]

# get common player info for each player in the db
dfs = []
team_ids = pd.read_sql('SELECT id FROM Team', conn)['id'].values
dfs = [get_team_details(team_id, proxies=proxies) for team_id in team_ids]
df_0 = pd.concat([df[0] for df in dfs])
df_1 = pd.concat([df[1] for df in dfs])
display(df_0.head())
df_1.head()

1610612737
1610612738
1610612739
1610612740
1610612741
1610612742
1610612743
1610612744
1610612745
1610612746
1610612747
1610612748
1610612749
1610612750
1610612751
1610612752
1610612753
1610612754
1610612755
1610612756
1610612757
1610612758
1610612759
1610612760
1610612761
1610612762
1610612763
1610612764
1610612765
1610612766


Unnamed: 0,ID,ABBREVIATION,NICKNAME,YEARFOUNDED,CITY,ARENA,ARENACAPACITY,OWNER,GENERALMANAGER,HEADCOACH,DLEAGUEAFFILIATION,FACEBOOK_WEBSITE_LINK,INSTAGRAM_WEBSITE_LINK,TWITTER_WEBSITE_LINK
0,1610612737,ATL,Hawks,1949,Atlanta,State Farm Arena,18729.0,Tony Ressler,Travis Schlenk,Nate McMillan,Erie Bayhawks,https://www.facebook.com/hawks,https://instagram.com/atlhawks,https://twitter.com/ATLHawks
0,1610612738,BOS,Celtics,1946,Boston,TD Garden,18624.0,Wyc Grousbeck,Danny Ainge,Brad Stevens,Maine Red Claws,https://www.facebook.com/bostonceltics,https://instagram.com/celtics,https://twitter.com/celtics
0,1610612739,CLE,Cavaliers,1970,Cleveland,Rocket Mortgage FieldHouse,20562.0,Dan Gilbert,Koby Altman,JB Bickerstaff,Canton Charge,https://www.facebook.com/Cavs,https://instagram.com/cavs,https://twitter.com/cavs
0,1610612740,NOP,Pelicans,2002,New Orleans,Smoothie King Center,,Tom Benson,Trajan Langdon,Stan Van Gundy,No Affiliate,https://www.facebook.com/PelicansNBA,https://instagram.com/pelicansnba,https://twitter.com/PelicansNBA
0,1610612741,CHI,Bulls,1966,Chicago,United Center,21711.0,Jerry Reinsdorf,Arturas Karnisovas,Billy Donovan,Windy City Bulls,https://www.facebook.com/chicagobulls,https://instagram.com/chicagobulls,https://twitter.com/chicagobulls


Unnamed: 0,ID,CITY,NICKNAME,YEARFOUNDED,YEARACTIVETILL
0,1610612737,Atlanta,Hawks,1968,2019
1,1610612737,St. Louis,Hawks,1955,1967
2,1610612737,Milwaukee,Hawks,1951,1954
3,1610612737,Tri-Cities,Blackhawks,1949,1950
0,1610612738,Boston,Celtics,1946,2019


In [None]:
df_0.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 0 to 0
Data columns (total 14 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      30 non-null     object 
 1   ABBREVIATION            30 non-null     object 
 2   NICKNAME                30 non-null     object 
 3   YEARFOUNDED             30 non-null     object 
 4   CITY                    30 non-null     object 
 5   ARENA                   30 non-null     object 
 6   ARENACAPACITY           20 non-null     float64
 7   OWNER                   30 non-null     object 
 8   GENERALMANAGER          30 non-null     object 
 9   HEADCOACH               30 non-null     object 
 10  DLEAGUEAFFILIATION      30 non-null     object 
 11  FACEBOOK_WEBSITE_LINK   30 non-null     object 
 12  INSTAGRAM_WEBSITE_LINK  30 non-null     object 
 13  TWITTER_WEBSITE_LINK    30 non-null     object 
dtypes: float64(1), object(13)
memory usage: 3.5+ 

In [None]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 60 entries, 0 to 2
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   ID              60 non-null     object
 1   CITY            60 non-null     object
 2   NICKNAME        60 non-null     object
 3   YEARFOUNDED     60 non-null     object
 4   YEARACTIVETILL  60 non-null     object
dtypes: object(5)
memory usage: 2.8+ KB


In [None]:
try:
    df_0.to_sql('Team_Attributes', conn, index=False)
except:
    pass
try:
    df_1.to_sql('Team_History', conn, index=False)
except:
    pass

#### Upload to Kaggle

In [None]:
!kaggle datasets version -p ../data -m "adding team details"

Starting upload for file basketball.sqlite
100%|███████████████████████████████████████| 1.70M/1.70M [00:04<00:00, 435kB/s]
Upload successful: basketball.sqlite (2MB)
Dataset version creation error: You have exceeded the max category limit


### League Games

In [21]:
def get_league_games(season_id, proxies):
    # define helpful variables
    no_res = True
    proxy_collection_counter = 0
    proxy_index = 0
    # while no response
    while no_res:
        # try getting a response without a proxy
        try:
            res = leaguegamelog.LeagueGameLog(season=season_id, timeout=5).get_data_frames()
            no_res = False
            print(season_id)
            break
        except:
            # if that fails
            while no_res:
                # try getting with a certain proxy
                try: 
                    res = leaguegamelog.LeagueGameLog(season=season_id, proxy="http://" + proxies[proxy_index], timeout=5).get_data_frames()
                    no_res = False
                    break
                except:
                    # if that fails, move on to next proxy unless out of proxies
                    if (proxy_index + 1) >= len(proxies):
                        # unless tried proxies 5 times
                        if proxy_collection_counter < 5:
                            # if out of proxies: get more proxies, fix counters, and try without a proxy again
                            proxy_index = 0
                            proxy_collection_counter = proxy_collection_counter + 1
                            print(season_id, ' failed {} times'.format(proxy_collection_counter))
                            proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1500&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]
                            break
                        else:
                            return None
                    else:
                        proxy_index = proxy_index + 1
                        
  
    df = res[0]
    df["TEAM_ID"] = df["TEAM_ID"].astype(str)
    game_ids = df["GAME_ID"].unique()         

    def get_df(df, game_id):
        season_id = df['SEASON_ID'].values[0]
        game_date = df['GAME_DATE'].values[0]
        rows = df.loc[df["GAME_ID"] == game_id].drop(['GAME_ID', "SEASON_ID", "GAME_DATE"], axis=1)
        row_0 = rows.iloc[[0]]
        row_1 = rows.iloc[[1]]

        def rename_cols(df):
            if "vs" in df['MATCHUP'].values[0]:
                df.columns = [x + '_HOME' for x in df.columns]
            else:
                df.columns = [x + '_AWAY' for x in df.columns]
            return df
        
        row_0 = rename_cols(row_0).reset_index(drop=True)
        row_1 = rename_cols(row_1).reset_index(drop=True)
        df = pd.concat([row_0, row_1], axis=1)
        cols = list(df.columns.values)
        cols = ['GAME_ID', "SEASON_ID"] + cols
        df["GAME_ID"] = game_id
        df["SEASON_ID"] = season_id
        df = df[cols]
        return df
    
    df = pd.concat([get_df(df, game_id) for game_id in game_ids], axis=0)
    return df

# get proxies
proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1500&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]

# get common player info for each player in the db
dfs = []
player_ids = pd.read_sql('SELECT id FROM Player', conn)['id'].values
season_ids = [str(x) + "-" + str(x+1)[2:4] for x in range(1946, 2021)]
dfs = [get_league_games(season_id=season_id , proxies=proxies) for season_id in season_ids]
df = pd.concat(dfs)
df.head()

1946-47
1947-48
1948-49
1949-50
1950-51
1951-52
1952-53
1953-54
1954-55
1955-56
1956-57
1957-58
1958-59
1959-60
1960-61
1961-62
1962-63
1963-64
1964-65
1965-66
1966-67
1967-68
1968-69
1969-70
1970-71
1971-72
1972-73
1973-74
1974-75
1975-76
1976-77
1977-78
1978-79
1979-80
1980-81
1981-82
1982-83
1983-84
1984-85
1985-86
1986-87
1987-88
1988-89
1989-90
1990-91
1991-92
1992-93
1993-94
1994-95
1995-96
1996-97
1997-98
1998-99
1999-00
2000-01
2001-02
2002-03
2003-04
2004-05
2005-06
2006-07
2007-08
2008-09
2009-10
2010-11
2011-12
2012-13
2013-14
2014-15
2015-16
2016-17
2017-18
2018-19
2019-20
2020-21


Unnamed: 0,GAME_ID,SEASON_ID,TEAM_ID_HOME,TEAM_ABBREVIATION_HOME,TEAM_NAME_HOME,GAME_DATE_HOME,MATCHUP_HOME,WL_HOME,MIN_HOME,FGM_HOME,FGA_HOME,FG_PCT_HOME,FG3M_HOME,FG3A_HOME,FG3_PCT_HOME,FTM_HOME,FTA_HOME,FT_PCT_HOME,OREB_HOME,DREB_HOME,REB_HOME,AST_HOME,STL_HOME,BLK_HOME,TOV_HOME,PF_HOME,PTS_HOME,PLUS_MINUS_HOME,VIDEO_AVAILABLE_HOME,TEAM_ID_AWAY,TEAM_ABBREVIATION_AWAY,TEAM_NAME_AWAY,GAME_DATE_AWAY,MATCHUP_AWAY,WL_AWAY,MIN_AWAY,FGM_AWAY,FGA_AWAY,FG_PCT_AWAY,FG3M_AWAY,FG3A_AWAY,FG3_PCT_AWAY,FTM_AWAY,FTA_AWAY,FT_PCT_AWAY,OREB_AWAY,DREB_AWAY,REB_AWAY,AST_AWAY,STL_AWAY,BLK_AWAY,TOV_AWAY,PF_AWAY,PTS_AWAY,PLUS_MINUS_AWAY,VIDEO_AVAILABLE_AWAY
0,24600001,21946,1610610035,HUS,Toronto Huskies,1946-11-01,HUS vs. NYK,L,0,25.0,,,,,,16.0,29.0,0.552,,,,,,,,,66,-2,0,1610612752,NYK,New York Knicks,1946-11-01,NYK @ HUS,W,0,24.0,,,,,,20.0,26.0,0.769,,,,,,,,,68,2,0
0,24600003,21946,1610610034,BOM,St. Louis Bombers,1946-11-02,BOM vs. PIT,W,0,20.0,59.0,0.339,,,,16.0,,,,,,,,,,21.0,56,5,0,1610610031,PIT,Pittsburgh Ironmen,1946-11-02,PIT @ BOM,L,0,16.0,72.0,0.222,,,,19.0,,,,,,,,,,25.0,51,-5,0
0,24600004,21946,1610610025,CHS,Chicago Stags,1946-11-02,CHS vs. NYK,W,0,21.0,,,,,,21.0,,,,,,,,,,20.0,63,16,0,1610612752,NYK,New York Knicks,1946-11-02,NYK @ CHS,L,0,16.0,,,,,,15.0,,,,,,,,,,22.0,47,-16,0
0,24600002,21946,1610610032,PRO,Providence Steamrollers,1946-11-02,PRO vs. BOS,W,0,21.0,,,,,,17.0,,,,,,,,,,,59,6,0,1610612738,BOS,Boston Celtics,1946-11-02,BOS @ PRO,L,0,21.0,,,,,,11.0,,,,,,,,,,,53,-6,0
0,24600005,21946,1610610028,DEF,Detroit Falcons,1946-11-02,DEF vs. WAS,L,0,10.0,,,,,,13.0,,,,,,,,,,,33,-17,0,1610610036,WAS,Washington Capitols,1946-11-02,WAS @ DEF,W,0,18.0,,,,,,14.0,,,,,,,,,,,50,17,0


In [22]:
df

Unnamed: 0,GAME_ID,SEASON_ID,TEAM_ID_HOME,TEAM_ABBREVIATION_HOME,TEAM_NAME_HOME,GAME_DATE_HOME,MATCHUP_HOME,WL_HOME,MIN_HOME,FGM_HOME,FGA_HOME,FG_PCT_HOME,FG3M_HOME,FG3A_HOME,FG3_PCT_HOME,FTM_HOME,FTA_HOME,FT_PCT_HOME,OREB_HOME,DREB_HOME,REB_HOME,AST_HOME,STL_HOME,BLK_HOME,TOV_HOME,PF_HOME,PTS_HOME,PLUS_MINUS_HOME,VIDEO_AVAILABLE_HOME,TEAM_ID_AWAY,TEAM_ABBREVIATION_AWAY,TEAM_NAME_AWAY,GAME_DATE_AWAY,MATCHUP_AWAY,WL_AWAY,MIN_AWAY,FGM_AWAY,FGA_AWAY,FG_PCT_AWAY,FG3M_AWAY,FG3A_AWAY,FG3_PCT_AWAY,FTM_AWAY,FTA_AWAY,FT_PCT_AWAY,OREB_AWAY,DREB_AWAY,REB_AWAY,AST_AWAY,STL_AWAY,BLK_AWAY,TOV_AWAY,PF_AWAY,PTS_AWAY,PLUS_MINUS_AWAY,VIDEO_AVAILABLE_AWAY
0,0024600001,21946,1610610035,HUS,Toronto Huskies,1946-11-01,HUS vs. NYK,L,0,25.0,,,,,,16.0,29.0,0.552,,,,,,,,,66,-2,0,1610612752,NYK,New York Knicks,1946-11-01,NYK @ HUS,W,0,24.0,,,,,,20.0,26.0,0.769,,,,,,,,,68,2,0
0,0024600003,21946,1610610034,BOM,St. Louis Bombers,1946-11-02,BOM vs. PIT,W,0,20.0,59.0,0.339,,,,16.0,,,,,,,,,,21.0,56,5,0,1610610031,PIT,Pittsburgh Ironmen,1946-11-02,PIT @ BOM,L,0,16.0,72.0,0.222,,,,19.0,,,,,,,,,,25.0,51,-5,0
0,0024600004,21946,1610610025,CHS,Chicago Stags,1946-11-02,CHS vs. NYK,W,0,21.0,,,,,,21.0,,,,,,,,,,20.0,63,16,0,1610612752,NYK,New York Knicks,1946-11-02,NYK @ CHS,L,0,16.0,,,,,,15.0,,,,,,,,,,22.0,47,-16,0
0,0024600002,21946,1610610032,PRO,Providence Steamrollers,1946-11-02,PRO vs. BOS,W,0,21.0,,,,,,17.0,,,,,,,,,,,59,6,0,1610612738,BOS,Boston Celtics,1946-11-02,BOS @ PRO,L,0,21.0,,,,,,11.0,,,,,,,,,,,53,-6,0
0,0024600005,21946,1610610028,DEF,Detroit Falcons,1946-11-02,DEF vs. WAS,L,0,10.0,,,,,,13.0,,,,,,,,,,,33,-17,0,1610610036,WAS,Washington Capitols,1946-11-02,WAS @ DEF,W,0,18.0,,,,,,14.0,,,,,,,,,,,50,17,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,0022000672,22020,1610612745,HOU,Houston Rockets,2021-03-24,HOU vs. CHA,L,240,33.0,86,0.384,16,49,0.327,15.0,22.0,0.682,10,29,39,24,6,2,7,19.0,97,-25,1,1610612766,CHA,Charlotte Hornets,2021-03-24,CHA @ HOU,W,240,41.0,80,0.513,18,38,0.474,22.0,29.0,0.759,10,40,50,27,4,3,14,20.0,122,25,1
0,0022000673,22020,1610612750,MIN,Minnesota Timberwolves,2021-03-24,MIN vs. DAL,L,240,38.0,85,0.447,12,38,0.316,20.0,26.0,0.769,6,31,37,24,6,2,13,25.0,108,-20,1,1610612742,DAL,Dallas Mavericks,2021-03-24,DAL @ MIN,W,240,48.0,99,0.485,16,44,0.364,16.0,22.0,0.727,14,38,52,27,6,5,10,22.0,128,20,1
0,0022000675,22020,1610612759,SAS,San Antonio Spurs,2021-03-24,SAS vs. LAC,L,240,39.0,85,0.459,7,22,0.318,16.0,19.0,0.842,6,34,40,22,6,6,16,24.0,101,-33,1,1610612746,LAC,LA Clippers,2021-03-24,LAC @ SAS,W,240,49.0,88,0.557,17,33,0.515,19.0,30.0,0.633,5,39,44,23,10,3,8,20.0,134,33,1
0,0022000677,22020,1610612758,SAC,Sacramento Kings,2021-03-24,SAC vs. ATL,,120,24.0,51,0.471,7,19,0.368,10.0,11.0,0.909,7,16,23,11,4,4,8,8.0,65,4,0,1610612737,ATL,Atlanta Hawks,2021-03-24,ATL @ SAC,,120,24.0,51,0.471,2,12,0.167,11.0,13.0,0.846,8,20,28,11,2,1,8,10.0,61,-4,0


In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 61951 entries, 0 to 0
Data columns (total 56 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   GAME_ID                 61951 non-null  object 
 1   SEASON_ID               61951 non-null  object 
 2   TEAM_ID_HOME            61951 non-null  object 
 3   TEAM_ABBREVIATION_HOME  61951 non-null  object 
 4   TEAM_NAME_HOME          61951 non-null  object 
 5   GAME_DATE_HOME          61951 non-null  object 
 6   MATCHUP_HOME            61951 non-null  object 
 7   WL_HOME                 61948 non-null  object 
 8   MIN_HOME                61951 non-null  int64  
 9   FGM_HOME                61933 non-null  float64
 10  FGA_HOME                44089 non-null  object 
 11  FG_PCT_HOME             44050 non-null  object 
 12  FG3M_HOME               46294 non-null  object 
 13  FG3A_HOME               41030 non-null  object 
 14  FG3_PCT_HOME            40644 non-null  ob

In [None]:
try:
    df.to_sql('Game', conn, index=False)
except:
    pass

#### Upload to Kaggle

In [25]:
!kaggle datasets version -p ../data -m "adding league game log to Game table"

Starting upload for file basketball.sqlite
100%|███████████████████████████████████████| 17.2M/17.2M [00:29<00:00, 614kB/s]
Upload successful: basketball.sqlite (17MB)
Dataset version creation error: You have exceeded the max category limit


### Box Score Summaries

In [None]:
def get_box_score_summaries(game_id, proxies):
    # define helpful variables
    no_res = True
    proxy_collection_counter = 0
    proxy_index = 0
    # while no response
    while no_res:
        # try getting a response without a proxy
        try:
            res = boxscoresummaryv2.BoxScoreSummaryV2(game_id, timeout=5).get_data_frames()
            no_res = False
            print(game_id)
            break
        except:
            # if that fails
            while no_res:
                # try getting with a certain proxy
                try: 
                    res = boxscoresummaryv2.BoxScoreSummaryV2(game_id, proxy="http://" + proxies[proxy_index], timeout=5).get_data_frames()
                    no_res = False
                    break
                except:
                    # if that fails, move on to next proxy unless out of proxies
                    if (proxy_index + 1) >= len(proxies):
                        # unless tried proxies 5 times
                        if proxy_collection_counter < 5:
                            # if out of proxies: get more proxies, fix counters, and try without a proxy again
                            proxy_index = 0
                            proxy_collection_counter = proxy_collection_counter + 1
                            print(game_id, ' failed {} times'.format(proxy_collection_counter))
                            proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1500&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]
                            break
                        else:
                            return None
                    else:
                        proxy_index = proxy_index + 1
                        
  
    game_summary = res[0]
    home_team_id = game_summary['HOME_TEAM_ID'].values[0]
    visitor_team_id = game_summary['VISITOR_TEAM_ID'].values[0]
    game_summary['GAME_STATUS_ID'] = game_summary['GAME_STATUS_ID'].astype(str)
    game_summary['HOME_TEAM_ID'] = game_summary['HOME_TEAM_ID'].astype(str)
    game_summary['VISITOR_TEAM_ID'] = game_summary['VISITOR_TEAM_ID'].astype(str)
    
    other_stats = res[1]
    other_stats['TEAM_ID'] = other_stats['TEAM_ID'].astype(str)
    other_stats_home = other_stats.loc[other_stats['TEAM_ID'] == home_team_id].drop(['LEAGUE_ID'], axis=1).reset_index(drop=True)
    other_stats_visitor = other_stats.loc[other_stats['TEAM_ID'] == visitor_team_id].drop(['LEAGUE_ID'], axis=1).reset_index(drop=True)
    try:
        league_id = other_stats['LEAGUE_ID'].values[0]
    except:
        try: 
            league_id = other_stats['LEAGUE_ID'].values
        except:
            league_id = None

    other_stats_home.columns = [col + '_HOME' for col in other_stats_home.columns]
    other_stats_visitor.columns = [col + '_AWAY' for col in other_stats_visitor.columns]
    other_stats = pd.concat([other_stats_home, other_stats_visitor], axis = 1)
    other_stats['LEAGUE_ID'] = league_id
    
    df = pd.concat([game_summary, other_stats], axis=1)
    
    officials = res[2]
    officials["GAME_ID"] = game_id
    officials['OFFICIAL_ID'] = officials['OFFICIAL_ID'].astype(str)
    
    inactive_players = res[3]
    inactive_players["GAME_ID"] = game_id
    inactive_players["PLAYER_ID"] = inactive_players["PLAYER_ID"].astype(str)
    inactive_players["TEAM_ID"] = inactive_players["TEAM_ID"].astype(str)
    
    
    
    game_info = res[4]
    
    df = pd.concat([df, game_info], axis=1)
    
    line_score = res[5]
    line_score['TEAM_ID'] = line_score['TEAM_ID'].astype(str)
    line_score = line_score.drop(['GAME_DATE_EST', "GAME_SEQUENCE", "GAME_ID"], axis=1)
    line_score_home = line_score.loc[line_score['TEAM_ID'] == home_team_id].reset_index(drop=True)
    line_score_visitor = line_score.loc[line_score['TEAM_ID'] == visitor_team_id].reset_index(drop=True)
    line_score_home.columns = [col + '_HOME' for col in line_score.columns]
    line_score_visitor.columns = [col + '_AWAY' for col in line_score.columns]
    line_score = pd.concat([line_score_home, line_score_visitor], axis = 1)
    
    df = pd.concat([df, line_score], axis=1)
    
    
    last_meeting = res[6]
    last_meeting = last_meeting.drop(['GAME_ID'], axis=1)
    last_meeting['LAST_GAME_HOME_TEAM_ID'] = last_meeting['LAST_GAME_HOME_TEAM_ID'].astype(str)
    last_meeting['LAST_GAME_VISITOR_TEAM_ID'] = last_meeting['LAST_GAME_VISITOR_TEAM_ID'].astype(str)
    
    df = pd.concat([df, last_meeting], axis=1)
    
    season_series = res[7]
    season_series = season_series.drop(['GAME_ID', 'HOME_TEAM_ID', 'VISITOR_TEAM_ID', 'GAME_DATE_EST'], axis=1)
    season_series = season_series.rename({'LAST_GAME_VISITOR_TEAM_CITY1': "LAST_GAME_VISITOR_TEAM_CITY"}, axis=1)
    
    df = pd.concat([df, season_series], axis=1)
    
    available_video = res[8]
    available_video = available_video.drop(['GAME_ID'], axis=1)
    
    df = pd.concat([df, available_video], axis=1)
    
    to_return = {}
    to_return['df'] = df
    to_return['officials'] = officials
    to_return['inactive_players'] = inactive_players
    return to_return

# get proxies
proxies = [str(proxy).split('\\')[0][2:] for proxy in urllib.request.urlopen("https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=1500&country=all&ssl=yes&anonymity=all&simplified=true").readlines()]

# get common player info for each player in the db
dfs = []
game_ids = pd.read_sql('SELECT GAME_ID FROM Game', conn)['GAME_ID'].values
dfs = [get_box_score_summaries(game_id, proxies=proxies) for game_id in game_ids]
df = pd.concat([df['df'] for df in dfs], axis=0)
display(df)
officials = pd.concat([df['officials'] for df in dfs], axis=0)
display(officials)
inactive_players = pd.concat([df['inactive_players'] for df in dfs], axis = 0)
display(inactive_players)

0024600001
0024600003
0024600004
0024600002
0024600005
0024600006
0024600007
0024600008
0024600009
0024600010
0024600012
0024600011
0024600013
0024600015
0024600016
0024600017
0024600014
0024600018
0024600020
0024600019
0024600022
0024600021
0024600023
0024600024
0024600025
0024600030
0024600028
0024600029
0024600027
0024600026
0024600031
0024600032
0024600033
0024600035
0024600034
0024600036
0024600038
0024600037
0024600041
0024600039
0024600040
0024600042
0024600045
0024600046
0024600044
0024600043
0024600048
0024600047
0024600049
0024600050
0024600052
0024600051
0024600053
0024600056
0024600054
0024600055
0024600059
0024600060
0024600057
0024600058
0024600061
0024600063
0024600062
0024600065
0024600064
0024600066
0024600067
0024600068
0024600069
0024600071
0024600070
0024600072
0024600074
0024600076
0024600075
0024600073
0024600078
0024600080
0024600079
0024600077
0024600081
0024600083
0024600082
0024600084
0024600085
0024600086
0024600088
0024600087
0024600089
0024600091
0024600092

0024800221
0024800222
0024800223
0024800224
0024800225
0024800229
0024800226
0024800230
0024800228
0024800227
0024800231
0024800232
0024800235
0024800233
0024800234
0024800238
0024800237
0024800236
0024800240
0024800241
0024800239
0024800242
0024800243
0024800244
0024800247
0024800246
0024800245
0024800248
0024800250
0024800251
0024800249
0024800252
0024800253
0024800254
0024800256
0024800255
0024800259
0024800258
0024800257
0024800262
0024800260
0024800261
0024800264
0024800263
0024800269
0024800268
0024800266
0024800265
0024800267
0024800270
0024800271
0024800273
0024800272
0024800275
0024800274
0024800276
0024800277
0024800279
0024800280
0024800278
0024800281
0024800282
0024800285
0024800286
0024800284
0024800283
0024800287
0024800289
0024800288
0024800291
0024800290
0024800292
0024800294
0024800293
0024800295
0024800297
0024800298
0024800296
0024800299
0024800303
0024800304
0024800302
0024800301
0024800300
0024800305
0024800306
0024800307
0024800309
0024800308
0024800310
0024800311

0025000044
0025000041
0025000045
0025000047
0025000049
0025000048
0025000046
0025000054
0025000053
0025000050
0025000052
0025000051
0025000055
0025000057
0025000056
0025000058
0025000061
0025000059
0025000060
0025000063
0025000064
0025000065
0025000062
0025000066
0025000067
0025000068
0025000070
0025000069
0025000073
0025000072
0025000074
0025000071
0025000075
0025000076
0025000077
0025000079
0025000078
0025000081
0025000080
0025000084
0025000082
0025000083
0025000085
0025000088
0025000086
0025000087
0025000090
0025000091
0025000092
0025000093
0025000089
0025000094
0025000098
0025000096
0025000095
0025000097
0025000100
0025000101
0025000099
0025000102
0025000104
0025000103
0025000105
0025000108
0025000106
0025000110
0025000109
0025000107
0025000113
0025000111
0025000112
0025000114
0025000116
0025000117
0025000115
0025000118
0025000119
0025000120
0025000123
0025000122
0025000125
0025000121
0025000124
0025000128
0025000129
0025000126
0025000127
0025000131
0025000132
0025000130
0025000135

0025200106
0025200107
0025200108
0025200110
0025200109
0025200112
0025200114
0025200111
0025200113
0025200115
0025200118
0025200117
0025200116
0025200119
0025200120
0025200121
0025200124
0025200125
0025200123
0025200122
0025200127
0025200126
0025200128
0025200129
0025200130
0025200131
0025200133
0025200135
0025200134
0025200132
0025200136
0025200137
0025200141
0025200138
0025200140
0025200139
0025200145
0025200142
0025200143
0025200144
0025200147
0025200146
0025200148
0025200151
0025200149
0025200150
0025200152
0025200005
0025200153
0025200154
0025200157
0025200156
0025200162
0025200161
0025200158
0025200159
0025200160
0025200163
0025200165
0025200166
0025200164
0025200167
0025200169
0025200168
0025200171
0025200170
0025200173
0025200174
0025200172
0025200175
0025200176
0025200179
0025200180
0025200178
0025200177
0025200181
0025200183
0025200185
0025200184
0025200182
0025200186
0025200187
0025200188
0025200189
0025200191
0025200190
0025200193
0025200192
0025200194
0025200197
0025200199

0025400173
0025400175
0025400172
0025400174
0025400176
0025400178
0025400179
0025400177
0025400182
0025400180
0025400181
0025400183
0025400184
0025400185
0025400186
0025400187
0025400190
0025400189
0025400188
0025400192
0025400191
0025400193
0025400195
0025400196
0025400194
0025400197
0025400198
0025400199
0025400200
0025400201
0025400203
0025400202
0025400204
0025400205
0025400206
0025400207
0025400208
0025400209
0025400210
0025400212
0025400213
0025400211
0025400214
0025400215
0025400216
0025400217
0025400218
0025400220
0025400219
0025400221
0025400222
0025400223
0025400225
0025400226
0025400224
0025400227
0025400229
0025400228
0025400230
0025400231
0025400232
0025400234
0025400233
0025400235
0025400236
0025400237
0025400238
0025400241
0025400240
0025400239
0025400242
0025400243
0025400244
0025400246
0025400245
0025400249
0025400247
0025400248
0025400253
0025400250
0025400252
0025400251
0025400254
0025400255
0025400256
0025400258
0025400257
0025400260
0025400259
0025400261
0025400262

0025700051
0025700049
0025700050
0025700052
0025700053
0025700054
0025700056
0025700055
0025700058
0025700057
0025700061
0025700059
0025700060
0025700062
0025700064
0025700063
0025700066
0025700065
0025700068
0025700067
0025700069
0025700072
0025700073
0025700070
0025700071
0025700074
0025700075
0025700076
0025700077
0025700079
0025700078
0025700081
0025700080
0025700082
0025700084
0025700083
0025700086
0025700087
0025700085
0025700088
0025700090
0025700091
0025700089
0025700092
0025700093
0025700096
0025700095
0025700094
0025700097
0025700099
0025700098
0025700100
0025700101
0025700102
0025700103
0025700105
0025700106
0025700104
0025700108
0025700110
0025700109
0025700107
0025700112
0025700113
0025700111
0025700114
0025700116
0025700115
0025700117
0025700118
0025700119
0025700123
0025700120
0025700121
0025700122
0025700125
0025700124
0025700126
0025700127
0025700128
0025700129
0025700130
0025700131
0025700132
0025700134
0025700133
0025700136
0025700135
0025700137
0025700138
0025700139

0025900216
0025900217
0025900219
0025900218
0025900220
0025900221
0025900224
0025900222
0025900223
0025900228
0025900225
0025900226
0025900227
0025900230
0025900229
0025900232
0025900231
0025900233
0025900235
0025900234
0025900236
0025900237
0025900239
0025900238
0025900240
0025900241
0025900242
0025900243
0025900244
0025900246
0025900245
0025900248
0025900247
0025900250
0025900249
0025900252
0025900251
0025900255
0025900253
0025900254
0025900256
0025900257
0025900260
0025900258
0025900259
0025900261
0025900263
0025900264
0025900262
0025900267
0025900266
0025900265
0025900269
0025900271
0025900268
0025900270
0025900272
0025900275
0025900274
0025900273
0025900276
0025900277
0025900278
0025900279
0025900280
0025900281
0025900282
0025900283
0025900284
0025900287
0025900285
0025900286
0025900288
0025900289
0025900290
0025900292
0025900291
0025900294
0025900293
0025900295
0025900296
0025900297
0025900298
0025900299
0025900300
0026000001
0026000002
0026000005
0026000003
0026000006
0026000004

0026100345
0026100344
0026100348
0026100349
0026100347
0026100352
0026100350
0026100351
0026100354
0026100353
0026100356
0026100355
0026100357
0026100360
0026100361
0026100358
0026100359
0026200001
0026200002
0026200003
0026200004
0026200007
0026200005
0026200006
0026200008
0026200010
0026200011
0026200009
0026200012
0026200013
0026200016
0026200017
0026200014
0026200015
0026200021
0026200019
0026200020
0026200018
0026200022
0026200024
0026200023
0026200025
0026200027
0026200026
0026200028
0026200029
0026200030
0026200033
0026200031
0026200032
0026200034
0026200035
0026200036
0026200037
0026200038
0026200041
0026200040
0026200039
0026200042
0026200045
0026200044
0026200043
0026200048
0026200047
0026200046
0026200049
0026200053
0026200050
0026200052
0026200051
0026200055
0026200054
0026200056
0026200057
0026203001
0026200058
0026200059
0026200063
0026200062
0026200060
0026200061
0026200065
0026200066
0026200064
0026200067
0026200068
0026200072
0026200071
0026200069
0026200070
0026200074

0026400010
0026400009
0026400012
0026400011
0026400013
0026400016
0026400015
0026400014
0026400019
0026400018
0026400017
0026400020
0026400021
0026400022
0026400024
0026400023
0026400025
0026400027
0026400028
0026400026
0026400030
0026400032
0026400031
0026400029
0026400033
0026400034
0026400036
0026400035
0026400037
0026400039
0026400038
0026400042
0026400043
0026400040
0026400041
0026400044
0026400045
0026400046
0026400047
0026400048
0026400049
0026400051
0026400050
0026400054
0026400053
0026400052
0026400055
0026400056
0026400058
0026400057
0026400060
0026400059
0026400062
0026400061
0026400063
0026400064
0026400065
0026400067
0026400001
0026400066
0026400069
0026400068
0026400071
0026400070
0026400075
0026400074
0026400072
0026400073
0026400076
0026400077
0026400078
0026400080
0026400082
0026400079
0026400081
0026400084
0026400083
0026400086
0026400085
0026400088
0026400087
0026400090
0026400089
0026400091
0026400093
0026400092
0026400095
0026400097
0026400094
0026400096
0026400098

0026600033
0026600030
0026600034
0026600035
0026600037
0026600038
0026600036
0026600041
0026600040
0026600039
0026600044
0026600043
0026600042
0026600045
0026600048
0026600046
0026600047
0026600049
0026600050
0026600054
0026600052
0026600051
0026600053
0026600055
0026600056
0026600057
0026600058
0026600060
0026600062
0026600061
0026600059
0026600066
0026600065
0026600063
0026600064
0026600068
0026600067
0026600069
0026600071
0026600070
0026600074
0026600073
0026600072
0026600075
0026600076
0026600077
0026600078
0026600079
0026600082
0026600084
0026600083
0026600080
0026600081
0026600085
0026600086
0026600088
0026600087
0026600089
0026600092
0026600093
0026600090
0026600091
0026600095
0026600094
0026600096
0026600099
0026600097
0026600098
0026600101
0026600104
0026600100
0026600103
0026600102
0026600108
0026600107
0026600106
0026600105
0026600109
0026600111
0026600110
0026600114
0026600113
0026600112
0026600117
0026600118
0026600119
0026600115
0026600116
0026600120
0026600123
0026600122

In [26]:
df.info(max_cols=150)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 0
Data columns (total 105 columns):
 #    Column                            Non-Null Count  Dtype  
---   ------                            --------------  -----  
 0    GAME_DATE_EST                     2 non-null      object 
 1    GAME_SEQUENCE                     2 non-null      int64  
 2    GAME_ID                           2 non-null      object 
 3    GAME_STATUS_ID                    2 non-null      object 
 4    GAME_STATUS_TEXT                  2 non-null      object 
 5    GAMECODE                          2 non-null      object 
 6    HOME_TEAM_ID                      2 non-null      object 
 7    VISITOR_TEAM_ID                   2 non-null      object 
 8    SEASON                            2 non-null      object 
 9    LIVE_PERIOD                       2 non-null      int64  
 10   LIVE_PC_TIME                      2 non-null      object 
 11   NATL_TV_BROADCASTER_ABBREVIATION  0 non-null      object 
 1

In [27]:
officials.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 2
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   OFFICIAL_ID  6 non-null      object
 1   FIRST_NAME   6 non-null      object
 2   LAST_NAME    6 non-null      object
 3   JERSEY_NUM   6 non-null      object
 4   GAME_ID      6 non-null      object
dtypes: object(5)
memory usage: 288.0+ bytes


In [28]:
inactive_players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9 entries, 0 to 4
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PLAYER_ID          9 non-null      object
 1   FIRST_NAME         9 non-null      object
 2   LAST_NAME          9 non-null      object
 3   JERSEY_NUM         9 non-null      object
 4   TEAM_ID            9 non-null      object
 5   TEAM_CITY          9 non-null      object
 6   TEAM_NAME          9 non-null      object
 7   TEAM_ABBREVIATION  9 non-null      object
 8   GAME_ID            9 non-null      object
dtypes: object(9)
memory usage: 720.0+ bytes


In [None]:
try:
    df.to_sql('Game', conn, index=False)
except:
    pass

#### Upload to Kaggle

In [25]:
!kaggle datasets version -p ../data -m "adding league game log to Game table"

Starting upload for file basketball.sqlite
100%|███████████████████████████████████████| 17.2M/17.2M [00:29<00:00, 614kB/s]
Upload successful: basketball.sqlite (17MB)
Dataset version creation error: You have exceeded the max category limit


In [30]:
dfs[1]

Unnamed: 0,LEAGUE_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_CITY,PTS_PAINT,PTS_2ND_CHANCE,PTS_FB,LARGEST_LEAD,LEAD_CHANGES,TIMES_TIED,TEAM_TURNOVERS,TOTAL_TURNOVERS,TEAM_REBOUNDS,PTS_OFF_TO
0,0,1610612746,LAC,LA,42,8,10,35,0,0,0,8,8,7
1,0,1610612759,SAS,San Antonio,58,6,12,0,0,0,0,16,7,28


In [12]:
dfs[1].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   LEAGUE_ID          2 non-null      object
 1   TEAM_ID            2 non-null      int64 
 2   TEAM_ABBREVIATION  2 non-null      object
 3   TEAM_CITY          2 non-null      object
 4   PTS_PAINT          2 non-null      int64 
 5   PTS_2ND_CHANCE     2 non-null      int64 
 6   PTS_FB             2 non-null      int64 
 7   LARGEST_LEAD       2 non-null      int64 
 8   LEAD_CHANGES       2 non-null      int64 
 9   TIMES_TIED         2 non-null      int64 
 10  TEAM_TURNOVERS     2 non-null      int64 
 11  TOTAL_TURNOVERS    2 non-null      int64 
 12  TEAM_REBOUNDS      2 non-null      int64 
 13  PTS_OFF_TO         2 non-null      int64 
dtypes: int64(11), object(3)
memory usage: 352.0+ bytes


In [31]:
dfs[2]

Unnamed: 0,OFFICIAL_ID,FIRST_NAME,LAST_NAME,JERSEY_NUM
0,2715,Eric,Lewis,42
1,201245,Marat,Kogut,32
2,202901,Matt,Myers,43


In [13]:
dfs[2].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   OFFICIAL_ID  3 non-null      int64 
 1   FIRST_NAME   3 non-null      object
 2   LAST_NAME    3 non-null      object
 3   JERSEY_NUM   3 non-null      object
dtypes: int64(1), object(3)
memory usage: 224.0+ bytes


In [32]:
dfs[3]

Unnamed: 0,PLAYER_ID,FIRST_NAME,LAST_NAME,JERSEY_NUM,TEAM_ID,TEAM_CITY,TEAM_NAME,TEAM_ABBREVIATION
0,201586,Serge,Ibaka,9,1610612746,LA,Clippers,LAC
1,201976,Patrick,Beverley,21,1610612746,LA,Clippers,LAC
2,1630206,Jay,Scrubb,4,1610612746,LA,Clippers,LAC
3,1628966,Keita,Bates-Diop,31,1610612759,San Antonio,Spurs,SAS
4,200746,LaMarcus,Aldridge,12,1610612759,San Antonio,Spurs,SAS


In [14]:
dfs[3].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   PLAYER_ID          5 non-null      int64 
 1   FIRST_NAME         5 non-null      object
 2   LAST_NAME          5 non-null      object
 3   JERSEY_NUM         5 non-null      object
 4   TEAM_ID            5 non-null      int64 
 5   TEAM_CITY          5 non-null      object
 6   TEAM_NAME          5 non-null      object
 7   TEAM_ABBREVIATION  5 non-null      object
dtypes: int64(2), object(6)
memory usage: 448.0+ bytes


In [35]:
dfs[4]

Unnamed: 0,GAME_DATE,ATTENDANCE,GAME_TIME
0,"WEDNESDAY, MARCH 24, 2021",3224,2:08


In [15]:
dfs[4].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   GAME_DATE   1 non-null      object
 1   ATTENDANCE  1 non-null      int64 
 2   GAME_TIME   1 non-null      object
dtypes: int64(1), object(2)
memory usage: 152.0+ bytes


In [36]:
dfs[5]

Unnamed: 0,GAME_DATE_EST,GAME_SEQUENCE,GAME_ID,TEAM_ID,TEAM_ABBREVIATION,TEAM_CITY_NAME,TEAM_NICKNAME,TEAM_WINS_LOSSES,PTS_QTR1,PTS_QTR2,PTS_QTR3,PTS_QTR4,PTS_OT1,PTS_OT2,PTS_OT3,PTS_OT4,PTS_OT5,PTS_OT6,PTS_OT7,PTS_OT8,PTS_OT9,PTS_OT10,PTS
0,2021-03-24T00:00:00,9,22000675,1610612746,LAC,LA,Clippers,29-16,41,26,35,32,0,0,0,0,0,0,0,0,0,0,134
1,2021-03-24T00:00:00,9,22000675,1610612759,SAS,San Antonio,Spurs,22-19,29,24,31,17,0,0,0,0,0,0,0,0,0,0,101


In [16]:
dfs[5].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 23 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   GAME_DATE_EST      2 non-null      object
 1   GAME_SEQUENCE      2 non-null      int64 
 2   GAME_ID            2 non-null      object
 3   TEAM_ID            2 non-null      int64 
 4   TEAM_ABBREVIATION  2 non-null      object
 5   TEAM_CITY_NAME     2 non-null      object
 6   TEAM_NICKNAME      2 non-null      object
 7   TEAM_WINS_LOSSES   2 non-null      object
 8   PTS_QTR1           2 non-null      int64 
 9   PTS_QTR2           2 non-null      int64 
 10  PTS_QTR3           2 non-null      int64 
 11  PTS_QTR4           2 non-null      int64 
 12  PTS_OT1            2 non-null      int64 
 13  PTS_OT2            2 non-null      int64 
 14  PTS_OT3            2 non-null      int64 
 15  PTS_OT4            2 non-null      int64 
 16  PTS_OT5            2 non-null      int64 
 17  P

In [37]:
dfs[6]

Unnamed: 0,GAME_ID,LAST_GAME_ID,LAST_GAME_DATE_EST,LAST_GAME_HOME_TEAM_ID,LAST_GAME_HOME_TEAM_CITY,LAST_GAME_HOME_TEAM_NAME,LAST_GAME_HOME_TEAM_ABBREVIATION,LAST_GAME_HOME_TEAM_POINTS,LAST_GAME_VISITOR_TEAM_ID,LAST_GAME_VISITOR_TEAM_CITY,LAST_GAME_VISITOR_TEAM_NAME,LAST_GAME_VISITOR_TEAM_CITY1,LAST_GAME_VISITOR_TEAM_POINTS
0,22000675,22000105,2021-01-05T00:00:00,1610612759,San Antonio,Spurs,SAS,116,1610612746,LA,Clippers,LAC,113


In [17]:
dfs[6].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 13 columns):
 #   Column                            Non-Null Count  Dtype 
---  ------                            --------------  ----- 
 0   GAME_ID                           1 non-null      object
 1   LAST_GAME_ID                      1 non-null      object
 2   LAST_GAME_DATE_EST                1 non-null      object
 3   LAST_GAME_HOME_TEAM_ID            1 non-null      int64 
 4   LAST_GAME_HOME_TEAM_CITY          1 non-null      object
 5   LAST_GAME_HOME_TEAM_NAME          1 non-null      object
 6   LAST_GAME_HOME_TEAM_ABBREVIATION  1 non-null      object
 7   LAST_GAME_HOME_TEAM_POINTS        1 non-null      int64 
 8   LAST_GAME_VISITOR_TEAM_ID         1 non-null      int64 
 9   LAST_GAME_VISITOR_TEAM_CITY       1 non-null      object
 10  LAST_GAME_VISITOR_TEAM_NAME       1 non-null      object
 11  LAST_GAME_VISITOR_TEAM_CITY1      1 non-null      object
 12  LAST_GAME_VISITOR_TEAM_POI

In [38]:
dfs[7]

Unnamed: 0,GAME_ID,HOME_TEAM_ID,VISITOR_TEAM_ID,GAME_DATE_EST,HOME_TEAM_WINS,HOME_TEAM_LOSSES,SERIES_LEADER
0,22000675,1610612759,1610612746,2021-03-24T00:00:00,1,1,Tied


In [19]:
dfs[7].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   GAME_ID           1 non-null      object
 1   HOME_TEAM_ID      1 non-null      int64 
 2   VISITOR_TEAM_ID   1 non-null      int64 
 3   GAME_DATE_EST     1 non-null      object
 4   HOME_TEAM_WINS    1 non-null      int64 
 5   HOME_TEAM_LOSSES  1 non-null      int64 
 6   SERIES_LEADER     1 non-null      object
dtypes: int64(4), object(3)
memory usage: 184.0+ bytes


In [39]:
dfs[8]

Unnamed: 0,GAME_ID,VIDEO_AVAILABLE_FLAG,PT_AVAILABLE,PT_XYZ_AVAILABLE,WH_STATUS,HUSTLE_STATUS,HISTORICAL_STATUS
0,22000675,1,0,0,1,1,0


In [18]:
dfs[8].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 7 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   GAME_ID               1 non-null      object
 1   VIDEO_AVAILABLE_FLAG  1 non-null      int64 
 2   PT_AVAILABLE          1 non-null      int64 
 3   PT_XYZ_AVAILABLE      1 non-null      int64 
 4   WH_STATUS             1 non-null      int64 
 5   HUSTLE_STATUS         1 non-null      int64 
 6   HISTORICAL_STATUS     1 non-null      int64 
dtypes: int64(6), object(1)
memory usage: 184.0+ bytes


In [46]:
boxscoresummaryv2.BoxScoreSummaryV2('0022000675', timeout=5).game_summary.data

{'headers': ['GAME_DATE_EST',
  'GAME_SEQUENCE',
  'GAME_ID',
  'GAME_STATUS_ID',
  'GAME_STATUS_TEXT',
  'GAMECODE',
  'HOME_TEAM_ID',
  'VISITOR_TEAM_ID',
  'SEASON',
  'LIVE_PERIOD',
  'LIVE_PC_TIME',
  'NATL_TV_BROADCASTER_ABBREVIATION',
  'LIVE_PERIOD_TIME_BCAST',
  'WH_STATUS'],
 'data': [['2021-03-24T00:00:00',
   9,
   '0022000675',
   3,
   'Final',
   '20210324/LACSAS',
   1610612759,
   1610612746,
   '2020',
   4,
   '     ',
   None,
   'Q4       - ',
   1]]}