Here we are going to extract the basketball seasons from the [NBA Site](https://www.nba.com/stats/teams/traditional?Outcome=&SeasonType=Regular+Season&Season=2023-24).

1. To extract dataset from the site, we are going to use a web scraper, and chosen to go with Selenium. <br/>
   Read the docs to know more about [Selenium](https://selenium-python.readthedocs.io/installation.html)


In [257]:
# Importing all neccesary packages
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import pandas as pd
from openpyxl import Workbook, load_workbook
from buckets import dimes

In [248]:
# Webdriver: Chrome | Site: NBA site
driver = webdriver.Chrome()
url = "https://www.nba.com/stats/teams/traditional/"
driver.get(url)

In [249]:
# Create an Excel to begin data transfer
wb = Workbook()
wb.save('nba_stats.xlsx')
ws = wb.active
dfs = []

In [250]:
# Capture all seasons in the list from web page
season_drpdwn = Select(driver.find_element(By.CLASS_NAME,"DropDown_select__4pIg9"))
seasons_arr = [sn.text for sn in season_drpdwn.options]
xpath_table = dimes.get('XPATH')

In [251]:
# Fetch and Export Seasons' dataset into Excel
for i,sn in enumerate(seasons_arr):
    season_drpdwn.select_by_visible_text(sn)
    table = WebDriverWait(driver, 40).until(EC.visibility_of_element_located((By.XPATH, xpath_table))).get_attribute("outerHTML")
    df = pd.read_html(table)[0]
    df.dropna(how='all', axis=1, inplace=True)
    df.columns.values[0] = 'Rank'
    df.insert(2, "Season", sn)
    # print(df)
    dfs.append(df)
    with pd.ExcelWriter(path='nba_stats.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
       df.to_excel(writer, sheet_name= sn, index=False, header=True)



In [256]:
# Combine all df and sorted by Win Rate
full_df = pd.concat([x for x in dfs], ignore_index=True)
full_df = full_df.sort_values(by="WIN%", ascending= False).reset_index(drop=True)
full_df["Rank"] = full_df.index + 1
with pd.ExcelWriter(path='nba_stats.xlsx', engine='openpyxl', mode='a', if_sheet_exists='replace') as writer:
    full_df.to_excel(writer, sheet_name= 'Sheet', index=False, header=True)

In [253]:
# Close all opened files and systems
wb.close()
driver.close()

In [260]:
full_df.describe()

Unnamed: 0,Rank,GP,W,L,WIN%,Min,PTS,FGM,FGA,FG%,...,DREB,REB,AST,TOV,STL,BLK,BLKA,PF,PFD,+/-
count,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0,...,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0,832.0
mean,416.5,79.153846,39.576923,39.576923,0.499837,48.347356,101.415745,37.764784,83.027885,45.467668,...,31.44363,42.635817,22.435216,14.517067,7.664663,4.913221,4.913942,20.987139,14.118029,-0.005048
std,240.322006,7.041249,12.644595,12.584209,0.152629,0.178016,7.75069,2.673842,4.263605,1.72171,...,2.343331,2.09198,2.37143,1.217105,0.887214,0.816684,0.706424,1.80299,9.427177,4.6213
min,1.0,50.0,7.0,9.0,0.106,48.0,81.9,30.8,71.2,40.1,...,24.9,35.8,15.6,11.1,5.5,2.4,3.0,15.8,0.0,-13.9
25%,208.75,82.0,30.0,30.0,0.39,48.2,95.675,35.9,79.8,44.3,...,29.7,41.2,20.8,13.7,7.1,4.3,4.4,19.7,0.2,-3.1
50%,416.5,82.0,41.0,39.0,0.512,48.4,99.9,37.4,82.6,45.4,...,31.25,42.6,22.1,14.5,7.6,4.9,4.9,20.9,19.5,0.3
75%,624.25,82.0,49.0,49.0,0.61,48.5,106.6,39.5,86.2,46.6,...,33.1,44.0,23.9,15.225,8.2,5.4,5.4,22.2,20.8,3.3
max,832.0,82.0,73.0,72.0,0.89,49.0,123.3,46.9,94.4,50.7,...,42.2,51.7,30.7,19.0,12.0,8.2,6.9,27.1,25.7,11.6


2. Now that we have extracted our data into a df we're going to 
