# Dataset

This dataset contains 12,844 records of NBA player statistics spanning multiple seasons. It includes demographic details such as age, height, weight, college, and country of origin, as well as draft information and seasonal performance metrics like points, rebounds, assists, and advanced analytics. The data was collected using the NBA Stats API and supplemented with missing values from Basketball Reference. It can be used to analyze player career trajectories, historical trends in playing styles, and the impact of international talent on the league.


In [None]:
#import pandas
import pandas as pd

In [None]:
# Import the All Seasons dataset (CSV file
# The file location
all_seasons_df = pd.read_csv('https://raw.githubusercontent.com/RyanHinshaw/csc442_group_project/refs/heads/main/all_seasons.csv')

# Read in the file and print out the DataFrame
print(all_seasons_df.head())

   Unnamed: 0    player_name team_abbreviation  age  player_height  \
0       10227   James Harden               HOU   29         195.58   
1        4163    Kobe Bryant               LAL   27         198.12   
2       10634   James Harden               HOU   30         195.58   
3       12839    Joel Embiid               PHI   29         213.36   
4        4302  Allen Iverson               PHI   31         182.88   

   player_weight        college   country draft_year draft_round  ...   pts  \
0       99.79024  Arizona State       USA       2009           1  ...  36.1   
1       99.79024            NaN       USA       1996           1  ...  35.4   
2       99.79024  Arizona State       USA       2009           1  ...  34.3   
3      127.00576         Kansas  Cameroon       2014           1  ...  33.1   
4       74.84268     Georgetown       USA       1996           1  ...  33.0   

    reb  ast  net_rating  oreb_pct  dreb_pct  usg_pct  ts_pct  ast_pct  \
0   6.6  7.5         6.3     0

# 1.Data Cleaning

## 1.1 Get to know data

 ## Data Dictionary

- **Unnamed: 0**: Index value assigned to each player record (appears to be an auto-generated index).
- **player_name**: Name of the player.
- **team_abbreviation**: Abbreviation of the team the player was on for that season.
- **age**: Player’s age during the recorded season.
- **player_height**: Height of the player (in cm).
- **player_weight**: Weight of the player (in kg).
- **college**: College the player attended, if applicable.
- **country**: Country of origin of the player.
- **draft_year**: Year in which the player was drafted into the NBA.
- **draft_round**: The round in which the player was selected in the draft.
- **draft_number**: Overall pick number of the player in the draft.
- **gp**: Number of games played during the season.
- **pts**: Average points scored per game.
- **reb**: Average rebounds per game.
- **ast**: Average assists per game.
- **net_rating**: Player’s net efficiency rating (offensive rating - defensive rating).
- **oreb_pct**: Offensive rebound percentage, indicating the proportion of available offensive rebounds secured by the player.
- **dreb_pct**: Defensive rebound percentage, showing the proportion of available defensive rebounds secured by the player.
- **usg_pct**: Usage percentage, representing the percentage of team plays used by the player while on the floor.
- **ts_pct**: True shooting percentage, measuring shooting efficiency by incorporating field goals, free throws, and three-pointers.
- **ast_pct**: Assist percentage, showing the proportion of teammate field goals assisted by the player.
- **season**: The NBA season in which the statistics were recorded (e.g., "2018-19").


In [None]:
# find shape of the data
print(all_seasons_df.shape)

(12844, 22)


In [None]:
# print columns labels
print(all_seasons_df.columns)

Index(['Unnamed: 0', 'player_name', 'team_abbreviation', 'age',
       'player_height', 'player_weight', 'college', 'country', 'draft_year',
       'draft_round', 'draft_number', 'gp', 'pts', 'reb', 'ast', 'net_rating',
       'oreb_pct', 'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct', 'season'],
      dtype='object')


In [None]:
# check row labels
print(all_seasons_df.index)

RangeIndex(start=0, stop=12844, step=1)


In [None]:
# find the column that is unique to each row (unit of observation)
# HINT: these will have number of unique values equal to the  number of rows in the dataframe
# could be names or IDs
len(all_seasons_df["player_name"].unique())

2551

In [None]:
# check type of varaibles
all_seasons_df.dtypes

Unnamed: 0,0
Unnamed: 0,int64
player_name,object
team_abbreviation,object
age,int64
player_height,float64
player_weight,float64
college,object
country,object
draft_year,object
draft_round,object


Strings are usually represented as objects, check few rows to learn more about objects. They could also be compound data types like lists and dictionaries within a column - that may need more cleaning.

You can find the datatype of a column through `dtype`







 attribute

In [None]:
# similarly to know the datatype of a single column in the dataframe
# HINT: use dtype
all_seasons_df['player_name'].dtype

dtype('O')

In [None]:
# display the head
all_seasons_df.head()

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,...,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
0,10227,James Harden,HOU,29,195.58,99.79024,Arizona State,USA,2009,1,...,36.1,6.6,7.5,6.3,0.023,0.157,0.396,0.616,0.394,2018-19
1,4163,Kobe Bryant,LAL,27,198.12,99.79024,,USA,1996,1,...,35.4,5.3,4.5,4.7,0.026,0.127,0.384,0.559,0.228,2005-06
2,10634,James Harden,HOU,30,195.58,99.79024,Arizona State,USA,2009,1,...,34.3,6.6,7.5,5.8,0.026,0.139,0.356,0.626,0.366,2019-20
3,12839,Joel Embiid,PHI,29,213.36,127.00576,Kansas,Cameroon,2014,1,...,33.1,10.2,4.2,8.8,0.057,0.243,0.37,0.655,0.233,2022-23
4,4302,Allen Iverson,PHI,31,182.88,74.84268,Georgetown,USA,1996,1,...,33.0,3.2,7.4,0.8,0.016,0.071,0.354,0.543,0.331,2005-06


## 1.2 Identify which numerical columns and categorical columns

## Document!!!

Usually a good idea to mark which columns you think and find to be categorical.

- **Unnamed: 0**: Index value assigned to each player record (appears to be an auto-generated index).
- **player_name**: Name of the player.
- **team_abbreviation**: Abbreviation of the team the player was on for that season.
- **age**: Player’s age during the recorded season.
- **player_height**: Height of the player (in cm).
- **player_weight**: Weight of the player (in kg).
- **college**: College the player attended, if applicable.
- **country**: Country of origin of the player.
- **draft_year**: Year in which the player was drafted into the NBA.
- **draft_round**: The round in which the player was selected in the draft.
- **draft_number**: Overall pick number of the player in the draft.
- **gp**: Number of games played during the season.
- **pts**: Average points scored per game.
- **reb**: Average rebounds per game.
- **ast**: Average assists per game.
- **net_rating**: Player’s net efficiency rating (offensive rating - defensive rating).
- **oreb_pct**: Offensive rebound percentage, indicating the proportion of available offensive rebounds secured by the player.
- **dreb_pct**: Defensive rebound percentage, showing the proportion of available defensive rebounds secured by the player.
- **usg_pct**: Usage percentage, representing the percentage of team plays used by the player while on the floor.
- **ts_pct**: True shooting percentage, measuring shooting efficiency by incorporating field goals, free throws, and three-pointers.
- **ast_pct**: Assist percentage, showing the proportion of teammate field goals assisted by the player.
- **season**: The NBA season in which the statistics were recorded (e.g., "2018-19").


In [None]:
import numpy as np
import pandas as pd

# Initialize lists
numerical = []
categorical = []
string = []

for column in all_seasons_df.columns:  # Ensure df is the correct DataFrame variable
    # Check if the column is of string type (object or category)
    if all_seasons_df[column].dtype == 'object' or pd.api.types.is_string_dtype(all_seasons_df[column]):
        if all_seasons_df[column].nunique() <= 10:
            categorical.append(column)  # Categorical columns (fewer unique values)
        else:
            string.append(column)  # String columns (many unique values)
    # Check if the column contains numerical values (integer or float)
    elif np.issubdtype(all_seasons_df[column].dtype, np.number):
        numerical.append(column)  # Numerical columns

# Print the classified columns
print("Numerical columns:", numerical)
print("Categorical columns:", categorical)
print("String columns:", string)


Numerical columns: ['Unnamed: 0', 'age', 'player_height', 'player_weight', 'gp', 'pts', 'reb', 'ast', 'net_rating', 'oreb_pct', 'dreb_pct', 'usg_pct', 'ts_pct', 'ast_pct']
Categorical columns: ['draft_round']
String columns: ['player_name', 'team_abbreviation', 'college', 'country', 'draft_year', 'draft_number', 'season']


### 1.2.1 Categorical columns coding

Sometimes you need some categorical columns to be numbers, or encoded as numbers.

In [None]:
# convert all the columns in the categorical list to be of the type category

for column in categorical:
  all_seasons_df[column] = all_seasons_df[column].astype('category')

print(all_seasons_df.dtypes)

Unnamed: 0              int64
player_name            object
team_abbreviation      object
age                     int64
player_height         float64
player_weight         float64
college                object
country                object
draft_year             object
draft_round          category
draft_number           object
gp                      int64
pts                   float64
reb                   float64
ast                   float64
net_rating            float64
oreb_pct              float64
dreb_pct              float64
usg_pct               float64
ts_pct                float64
ast_pct               float64
season                 object
dtype: object


Most of the commands shown until this point, you would have used in your homework assignment 4.

### 1.2.2. Numerical column consistency



In [None]:
# We have multiple ID columns, check formats
# ConstitutentID should be an Integer but is currently represented as a Float with .0
#paintings['ConstituentID'] = paintings['ConstituentID'].astype(int)

all_seasons_df['draft_year'] = pd.to_numeric(all_seasons_df['draft_year'], errors='coerce')  # Convert numeric, set others to NaN

# Now convert to 'Int64' to handle NaNs, if any
all_seasons_df['draft_year'] = all_seasons_df['draft_year'].astype('Int64')

print(all_seasons_df.dtypes)

Unnamed: 0              int64
player_name            object
team_abbreviation      object
age                     int64
player_height         float64
player_weight         float64
college                object
country                object
draft_year              Int64
draft_round          category
draft_number           object
gp                      int64
pts                   float64
reb                   float64
ast                   float64
net_rating            float64
oreb_pct              float64
dreb_pct              float64
usg_pct               float64
ts_pct                float64
ast_pct               float64
season                 object
dtype: object


What is the difference between int and Int?

### 1.2.3 Working with date and time columns

In [None]:
all_seasons_df["draft_year"].head(20)

Unnamed: 0,draft_year
0,2009
1,1996
2,2009
3,2014
4,1996
5,2018
6,2012
7,1997
8,2007
9,2009


In [None]:


# very messy - only keep year - how could you filter only year from this?
# think how strings can be accessed..
# traverse each string and capture last four characters that are numbers

import numpy as np
def extract_last_4_digits(text):
    """Extracts the last 4 digits from a string without regular expressions.

    Args:
    text: The input string.

    Returns:
    The last 4 digits of the string, or None if no digits are found.
    """
    text = str(text)
    if text is not np.nan:
      digits = [char for char in reversed(text) if char.isdigit()]
      if len(digits) >= 4:
        return "".join(reversed(digits[:4]))
    return None


# identify digits in the string -

# all_seasons_df['draft_year'] = all_seasons_df['draft_year'].apply(
#     extract_last_4_digits
# )

# # verify
# print(all_seasons_df["draft_year"].head(20))

In [None]:
# sanity check
all_seasons_df.info()

all_seasons_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12844 entries, 0 to 12843
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Unnamed: 0         12844 non-null  int64   
 1   player_name        12844 non-null  object  
 2   team_abbreviation  12844 non-null  object  
 3   age                12844 non-null  int64   
 4   player_height      12844 non-null  float64 
 5   player_weight      12844 non-null  float64 
 6   college            10990 non-null  object  
 7   country            12844 non-null  object  
 8   draft_year         10486 non-null  Int64   
 9   draft_round        12844 non-null  category
 10  draft_number       12844 non-null  object  
 11  gp                 12844 non-null  int64   
 12  pts                12844 non-null  float64 
 13  reb                12844 non-null  float64 
 14  ast                12844 non-null  float64 
 15  net_rating         12844 non-null  float64 
 16  oreb

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,...,pts,reb,ast,net_rating,oreb_pct,dreb_pct,usg_pct,ts_pct,ast_pct,season
0,10227,James Harden,HOU,29,195.58,99.79024,Arizona State,USA,2009,1,...,36.1,6.6,7.5,6.3,0.023,0.157,0.396,0.616,0.394,2018-19
1,4163,Kobe Bryant,LAL,27,198.12,99.79024,,USA,1996,1,...,35.4,5.3,4.5,4.7,0.026,0.127,0.384,0.559,0.228,2005-06
2,10634,James Harden,HOU,30,195.58,99.79024,Arizona State,USA,2009,1,...,34.3,6.6,7.5,5.8,0.026,0.139,0.356,0.626,0.366,2019-20
3,12839,Joel Embiid,PHI,29,213.36,127.00576,Kansas,Cameroon,2014,1,...,33.1,10.2,4.2,8.8,0.057,0.243,0.37,0.655,0.233,2022-23
4,4302,Allen Iverson,PHI,31,182.88,74.84268,Georgetown,USA,1996,1,...,33.0,3.2,7.4,0.8,0.016,0.071,0.354,0.543,0.331,2005-06


The number of rows of not null values in DateYear and Date are not equal.

Check each row for values- always good idea to check the column that gas more null values

## 1.3. Remove unnecssary values

### 1.3.1. Check for duplicate rows, remove them if needed

In [None]:
# check for duplicate rows using the duplicated().sum() functions - returns number of duplicate rows
duplicate_count = all_seasons_df.duplicated().sum()

# Print the number of duplicate rows
print(f"Number of duplicate rows: {duplicate_count}")

Number of duplicate rows: 0


if there was we would drop them with `drop_duplicates()` function called on the entire dataframe .

### 1.3.2. Removing unnecessary columns

We can reduce the size of our combined dataset by removing columns that are not important for our analyses. Columns can be "dropped" from a DataFrame using the DataFrame method `drop()`.

In [None]:
# Print out the column labels for the full dataset of artworks and artist info
all_seasons_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12844 entries, 0 to 12843
Data columns (total 22 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Unnamed: 0         12844 non-null  int64   
 1   player_name        12844 non-null  object  
 2   team_abbreviation  12844 non-null  object  
 3   age                12844 non-null  int64   
 4   player_height      12844 non-null  float64 
 5   player_weight      12844 non-null  float64 
 6   college            10990 non-null  object  
 7   country            12844 non-null  object  
 8   draft_year         10486 non-null  Int64   
 9   draft_round        12844 non-null  category
 10  draft_number       12844 non-null  object  
 11  gp                 12844 non-null  int64   
 12  pts                12844 non-null  float64 
 13  reb                12844 non-null  float64 
 14  ast                12844 non-null  float64 
 15  net_rating         12844 non-null  float64 
 16  oreb

We will not be using any of the external link resources, so we can remove the columns URL, ThumbnailURL.

In [None]:
# Remove specified columns from the dataset using "drop()"
all_seasons_df = all_seasons_df.drop(columns=['oreb_pct', 'dreb_pct'])

# Print out the column labels from the new DataFrame
print(all_seasons_df.columns)

Index(['Unnamed: 0', 'player_name', 'team_abbreviation', 'age',
       'player_height', 'player_weight', 'college', 'country', 'draft_year',
       'draft_round', 'draft_number', 'gp', 'pts', 'reb', 'ast', 'net_rating',
       'usg_pct', 'ts_pct', 'ast_pct', 'season'],
      dtype='object')


### 1.4.1. Consistency of missing values


In [None]:
all_seasons_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12844 entries, 0 to 12843
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Unnamed: 0         12844 non-null  int64   
 1   player_name        12844 non-null  object  
 2   team_abbreviation  12844 non-null  object  
 3   age                12844 non-null  int64   
 4   player_height      12844 non-null  float64 
 5   player_weight      12844 non-null  float64 
 6   college            10990 non-null  object  
 7   country            12844 non-null  object  
 8   draft_year         10486 non-null  Int64   
 9   draft_round        12844 non-null  category
 10  draft_number       12844 non-null  object  
 11  gp                 12844 non-null  int64   
 12  pts                12844 non-null  float64 
 13  reb                12844 non-null  float64 
 14  ast                12844 non-null  float64 
 15  net_rating         12844 non-null  float64 
 16  usg_

## 1.6. Data Wrangling

> Wrangling is often used to change into a format that is usable, includes merging, subsetting and transformation.



### 1.6.1. Replacing values in a column

We can replace values in a column by first accessing that column and using the Series method `replace()` (*remember accessing one column from a DataFrame returns a pandas Series*). The `replace()` method can accept a dictionary of items in which the dictionary keys are the values to be replaced and the dictionary values are the new values to be inserted.

We will demonstrate this method by replacing the values `Y` and `N` in the `Cataloged` column to the more explicit values `Yes` and `No`, respectively. Also, we will edit the DataFrame directly by including the keyword argument `inplace=True`.

In [None]:
# print the columns and the number of values and datatypes for reference
all_seasons_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12844 entries, 0 to 12843
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Unnamed: 0         12844 non-null  int64   
 1   player_name        12844 non-null  object  
 2   team_abbreviation  12844 non-null  object  
 3   age                12844 non-null  int64   
 4   player_height      12844 non-null  float64 
 5   player_weight      12844 non-null  float64 
 6   college            10990 non-null  object  
 7   country            12844 non-null  object  
 8   draft_year         10486 non-null  Int64   
 9   draft_round        12844 non-null  category
 10  draft_number       12844 non-null  object  
 11  gp                 12844 non-null  int64   
 12  pts                12844 non-null  float64 
 13  reb                12844 non-null  float64 
 14  ast                12844 non-null  float64 
 15  net_rating         12844 non-null  float64 
 16  usg_

In [None]:
all_seasons_df.head()

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,gp,pts,reb,ast,net_rating,usg_pct,ts_pct,ast_pct,season
0,10227,James Harden,HOU,29,195.58,99.79024,Arizona State,USA,2009,1,3,78,36.1,6.6,7.5,6.3,0.396,0.616,0.394,2018-19
1,4163,Kobe Bryant,LAL,27,198.12,99.79024,,USA,1996,1,13,80,35.4,5.3,4.5,4.7,0.384,0.559,0.228,2005-06
2,10634,James Harden,HOU,30,195.58,99.79024,Arizona State,USA,2009,1,3,68,34.3,6.6,7.5,5.8,0.356,0.626,0.366,2019-20
3,12839,Joel Embiid,PHI,29,213.36,127.00576,Kansas,Cameroon,2014,1,3,66,33.1,10.2,4.2,8.8,0.37,0.655,0.233,2022-23
4,4302,Allen Iverson,PHI,31,182.88,74.84268,Georgetown,USA,1996,1,1,72,33.0,3.2,7.4,0.8,0.354,0.543,0.331,2005-06


In [None]:
# Reformat columns to use imperial units

# Convert weight from kg to lb (1 kg = 2.20462 lb)
all_seasons_df['player_weight'] = (all_seasons_df['player_weight'] * 2.20462).round(1)

# Convert height from cm to in (1 cm = 0.393701 in)
all_seasons_df['player_height'] = (all_seasons_df['player_height'] * 0.393701).round(1)



In [None]:
all_seasons_df.head()
print(all_seasons_df[all_seasons_df['player_name'] == 'Stephen Curry'])


      Unnamed: 0    player_name team_abbreviation  age  player_height  \
9          11537  Stephen Curry               GSW   33           75.0   
25          8930  Stephen Curry               GSW   28           75.0   
38         12434  Stephen Curry               GSW   35           74.0   
95         10390  Stephen Curry               GSW   31           75.0   
139         9801  Stephen Curry               GSW   30           75.0   
182        12250  Stephen Curry               GSW   34           74.0   
193         9257  Stephen Curry               GSW   29           75.0   
263         7941  Stephen Curry               GSW   26           75.0   
271         8303  Stephen Curry               GSW   27           75.0   
339         7213  Stephen Curry               GSW   25           75.0   
569        10961  Stephen Curry               GSW   32           75.0   
943         6471  Stephen Curry               GSW   23           75.0   
1168        5876  Stephen Curry               GSW  

### 1.6.5 groupby and Aggregation

In [None]:
# For duplicate players, choose the season with the highest points
all_seasons_df.head()

all_seasons_df = all_seasons_df.loc[all_seasons_df.groupby('player_name')['pts'].idxmax()]


In [None]:
print(all_seasons_df[all_seasons_df['player_name'] == 'Stephen Curry'])
all_seasons_df.info()


   Unnamed: 0    player_name team_abbreviation  age  player_height  \
9       11537  Stephen Curry               GSW   33           75.0   

   player_weight   college country  draft_year draft_round draft_number  gp  \
9          185.0  Davidson     USA        2009           1            7  63   

    pts  reb  ast  net_rating  usg_pct  ts_pct  ast_pct   season  
9  32.0  5.5  5.8         4.6    0.331   0.655    0.283  2020-21  
<class 'pandas.core.frame.DataFrame'>
Index: 2551 entries, 5871 to 10365
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Unnamed: 0         2551 non-null   int64   
 1   player_name        2551 non-null   object  
 2   team_abbreviation  2551 non-null   object  
 3   age                2551 non-null   int64   
 4   player_height      2551 non-null   float64 
 5   player_weight      2551 non-null   float64 
 6   college            2208 non-null   object  
 7   country       

In [None]:
all_seasons_df

Unnamed: 0.1,Unnamed: 0,player_name,team_abbreviation,age,player_height,player_weight,college,country,draft_year,draft_round,draft_number,gp,pts,reb,ast,net_rating,usg_pct,ts_pct,ast_pct,season
5871,667,A.C. Green,DAL,34,81.0,225.0,Oregon State,USA,1985,1,23,82,7.3,8.1,1.5,-7.2,0.118,0.496,0.074,1997-98
12305,1625,A.J. Bramlett,CLE,23,82.0,227.0,Arizona,USA,1999,2,39,8,1.0,2.8,0.0,-32.6,0.146,0.190,0.000,1999-00
7047,1970,A.J. Guyton,CHI,23,73.0,180.0,Indiana,USA,2000,2,32,33,6.0,1.1,1.9,-12.4,0.169,0.495,0.198,2000-01
9578,12588,A.J. Lawson,DAL,22,78.0,179.0,South Carolina,Canada,,Undrafted,Undrafted,15,3.7,1.4,0.1,-20.1,0.189,0.589,0.032,2022-23
8774,12589,AJ Green,MIL,23,77.0,190.0,Northern Iowa,USA,,Undrafted,Undrafted,35,4.4,1.3,0.6,-4.9,0.159,0.607,0.092,2022-23
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
110,11329,Zion Williamson,NOP,20,79.0,284.0,Duke,USA,2019,1,1,61,27.0,7.2,3.7,2.1,0.287,0.649,0.188,2020-21
11688,8349,Zoran Dragic,MIA,26,77.0,200.0,,Slovenia,,Undrafted,Undrafted,16,1.8,0.5,0.3,-15.3,0.217,0.435,0.116,2014-15
8041,3666,Zoran Planinic,NJN,22,79.0,200.0,,Croatia,2003,1,22,43,5.0,1.6,1.0,-4.8,0.227,0.534,0.185,2004-05
1227,2821,Zydrunas Ilgauskas,CLE,28,87.0,260.0,,Lithuania,1996,1,20,81,17.2,7.5,1.6,-7.9,0.279,0.516,0.101,2002-03


In [None]:
from google.colab import drive
from google.colab import files

all_seasons_df.to_csv('best_seasons.csv')

files.download('best_seasons.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Credits

This workshop was created by Aditi Mallavarapu, Claire Cahoon and Walt Gurley, adapted from previous workshop materials by Scott Bailey and Simon Wiles, of Stanford Libraries.