<div>
    <img src="https://i.imgur.com/1X95zGn.png">
 </div>

> During my first glance at the competition data, I noticed a huge number of files in the dataset.
> 
> Learning more about the game was essential to understand the data for this competition and the [NCAA Tournament Glossary](https://www.ncaa.com/news/basketball-men/article/2019-04-03/march-madness-terms-ncaa-tournament-dictionary) came in handy for this purpose✨

<center><h3><a href="https://www.ncaa.com/brackets/basketball-women/d1/2021">NCAA bracket for the 2021 Division I Women's basketball tournament Official Bracket</a></h3></center>
<img src="https://i.imgur.com/Axcz5YO.png" class="center">

**🎯Goal:**

> - `Stage 1` - To predict and submit predicted point spread for every possible matchup in the past 5 NCAA tournaments (seasons 2015-2019).
> 
> - `Stage 2` - To predict and submit predicted point spread for every possible matchup before the 2021 tournament begins.

<center><h1>Importing libraries📚</h1></center>

In [None]:
import numpy as np
import pandas as pd
import os
import glob
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.lines as lines
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import wandb
import lightgbm as lgb

from plotly.subplots import make_subplots
from matplotlib_venn import venn2, venn2_circles
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import KFold
from sklearn.metrics import mean_squared_error
from tqdm.notebook import tqdm

In [None]:
#set context to customize and style plots
sns.set_context("poster", font_scale = 0.6, rc={"grid.linewidth": 0.4})

#set font family
sns.set_style({'font.family':'serif'})

<img src="https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

> Super excited to integrate W&B for visualizations and tracking model performance!

> [NCAAW Project on W&B Dashboard](https://wandb.ai/ruchi798/ncaaw?workspace=user-ruchi798)  🏋️‍♀️ 

- To get the API key, an account is to be created on the [website](https://wandb.ai/site) first.
- Next, use secrets to use API Keys more securely🤫

In [None]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
api_key = user_secrets.get_secret("api_key")

os.environ["WANDB_SILENT"] = "true"

In [None]:
! wandb login $api_key

<center><h1>Reading and displaying csv files 📖</h1></center>

<img src="https://i.imgur.com/wmcVZDC.png">

> Since we have many csv files, I've written a code block to read multiple csv files and store them in a dictionary.

In [None]:
path = '../input/ncaaw-march-mania-2021-spread/WDataFiles_Stage2_Spread' 
all_files = glob.glob(path + "/*.csv")

df = {}
li = []

for x in all_files:
    name = x
    name = name.split("/")[4]
    name = name.split(".")[0]
    b = f"{x}"
    try:   
        df_b = pd.read_csv(b)
        df[name] = df_b
    except:
        df_b = pd.read_csv(b,encoding='cp1252')
        df[name] = df_b
    li.append(name)

>Here's a glimpse of the dataframes 🔍

In [None]:
def display(i):
    print(i)
    return df[i].head(10)

In [None]:
len(li)

In [None]:
display(li[0])

In [None]:
display(li[1])

In [None]:
display(li[2])

In [None]:
display(li[3])

In [None]:
display(li[4])

> 🏀 **Slot** - uniquely identifies one of the tournament games
> First two characters: Round of the game
> Second two characters: Expected seed of the favored team 
> 
> 🏀 **StrongSeed** - the expected stronger-seeded team that plays in this game
> 
> 🏀 **WeakSeed** - the expected weaker-seeded team that plays in this game

In [None]:
display(li[5])

In [None]:
display(li[6])

In [None]:
display(li[7])

In [None]:
display(li[8])

> To find the exact date a game was played on, we can combine the game's 
> * "DayNum" from WNCAATourneyCompactResults 🔝 
> * and the season's "DayZero" from WSeasons ⬇️

In [None]:
display(li[9])

> 🏀 A **seed** in basketball is the number which corresponds to a team's ranking.
> 
> This is done to prevent high-ranking teams from playing together in the beginning of the competition and to encourage healthy competition!

In [None]:
display(li[10])

In [None]:
display(li[11])

In [None]:
display(li[12])

In [None]:
display(li[13])

> There are 369 unique teams in this dataset.

In [None]:
df['WTeams']['TeamID'].nunique()

<center><h1>EDA 📊</h1></center>

In [None]:
new_teams = pd.DataFrame({"Team ID": [3468,3469,3470,3471]},
                  index=["Bellarmine","North Alabama","Tarleton State","UC_San Diego"])

custom_colors = ["#FFC8FB","#9B287B","#00B295","#D5FFF3"]
colormap = matplotlib.colors.LinearSegmentedColormap.from_list("", custom_colors)

fig = plt.figure(figsize=(12,8), facecolor="white") 
gs = fig.add_gridspec(1,1)
gs.update(wspace=0.1, hspace=0.5)
ax = fig.add_subplot(gs[0, :])

sns.heatmap(new_teams,annot=True, fmt='g',cmap=colormap, linewidths=1.5,cbar=False,annot_kws={"fontsize":15})

ax.text(0.25,-0.4,'Teams that are new to Division 1',fontfamily='serif',fontsize=20,fontweight='bold')

ax.set_xticklabels(ax.get_xticklabels(), fontfamily='serif',fontsize=15,fontweight='bold')
ax.set_yticklabels(ax.get_yticklabels(), fontfamily='serif',rotation=0,fontsize=15,fontweight='bold')
        
plt.show()

> Total number of seasons in this dataset are 24.

In [None]:
df['WSeasons']['Season'].nunique()

> Logging a **dictionary of custom objects** 🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', name='count')

b = df['WSeasons']['Season'].nunique()
c = df['WTeams']['TeamID'].nunique()

wandb.log({'No. of files in the data folder': len(li), 
           'No. of seasons :' : b,
           'No. of unique teams' : c })
run.finish()

In [None]:
df['WNCAATourneySeeds']

> Let's separate the region and seed into different columns.

In [None]:
df['WNCAATourneySeeds']['Region'] = df['WNCAATourneySeeds']['Seed'].apply(lambda x: x[0][:1])
df['WNCAATourneySeeds']['Seed'] = df['WNCAATourneySeeds']['Seed'].apply(lambda x: int(x[1:3]))

> 📌 **Special note about "Season" numbers**: 

> The season year is the year in which the season ends in, not the year that it starts in. 
> 
> Thus the Current season is:
> - the 2021 season ✅
> - and not the 2020 season/2020-2021 season/ 2020-21 season❌

In [None]:
df['WNCAATourneySeeds']

In [None]:
df['WNCAATourneySeeds'] = pd.merge(df['WNCAATourneySeeds'], df['WTeams'],on='TeamID')
df['WNCAATourneySeeds']

In [None]:
data = df['WNCAATourneySeeds'][df['WNCAATourneySeeds']['Seed'] ==1]['TeamName'].value_counts()[:10]

fig = plt.gcf()
fig.set_size_inches(12, 8)
ax = sns.barplot(x=data.index, y=data, 
       edgecolor='darkgray',
       linewidth=0.6,palette="spring")

plt.title('Highest seeded teams', fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
ax.set_xticklabels(data.index, rotation=45,fontfamily='serif',fontsize=12) 
ax.set(xlabel='Team Name', ylabel='count')

plt.show()

> Logging a **custom bar chart** for Highest Seeded Teams🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='seed')

data1 = df['WNCAATourneySeeds'][df['WNCAATourneySeeds']['Seed'] ==1]['TeamName'].value_counts()[:10]
labels = data1.index
values = data1.values
dt = [[label, val] for (label, val) in zip(labels, values)]
table = wandb.Table(data=dt, columns = ["Team name", "Count"])
wandb.log({"highest_seeded_teams" : wandb.plot.bar(table, "Team name", "Count",title="Highest seeded teams")})

run.finish()

run

In [None]:
data = df['WNCAATourneySeeds'][df['WNCAATourneySeeds']['Seed'] ==16]['TeamName'].value_counts()[:10]

fig = plt.gcf()
fig.set_size_inches(12, 8)
ax = sns.barplot(x=data.index, y=data, 
       edgecolor='darkgray',
       linewidth=0.6,palette="cool")

plt.title('Lowest seeded teams', fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
ax.set_xticklabels(data.index, rotation=45,fontfamily='serif',fontsize=12) 
ax.set(xlabel='Team Name', ylabel='count')

plt.show()

> Logging a **custom bar chart** for Lowest Seeded Teams🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='seed')

data2 = df['WNCAATourneySeeds'][df['WNCAATourneySeeds']['Seed'] ==16]['TeamName'].value_counts()[:10]
labels = data2.index
values = data2.values
dt = [[label, val] for (label, val) in zip(labels, values)]
table = wandb.Table(data=dt, columns = ["Team name", "Count"])
wandb.log({"lowest_seeded_teams" : wandb.plot.bar(table, "Team name", "Count",title="Lowest seeded teams")})


run.finish()

run

In [None]:
df['WNCAATourneyCompactResults']

> Logging an **image** for Distribution of Winning and Losing Scores🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='score_diff')
fig = plt.gcf()
fig.set_size_inches(14, 6)

plt.hist(df['WNCAATourneyCompactResults']['WScore'],label="Winning Score",color='#c5ffff',alpha=0.9)
plt.hist(df['WNCAATourneyCompactResults']['LScore'],label="Losing Score",color='#eca3fc',alpha=0.6)
plt.title('Distribution of Winning and Losing Scores',fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
plt.xlabel('')
plt.ylabel('')
plt.legend();
plt.grid()

wandb.log({"Distribution of Winning and Losing Scores": [wandb.Image(plt)]})
run.finish()

run

> Logging **histograms** for Distribution of Winning and Losing Scores🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='score_diff')

data3 = [[s] for s in df['WNCAATourneyCompactResults']['WScore']]
table1 = wandb.Table(data=data3, columns=["w_score"])

data4 = [[s] for s in df['WNCAATourneyCompactResults']['LScore']]
table2 = wandb.Table(data=data4, columns=["l_score"])

wandb.log({'winning_score': wandb.plot.histogram(table1, "w_score",title="Winning Score Distribution")})
wandb.log({'losing_score': wandb.plot.histogram(table2, "l_score",title="Losing Score Distribution")})
run.finish()

run

In [None]:
data = df['WNCAATourneyCompactResults'].groupby('Season')[['WScore','LScore']].mean()

fig = plt.gcf()
fig.set_size_inches(16, 8)

sns.lineplot(x=data.index,y=data['WScore'],color='#52D1DC',label="Winning Score")
sns.lineplot(x=data.index,y=data['LScore'],color='#d216d2',label="Losing Score")

plt.title('Distribution of Winning and Losing Scores Average' ,fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
plt.xlabel('Season')
plt.ylabel('Score')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.grid()
plt.show()

> Logging a **custom line plot** for Winning Score and Losing Score Average🏋️‍♀️

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='score_avg')

data = df['WNCAATourneyCompactResults'].groupby('Season')[['WScore','LScore']].mean()

data5 = [[x, y] for (x, y) in zip(data.index,data['WScore'])]
table1 = wandb.Table(data=data5, columns=["year","w_score_avg"])

data6 = [[x, y] for (x, y) in zip(data.index,data['LScore'])]
table2 = wandb.Table(data=data6, columns=["year","l_score_avg"])

wandb.log({'winning_score_avg': wandb.plot.line(table1,"year","w_score_avg",title="Winning Score Average")})
wandb.log({'losing_score_avg': wandb.plot.line(table2, "year","l_score_avg",title="Losing Score Average")})
run.finish()

run

**Home advantage 🏠** – describes the benefit that the home team is said to gain over the visiting team. This benefit has been attributed to psychological effects supporting fans have on the competitors or referees; to psychological or physiological advantages of playing near home in familiar situations; to the disadvantages away teams suffer from changing time zones or climates, or from the rigors of travel; and in some sports, to specific rules that favor the home team directly or indirectly. 

[Reference](https://en.wikipedia.org/wiki/Home_advantage)

In [None]:
fig, ax  = plt.subplots(figsize=(16, 8))
fig.suptitle('Winning Location Distribution', size = 20, fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
labels = ["Neutral","Home","Away"]
sizes = df['WNCAATourneyCompactResults']['WLoc'].value_counts()
ax.pie(sizes, explode=(0.05, 0.05, 0.2), colors=["#ffdf64","#b98de8","#ff74d4"], startangle=60, labels=labels,autopct='%1.0f%%', pctdistance=0.6)
ax.add_artist(plt.Circle((0,0),0.4,fc='white'))
plt.show()

In [None]:
df['WNCAATourneyCompactResults'] = pd.merge(df['WNCAATourneyCompactResults'], df['WNCAATourneySeeds'], left_on=['Season', 'WTeamID'], right_on=['Season', 'TeamID'], how='left')
df['WNCAATourneyCompactResults'].rename(columns={'Seed':'WSeed'}, inplace=True)
df['WNCAATourneyCompactResults'] = df['WNCAATourneyCompactResults'].drop(columns=['Region','TeamID','DayNum','NumOT','TeamName'],axis=1)

df['WNCAATourneyCompactResults'] = pd.merge(df['WNCAATourneyCompactResults'], df['WNCAATourneySeeds'], left_on=['Season', 'LTeamID'], right_on=['Season', 'TeamID'], how='left')
df['WNCAATourneyCompactResults'].rename(columns={'Seed':'LSeed'}, inplace=True)
df['WNCAATourneyCompactResults'] = df['WNCAATourneyCompactResults'].drop(columns=['TeamName','Region'],axis=1)

In [None]:
df['WNCAATourneyCompactResults']

In [None]:
df_w_s = df['WNCAATourneyCompactResults'].copy()
df_l_s = df_w_s.copy()

In [None]:
def score_seed(df,a):
    if a==1:
        df['score_diff'] = df.WScore-df.LScore
        df['seed_diff'] = df.WSeed-df.LSeed
        df['result'] = 1
    else:
        df['score_diff'] = -(df.WScore-df.LScore)
        df['seed_diff'] = -(df.WSeed-df.LSeed)
        df['result'] = 0
    return df

In [None]:
df_w_s = score_seed(df_w_s,1)
df_l_s = score_seed(df_l_s,0)

In [None]:
train_df = pd.concat([df_w_s,df_l_s])

> Tourney Seeds and Scores

In [None]:
pairplot_df=train_df.copy()
pairplot_df['Season'] = pairplot_df['Season'].astype(str)
fig = px.scatter_matrix(pairplot_df,
    dimensions=['WScore','LScore','score_diff','WSeed','LSeed','Season'],
    color="Season",template="none",color_discrete_sequence=px.colors.qualitative.Pastel)
fig.show()

In [None]:
fig = plt.gcf()
fig.set_size_inches(16, 8)
sns.kdeplot(data=train_df, x="score_diff",color='#5bffdd',shade=True)
plt.title('Distribution of difference in scores' ,fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
plt.xlim(0, None);

In [None]:
fig = plt.gcf()
fig.set_size_inches(14, 6)

d1 = train_df[train_df['WLoc'] == 'N']['score_diff']
d2 = train_df[train_df['WLoc'] == 'H']['score_diff']
d3 = train_df[train_df['WLoc'] == 'A']['score_diff']

ax = sns.histplot(d1,label="Neutral",color='#EAC435',alpha=0.2,element="step")
ax = sns.histplot(d2,label="Home",color='#7353BA',alpha=0.2,element="step")
ax = sns.histplot(d3,label="Away",color='#FAA6FF',alpha=0.5,element="step")

plt.title('Distribution of difference in scores vs Winning Location',fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
plt.xlim(0, None);
ax.set(xlabel='', ylabel='')
plt.grid(axis = 'y')
ax.legend();

In [None]:
train_df = train_df.drop(columns=['TeamID','Season','WTeamID','LTeamID','WLoc'],axis=1)
train_df

In [None]:
df_w_sc = df['WRegularSeasonCompactResults'][['Season', 'WTeamID', 'WScore']]
df_l_sc = df['WRegularSeasonCompactResults'][['Season', 'LTeamID', 'LScore']]

In [None]:
df_w_sc.rename(columns={'WTeamID':'TeamID', 'WScore':'Score'}, inplace=True)
df_l_sc.rename(columns={'LTeamID':'TeamID', 'LScore':'Score'}, inplace=True)

In [None]:
scores = pd.concat([df_w_sc,df_l_sc])

In [None]:
test_df = df['WSampleSubmissionStage2']
test_df['Season'] = test_df['ID'].map(lambda x: int(x[:4]))
test_df['WTeamID'] = test_df['ID'].map(lambda x: int(x[5:9]))
test_df['LTeamID'] = test_df['ID'].map(lambda x: int(x[10:14]))

Merging dataframes to get desired columns in one place

[Documentation](https://pypi.org/project/matplotlib-venn/)

In [None]:
fig = plt.figure(figsize=(8,8))

v1 = venn2(subsets = (3, 1, 2),
          set_labels = ( '', '', ''),
          set_colors=( 'lightcoral', 'linen'),
           alpha=1)
v1.get_patch_by_id('11').set_color('mistyrose')
v1.get_patch_by_id('11').set_alpha(1)

v2 = venn2_circles(subsets = (3, 1, 2),color='linen')

plt.annotate('Included', xy=v1.get_label_by_id('10').get_position() - np.array([0, -0.05]), xytext=(-80,55),
ha='center', textcoords='offset points', 
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.5',color='gray'))

plt.annotate('Not included', xy=v1.get_label_by_id('01').get_position() - np.array([0, -0.05]), xytext=(70,60),
ha='center', textcoords='offset points', 
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=-0.5',color='gray'))

plt.annotate('Included', xy=v1.get_label_by_id('11').get_position() - np.array([0, -0.05]), xytext=(0,60),
ha='center', textcoords='offset points',
arrowprops=dict(arrowstyle='->', color='gray'))

plt.text(1, 0.05, '''
Left Join clause returns 
- all the rows from the left table 
                AND                
- matched records from the right table OR
- returns Null if no matching record found. 

'''
         , fontsize=12, fontweight='light', fontfamily='serif')

l1 = lines.Line2D([1, 1], [0, 1], transform=fig.transFigure, figure=fig,color='black',lw=0.2)
fig.lines.extend([l1])

plt.title("Left join",fontsize=15, fontweight='bold',horizontalalignment='center',fontfamily='serif')
plt.show()

In [None]:
test_df = pd.merge(test_df, df['WNCAATourneySeeds'], left_on=['Season', 'WTeamID'], right_on=['Season', 'TeamID'], how='left')
test_df.rename(columns={'Seed':'WSeed'}, inplace=True)
test_df = test_df.drop(columns=['Region','TeamID','TeamName'],axis=1)

test_df = pd.merge(test_df, df['WNCAATourneySeeds'], left_on=['Season', 'LTeamID'], right_on=['Season', 'TeamID'], how='left')
test_df.rename(columns={'Seed':'LSeed'}, inplace=True)
test_df = test_df.drop(columns=['Region','TeamID','TeamName'],axis=1)

In [None]:
ws=[]
ls=[]
for ii, row in test_df.iterrows():
    t1 = test_df['WTeamID'][ii]
    t2 = test_df['LTeamID'][ii]
    year = test_df['Season'][ii]
    w_score = scores[(scores.TeamID == t1) & (scores.Season == year)].Score.values[0]
    l_score = scores[(scores.TeamID == t2) & (scores.Season == year)].Score.values[0]
    ws.append(w_score)
    ls.append(l_score)

In [None]:
test_df['WScore'] = ws
test_df['LScore'] = ls

test_df['score_diff'] = test_df.WScore - test_df.LScore
test_df['seed_diff'] = test_df.WSeed - test_df.LSeed

In [None]:
train_df= train_df[['seed_diff','score_diff']]
test_df = test_df[['seed_diff']]

<center><h1>Model training 🛠️</h1></center>

In [None]:
X = train_df['seed_diff'].values.reshape(-1, 1)
y = train_df['score_diff']

In [None]:
train_oof = np.zeros((X.shape[0],))
test_preds = 0

> Logging sklearn plots to **visualize model performance** 🏋️‍♀️

In [None]:
def train_plot_regressor(model,name,v_ll,v_mll):
    run = wandb.init(project='ncaaw', name=name+" with KFold")

    mse_l = []
    NUM_FOLDS = 10
    kf = KFold(n_splits=NUM_FOLDS, shuffle=True, random_state=0)

    for f, (trn_idx, val_idx) in tqdm(enumerate(kf.split(X, y))):
            print('\nFold {}'.format(f))
            X_train, X_val = X[trn_idx], X[val_idx]
            y_train, y_val = y.iloc[trn_idx], y.iloc[val_idx]

            model = model
            model.fit(X_train, y_train)
            temp_oof = model.predict(X_val)
            temp_test = model.predict(test_df)
            
            wandb.sklearn.plot_regressor(model, X_train, X_val, y_train, y_val)
            
            test_preds = 0
            train_oof[val_idx] = temp_oof
            test_preds += temp_test/NUM_FOLDS

            mse = mean_squared_error(y_val, temp_oof, squared=False)
            mse_l.append(mse)
            print("MSE: ",mse)
            
    mean_mse = np.mean(mse_l, axis=0)
    v_mse_l.append(mse_l)
    v_mean_mse.append(mean_mse)
    print("\nMean MSE of ",name,mean_mse)
    run.finish()
    return v_mse_l,v_mean_mse

In [None]:
v_mse_l = []
v_mean_mse = []

v_mse_l,v_mean_mse = train_plot_regressor(Ridge(),'Ridge',v_mse_l,v_mean_mse)
v_mse_l,v_mean_mse = train_plot_regressor(KNeighborsRegressor(n_neighbors=99),'K Neighbors Regressor',v_mse_l,v_mean_mse)
v_mse_l,v_mean_mse = train_plot_regressor(RandomForestRegressor(random_state =21),'Random Forest Regressor',v_mse_l,v_mean_mse)
v_mse_l,v_mean_mse = train_plot_regressor(lgb.LGBMRegressor(),'LGBM Regressor',v_mse_l,v_mean_mse)

In [None]:
run = wandb.init(project='ncaaw', job_type='image-visualization',name='Mean MSE')
values = v_mean_mse
labels = ["Ridge","K Neighbors Regressor","Random Forest Regressor","LGBM Regressor"]
dt = [[label, val] for (label, val) in zip(labels, values)]
table = wandb.Table(data=dt, columns = ["Model name", "Mean MSE"])
wandb.log({"Mean MSE" : wandb.plot.bar(table,"Model name", "Mean MSE",title="Mean MSE of models")})

run.finish()

run

Here's a snapshot of my [project](https://wandb.ai/ruchi798/ncaaw?workspace=user-ruchi798) ⬇️

<img src="https://i.imgur.com/FYoTeYp.png">

In [None]:
def train_all(model):
    
    #fit model on entire training data
    model.fit(X, y)        
    
    #test predictions
    test_preds = model.predict(test_df)
    
    return test_preds

#choosing LGBM Regressor as the baseline model
test_preds = train_all(lgb.LGBMRegressor())

<center><h1>Creating a submission file 🔖</h1></center>

In [None]:
df_submission = pd.read_csv("../input/ncaaw-march-mania-2021-spread/WDataFiles_Stage2_Spread/WSampleSubmissionStage2.csv")
df_submission['Pred'] = test_preds
df_submission.to_csv('/kaggle/working/Predictions.csv',index=False)

Inspiration:
* [Netflix Data Visualization](https://www.kaggle.com/joshuaswords/netflix-data-visualization)
* [Trends in 2020 with Advice from Top Kagglers](https://www.kaggle.com/iamleonie/trends-in-2020-with-advice-from-top-kagglers)

Illustrations tools:

- [Canva](https://www.canva.com/) 🖌️

<img src="https://i.imgur.com/pl3FhXV.png">