# March Machine Learning Madness 2024: 
### Predicting NCAA Basketball Tournament Results
##### By David Hartsman

<hr style="border: 4px solid blue">

In [1]:
# Imports 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import os

### Exploration of the Various Data Files
There are 32 separate data sources that require exploration and understanding. I will begin my project by inspecting the contents of these files to understand the information contained within. The first file contains information about `THIS YEAR'S` tournament seeds.

<hr style="border: 2px solid blue">

##### Seeds:
This data contain 128 rows, 64 for each tournament, men's and women's respectively. The 
- `Tournament` column designates M/W for men or women, 
- `Seed` contains bracket region and numeric seeds, and 
- `TeamID`, contains what may be a unique identifier for merging

In [17]:
path = '/Users/samalainabayeva/Desktop/FLAT_IRON!!!/NCAA_KAGGLE/march-machine-learning-mania-2024'

In [20]:
# Load data
tournament_seeds = pd.read_csv(os.path.join(path, '2024_tourney_seeds.csv'))

# Inspect the DF contents
tournament_seeds.shape, tournament_seeds.head()

((128, 3),
   Tournament Seed  TeamID
 0          M  W01    1345
 1          M  W02    1266
 2          M  W03    1243
 3          M  W04    1397
 4          M  W05    1181)

##### Cities: 
This df is 483 rows and 3 columns, and it contains a 
- `CityID` key which may be a unique identifier that can be used for merging tables, 
- `City` name, and 
- `State` name 

In [22]:
# Load data
cities = pd.read_csv(os.path.join(path, 'Cities.csv'))

# Inspect the DF contents
cities.shape, cities

((483, 3),
      CityID          City State
 0      4001       Abilene    TX
 1      4002         Akron    OH
 2      4003        Albany    NY
 3      4004   Albuquerque    NM
 4      4005     Allentown    PA
 ..      ...           ...   ...
 478    4501        Destin    FL
 479    4502  South Easton    MA
 480    4503      Cherokee    NC
 481    4504    St. Joseph    MO
 482    4505         Paris    FR
 
 [483 rows x 3 columns])

##### Conferences:
51 rows, 2 columns for each distinct conference. 
- `ConfAbbrev` containing lowercase abbreviation, and 
- `Description` containing the full length conference name. 

In [23]:
# Load Data
conferences = pd.read_csv(os.path.join(path, 'Conferences.csv'))

# Inspect Data
conferences.shape, conferences.head()

((51, 2),
   ConfAbbrev                   Description
 0      a_sun       Atlantic Sun Conference
 1      a_ten        Atlantic 10 Conference
 2        aac  American Athletic Conference
 3        acc     Atlantic Coast Conference
 4        aec       America East Conference)

##### Men's Conference Tourney Games:
- `Season` Year of Record
- `ConfAbbrev` same as "conferences" df ConfAbbrev columns
- `DayNum` this integer always ranges from 0 to 132, day 132 is always SELECTION SUNDAY, day 0 is earliest first game of ALL seasons in the data
- `WTeamID` winning team ID for merging data
- `LTeamID` also for merging data

In [25]:
# Load Data
mens_games = pd.read_csv(os.path.join(path, 'MConferenceTourneyGames.csv'))

# Inspect Data
mens_games.shape, mens_games.head(6)

((6189, 5),
    Season ConfAbbrev  DayNum  WTeamID  LTeamID
 0    2001      a_sun     121     1194     1144
 1    2001      a_sun     121     1416     1240
 2    2001      a_sun     122     1209     1194
 3    2001      a_sun     122     1359     1239
 4    2001      a_sun     122     1391     1273
 5    2001      a_sun     122     1407     1416)

##### Men's City Game Locations:

In [27]:
# Load Data
mens_city_games = pd.read_csv(os.path.join(path, 'MGameCities.csv'))

# Inspect Data
mens_city_games.shape, mens_city_games.head()

((80170, 6),
    Season  DayNum  WTeamID  LTeamID   CRType  CityID
 0    2010       7     1143     1293  Regular    4027
 1    2010       7     1314     1198  Regular    4061
 2    2010       7     1326     1108  Regular    4080
 3    2010       7     1393     1107  Regular    4340
 4    2010       9     1143     1178  Regular    4027)