# NCAA to NFL Draft Predictions – Exploratory Data Analysis (EDA)

This notebook explores NCAA player stats, cleans the data, and visualizes key trends related to NFL Draft outcomes.

## 1. Import Libraries

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# visualization style
sns.set_theme(style="whitegrid")


## 2. Load Data

In [8]:
passing = pd.read_csv('../data/raw/CFB_Passing_2021.csv')
receiving = pd.read_csv('../data/raw/CFB_Receiving_2021.csv')
rushing = pd.read_csv('../data/raw/CFB_Rushing_2021.csv')
draft = pd.read_csv('../data/raw/NFL_Draft_2023.csv')

## 3. Inspect Data
- Look at shape, data types, and missing values.
- Summarize numeric stats.

In [9]:
passing.info()
receiving.info()
rushing.info()
draft.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 532 entries, 0 to 531
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Rk                 532 non-null    int64  
 1   Player             532 non-null    object 
 2   Team               532 non-null    object 
 3   Conf               532 non-null    object 
 4   G                  532 non-null    int64  
 5   Cmp                532 non-null    int64  
 6   Att                532 non-null    int64  
 7   Cmp%               532 non-null    float64
 8   Yds                532 non-null    int64  
 9   TD                 532 non-null    int64  
 10  TD%                532 non-null    float64
 11  Int                532 non-null    int64  
 12  Int%               532 non-null    float64
 13  Y/A                532 non-null    float64
 14  AY/A               532 non-null    float64
 15  Y/C                465 non-null    float64
 16  Y/G                532 non

## 4a. Cleaning Draft Data
- Handle missing values
- Rename columns for consistency
- Drop duplicates

In [None]:
# Clean draft data: 
# drop irrelevant columns and rename duplicates for clarity

# Make a copy to avoid overwriting the original
draft_clean = draft.copy()

# Drop empty or irrelevant columns
draft_clean = draft_clean.drop(columns=[
    'To','AP1', 'PB', 'St', 'wAV', 'DrAV', 'G',
    'Cmp', 'Pass_Att', 'Pass_Yds', 'Pass_TD', 'Pass_Int',
    'Rush_Att', 'Rush_Yds', 'Rush_TD', 'Rec', 'Yds.2', 'TD.2',
    'Solo', 'Int.1', 'Sk', 'Unnamed: 28', '-9999'
    ], errors='ignore')

# Rename confusing duplicate columns
draft_clean = draft_clean.rename(columns={
    'Cmp': 'Pass_Cmp',
    'Att': 'Pass_Att',
    'Yds': 'Pass_Yds',
    'TD': 'Pass_TD',
    'Int': 'Pass_Int',
    'Att.1': 'Rush_Att',
    'Yds.1': 'Rush_Yds',
    'TD.1': 'Rush_TD',
    'Rec': 'Rec_Rec',
    'Yds.2': 'Rec_Yds',
    'TD.2': 'Rec_TD',
    'Int.1': 'Def_Int',
})

# Check result
draft_clean.head()

Unnamed: 0,Rnd,Pick,Tm,Player,Pos,Age,Pass_Att,Pass_Yds,Pass_TD,Pass_Int,Rush_Att,Rush_Yds,Rush_TD,College/Univ
0,1,1,CAR,Bryce Young,QB,22.0,1055.0,6033.0,31.0,22.0,92.0,555.0,7.0,Alabama
1,1,2,HOU,C.J. Stroud,QB,21.0,1148.0,8667.0,47.0,20.0,108.0,492.0,3.0,Ohio St.
2,1,3,HOU,Will Anderson,LB,22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Alabama
3,1,4,IND,Anthony Richardson,QB,21.0,348.0,2391.0,11.0,13.0,115.0,634.0,10.0,Florida
4,1,5,SEA,Devon Witherspoon,DB,22.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Illinois


## 5. Next Steps
- Clean passing, rushing, and receiving datasets
- Merge college data with draft data
- Begin exploratory visualizations (e.g., pick vs. yards)
