# NCAA to NFL Draft Predictions – Exploratory Data Analysis (EDA)

This notebook explores NCAA player stats, cleans the data, and visualizes key trends related to NFL Draft outcomes.

## 1. Import Libraries

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# visualization style
sns.set_theme(style="whitegrid")


## 2. Load Data

In [7]:
passing = pd.read_csv('../data/raw/CFB_Passing_2021.csv')
receiving = pd.read_csv('../data/raw/CFB_Receiving_2021.csv')
rushing = pd.read_csv('../data/raw/CFB_Rushing_2021.csv')
draft = pd.read_csv('../data/raw/NFL_Draft_2023.csv')

## 3. Inspect Data
- Look at shape, data types, and missing values.
- Summarize numeric stats.

In [None]:
passing.info()
receiving.info()
rushing.info()
draft.info()

## 4a. Cleaning Draft Data
- Handle missing values
- Rename columns for consistency
- Drop duplicates

In [None]:
# Clean draft data: 
# drop irrelevant columns and rename duplicates for clarity

# Make a copy to avoid overwriting the original
draft_clean = draft.copy()

# Drop empty or irrelevant columns
draft_clean = draft_clean.drop(columns=[
    'To','AP1', 'PB', 'St', 'wAV', 'DrAV', 'Unnamed: 28', '-9999'
    ], errors='ignore')

# Rename confusing duplicate columns
draft_clean = draft_clean.rename(columns={
    'Cmp': 'Pass_Cmp',
    'Att': 'Pass_Att',
    'Yds': 'Pass_Yds',
    'TD': 'Pass_TD',
    'Int': 'Pass_Int',
    'Att.1': 'Rush_Att',
    'Yds.1': 'Rush_Yds',
    'TD.1': 'Rush_TD',
    'Rec': 'Rec_Rec',
    'Yds.2': 'Rec_Yds',
    'TD.2': 'Rec_TD',
    'Int.1': 'Def_Int',
})

# Check result
draft_clean.head()

## 4b. Cleaning Passing Data

In [None]:
print("Next step")

## 4c. Cleaning Receiving Data

In [None]:
print("Next step")

## 4d. Cleaning Rushing Data

In [None]:
print("Next step")

## 5. Next Steps
- Clean passing, rushing, and receiving datasets
- EDA on college data
- Begin exploratory visualizations (e.g., pick vs. yards)
