# NCAA to NFL Draft Predictions – Exploratory Data Analysis (EDA)

This notebook explores NCAA player stats, cleans the data, and visualizes key trends related to NFL Draft outcomes.

## Environment Check

In [3]:
import sys
import pandas as pd
import numpy as np
import matplotlib
import seaborn as sns
import sklearn
import statsmodels.api as sm

print("Python version:", sys.version)
print("Pandas:", pd.__version__)
print("NumPy:", np.__version__)
print("Matplotlib:", matplotlib.__version__)
print("Seaborn:", sns.__version__)
print("scikit-learn:", sklearn.__version__)
print("Statsmodels:", sm.__version__)


Python version: 3.10.18 (main, Jun  5 2025, 08:37:47) [Clang 14.0.6 ]
Pandas: 2.3.2
NumPy: 2.0.1
Matplotlib: 3.10.6
Seaborn: 0.13.2
scikit-learn: 1.7.2
Statsmodels: 0.14.5


## 1. Import Libraries

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Settings for cleaner visuals
sns.set(style="whitegrid")
plt.rcParams["figure.figsize"] = (10,6)

## 2. Load Data

In [None]:
# Example: load sample data (replace with actual NCAA stats later)
df = pd.read_csv("data/raw/sample_cfb_stats.csv")

df.head()

## 3. Inspect Data
- Look at shape, data types, and missing values.
- Summarize numeric stats.

In [None]:
print("Shape:", df.shape)
print("\nData Types:\n", df.dtypes)

print("\nMissing Values:\n", df.isnull().sum())

df.describe().T

## 4. Data Cleaning
- Handle missing values
- Rename columns for consistency
- Drop duplicates

In [None]:
# Example cleaning steps
df = df.dropna()               # remove missing values
df = df.drop_duplicates()      # remove duplicate rows

# Rename columns if needed
df.columns = [col.strip().lower().replace(" ", "_") for col in df.columns]
df.head()

## 5. Exploratory Visualizations
- Distribution of key stats
- Correlations between performance and draft position

In [None]:
# Example histogram
sns.histplot(df["passing_yards"], bins=30, kde=True)
plt.title("Distribution of Passing Yards")
plt.show()

# Example correlation heatmap
sns.heatmap(df.corr(), cmap="coolwarm", annot=False)
plt.title("Correlation Heatmap")
plt.show()

## 6. Initial Insights
- Note any interesting patterns (e.g., higher passing yards correlates with earlier draft rounds).