<a href="https://colab.research.google.com/github/SylviaMwai/NFL-Players-Analytics-EDA-/blob/main/NFL_Players_Analytics_(EDA).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**ANALYZING NFL PLAYERS DATASET**

## 1. BUSINESS UNDERSTANDING

#### a) PERSPECTIVE

In the world of professional American football, understanding players' **physical and demographic attributes** is essential for making informed decisions in coaching, recruitment, player development, and sports analytics. Each player's position demands a different set of physical characteristics, and patterns in height, weight, and age can offer insights into the strategies behind team formation and talent scouting.

This project explores the **NFL player dataset** to uncover patterns in player profiles, including physical measurements, age, and educational background. Such exploratory analysis helps establish a baseline understanding of the kinds of athletes who succeed in different positions, which schools produce NFL talent, and how positions vary regarding player size and age.

This type of analysis also supports **sports scientists, coaches, and analysts** in:

• Comparing physical attributes across positions

• Identifying outliers or unique players

• Understanding talent pipelines (e.g., top colleges)

#### b) OBJECTIVES

The main objective of this project is to perform a thorough **Exploratory Data Analysis (EDA)** of NFL players using structured, interpretable visual and statistical tools.

This analysis aims to:

• Understand the **distribution of player positions**

• Examine **average height and weight** and how they vary by position

• Calculate player **ages** from birthdates

• Analyze the **educational background** by identifying colleges producing the most NFL athletes

Key goals include:

• Identifying **position-specific physical patterns**

• Providing a **visual profile** of player attributes

• Highlighting potential **outliers or unique cases**

• Building a **strong foundation** for future predictive modelling tasks (e.g., injury risk, performance, or draft value)

This project is valuable in building domain familiarity and data storytelling for aspiring sports data scientists.


In [1]:
import pandas as pd
import numpy as np

## 2. DATA UNDERSTANDING

In [3]:
Data = pd.read_csv("/content/players.csv")
Data.head()

Unnamed: 0,nflId,height,weight,birthDate,collegeName,position,displayName
0,2539334,72,190,1990-09-10,Washington,CB,Desmond Trufant
1,2539653,70,186,1988-11-01,Southeastern Louisiana,CB,Robert Alford
2,2543850,69,186,1991-12-18,Purdue,SS,Ricardo Allen
3,2555162,73,227,1994-11-04,Louisiana State,MLB,Deion Jones
4,2555255,75,232,1993-07-01,Minnesota,OLB,De'Vondre Campbell


In [4]:
Data.info()
Data.describe(include= "all")

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1303 entries, 0 to 1302
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   nflId        1303 non-null   int64 
 1   height       1303 non-null   object
 2   weight       1303 non-null   int64 
 3   birthDate    1303 non-null   object
 4   collegeName  1303 non-null   object
 5   position     1303 non-null   object
 6   displayName  1303 non-null   object
dtypes: int64(2), object(5)
memory usage: 71.4+ KB


Unnamed: 0,nflId,height,weight,birthDate,collegeName,position,displayName
count,1303.0,1303.0,1303.0,1303,1303,1303,1303
unique,,29.0,,1150,251,21,1298
top,,73.0,,1989-04-26,Alabama,WR,Isaiah Johnson
freq,,159.0,,4,33,228,2
mean,2416518.0,,222.537222,,,,
std,533333.5,,29.476747,,,,
min,252.0,,159.0,,,,
25%,2539662.0,,200.0,,,,
50%,2553658.0,,216.0,,,,
75%,2558184.0,,242.0,,,,


In [5]:
Data.dtypes

Unnamed: 0,0
nflId,int64
height,object
weight,int64
birthDate,object
collegeName,object
position,object
displayName,object


## 3. DATA CLEANING