# Baby Project
## Players Data Analysis

In this notebook, we’ll explore the **players_data-2024_2025.csv** dataset.


### Loading the Dataset
We will load in all our packages and our dataset all at once in the beginning.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Loading the dataset
df = pd.read_csv('players_data-2024_2025.csv')

# Display first rows
df.head()

### Checking for Missing Data
Now we check for null values in the dataset.

In [2]:
df.isnull().sum()



### Summary Statistics
We examine numeric columns.

In [3]:
df.describe()

### Cleaning the Duplicate Data
We drop duplicates to keep only unique rows. If there's a timestamp or date column, we can convert it to datetime as well.

In [4]:
# Drop duplicates
df.drop_duplicates(inplace=True)

# Example: if there's a match_date column
# df['match_date'] = pd.to_datetime(df['match_date'])

df.info()

### Creating Data Visualizations
We’ll create some graphs to visually explore the players dataset.

#### Graph 1 (Top 10 Players)
This example looks at the number of times each player appears, though you can change it to sum goals, minutes, etc.

In [5]:
top_players = df['player_name'].value_counts().head(10)

plt.figure(figsize=(12,6))
top_players.plot(kind='bar', color='blue')
plt.xlabel('Player Name')
plt.ylabel('Count')
plt.title('Top 10 Players')
plt.xticks(rotation=45)
plt.show()

#### Graph 2 (Distribution of Minutes Played)
We can plot a histogram to see how many minutes players are getting.

In [6]:
plt.figure(figsize=(8,5))
sns.histplot(df['minutes_played'], kde=True, bins=20, color='green')
plt.title('Distribution of Minutes Played')
plt.xlabel('Minutes Played')
plt.ylabel('Frequency')
plt.show()

#### Graph 3 (Correlation Heatmap)
If you have multiple numeric columns like goals, assists, etc., a heatmap is useful.

In [7]:
plt.figure(figsize=(10,6))
corr = df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()