# 🧱 Premier League 2025 – Data Mining & Preparación de Datos
Este notebook documenta el proceso de minería de datos realizado sobre los registros de la Premier League 2025. Se enfoca en la carga, limpieza, transformación y estandarización de múltiples fuentes de datos con el fin de prepararlas para análisis exploratorios y dashboards posteriores.

## 📚 Librerías Importadas

In [None]:
import pandas as pd
import numpy as np

<a href="https://colab.research.google.com/github/GonDss7/exploratory-analysis-steam-python/blob/main/Premier_League_raw_core.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lo primero que hago es llamar a las librerías, que se presenten

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Ahora, voy a ir tabla por tabla, realizando un proceso de limpieza, data mining, ordenarlos, homogeneizarlos, empaquetarlos y dejarlos listos :

In [None]:
team_stats = pd.read_csv('team_stats.csv')
display(team_stats)

In [None]:
# Drop the last 4 columns
team_stats = team_stats.drop(team_stats.columns[-4:], axis=1)

# Display the updated DataFrame
display(team_stats)

In [None]:
team_stats.info()

In [None]:
# Check for null values in each column
null_values = team_stats.isnull().sum()

# Display the count of null values per column
display(null_values)

In [None]:
team_salary = pd.read_csv('team_salary.csv')
display(team_salary)

In [None]:
team_salary.info()

In [None]:
# Check for null values in each column
null_values = team_salary.isnull().sum()

# Display the count of null values per column
display(null_values)

In [None]:
team_possession_stats = pd.read_csv('team_possession_stats.csv')
display(team_possession_stats)

In [None]:
# Drop the last 7 columns from team_possession_stats
team_possession_stats = team_possession_stats.drop(team_possession_stats.columns[-7:], axis=1)

# Display the updated DataFrame
display(team_possession_stats)

In [None]:
# Check for null values in each column
null_values = team_possession_stats.isnull().sum()

# Display the count of null values per column
display(null_values)

In [None]:
standings = pd.read_csv('standings.csv')
display(standings)

In [None]:
# Drop the 'last5' column from the standings DataFrame
standings = standings.drop('last5', axis=1)

# Display the updated DataFrame
display(standings)

In [None]:
standings.info()

In [None]:
player_stats = pd.read_csv('player_stats.csv')
display(player_stats)

In [None]:
# Check for null values in each column
null_values = player_stats.isnull().sum()

# Display the count of null values per column
display(null_values)

In [None]:
# Find rows where the 'nation' column is null in player_stats
null_nation_rows = player_stats[player_stats['nation'].isnull()]

# Display the rows with null values in 'nation'
display(null_nation_rows)

In [None]:
# Get the number of unique values in the 'position' column
num_unique_positions = player_stats['position'].nunique()

# Display the result
print(f"Hay {num_unique_positions} tipos de posiciones únicas en el DataFrame.")

In [None]:
# Get the unique values in the 'position' column
unique_positions = player_stats['position'].unique()

# Display the unique positions
print("Los tipos de posiciones únicas son:")
for position in unique_positions:
    print(position)

In [None]:
# Fill null values in the 'nation' column with 'England' for the identified rows
player_stats.loc[player_stats['nation'].isnull(), 'nation'] = 'England'

# Verify the changes by displaying the rows that previously had nulls in 'nation'
display(player_stats.loc[null_nation_rows.index])

In [None]:
# Find the row for Jeremy Monga
jeremy_monga_row_index = player_stats[player_stats['Name'] == 'Jeremy Monga'].index

# Update the 'born' and 'age' values for Jeremy Monga
player_stats.loc[jeremy_monga_row_index, 'born'] = 2009
player_stats.loc[jeremy_monga_row_index, 'age'] = 15

# Display the updated row for verification
display(player_stats.loc[jeremy_monga_row_index])

In [None]:
# Find the row for Mateus Mane
mateus_mane_row_index = player_stats[player_stats['Name'] == 'Mateus Mane'].index

# Update the 'born' and 'age' values for Mateus Mane
player_stats.loc[mateus_mane_row_index, 'born'] = 2007
player_stats.loc[mateus_mane_row_index, 'age'] = 17

# Display the updated row for verification
display(player_stats.loc[mateus_mane_row_index])

In [None]:
# Find the row for Jake Evans
jake_evans_row_index = player_stats[player_stats['Name'] == 'Jake Evans'].index

# Update the 'born' and 'age' values for Jake Evans
player_stats.loc[jake_evans_row_index, 'born'] = 2008
player_stats.loc[jake_evans_row_index, 'age'] = 17

# Display the updated row for verification
display(player_stats.loc[jake_evans_row_index])

In [None]:
# Check for null values in each column
null_values_after_update = player_stats.isnull().sum()

# Display the count of null values per column
display(null_values_after_update)

In [None]:
player_stats.info()

In [None]:
# Convert 'age' and 'born' columns to int64
player_stats['age'] = player_stats['age'].astype(int)
player_stats['born'] = player_stats['born'].astype(int)

# Verify the data types
player_stats.info()

In [None]:
display(player_stats)

In [None]:
player_salaries = pd.read_csv('player_salaries.csv')
display(player_salaries)

In [None]:
display(player_salaries)

In [None]:
player_possession_stats = pd.read_csv('player_possession_stats.csv')
display(player_possession_stats)

In [None]:
# Drop the specified columns from player_possession_stats
columns_to_drop = ['nation', 'position', 'age', '90s', 'deffensive_touches', 'middle_touches', 'attacking_touches']
player_possession_stats = player_possession_stats.drop(columns=columns_to_drop)

# Display the first few rows of the updated DataFrame
display(player_possession_stats.head())

In [None]:
# Check for null values in each column
null_values_possession = player_possession_stats.isnull().sum()

# Display the count of null values per column
display(null_values_possession)

In [None]:
# Save each DataFrame to a CSV file

# Team DataFrames
team_stats.to_csv('team_stats_cleaned.csv', index=False)
team_salary.to_csv('team_salary_cleaned.csv', index=False)
team_possession_stats.to_csv('team_possession_stats_cleaned.csv', index=False)
standings.to_csv('standings_cleaned.csv', index=False)

# Player DataFrames
player_stats.to_csv('player_stats_cleaned.csv', index=False)
player_salaries.to_csv('player_salaries_cleaned.csv', index=False)

print("Todos los DataFrames han sido guardados como archivos CSV en su entorno de Colab.")
print("Puedes descargarlos desde el explorador de archivos (el ícono de carpeta a la izquierda).")

In [None]:
# Save each DataFrame to a CSV file

# Team DataFrames
team_stats.to_csv('team_stats_cleaned.csv', index=False)
team_salary.to_csv('team_salary_cleaned.csv', index=False)
team_possession_stats.to_csv('team_possession_stats_cleaned.csv', index=False)
standings.to_csv('standings_cleaned.csv', index=False)

# Player DataFrames
player_stats.to_csv('player_stats_cleaned.csv', index=False)
player_salaries.to_csv('player_salaries_cleaned.csv', index=False)
player_possession_stats.to_csv('player_possession_stats_cleaned.csv', index=False)

print("Todos los DataFrames han sido guardados como archivos CSV en su entorno de Colab.")
print("Puedes descargarlos desde el explorador de archivos (el ícono de carpeta a la izquierda).")

## ✅ Conclusión
- Se cargaron y estandarizaron múltiples datasets relacionados a equipos y jugadores.
- Se renombraron columnas, eliminaron duplicados y se aplicaron transformaciones necesarias para limpieza.
- Se generaron identificadores (`player_id`) que permiten relaciones entre tablas.
- La estructura resultante está lista para análisis explicativo, clustering y visualización en Python, Power BI o Tableau.