# Humza Khalid, 11-7-2025

-_Real Madrid League Soccer Analysis (2024-2025)_


-_Web Scraping and Data Analysis Project_



-_This analysis project scrapes data from a website and creates visual  data analysis amongst the questions being given._

In [None]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import time
warnings.filterwarnings('ignore')


sns.set_style("whitegrid")
sns.set_palette("husl")


In [None]:
url = "https://fbref.com/en/squads/53a2f082/2025/Real-Madrid-Stats"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                  "AppleWebKit/537.36 (KHTML, like Gecko) "
                  "Chrome/120.0.0.0 Safari/537.36"
}


_Data Collection - Web Scraping_

In [None]:


response = requests.get(url, headers=headers)

if response.status_code != 200:
    print(f"Oops! Received status code {response.status_code}")
else:
    print("‚úÖ Page fetched successfully!")
    soup = BeautifulSoup(response.text, "html.parser")
    
    # Pull the STANDARD STATS TABLE (main table)
    table = soup.find("table", {"id": "stats_standard_53a2f082"})
    
    # Convert to DataFrame
    df = pd.read_html(str(table))[0]

    # Drop multi-level header (FBref format)
    df.columns = df.columns.droplevel(0)

    # Remove extra header rows inside table
    df = df[df["Player"] != "Player"]
    df.reset_index(drop=True, inplace=True)

    # Convert important columns to numeric
    numeric_cols = ["Gls", "Ast", "Sh", "Cmp%", "90s"]
    for col in numeric_cols:
        if col in df.columns:
            df[col] = pd.to_numeric(df[col], errors="coerce")

    # Show first few rows
    display(df.head())

    # Save file
    df.to_csv("real_madrid_stats.csv", index=False)
    print("üìÅ Saved as real_madrid_stats.csv")

_Question 1) Which player  has the highest average goals per 90 minutes among their players in 24-25 season?_

In [None]:
df["Goals_per_90"] = df["Gls"] / df["90s"]

top_goal_rate = df[["Player", "Pos", "Gls", "90s", "Goals_per_90"]] \
                    .sort_values("Goals_per_90", ascending=False).head(5)

print(" Top Players by Goals per 90 Minutes:")
display(top_goal_rate)

#Visual
plt.figure(figsize=(8,5))
ax = sns.barplot(data=top_goal_rate, x="Player", y="Goals_per_90", palette="Reds_d")
plt.title("Top 5 Real Madrid Players ‚Äì Goals per 90 Minutes", fontsize=14, weight="bold")
plt.xlabel("Player")
plt.ylabel("Goals per 90 min")
for i, v in enumerate(top_goal_rate["Goals_per_90"]):
    ax.text(i, v + 0.02, f"{v:.2f}", ha='center', fontweight='bold')
plt.show()


_This plot retrieves player with the highest average goals per game amongst the other players on the team._

_Question 2) Is there a correlation between total shots and total goals scored per player in 24-25 season?_

In [None]:
corr = df["Sh"].corr(df["Gls"])
print(f" Correlation between Shots and Goals: {corr:.2f}")

#Visual
plt.figure(figsize=(6,5))
sns.regplot(data=df, x="Sh", y="Gls", scatter_kws={"s":60}, line_kws={"color":"red"})
plt.title(f"Correlation Between Shots and Goals (r = {corr:.2f})", fontsize=14, weight="bold")
plt.xlabel("Total Shots")
plt.ylabel("Total Goals")
plt.show()

_This plot demonstrates the difference in between the total goals and shots played amongst the players._

_Question 3) Which players contribute most to goal creation (goals + assists) per 90 minutes within each team in 24-25 season?_

In [None]:
df["GoalContrib_per_90"] = (df["Gls"] + df["Ast"]) / df["90s"]

top_contrib = df.sort_values("GoalContrib_per_90", ascending=False).head(5)
print(" Top Players by Goal Contributions per 90 Minutes:")
display(top_contrib[["Player", "Pos", "Gls", "Ast", "90s", "GoalContrib_per_90"]])

#Visual
plt.figure(figsize=(8,5))
ax = sns.barplot(data=top_contrib, x="Player", y="GoalContrib_per_90", palette="Blues_d")
plt.title("Top 5 Players ‚Äì (Goals + Assists) per 90 Minutes", fontsize=14, weight="bold")
plt.xlabel("Player")
plt.ylabel("Contributions per 90")
for i, v in enumerate(top_contrib["GoalContrib_per_90"]):
    ax.text(i, v + 0.02, f"{v:.2f}", ha='center', fontweight='bold')
plt.show()

_This plot shows the players that have contributed the most goals/assists per game played._

_Question 4) How does passing accuracy differ across players in 24-25 season?_

In [None]:
pass_acc = df.sort_values("Cmp%", ascending=False).head(10)
print(" Top 10 Players by Passing Accuracy:")
display(pass_acc[["Player", "Pos", "Cmp%"]])

#Visual
plt.figure(figsize=(8,5))
ax = sns.barplot(data=pass_acc, x="Player", y="Cmp%", palette="Greens_d")
plt.title("Passing Accuracy (Top 10 Players) ‚Äì Real Madrid 2024‚Äì25", fontsize=14, weight="bold")
plt.xlabel("Player")
plt.ylabel("Pass Completion %")
for i, v in enumerate(pass_acc["Cmp%"]):
    ax.text(i, v + 0.2, f"{v:.1f}%", ha='center', fontweight='bold')
plt.show()

_This plot shows the passing  accuracy of each of the players._
