<div style="
    padding: 30px;
    margin: 0 auto;
    font-size: 270%;
    text-align: center;
    border-radius: 10px;
    background-color: #040D12; /* Updated background color */
    color: #ffffff; /* Text color */
    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); /* Adding a little shadow */
    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; /* Updated font-family */
    font-weight: bold;
    line-height: 1.3;
">
    From Sporting to Al-Nassr<br/>
    <div style = "text-align:center;">Cristiano Ronaldo: All Club Goals and Stats</div>
</div>

# 1. Introduction

<div style="">
    <div>
        <img style="width:23%;float: left;margin:5px 30px 5px 1px;height:100%" src="https://pbs.twimg.com/media/FmMUBAVWYAAp9yZ.jpg:large"  />
    </div>
    <div style="margin-left: 20px;">
        <p style="font-weight: bold; font-size: 20px;">Cristiano Ronaldo dos Santos Aveiro</p>
        <p style="font-size: 14px; line-height: 1.6;">
            A Portuguese professional footballer who plays as a forward for Saudi Pro League club <strong>Al Nassr</strong> and captains the <strong>Portugal national team</strong>. He is considered one of the greatest footballers of all time.
        </p>
        <p style="font-size: 14px; font-style: italic;">
            Here is a brief overview of Ronaldo's career:
        </p>
        <ul style="font-size: 14px; line-height: 1.6; position:relative;right:-15px">
            <li>Born: February 5, 1985 (age 38 years)</li>
            <li>Current team: Al Nassr FC (#7 / Forward)</li>
            <li>Height: 1.87 m</li>
            <li>Partner: Georgina Rodríguez </li>
            <li>Salary: 200 million EUR (2023-2024)</li>
            <li>Ballon d'Or awards: 5 (2008, 2013, 2014, 2016, and 2017)</li>
        </ul>
    </div>
        <p style="font-size: 14px; line-height: 1.6;">
            In this notebook, we will explore Ronaldo's goals and stats in detail. We will use data visualization techniques to identify trends and patterns in his performance.
        </p>


<div style="padding:20px;color:white;margin:0;font-size:220%;text-align:center;display:fill;border-radius:5px;background-color:#183D3D;overflow:hidden;font-weight:500">Importing Libraries & Data Loading</div>

# 2. Data loading

In [None]:
# NumPy is used for numerical operations and working with arrays
import numpy as np

# Pandas is used for data manipulation and analysis
import pandas as pd

# Matplotlib is used for creating static, interactive, and animated visualizations
import matplotlib.pyplot as plt
%matplotlib inline

# Plotly's graph_objs provides graph objects for designing rich interactive visualizations
import plotly.graph_objs as go

# Plotly's offline mode allows you to create, render, and display Plotly figures without an internet connection
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

# Useful for producing easy-to-make plots
import plotly.express as px

# timedelta is used for manipulating and formatting time durations
import datetime as dt

# Suppressing warning messages to ensure a clean output
import warnings
warnings.filterwarnings('ignore')

In [None]:
df = pd.read_csv("data.csv")

<div style="padding:20px;color:white;margin:0;font-size:220%;text-align:center;display:fill;border-radius:5px;background-color:#183D3D;overflow:hidden;font-weight:500">Discovering the dataset</div>

# 3. Initial Discovery of the data

In [None]:
pd.set_option('display.max_columns', None) # Display all columns
df.head(5) #First five rows

Unnamed: 0,Season,Competition,Matchday,Date,Venue,Club,Opponent,Result,Playing_Position,Minute,At_score,Type,Goal_assist
0,02/03,Liga Portugal,6,10-07-02,H,Sporting CP,Moreirense FC,3:00,LW,34,2:00,Solo run,
1,02/03,Liga Portugal,6,10-07-02,H,Sporting CP,Moreirense FC,3:00,LW,90+5,3:00,Header,Rui Jorge
2,02/03,Liga Portugal,8,10/26/02,A,Sporting CP,Boavista FC,1:02,,88,1:02,Right-footed shot,Carlos Martins
3,02/03,Taca de Portugal Placard,Fourth Round,11/24/02,H,Sporting CP,CD Estarreja,4:01,,67,3:00,Left-footed shot,Cesar Prates
4,02/03,Taca de Portugal Placard,Fifth Round,12/18/02,H,Sporting CP,FC Oliveira do Hospital,8:01,,13,3:00,,


In [None]:
print(f'Shape of DataFrame: {df.shape}\n')


Shape of DataFrame: (710, 13)



In [None]:
df.info() #Some nulls in the Type, Goal_assist, Playing_position. No need to rename columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 710 entries, 0 to 709
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Season            710 non-null    object
 1   Competition       710 non-null    object
 2   Matchday          710 non-null    object
 3   Date              710 non-null    object
 4   Venue             710 non-null    object
 5   Club              710 non-null    object
 6   Opponent          710 non-null    object
 7   Result            710 non-null    object
 8   Playing_Position  652 non-null    object
 9   Minute            710 non-null    object
 10  At_score          710 non-null    object
 11  Type              695 non-null    object
 12  Goal_assist       464 non-null    object
dtypes: object(13)
memory usage: 72.2+ KB


In [None]:
null_percentage = (df.isnull().mean() * 100).round(2).astype(str) + '%'
print("Percentage of null values in each column:")
print(null_percentage)

Percentage of null values in each column:
Season                0.0%
Competition           0.0%
Matchday              0.0%
Date                  0.0%
Venue                 0.0%
Club                  0.0%
Opponent              0.0%
Result                0.0%
Playing_Position     8.17%
Minute                0.0%
At_score              0.0%
Type                 2.11%
Goal_assist         34.65%
dtype: object


In [None]:
pd.DataFrame(df.describe(include = 'object').T)
#Intersting things: The data type of Date column is object. The data type of Minute column is object

Unnamed: 0,count,unique,top,freq
Season,710,21,14/15,61
Competition,710,17,LaLiga,311
Matchday,710,52,Group Stage,75
Date,710,468,09-12-15,5
Venue,710,2,H,404
Club,710,5,Real Madrid,450
Opponent,710,129,Sevilla FC,27
Result,710,57,3:00,49
Playing_Position,652,5,LW,356
Minute,710,106,90,17


In [None]:
df.duplicated().any()

False

<div style="padding:20px;color:white;margin:0;font-size:220%;text-align:center;display:fill;border-radius:5px;background-color:#183D3D;overflow:hidden;font-weight:500">Data Types and Conversion</div>

# 4. Data Types and Conversion

In [None]:
df['Date'] = pd.to_datetime(df['Date'],dayfirst = True) #Converting object to datetime
type(df['Date'][0]) #checking the data type after conversion Venue

pandas._libs.tslibs.timestamps.Timestamp

In [None]:
df['Season'] = df['Season'].apply(lambda x: '12/13' if x == 'Dec-13' else x) #It should be 12-13 like the rest of the Season column format
#df['Season'] = df['Season'].str.replace('Dec-13','12-13') #Another way to achieve the same result

In [None]:
df['Playing_Position'].fillna(method='ffill', inplace=True) # Fill NaN with the previous value (The best approach here.)

In [None]:
df['Playing_Position'] = df['Playing_Position'].str.strip() #Some values has extra spaces that might affect our analysis

In [None]:
#Creating a Win VS Loss column based on final result (3:00 for ex) and venue (H,A)
def check_win(row):
    raw_result = ''.join([c for c in row['Result'] if c.isdigit() or c == ':'])  # Removing any extra strings like AET...
    home_goals, away_goals = map(int, raw_result.split(':')) #splitting home and away goals then checking the Venue in the next lines

    outcome = 'draw' if home_goals == away_goals else 'win' if (home_goals > away_goals and row['Venue'] == 'H') \
    else 'win' if (home_goals < away_goals and row['Venue'] == 'A') else 'loss' if (home_goals < away_goals and row['Venue'] == 'H') \
    else 'loss'

    return outcome


In [None]:
df['winLoss'] = df.apply(check_win, axis=1)
df[['Result','Venue', 'winLoss']].sample(3)

Unnamed: 0,Result,Venue,winLoss
42,3:02,H,win
162,6:01,H,win
391,5:00,H,win


*<div style="padding:20px;color:white;margin:0;font-size:220%;text-align:center;display:fill;border-radius:5px;background-color:#183D3D;overflow:hidden;font-weight:500">Exploratory Data Analysis </div>

# 5. Exploratory Data Analysis

In [None]:
# Group goals by club and count the number of goals
goals_per_club = df.groupby('Club').size().reset_index(name='Goals').sort_values(by = "Goals", ascending=False)

color_map ={'Real Madrid':'#4D2DB7', 'Juventus FC':'#040D12', 'Manchester United':'#C70039', 'Al-Nassr FC':'#F8DE22','Sporting CP': 'green'}

fig = px.bar(goals_per_club, x='Club', y='Goals', title='Ronaldo Goals per Club',
             labels={'Club': 'Football Club', 'Goals': 'Number of Goals'}, text= 'Goals',
              color_discrete_map = color_map, color = 'Club')
# Show the plot
fig.show()

In [None]:
#Ronaldo's goals percentage per club
fig = px.pie(goals_per_club,
             names='Club',
             values='Goals',
             title="Ronaldo's goals percentage per club %⚽%",
             labels={'Club':'Club', 'Goals':'Number of Goals'},
             color_discrete_map= color_map, color = 'Club')

# Create a custom text label that combines the club name and the number of goals
custom_label = goals_per_club.apply(lambda row: f"{row['Club']}<br>{row['Goals']}", axis=1)

# Customize the text labels
fig.update_traces(textinfo="label+percent", text=custom_label)

# Show the plot
fig.show()

In [None]:
# Plotting a histogram to visualize CR7 goals per season using the Plotly Express library
#Simply it displayes how many times a season appears in the dataset. Each goal corresponds to a row in the data frame with the season value.
px.histogram(
    data_frame=df,              # Data source
    x='Season',                 # Column for x-axis
    title="CR7 goals per season 📊⚽",   # Chart title
    height=500,                 # Height of the chart
    color='Club' , # Color based on 'Club' column
    color_discrete_map= color_map, #A specific color for each club
    hover_name='Club',          # Display 'Club' name on hover
    hover_data=['Competition', 'Season', 'Club']  # Additional hover data
)

In [None]:
top_seasons = df.groupby(['Season', 'Club']).size().reset_index(name = 'Goals').sort_values(by = 'Goals', ascending=False)[:5]
fig = px.bar(top_seasons, x='Season', y='Goals', title="CR7's Highest-Scoring Seasons ⚽🔝",
             labels={'Club': 'Football Club', 'Goals': 'Number of Goals⚽'}, text= 'Club')

# Create a continuous color scale of green'
num_bars = len(df['Club'])
colors =  ['#645CBB', '#7C6CC1', '#9481C7', '#AB97CD', '#d6caed']

# Manually update bar colors
fig.update_traces(marker=dict(color=colors))
# Show the plot
fig.show()

In [None]:
#Ronaldo's Goals percentage by Type %🥅%.
goals_type = df.groupby('Type').size().reset_index(name = 'Goals') #counting how many goals for each type
types = ['Penalty', 'Header', 'Right-footed shot', 'Left-footed shot', 'Long distance kick'] #the most common types to compare
#This will group the goal types as specified and sum up their respective goals.
goals_type['Type'] = goals_type['Type'].apply(lambda x: 'Other' if x not in types else x)
goals_type = goals_type.groupby('Type').sum().reset_index()

fig = px.pie(goals_type,
             names='Type',
             values='Goals',
             title="CR7 Goals percentage by Type %⚽🥅%",
             labels={'Type':'Type', 'Goals':'Number of Goals'})

# Create a custom text label that combines the club name and the number of goals
custom_label = goals_type.apply(lambda row: f"{row['Type']}<br>{row['Goals']}", axis=1)

# Customize the text labels
fig.update_traces(textinfo="label+percent", text=custom_label)

# Show the plot
fig.show()

In [None]:
#CR7 goals per Venue: Home VS Away
px.histogram(
    data_frame=df,              # Data source
    x='Venue',                 # Column for x-axis
    title="CR7 goals per Venue: Home VS Away 🏠",   # Chart title
    height=500,                 # Height of the chart
    width = 800,                # Width of the chart
    color='Club' , # Color based on 'Club' column
    color_discrete_map= color_map, #A specific color for each club
    hover_name='Club',          # Display 'Club' name on hover
    hover_data='Club' # Additional hover data

)

In [None]:
#CR7 Goals per Playing Position
goals_by_pos = df.groupby('Playing_Position').size().reset_index(name = 'Goals').sort_values(by = 'Goals', ascending= False)
fig = px.bar(goals_by_pos, x='Playing_Position', y='Goals', title="CR7's Goals per Playing Position ⚽7️⃣",
             labels={'Playing_Position': 'Position', 'Goals': 'Number of Goals⚽'}, text= 'Goals', color = 'Playing_Position',height=450 )

# Show the plot
fig.show()

In [None]:
#Ratio of victories to games played by CR7.
winLoss_counts = df.drop_duplicates(subset='Date').reset_index()['winLoss'].value_counts()
fig = px.pie(winLoss_counts, values=winLoss_counts.values, names=winLoss_counts.index, title="Ratio of victories to games played by CR7 ✅", hole=0.4,height=500 )
fig.show()

In [None]:
#CR7 Goals per time ⌚⚽
#let's do some reshaping
first_half = list(range(1,46,1)) #a list contains mins from 1 to 45
second_half = list(range(46,91,1)) #a list contains mins from 46 to 90

mins = df['Minute'].apply(lambda x: '45' if '45+' in x else x) #a temp change for better summary
#applying time inverval categories to the data
mins = mins.apply(lambda x: 'First half' if eval(x) in first_half else('Second half' if eval(x) in second_half else ('Plus 90' if '90+' in x else 'Extra Time')))

#creating a dataframe that will be visualized using plotly
goals_by_time = pd.DataFrame(mins.value_counts()).reset_index()

# Create a bar plot using Plotly Express
fig = px.bar(goals_by_time, x='Minute', y='count', title="CR7 Goals per time ⌚⚽",
            labels={'Minute': 'Time', 'count': 'Number of Goals⚽'}, text= 'count',
            color = 'Minute')

# Show the plot
fig.show()

In [None]:
#GR7 goals trend Over Time
df["Cumulative_Goals"] = df['Date'].expanding().count()

fig = px.line(df, x="Date", y="Cumulative_Goals", title="CR7 Goals Trend Over Time 🐐")
fig.update_xaxes(range=['2005-01-01', '2023-12-31'])


# Display the figure
fig.show()

In [None]:
# CR7's top assists come from...
assists = df.groupby('Goal_assist').size().reset_index(name = 'assists').sort_values(by = 'assists', ascending= False)[:5]
# Create a bar plot using Plotly Express
fig = px.bar(assists, x='Goal_assist', y='assists', title="CR7's top assists come from 🅰",
            labels={'Goal_assist': 'Player', 'assists': 'Number of Assists 🅰'}, text= 'assists',
            color = 'Goal_assist')

# Show the plot
fig.show()

In [None]:
#CR7 Hattricks Per Competition 🎯🐐⚽ (3 Goals or more)
hattricks =df[df.groupby('Date')['Date'].transform('count') > 2 ].drop_duplicates(subset='Date')
px.histogram(
    data_frame=hattricks,              # Data source
    x='Competition',                 # Column for x-axis
    title="CR7 Hattricks Per Competition 🎯🐐⚽",   # Chart title
    height=500,                 # Height of the chart
    color='Club' , # Color based on 'Club' column
    color_discrete_map= color_map, #A specific color for each club
    hover_name='Club',          # Display 'Club' name on hover
    hover_data=['Competition', 'Season', 'Club']  # Additional hover data
)

In [None]:
#CR7 Champions league Analysis
cl = df.query('Competition == "UEFA Champions League"') #Champions leage data

#Summary stastics
cl_goals_per_stage = cl.groupby('Matchday').size().reset_index(name = 'Goals').sort_values(by = 'Goals', ascending=False)

# Create a bar plot using Plotly Express
fig = px.bar(cl_goals_per_stage, x='Matchday', y='Goals', title="Champions league goals per stage ⌚⚽",
            labels={'Matchday': 'Stage', 'Goals': 'Number of Goals⚽'}, text= 'Goals',
            color = 'Matchday')

# Show the plot
fig.show()

<div style="">
    <div>
        <img style="width:30%;float: left;margin:5px 30px 5px 1px;height:100%" src="https://pbs.twimg.com/media/EaEhEzIXgAAzWMx.jpg:large"  />
    </div>
    <div style="margin-left: 20px; font-size:15px; line-hight:1.2">
After an exhaustive Exploratory Data Analysis of the available data on CR7, it is undeniable that CR7, stands tall among the legends of football. The metrics, statistics, and visual representations we've delved into not only capture his exceptional talent and prowess on the football field but also underline the consistency, determination, and passion he brings to every game.
<br>
From his early days at Sporting Lisbon to his legendary stints at clubs like Manchester United, Real Madrid, Juventus, and beyond, CR7 has showcased a unique ability to adapt, evolve, and excel. His countless awards, accolades, and records speak for themselves, solidifying his place in the pantheon of football greats.
<br>
However, numbers alone cannot encapsulate the electrifying moments he has provided fans worldwide, from breathtaking goals to crucial assists and game-changing plays. Cristiano Ronaldo's impact on the game transcends mere statistics, as he has not only set the bar for excellence but has also inspired countless young players to pursue their dreams in football.
<br><br>
In conclusion, while debates may continue about the 'greatest of all time' in football, one thing is crystal clear: Cristiano Ronaldo, CR7, is undeniably one of the most influential and exceptional talents the sport has ever witnessed
    </div>