<div>    
<img src="https://data1.ibtimes.co.in/en/full/721695/fifa-2022-emblem.jpg?w=1199&h=399&l=50&t=40" align="center"/>    
</div>

[Image Source: International Business Times](https://www.ibtimes.co.in/watch-fifa-world-cup-2022-official-emblem-unveiled-all-you-need-know-about-upcoming-event-804737)

# FIFA World Cup Qatar 2022: A Case For Morocco
---
Despite the [controversies](https://nypost.com/2022/11/15/all-the-controversies-surrounding-the-qatar-world-cup-in-2022/) surrounding this World Cup, Qatar 2022 has given us so many memorable events. The Iranian team not singing the national anthem paying homage to women’s rights in Iran, the giant killing of Morocco, Argentina’s defeat by Saudi Arabia, Japan’s win against the 4-time winners Germany, Uruguay’s karma vs Ghana and so many more. This World Cup has also seen first-timers and new records being set. Qatar became the first host nation to lose its opening game; so was Brazil's defeat to Cameroon (they have never lost to an African team before), Morocco became the first African nation to advance to a world cup semi-final, Cristiano Ronaldo became the first male player to score at 5 World Cup, first all-female refereeing team at a men’s FIFA World Cup game, first time countries from all continents qualified for the knockout stage and so on [[7](https://nypost.com/2022/11/15/all-the-controversies-surrounding-the-qatar-world-cup-in-2022/),
[8](https://www.ghanaweb.com/GhanaHomePage/SportsArchive/Key-unforgettable-moments-of-2022-World-Cup-Group-stage-in-Qatar-1673927),
[9](https://abcnews.go.com/Sports/wireStory/ronaldo-1st-male-player-score-5-world-cups-93923484),
[11](https://www.timeoutdoha.com/sport-wellbeing/records-broken-qatar-2022)]
. 

So in this notebook I will try to explore the Qatar FIFA World Cup 2022 games dataset highlighting the key features/attributes of teams.  In addition to the obvious stats that contributes to a team's success I will try to explore if there are hidden game attributes which can be important to success in an attempt to make a case for **Morocco** , the surprise team of the tournament. 

## The dataset
I used the [teams statistics dataset](https://www.kaggle.com/datasets/swaptr/fifa-world-cup-2022-statistics) collected and created by [Swaptr](https://www.kaggle.com/swaptr). It contains game statistics of all the teams competing at the FIFA World Cup Qatar 2022. Each row in the data represents a country and each column is a game statistics. This dataset has 32 rows and 189 columns.


In [1]:
import numpy as np 
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

team_data = pd.read_csv('/kaggle/input/fifa-world-cup-2022-statistics/team_data.csv')

print(team_data.shape)

team_data.head()

(32, 189)


Unnamed: 0,team,players_used,avg_age,possession,games,games_starts,minutes,minutes_90s,goals,assists,...,fouls,fouled,offsides,pens_won,pens_conceded,own_goals,ball_recoveries,aerials_won,aerials_lost,aerials_won_pct
0,Argentina,24,28.4,57.4,7,77,690,7.7,15,8,...,100,115,23,5,2,1,357,83,90,48.0
1,Australia,20,28.7,37.8,4,44,360,4.0,3,3,...,52,34,1,0,0,0,200,72,72,50.0
2,Belgium,20,30.6,57.0,3,33,270,3.0,1,1,...,30,35,3,0,1,0,132,33,28,54.1
3,Brazil,26,28.5,56.2,5,55,480,5.3,8,6,...,63,74,8,1,0,0,271,43,56,43.4
4,Cameroon,22,28.0,41.7,3,33,270,3.0,4,4,...,32,38,2,0,0,0,142,42,36,53.8


## Players' Age

According to the [source](https://www.soccerseattlestyle.com/what-is-the-average-age-of-a-professional-soccer-player), the age of professional footballers ranges from 16 to 43 years, with an average age of 25.5 years. The source also observed that around 80% of football players are between 21 and 29 years old. 

Summary of player's age at the world cup.

- The average age of all teams is ~28 years. 
- The oldest teams at the age of 30.6 years old are Belgium and Costa Rica. 
- The youngest team is the United States soccer team at the age of 25.3 years.

**Note**:  Incidentally the two oldest teams (Belgium and Costa Rica) did not make the knock-out stage. Right after Belgium's exit from the tournament there was a debate that Roberto Martinez's (Belgium's coach) loyalty to his older players was partly to blame for their failure. Who knows what would have happened had he given the young players a chance to show what they can do? Hindsight is a wonderful thing. It's it?


Morocco, the surprise packages of this world cup, are the **sixth** youngest team.


In [2]:
team_data['avg_age'].describe()

count    32.000000
mean     28.062500
std       1.166674
min      25.400000
25%      27.475000
50%      28.100000
75%      28.725000
max      30.600000
Name: avg_age, dtype: float64

In [3]:
fig = px.bar(team_data.sort_values('avg_age', ascending = True), 
             y="team", x="avg_age",color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b>Team's average age <b>",
             titlefont={'size':24},
)
age_avg = np.mean(team_data['avg_age'])

fig.add_shape( 
    type="line", line_color="black", line_width=3, opacity=1, line_dash="dot",
    y0=0, y1=1, yref="paper", x0=age_avg, x1=age_avg, xref="x"
)
colors = ['lightseagreen',] * 32 
colors[5] = 'crimson' 


fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

 ## Pool of Talents

Due to the unsual nature of Qatar's World Cup, FIFA has changed few rules compared to pervious editions. Such as teams' squad size has increased from 23 to 26; the number of bench players from 13 to 15 and the allowed substitutions from 3 to 5 players [[1](https://www.sportingnews.com/uk/soccer/news/how-many-players-world-cup-squads-rosters-fifa-qatar-2022/ic04lajzekgfsivpnoiv5hc0)]. However, depending on the quality of squads, game strategy and other circumstances, the number of players used in games can be different from team to teams. The highlights in Qatar 2022;

- On average each team has used 21 players.
- Brazil has used the highest number of players with 26.  
- The fewest number of players used is 18 (Wales and Ecuador).

Among the four teams which are in the semi-finals Croatia has used the fewest number of players. It seems their coach has a set of trusted players (knows his squad well). Brazil undoubtedly has a depth of talent and were among the favorites to extend their own record by winning it for the 6th time but Croatia had other ideas having used 8 less players (figures correct before the SF's games). 

Note: After their third place play-offs, Croatia has used a total of 21 players. 


### Number of Players Involved

In [4]:
team_data['players_used'].describe()

count    32.000000
mean     21.250000
std       1.951013
min      18.000000
25%      20.000000
50%      21.000000
75%      22.000000
max      26.000000
Name: players_used, dtype: float64

In [5]:
fig = px.bar(team_data.sort_values('players_used', ascending = True), 
             y="team", x="players_used",color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b>Number of Players Used <b>",
             titlefont={'size':24},
)
age_avg = np.mean(team_data['players_used'])

fig.add_shape( 
    type="line", line_color="black", line_width=3, opacity=1, line_dash="dot",
    y0=0, y1=1, yref="paper", x0=age_avg, x1=age_avg, xref="x"
)
colors = ['lightseagreen',] * 32 
colors[30] = 'crimson' 


fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

## The Ultimate *Goal* of a Football Game

The ultimate goal of a football game is to win and scoring goals will give you a higher chance of achieving that target. However, goals do not come out of the blue; teams need to work on a strategy, players need to have the talent and composure or need to be resilient working under pressure and of course there is also luck. Among other things xG’s (expected goals), shots (on target), ball possession and effective ball could give us a good indication if a goal is about to happen or not. Here I tried to explore a few of these game attributes.


### Goals and xG's

Expected goals (xG) calculates how many goals a team should have scored based on the quality of the chances created. It is a more accurate and fairer assessment than shots on targets [[2](https://footballxg.com/)].

In an ideal scenario xG is equal to goals, teams/players score the chances they are expected to convert. However, we do not live in an ideal world and that doesn't happen all the time. Higher xG's than scored goals implies wastefulness. The opposite however may suggest that teams/players were more prolific than expected. Of course (as always) there are exceptions to this assumption such as expected errors, extraordinary defense/goalkeeping, a genius of a strike etc.

We observe the following from the data.

- Germany has the highest xG with 3.35 per game. But the could only convert 2 of them. Meaning they were the most wasetfull of the teams. If you don't score the chance you are expected to score, probabily you don't win aginst you are expected either. Remember Japan vs Germany?
- Of all the teams below the dotted line (higher xG than goals) only Brazil made it out of the group stage. 


In [6]:
team_data['goal_per_game'] = team_data['goals']/team_data['games']
team_data['xg_per_game'] = team_data['xg']/team_data['games']
team_data['npxg_per_game'] = team_data['npxg']/team_data['games']


fig = px.scatter(team_data,
                 y='goal_per_game',
                 x='xg_per_game',
                 color='team',
                 size='goal_per_game',
                 width=850,
                 height=500,
                 text=team_data['team'],)

fig.add_shape( 
    type="line", line_color="black", line_width=0.5, opacity=1, line_dash="dot",
    y0=0, y1=1, yref="paper", x0=0, x1=1, xref="paper", 
)
fig.update_layout(showlegend=False, title='<b> Goals vs xG (expected goals) <b>', template='ggplot2',titlefont={'size':24})
fig.show()

fig = px.scatter(team_data,
                 y='goal_per_game',
                 x='npxg_per_game',
                 color='team',
                 size='goal_per_game',
                 width=850,
                 height=500,
                 text=team_data['team'],)

fig.add_shape( 
    type="line", line_color="black", line_width=0.5, opacity=1, line_dash="dot",
    y0=0, y1=1, yref="paper", x0=0, x1=1, xref="paper", 
)
fig.update_layout(showlegend=False, title='<b> Goals vs npxG (non-penality expected goals) <b>', template='ggplot2', titlefont={'size':24})
fig.show()

### Shots on Target

Opta [[3]](https://www.statsperform.com/opta-event-definitions/) defines shots on target as follows:

> A shot on target is defined as any goal attempt that:
> 
> - Goes into the net regardless of intent – For Goals only.
> - Is a clear attempt to score that would have gone into the net but for being saved by the goalkeeper or is stopped by a player who is the last-man with the goalkeeper having no chance of preventing the goal (last line block).
> 
> Shots directly hitting the frame of the goal are not counted as shots on target, unless the ball goes in and is awarded as a goal.Shots blocked by another player, who is not the last-man, are not counted as shots on target.

**Observations**:
- The most notable observation is that **Germany** under-performed in this tournament. Although they are leading the pack with total `shots_per_game` and `shot_on_target_per_game`, they are 12th from the **bottom** (~bottom third) in goal conversion rate.
- The Netherlands had the highest goal per shot_on_target ratio.
- And Morocco's ratio is right down the middle of the pack. In fact they are the second (next to France) from the four Semi-Finals teams. Not bad!


In [7]:
team_data['shots_per_game'] = team_data['shots']/team_data['games']
team_data['shots_on_target_per_game'] = team_data['shots_on_target']/team_data['games']
team_data['npxg_per_game'] = team_data['npxg']/team_data['games']

fig = px.scatter(team_data,
                 x='shots_per_game',
                 y='shots_on_target_per_game',
                 color='team',
                 size='goal_per_game',
                 width=850,
                 height=750,
                 text=team_data['team'],
                 )

fig.update_layout(showlegend=False, title='<b> Shots on target <b>', template='ggplot2', titlefont={'size':24})
fig.show()


# goals to shors ratio

team_data['goal_to_shot_ratio'] = team_data['goal_per_game']/team_data['shots_on_target_per_game']

fig = px.bar(team_data.sort_values('goal_to_shot_ratio', ascending = True), 
             y="team", x='goal_to_shot_ratio',color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b> Goals per shot_on_target ratio  <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen',] * 32 
colors[14] = 'crimson'
colors[11] = 'black' 
colors[7] = 'gold'

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()


### Possessions/Passes/Dribbles

Let's use Opta's definition of the three terminologies first before we go into the observations:

- **Possessions:** One or more sequences in a row belonging to the same team. A possession is ended by the opposition gaining control of the ball.
- **Pass:** Any intentionally played ball from one player to another. Passes include open play passes, goal kicks, corners and free kicks played as pass – but exclude crosses, keeper throws and throw-ins.
- **Progressive pass:** A forward pass that attempts to advance a team significantly closer to the opponent’s goal [[10](https://dataglossary.wyscout.com/progressive_pass/)].
- **Bribbles:** An attempt by a player to beat an opponent when they have possession of the ball. A successful dribble means the player beats the defender while retaining possession, unsuccessful ones are where the dribbler is tackled.

**Observations**:

For fair comparison we'll consider the per-game stats here.

- Spain had by far the most average ball possession (75.8%), a 13 percent more possession than the second placed team England.
- Spain is also a pass master. They led by quite a margin having made more than 300 passes more than Germany per game, 2nd on the list.
- On progressive passing statistics three teams stand out; Denmark, Germany and Spain. However, having failed to advance to the last-16 stage, Denmark and Germany must be wondering if they could have used their progressive passes a little more effectively. 
- The progressive to total pass ratio tells an interesting story. Spain's impressive ball possession and passes wasn't that effective going forward and finding key passes to score goals. 


In [8]:
fig = px.scatter(team_data,
                 x='passes_completed',
                 y='progressive_passes',
                 color='team',
                 size='possession',
                 width=850,
                 height=750,
                 text=team_data['team']
                )

fig.update_layout(showlegend=False, title='<b> Passes completed vs progressive passes <b>', template='ggplot2', titlefont={'size':24})
fig.show()

team_data['passes_completed_pergame'] = team_data['passes_completed']/team_data['games']
team_data['progressive_passes_pergame'] = team_data['progressive_passes']/team_data['games']

fig = px.scatter(team_data,
                 x='passes_completed_pergame',
                 y='progressive_passes_pergame',
                 color='team',
                 size='possession',
                 width=850,
                 height=750,
                 text=team_data['team']
                )

fig.update_layout(showlegend=False, title='<b> Passes completed vs progressive passes (per-game) <b>', template='ggplot2', titlefont={'size':24})
fig.show()

team_data['dribbles_completed_pergame'] = team_data['dribbles_completed']/team_data['games']

fig = px.bar(team_data.sort_values('dribbles_completed_pergame', ascending = True), 
             y="team", x='dribbles_completed_pergame',color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b> Dribbles Completed Per game <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen',] * 32 
colors[25] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

In [9]:
team_data['progressive_passes_ratio'] = team_data['progressive_passes']/team_data['passes_completed']

fig = px.bar(team_data.sort_values('progressive_passes_ratio', ascending = True), 
             y="team", x='progressive_passes_ratio',color=None,
             width=750, 
             height=600,
             )

fig.update_layout(
             template='ggplot2',
             title="<b> Progressive to Total Passes Ratio  <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen',] * 32 
colors[5] = 'crimson'
colors[4] = 'gold'

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

**The Messi Effect?**:

> In the graph below total passes completed vs the total distance covered by the passes is plotted. We can see that there is a linear relationship. More passes means more distance (kinda obvious). A straight line can be drawn through the dots. However, if we look closely Argentina deviates from the rest, i.e they seem to have made more short passes resulting in an overall distance a little less proportional to the total passes they have completed. 

>Being the play-maker of his team, it is likely that he [Messi] is involved in most of the passes Argentina makes. Moreover, watching his games [historically] he plays a lot of one-two's and short intricate passes. Therefore, it is safe to say that what we see here could be described as **“the Messi effect”**.

In [10]:
fig = px.scatter(team_data,
                 x='passes_completed',
                 y='passes_total_distance',
                 color='team',
                 size='possession',
                 width=850,
                 height=750,
                 text=team_data['team']
                )

fig.add_shape( 
    type="line", line_color="black", line_width=0.5, opacity=1, line_dash="dot",
    y0=0, y1=0.99, yref="paper", x0=0, x1=0.9, xref="paper", 
)

fig.add_shape(type="circle",
    xref="paper", yref="paper",
    fillcolor=None,opacity=0.2,
    x0=.89, y0=.84, x1=0.98, y1=0.94,
    line_color="black",
)

fig.update_layout(showlegend=False, title='<b> Total Passes Completed vs Total Distance of Passes <b>', template='ggplot2', titlefont={'size':24})
fig.show()

The following table confirms that Argentina is indeed the team which averaged the shortest pass-distance, 15.5m per pass. The overall average is 17.7m and the highest is almost 20m recorded by team Wales.

In [11]:
from tabulate import tabulate

team_data['avg_dist_per_pass'] = np.round(team_data['passes_total_distance']/team_data['passes_completed'], 2)
df= team_data.loc[:, ['team', 'avg_dist_per_pass']].sort_values(by='avg_dist_per_pass', ascending=True)

print('Average Passing Distance Statistics')
print(tabulate(df.describe()[:], tablefmt = 'psql'))

print('Average Passing Distance (m)')
print(tabulate(df, tablefmt = 'psql'))

Average Passing Distance Statistics
+-------+----------+
| count | 32       |
| mean  | 17.7369  |
| std   |  0.92504 |
| min   | 15.5     |
| 25%   | 17.1125  |
| 50%   | 17.67    |
| 75%   | 18.2425  |
| max   | 19.99    |
+-------+----------+
Average Passing Distance (m)
+----+----------------+-------+
|  0 | Argentina      | 15.5  |
|  3 | Brazil         | 16.45 |
| 12 | Germany        | 16.53 |
|  7 | Croatia        | 16.6  |
|  5 | Canada         | 16.88 |
| 19 | Netherlands    | 16.93 |
| 18 | Morocco        | 17.02 |
| 23 | Saudi Arabia   | 17.06 |
| 13 | Ghana          | 17.13 |
|  9 | Ecuador        | 17.28 |
| 15 | Japan          | 17.43 |
| 24 | Senegal        | 17.48 |
| 27 | Switzerland    | 17.55 |
| 22 | Qatar          | 17.58 |
| 26 | Spain          | 17.63 |
| 25 | Serbia         | 17.65 |
| 30 | Uruguay        | 17.69 |
| 21 | Portugal       | 17.72 |
| 29 | United States  | 17.79 |
|  1 | Australia      | 17.93 |
|  8 | Denmark        | 17.99 |
| 11 | France        

### Touches

Touches is defined as the sum of all events where a player touches the ball excluding things like Aerial Duel lost or Challenge lost. Knowing on which area of the pitch those touches are occurring help us identify which teams are attack minded which are not. Since the majority of the game is contested in the middle of the park, perhaps the attacking third and the defensive third are more important to know. A football pitch has three zones; the defensive third, the middle third and the attacking third. The following picture shows the three zones of the pitch (blue line is the direction of play).

![](https://4.bp.blogspot.com/-ZmFrMnhI7rg/Vbn8cUxOsVI/AAAAAAAAGWg/xtlOOcJTGFg/s1600/pitch%2Bzone.png)

[Picture Source](https://objective-football.blogspot.com/2015/07/territory.html)

**KEY**: `D Zone => defensive third`; `N Zone => Neutral Zone (middle third)` and `A Zone => Attacking third`

**Observations**:
- Costa Rica is the defensive team of all with ~48% of their touches happening in the defensive third (Argentina and France have made 26.4% and 26.9% touches in the middle respectively).
- Spain loves the neutral zone. 60.4% of their touches are happening there (Argentina and France have made 50.4% touches in the middle)
- Germany are the most attacking team with 31% of their touches occuring in the attacking third of the pitch (Argentina and France have made ~24% touches in the final third).
- Morocco is firmly on the defensive side having ranked second in the defensive third touches and 3 from the bottom in terms of touches in the attacking third.


In [12]:
team_data['touches_def_pen_area_percent'] = team_data['touches_def_pen_area']/team_data['touches']*100
team_data['touches_def_3rd_percent'] = team_data['touches_def_3rd']/team_data['touches']*100
team_data['touches_mid_3rd_percent'] = team_data['touches_mid_3rd']/team_data['touches']*100
team_data['touches_att_3rd_percent'] = team_data['touches_att_3rd']/team_data['touches']*100
team_data['touches_att_pen_area_percent'] = team_data['touches_att_pen_area']/team_data['touches']*100

df1= team_data[['team', 'touches_def_3rd_percent']].sort_values('touches_def_3rd_percent', ascending=False)
df2= team_data[['team', 'touches_mid_3rd_percent']].sort_values('touches_mid_3rd_percent', ascending=False)
df3= team_data[['team', 'touches_att_3rd_percent']].sort_values('touches_att_3rd_percent', ascending=False)


colors = ['lightseagreen',] * 32 
colors[1] = 'crimson'
colors[0] = 'gray'

colors1 = ['lightseagreen',] * 32 
colors1[27] = 'crimson'
colors1[0] = 'gold' 

colors2 = ['lightseagreen',] * 32 
colors2[29] = 'crimson'
colors2[0] = 'black'


fig = go.Figure(data=
                 [go.Bar(
                     name='def_3rd',
                     y=df1['touches_def_3rd_percent'],
                     x=df1["team"],
                     marker_color= colors,
                     ),
                  go.Bar(
                      name='mid_3rd',
                      y=df2['touches_mid_3rd_percent'],
                      x=df2['team'],
                      marker_color= colors1 ,
                      ),
                  go.Bar(
                      name='att_3rd',
                      y=df3['touches_att_3rd_percent'],
                      x=df3['team'],
                      marker_color= colors2 ,
                      ),
                  ]
                 )

# dropdown buttons

fig.update_layout(
    updatemenus=[
        dict(
            type="buttons",
            direction="down",
            
            pad={"r": 10, "t": 10},
            showactive=True,
            x=-0.2,
            xanchor="left",
            y=1.0,
            yanchor="top",
            
            buttons=list([
                dict(label="All zones",
                    method="update",
                    args=[{"visible": [True, True, True]},
                        {"title": "Number of touches: Full zones of the pitch."}]),
                dict(label="def_3rd",
                    method="update",
                    args=[{"visible": [True, False, False]},
                        {"title": "Number of touches: Defensive 3rd of the Pitch"\
                         '<br><span style="font-size:16px; color: darkgray"> Costa Rica, with ~48% touces in the defensive 3rd of the pitch are the most defensive team.',
                            },
                         {"markercolor": colors}]),
                dict(label="mid_3rd",
                    method="update",
                    args=[{"visible": [False, True, False]},
                        {"title": "Number of touches: Middle 3rd of the Pitch"\
                         '<br><span style="font-size:16px; color: darkgray"> With ~60% touces, Spain enjoyed most of their touches in the middle of the pitch!',
                            }]),
                dict(label="att_3rd",
                    method="update",
                    args=[{"visible": [False, False, True]},
                        {"title": "Number of touches: Attacking 3rd of the Pitch"\
                         '<br><span style="font-size:16px; color: darkgray"> Germany is the only team which had touched the ball more than 30% in the attackig 3rd.',
                            }]),
            ]),
        )
    ])

fig.update_layout(
             template='ggplot2',
             title="<b> Percentage of touches in various zones of the pitch <b>"\
                         '<br><span style="font-size:16px; color: darkgray"> A football pitch is divided in to three zones: defending, middle and attackig 3rd.',
             titlefont={'size':24},
             width=1000, 
             height=600,
)


fig.update_traces(showlegend=True,
                  marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.update_yaxes(title='Touches (percentage of total touches)')
fig.update_xaxes(title='Teams')
fig.show()


### Through Balls
Another key part of the game is a type of pass called *through ball*. According to [[12]](https://soccerknowledgehub.com/through-ball-in-soccer/), a through ball is defined as "a pass that is played by splitting at least two opposing defenders in an effort to penetrate a gap and advance into a more threatening position". In a football game through balls can be very important when they are excited to perfection as they can create goal scoring opportunities by creating havoc in the defense and defenders making last ditch tackles. So who had the edge on this stat? Did they make the most out of it?

**Observations**:
- We notice that there are three clusters (above 2.5 *TBPG, below 0.5 TBPG and the middle cluster)
- Except Morocco which is in the bottom cluster, the other semi-finalist are in the middle group, none are in the top group - an indication that through balls aren't easy to execute.
- This stats also magnifies Belgium's and Spain's (the top two teams on this metric) underachievement or inefficiency in the final third of the pitch.
- Denmark occupies the last spot having made no through balls at all, a little strange for a team Christian Eriksen (one of the creative midfielders in Europe) is part of. 

***TBPG**: Through balls per game.

In [13]:
team_data['through_balls_pergame'] = team_data['through_balls']/team_data['games']

fig = px.bar(team_data.sort_values('through_balls_pergame', ascending = True), 
             y="team", x='through_balls_pergame',color=None,
             width=750, 
             height=600,
             )

fig.add_hrect(
    y0=27.5, y1=31.5,
    fillcolor="gold", opacity=0.7,
    layer="below", line_width=0,
)

fig.add_hrect(
    y0=8.5, y1=27.5,
    fillcolor="gray", opacity=0.2,
    layer="below", line_width=0,
)

fig.add_hrect(
    y0=-0.5, y1=8.5,
    fillcolor="#CD7F32", opacity=0.7,
    layer="below", line_width=0,
)

fig.update_layout(
             template='ggplot2',
             title="<b> Through Balls Per game <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen',] * 32 
colors[8] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

## A case for Morocco


Drawn in the same group that contains Belgium and Croatia, nobody was expecting Morocco to progress to the knock-out stages let alone reaching to the semi-finals of this world cup. The whole of the African continent is proud that they finally have a representation in the semi-finals of a world cup for the first time in the history of the competition. 

So knocking-down **THREE European footballing heavy weights [1]** can't be a fluke. They must have done something very right. In the following sections I will try to discover some game attributes which might have contributed to their success so far.


>[1]. Morocco defeated Belgium (group-stage), Spain (last-16) and Portugal (quarter-finals) in the road to semi-finals. These teams are ranked 2nd, 7th and 9th respectively [[4](https://www.fifa.com/fifa-world-ranking/men?dateId=id13792)].


### Do not concede goals!
Football is a simple game. Pass pass pass...score more goals than your opponent and you win! But in a tournament football not conceding goals is equally important.    

> In Euro 2016 Portugal reached the semi-final without winning a game in the regulation time (normal time). Granted they had two draws in which they scored goals but you get the idea, they had zero net goal difference and that shows how important not conceding goals is [[5](https://www.skysports.com/football/news/11096/10330603/when-teams-progress-in-major-tournaments-without-winning)].

Conceding only 0.71 goals per game (in normal time), Morocco occupies the 5th spot at the top. Considering that the other Semi-Fianlists (team who played the maximum number of games which is 7) conceded 1 or more goals per game, this is an impressive stat! It appears that they are a team built upon a strong defense line.




In [14]:
team_data['on_goals_against_pergame'] = team_data['on_goals_against']/team_data['games']

fig = px.bar(team_data.sort_values('on_goals_against_pergame', ascending = False), 
             y="team", x="on_goals_against_pergame",color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b> Goals conceded (per game) <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen'] * 32 
colors[27] = 'crimson' 


fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

### Blocked Passes/Shots
A blocked shot is defined as any clear attempt to score that:
- Is going on target and is blocked by an outfield player, where there are other defenders or a goalkeeper behind the blocker.
- Includes shots blocked unintentionally by the shooter’s own team mate.

*Clearances off the line by an opposition player (last line blocks) are counted as shots on target and do not get counted as a blocked shot.*

From the above Opta's definition we see that block can sometimes mean saving a goal. This metric can tell us something about the quality of the defensive line and the tenacity and (maybe) the positional awareness of the midfield players. 

**Morocco's** stat on both of these metrics is respectable (above average). To put in perspective, from the other semi-finalists, only Argentina in `blocked passes` and France in `blocked shots` were better than Morocco. 


In [15]:
team_data['blocked_passes_pergame'] = team_data['blocked_passes']/team_data['games']
team_data['blocked_shots_pergame'] = team_data['blocked_shots']/team_data['games']

fig = make_subplots(
    rows=1, cols=2,
    specs=[[{}, {}]],
    subplot_titles=('Blocked Passes', 'Blocked Shots'),           
    
    print_grid=False)

fig.add_trace(go.Bar(x=team_data['blocked_passes_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=1).update_yaxes(categoryorder='total descending')

fig.add_trace(go.Bar(x=team_data['blocked_shots_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=2).update_yaxes(categoryorder='total ascending')

colors = ['lightseagreen',] * 32 
colors[18] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None,)

fig.update_layout(height=750, width=1100, template='ggplot2')
fig.show()

### Recoveries/Interceptions

Ball recoveries and interseption/tackle interception are defined by Opta as follows:

**Ball Recovery:** This is where a player recovers the ball in a situation where neither team has possession or where the ball has been played directly to him by an opponent, thus securing possession for their team.

**Interception:** This is where a player reads an opponent’s pass and intercepts the ball by moving into the line of the intended pass.

>**Observations**:
>
>We can see that this aspect of the game is also well executed by **Morocco** in comparision to the rest of the teams competing at the world cup. 

In [16]:
team_data['ball_recoveries_pergame'] = team_data['ball_recoveries']/team_data['games']
team_data['interceptions_pergame'] = team_data['interceptions']/team_data['games']
team_data['tackles_interceptions_pergame'] = team_data['tackles_interceptions']/team_data['games']


fig = make_subplots(
    rows=1, cols=3,
    specs=[[{}, {}, {}]],
    subplot_titles=('Ball Recoveries', 'Interceptions', 'Tackle Interceptions'),           
    
    print_grid=False)

fig.add_trace(go.Bar(x=team_data['ball_recoveries_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=1).update_yaxes(categoryorder='total descending')

fig.add_trace(go.Bar(x=team_data['interceptions_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=2).update_yaxes(categoryorder='total ascending')

fig.add_trace(go.Bar(x=team_data['tackles_interceptions_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=3).update_yaxes(categoryorder='total ascending')


colors = ['lightseagreen',] * 32 
colors[18] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None,)

fig.update_layout(height=750, width=1300, template='ggplot2')
fig.show()

### Clearances

> This is a defensive action where a player kicks the ball away from his own goal with no intended recipient [[Opta](https://www.statsperform.com/opta-event-definitions/)].

Booting it out of the danger zone when necessary is also a good strategy. It relieves the pressure from the defense line and gives a breather and time to reorganize the defensive unit. **Morocco** leads the way in this aspect of the game. They know how to **clear** the danger so it appears. 

In [17]:
team_data['clearances_pergame'] = team_data['clearances']/team_data['games']

fig = px.bar(team_data.sort_values('clearances_pergame', ascending = True), 
             y="team", x="clearances_pergame",color=None,
             width=750, 
             height=600,
             )
fig.update_layout(
             template='ggplot2',
             title="<b> Clearances Per Game <b>",
             titlefont={'size':24},
)

colors = ['lightseagreen',] * 32 
colors[22] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None)
fig.show()

### Possessions Lost

If a team loses possession every now and then chances are high that they will be exposed to counter attacks (especially if they lost it in the final third of the pitch) and that might end up leading to a goal and hence higher chances of losing the game. We see that Morocco is a team that does not lose position that much (are 4th from the fewest lost possession). 

What is interesting is that **Morocco** are the sixth bottom team in terms of ball possession, not impressive for a team which made it to the semi-finals one would think. However, they are 9th from the bottom (only bettered by 8 teams) when it comes to `lost possessions`. It appears that their strategy is to make the most of it when they have the ball. More importantly, not to lose it cheaply when it [the baal] is in their possession.


In [18]:
team_data['dispossessed_pergame'] = team_data['dispossessed']/team_data['games']
fig = make_subplots(
    rows=1, cols=2,
    specs=[[{}, {}]],
    subplot_titles=('Possession Lost (per game)', 'Ball Possession (%)'),           
    
    print_grid=False)

fig.add_trace(go.Bar(x=team_data['dispossessed_pergame'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=1).update_yaxes(categoryorder='total descending')

fig.add_trace(go.Bar(x=team_data['possession'], 
                     y=team_data['team'], orientation='h', showlegend=False), row=1, col=2).update_yaxes(categoryorder='total ascending')

colors = ['lightseagreen',] * 32 
colors[18] = 'crimson' 

fig.update_traces(marker_color=colors, marker_line_color=None,
                  marker_line_width=2.5, opacity=None,)

fig.update_layout(height=750, width=1100, template='ggplot2')
fig.show()

## Concluding Remarks:

Our exploration of the Qatar's World Cup games team's statistics has come to an end. Along the way we have tried to see what game attributes affect the performance and result of a team. We have tried to cover both sides of the game, attacking and defensive attributes. As the title of our notebook indicates, we have kept an eye on team Morocco all the way from start to finish and tried to make a case for them. The choice of our team is simply because they were the surprise package of the world cup, they defied logic and expectations. To avoid redundancy, I will not list all the highlights explored as a summary, just a few points highlighting the failure of some of the heavy weights of the game and the miracle of Morocco.  

 - The `Spanish` football team has shown us possession is overrated; for their impressive ball possession (75.8% on average) they have posted during the tournament they were dumped by `Morocco which averaged 39% ball-possession`.

- `Germany` paid the biggest price for their poor goal conversion rate. Although they were top on total `shots_per_game` and `shot_on_target_per_game`, they were 12th from the bottom (~bottom third) in goal conversion rate. Germany's `shot on target per-game` was 3X higher than Morocco's (7.67 vs 2.43) and still did not make it out of the group stages!

- Age (old) is not always a disadvantage as shown by Croatia (third oldest team in the Semi-Finals). For `Belgium` however it seems that they paid the price for relying too much on the aging squad members. Both Belgium and Costa Rica, the oldest teams, were eliminated from the group stage.

- `Morocco` are the surprise teams of the tournament. They surprised everybody by their achievement. However, when we look closely the match statistics it appears that the have done the **not-so-visible** part of the game efficiently. Especially they stood out on the following stats: on `goals conceded per game (5th)`, `lost ball possession (9th)`, `interception/tackle-interception/ball recovery (9th avg.)`. They deserve their history making 4th place finish. They are the crown of African football and they set the target for other teams [Africans team and other underdogs] to match. 





## Reference

1. https://www.sportingnews.com/uk/soccer/news/how-many-players-world-cup-squads-rosters-fifa-qatar-2022/ic04lajzekgfsivpnoiv5hc0
2. https://footballxg.com/
3. https://www.statsperform.com/opta-event-definitions/
4. https://www.fifa.com/fifa-world-ranking/men?dateId=id13792
5. https://www.skysports.com/football/news/11096/10330603/when-teams-progress-in-major-tournaments-without-winning
6. https://www.kaggle.com/datasets/swaptr/fifa-world-cup-2022-statistics
7. https://nypost.com/2022/11/15/all-the-controversies-surrounding-the-qatar-world-cup-in-2022/
8. https://www.ghanaweb.com/GhanaHomePage/SportsArchive/Key-unforgettable-moments-of-2022-World-Cup-Group-stage-in-Qatar-1673927
9. https://abcnews.go.com/Sports/wireStory/ronaldo-1st-male-player-score-5-world-cups-93923484
10. https://dataglossary.wyscout.com/progressive_pass/
11. https://www.timeoutdoha.com/sport-wellbeing/records-broken-qatar-2022
12. https://soccerknowledgehub.com/through-ball-in-soccer/
13. https://www.soccerseattlestyle.com/what-is-the-average-age-of-a-professional-soccer-player


---
### End of Notebook!

### ⚽⚽⚽ Thank you for reading my notebook! ⚽⚽⚽

<!-- # import plotly.graph_objects as go
# from sklearn.linear_model import LinearRegression

# fig = go.Figure()

# fig.add_trace(go.Scatter(x=team_data['passes_completed'], 
#                          y=team_data['progressive_passes'],
#                          mode='markers',
#                          text=team_data['team'],
#                          marker=dict(color='mediumaquamarine',
#                                      size=0.5*team_data['possession']                                     
#                                      )
#                         )
#              )
# model = LinearRegression().fit(np.array(team_data['passes_completed']).reshape(-1,1), (np.array(team_data['progressive_passes'])))
# y_hat = model.predict(np.array(team_data['passes_completed']).reshape(-1,1))

# fig.add_trace(go.Scatter(x=team_data['passes_completed'], y=y_hat, mode='lines',name="Linear_reg_fit", marker_color='salmon'))

# fig.update_layout(showlegend=False, title='<b> Passes completed vs progressive passes <b>', template='ggplot2',)
# fig.show()

#########################

team_data['passes_completed_pergame'] = team_data['passes_completed']/team_data['games']
team_data['progressive_passes_pergame'] = team_data['progressive_passes']/team_data['games']


fig = make_subplots(
    rows=1, cols=2,
    specs=[[{}, {}]],
    subplot_titles=('Passes completed vs progressive passes', 'Passes completed vs progressive passes (per-game)',),           
    
    print_grid=False)

fig.add_trace(go.Scatter(x=team_data['passes_completed_pergame'],
                         y=team_data['progressive_passes_pergame'], 
                         showlegend=False, text=team_data['team'], 
                         mode='markers+text',
                         name='team',
                         marker_size=team_data['goals'].to_numpy()
                        ),
              row=1, col=2)

fig.add_trace(go.Scatter(x=team_data['passes_completed'],
                         y=team_data['progressive_passes'], 
                         showlegend=False, 
                         text=team_data['team'], 
                         mode='markers+text',
                         name='team',
                         marker_size=10
                        ), 
              row=1, col=1)
# fig.update_traces(marker_color=colors, marker_line_color=None,
#                   marker_line_width=2.5, opacity=None,)

fig.update_layout(height=750, width=1200, template='ggplot2')
fig.show()

-->

<!-- # team_data['touches_def_pen_area_pergame'] = team_data['touches_def_pen_area']/team_data['games']
# team_data['touches_def_3rd_pergame'] = team_data['touches_def_3rd']/team_data['games']
# team_data['touches_mid_3rd_pergame'] = team_data['touches_mid_3rd']/team_data['games']
# team_data['touches_att_3rd_pergame'] = team_data['touches_att_3rd']/team_data['games']
# team_data['touches_att_pen_area_pergame'] = team_data['touches_att_pen_area']/team_data['games']

# df1= team_data[['team', 'touches_def_3rd_pergame']].sort_values('touches_def_3rd_pergame', ascending=False)
# df2= team_data[['team', 'touches_mid_3rd_pergame']].sort_values('touches_mid_3rd_pergame', ascending=False)
# df3= team_data[['team', 'touches_att_3rd_pergame']].sort_values('touches_att_3rd_pergame', ascending=False)

# colors = ['lightseagreen',] * 32 
# colors[2] = 'crimson'
# colors[0] = 'gold'

# colors1 = ['lightseagreen',] * 32 
# colors1[26] = 'crimson'
# colors1[0] = 'gold' 

# colors2 = ['lightseagreen',] * 32 
# colors2[29] = 'crimson'
# colors2[0] = 'gold'
# colors2[1] = 'gold'

# fig = go.Figure(data=
#                  [go.Bar(
#                      name='def_3rd',
#                      y=df1['touches_def_3rd_pergame'],
#                      x=df1["team"],
#                      marker_color= colors ,
#                      ),
#                   go.Bar(
#                       name='mid_3rd',
#                       y=df2['touches_mid_3rd_pergame'],
#                       x=df2['team'],
#                       marker_color= colors1 ,
#                       ),
#                   go.Bar(
#                       name='att_3rd',
#                       y=df3['touches_att_3rd_pergame'],
#                       x=df3['team'],
#                       marker_color= colors2 ,
#                       ),
#                   ]
#                  )

# # Add dropdown
# fig.update_layout(
#     updatemenus=[
#         dict(
#             type="buttons",
#             direction="down",
#             buttons=list([
#                 dict(label="All zones",
#                     method="update",
#                     args=[{"visible": [True, True, True]},
#                         {"title": "Number of touches: Full zones of the pitch."}]),
#                 dict(label="def_3rd",
#                     method="update",
#                     args=[{"visible": [True, False, False]},
#                         {"title": "Number of touches: Defensive 3rd of the Pitch"\
#                          '<br><span style="font-size:16px; color: darkgray"> Belgium, with 276 touces pergame in the defensive 3rd of the pitch is a litte surprising for the quality they had.',
#                             },
#                          {"markercolor": colors}]),
#                 dict(label="mid_3rd",
#                     method="update",
#                     args=[{"visible": [False, True, False]},
#                         {"title": "Number of touches: Middle 3rd of the Pitch"\
#                          '<br><span style="font-size:16px; color: darkgray"> With 649 touces pergame, Spain had by far the highest toutches in the middle 3rd. of the pitch!',
#                             }]),
#                 dict(label="att_3rd",
#                     method="update",
#                     args=[{"visible": [False, False, True]},
#                         {"title": "Number of touches: Attacking 3rd of the Pitch"\
#                          '<br><span style="font-size:16px; color: darkgray"> With appx. 235 touces pergame, Germany and Spain had the highest toutches in the attackig 3rd.',
#                             }]),
#             ]),
#         )
#     ])

# fig.update_layout(
#              template='simple_white',
#              title="<b> Total touches in parts of the pitch <b>",
#              titlefont={'size':24},
# )


# fig.update_traces(showlegend=True,
# #                   legend_color='red',
#                   marker_line_color=None,
#                   marker_line_width=2.5, opacity=None)
# fig.update_yaxes(title='Touches')
# fig.update_xaxes(title='Teams')
# fig.show()
 -->