# Goal kick analysis

In this notebook we will jointly identify the the best teams in shooting and defending goal kicks in 2017/2018 Bundesliga season. We will do so by going through the following sections:

1. [Number of goal kicks per team](#goal_kicks_per_team)<br>Get a first understanding of the data and look at the average number of goal kicks for each team


2. [End points of goal kicks](#end_points)<br>Analyze the end positions of the goal kicks. Where most of them played short? More to the left? In this chapter we will find answers to those questions.


3. [Accurate goal kicks](#events_after)<br>In this chapter we search for a good definition for accuracy regarding goal kicks. Futhermore, we are going to learn about different helper functions to visualize the event data in a nice way. 


4. [Team performance](#team_performance)<br>After we have done all the pre-work, we can now finally look into the goal kick performance of all teams and identify the ones that did extraordinarily well.

Notice that while we are going to look into the goal kick performance of the different teams, the underlying idea of the notebook is to introduce you to different helper functions that are implemented in this project. Having said this, I highly encourage you to start playing around with the helper functions and maybe even start your own analysis. Also, if you are interested in the technical aspects, feel free to jump into the source code and start to understand how they work under the hood :-)

Ok, enough of the talking, let's jump right into it!

In [63]:
import os
import pandas as pd
import numpy as np
import plotly
import plotly.express as px

if os.getcwd().split(os.sep)[-1] == "notebooks":
    os.chdir("../")

# import helper functions coming with this project
import helper.plotly as py_help
import helper.event_data as ed_help
import helper.io as io
import helper.general as gen_help

# this is very useful as it makes sure that always all columns and rows of a data frame are displayed
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## Loading the data

Read all the event data from the German league - notice that by specifying the notebook, we only read the event data required for this notebook

In [64]:
df_events = io.read_event_data("germany", notebook="goal_kick_analysis")
df_events.head()

Unnamed: 0,id,matchId,matchPeriod,eventSec,eventName,subEventName,teamId,posBeforeXMeters,posBeforeYMeters,posAfterXMeters,posAfterYMeters,playerId,playerName,playerPosition,homeTeamId,awayTeamId
0,179896442,2516739,1H,2.409746,Pass,Simple pass,2446,52.5,34.0,52.5,32.64,15231,K. Volland,FW,2444,2446
1,179896443,2516739,1H,2.506082,Pass,Simple pass,2446,52.5,32.64,23.1,14.96,14786,K. Bellarabi,MD,2444,2446
2,179896444,2516739,1H,6.946706,Pass,Simple pass,2446,23.1,14.96,6.3,31.28,14803,S. Bender,DF,2444,2446
3,179896445,2516739,1H,10.786491,Pass,Simple pass,2446,6.3,31.28,21.0,6.8,14768,B. Leno,GK,2444,2446
4,179896446,2516739,1H,12.684514,Pass,Simple pass,2446,21.0,6.8,28.35,2.72,14803,S. Bender,DF,2444,2446


Read the team data - in fact we are displaying the final table of the league. This also helps us to understand how good each team did in the 2017/2018 season.

In [65]:
df_league = io.read_team_data("germany")
df_league

Unnamed: 0,position,teamId,teamName,matches,goals,concededGoals,goalsDiff,points
4,1,2444,Bayern München,34,92,28,64,84
6,2,2449,Schalke 04,34,53,37,16,63
12,3,2482,Hoffenheim,34,66,48,18,55
3,4,2447,Borussia Dortmund,34,64,47,17,55
2,5,2446,Bayer Leverkusen,34,58,44,14,55
17,6,2975,RB Leipzig,34,57,53,4,53
5,7,2445,Stuttgart,34,36,36,0,51
15,8,2462,Eintracht Frankfurt,34,45,45,0,49
0,9,2454,Borussia M'gladbach,34,47,52,-5,47
8,10,2457,Hertha BSC,34,43,46,-3,43


<a id="goal_kicks_per_team"></a>

# 1. Number of goal kicks per team

The idea of this section is to get a first general idea about goal kicks. How many does each team have per game? Are there any differences in the amount of goal kicks between the different teams?

Ok, we have loaded all the data that is required in this notebook. Let's start our analysis by extracting all the goal kicks that happened during the year and get the average number of goal kicks per team and match.

In [66]:
df_goal_kick = df_events[df_events["subEventName"] == "Goal kick"].copy()

print(f"Number of goal kicks per team and match: {(len(df_goal_kick) / df_goal_kick['matchId'].nunique() / 2):.1f}")

Number of goal kicks per team and match: 7.9


On average, each team has almost 8 goal kicks per match. While this is not a huge amount, it is still considerable, isn't it? Let's see how the amount of goal kicks differs between the different teams

In [67]:
# get total number of goal kicks per team
df_nb_gk = df_goal_kick.groupby("teamId").agg(nbGoalKicks=("teamId", "count")).reset_index().sort_values("nbGoalKicks")

# merge information of team such as team name and number of matches
df_nb_gk = pd.merge(df_nb_gk, df_league, on="teamId")

# compute number of goal kicks per match
df_nb_gk["goalKicksPerMatch"] = df_nb_gk["nbGoalKicks"] / df_nb_gk["matches"]

# plot bar chart with number of goal kicks per team
fig = px.bar(df_nb_gk, x="teamName", y="goalKicksPerMatch", 
             labels={"goalKicksPerMatch":"Avg. # of goal kicks per match", "teamName": "Team"},
             title="Number of goal kicks per team")
fig.show()

Hm, there are two teams that really stand out: Bayern and Freiburg. Remember those two teams as they will stand out again below!

Honestly, I did not draw the chart above because of the insights, but because I wanted to introduce you a really nice plotting package called *plotly*. You see how we came up with this rudimentary bar chart in only a couple lines of code? And try hovering with your mouse over the graph. Isn't that cool? :-) In order to learn more about it, I recommend you to visit their [website](https://plotly.com/python/)

<a id="end_points"></a>

# 2. End points of the goal kicks

Nice, in the previous section we got some high-level insights on the goal kicks. Let's take it one step further and analyze where the goal kicks ended most often. Where they played long or short? More to the right or to the left? That's exactly the questions we are going to answer now. 

Let's start by dividing the field into zones and drawing a heatmap indicating the number of goal kicks ending in each of zones. By using the helper function *prepare_heatmap* we can divide the field into 40 zones (length is divided into 8 buckets and width is divided into 5 buckets) and count the number of appearances for each zone in only one line of code.

The function returns 4 values:
1. The number of goal kicks ending in each zone. This corresponds to the "z-value" in our heatmap and is an array of size 5x8
2. The center point of each zone in x-direction (array of length 8)
3. The center point of each zone in y-direction (array of length 5)
4. A data frame which corresponds to the *df_goal_kick* data frame but has two additional columns indicating the zone in which each point lays. Notice, however, that the data frame is only returned if the *return_df* parameter is set to *True*

In [93]:
nb_goal_kick, x, y, df_goal_kick = py_help.prepare_heatmap(df_goal_kick, "posAfterXMeters", "posAfterYMeters", 
                                                           8, 5, return_df=True)

Let's see how many appearances we have for each zone - seems like the majority of the goal kicks ended shortly behind the half-field line or they were played short to the full-backs

In [94]:
nb_goal_kick

array([[209., 160., 145., 114., 262., 117.,  11.,   0.],
       [ 41., 104.,  98.,  72., 414., 273.,  13.,   0.],
       [ 28.,  93.,  73.,  47., 208., 237.,  15.,   0.],
       [ 36.,  81.,  81.,  56., 426., 297.,  15.,   0.],
       [172., 158., 111., 156., 294., 139.,  18.,   0.]])

Notice that there are quite some goal kicks ending in the bottom right and there are way more in the top left than in the bottom left. This looks somehow strange, doesn't it? However, notice that these zones contain the default points (0,0) and (105,68) which seem to be dummy points. Let's therefore exclude those and see what happens.

In [95]:
nb_goal_kicks = len(df_goal_kick)

# only consider goal kicks that do not end in the top left or the bottom right point of the field
df_goal_kick = df_goal_kick[~((df_goal_kick["posAfterXMeters"]==0) & (df_goal_kick["posAfterYMeters"]==0)) & 
                            ~((df_goal_kick["posAfterXMeters"]==105) & (df_goal_kick["posAfterYMeters"]==68))]
print(f"Deleted {nb_goal_kicks - len(df_goal_kick)} goal kicks")

Deleted 0 goal kicks


In [96]:
nb_goal_kick, x, y, df_goal_kick = py_help.prepare_heatmap(df_goal_kick, "posAfterXMeters", "posAfterYMeters", 
                                                           8, 5, return_df=True)
nb_goal_kick

array([[209., 160., 145., 114., 262., 117.,  11.,   0.],
       [ 41., 104.,  98.,  72., 414., 273.,  13.,   0.],
       [ 28.,  93.,  73.,  47., 208., 237.,  15.,   0.],
       [ 36.,  81.,  81.,  56., 426., 297.,  15.,   0.],
       [172., 158., 111., 156., 294., 139.,  18.,   0.]])

This looks way better now, doesn't it? Now, let's make the visualisation a little bit more appealing

In [97]:
# Define what is shown when hovering over a zone
dict_info = {"Goal kicks": {"values": nb_goal_kick, "display_type": ".0f"}}

field = py_help.create_heatmap(x, y, nb_goal_kick, dict_info, title_name="Number of goal kicks")
field.show()

When hovering over the upper yellow zone, we see that 414 goal kicks ended there. However, what does that mean in terms of share of goal kicks? Let's quickly add this information as well as an identifier for each zone.

In [99]:
# compute the share (in %) of goals ending in each zone
share_goal_kick = nb_goal_kick / len(df_goal_kick) * 100

# prepare identifiers for the zones to be displayed (notice that this code can be copied for each heatmap)
x_index = np.tile(np.arange(len(x)),(len(y),1))
y_index = np.transpose(np.tile(np.arange(len(y)),(len(x),1)))

# update the information to be shown when hovering over a zone
dict_info = {"Goal kicks": {"values": nb_goal_kick, "display_type": ".0f"}, 
             "Share goal kick (%)": {"values": share_goal_kick, "display_type": ".1f"},
             "Index x": {"values": x_index, "display_type": ".0f"},
             "Index y": {"values": y_index, "display_type": ".0f"}
            }

field = py_help.create_heatmap(x, y, share_goal_kick, dict_info, title_name="Share of goal kicks")
field.show()

Nice, now we can see that the 414 goal kicks translate into 8.6% of all goal kicks. Moreover, it is now displayed that the zone we are referring to has x-index of 4 and y-index of 1. With this we can easily extract all goal kicks ending there and try to study them in a little bit more detail. This is what we are going to do in the next chapter. But before that let us quickly extract all goal kciks ending in our favourite zone ;-)

In [74]:
# Notice that df_goal_kick was returned by the *prepare_heatmap* function above 
df_zone = df_goal_kick[(df_goal_kick["posAfterXMetersZone"] == 4) & (df_goal_kick["posAfterYMetersZone"] == 1)].copy()

A quick checks shows that we have indeed 414 goal kicks

In [75]:
print(f"Number of goal kicks ending in zone: {len(df_zone)}")

Number of goal kicks ending in zone: 414


# 3. Accurate goal kicks

In the previous section we learned about the position on the field where most goal kicks ended. The natural question to ask now is: "Were these goal kicks accurate?" This is what we want to look at in this chapter. 

To do so, however, we first need to get a better understanding of the events *after* the goal kick. So let's start by looking at the events following one sample goal kick made by Ulreich. 

In [76]:
df_zone.head(1)

Unnamed: 0,id,matchId,matchPeriod,eventSec,eventName,subEventName,teamId,posBeforeXMeters,posBeforeYMeters,posAfterXMeters,posAfterYMeters,playerId,playerName,playerPosition,homeTeamId,awayTeamId,posAfterXMetersZone,posAfterYMetersZone
1046,179897516,2516739,2H,678.384382,Free Kick,Goal kick,2444,5.25,34.0,60.9,20.4,14736,S. Ulreich,GK,2444,2446,4,1


We can easily extract all events after the goal kick using the time of the goal kick and the matchId

In [77]:
# get all the information from our df_zone above
event_id = 179897516
match_id = 2516739
event_sec = 678
match_period = "2H"

# we are interested in the 20 seconds after the goal kick
duration_after_event = 20

df_events_after_gk = df_events[(df_events["matchId"] == match_id) & 
                               (df_events["matchPeriod"] == match_period) & 
                               (df_events["eventSec"] >= event_sec) & 
                               (df_events["eventSec"] <= (event_sec + duration_after_event))].copy()

# only display the columns we are interested in
cols = ["eventSec", "subEventName", "playerName", "playerPosition", "teamId", "posBeforeXMeters", "posBeforeYMeters", 
        "posAfterXMeters", "posAfterYMeters"]

df_events_after_gk[cols]

Unnamed: 0,eventSec,subEventName,playerName,playerPosition,teamId,posBeforeXMeters,posBeforeYMeters,posAfterXMeters,posAfterYMeters
1046,678.384382,Goal kick,S. Ulreich,GK,2444,5.25,34.0,60.9,20.4
1047,682.102811,Air duel,A. Vidal,MD,2444,60.9,20.4,75.6,11.56
1048,682.132346,Air duel,D. Kohr,MD,2446,44.1,47.6,29.4,56.44
1049,685.392092,Simple pass,A. Dragović,DF,2446,29.4,56.44,7.35,38.76
1050,689.194186,Simple pass,B. Leno,GK,2446,7.35,38.76,30.45,53.04
1051,691.500076,Simple pass,J. Tah,DF,2446,30.45,53.04,32.55,51.68
1052,692.670984,Simple pass,C. Aránguiz,MD,2446,32.55,51.68,14.7,63.92
1053,693.139654,Simple pass,A. Dragović,DF,2446,14.7,63.92,26.25,65.28
1054,696.506104,Simple pass,A. Mehmedi,FW,2446,26.25,65.28,28.35,59.16


Cool, that gives us a pretty good idea of what happened after the goal kick. Ulreich kicked the ball, there was a duel between Vidal and Kohr and the ball ended up being with Leverkusen. 

This is already great information. Nevertheless, I find it quite hard to understand how exactly Leverkusen played with the ball after they have it. Looks like they played it back to the goalie and then between defense and midfield, but I'm not completely sure how exactly. 

Fortunately, we can use a helper function to make this more visual.

In [100]:
# this extracts all events between *event_id* and *duration_after_event* seconds after the event
df_event_after_gk = ed_help.get_time_around_special_event(df_events, event_id, secs_after=duration_after_event)

# helper function to prepare the event plot 
df = py_help.prepare_event_plot(df_event_after_gk, "posBeforeXMeters", "posBeforeYMeters")
fig = py_help.create_event_plot(df, "posBeforeXMeters", "posBeforeYMeters")
fig.show()

Now it is way easier to understand what was going on, wouldn't you agree? There was a goal kick by Ulreich, then a duel between Vidal and Kohr and ... Wait, how do we know it was a duel between Vidal and Kohr? Again, you can just hover over the dots and get information about what is going on in each dot. 

We can now easily tell that Leverkusen opened up the game by playing over their right side. The question is, can we visualize this even more appealing? Let's check out the *event_animation* function! 

In [79]:
df = py_help.prepare_event_animation(df_event_after_gk, 
                                     "posBeforeXMeters", 
                                     "posAfterXMeters", 
                                     "posBeforeYMeters", 
                                     "posAfterYMeters")

animation = py_help.create_event_animation(df, 
                                           total_seconds = duration_after_event, 
                                           fps = 10,
                                           x_col_bef = "posBeforeXMeters", 
                                           x_col_aft = "posAfterXMeters", 
                                           y_col_bef = "posBeforeYMeters", 
                                           y_col_aft = "posAfterYMeters")

plotly.offline.iplot(animation, validate=False, auto_play=False)


divide by zero encountered in double_scalars



Ok, ok, I admit, I got a little bit distracted from the actual question we wanted to answer. When is a goal kick defined as accurate? In my opinion, a goal kick should be marked as accurate if the team having the goal kick is in possession of the ball after the goal kick. But what does possession mean? If we look at the events above, we notice that the first player on the ball after the goal kick was Vidal, i.e. a Bayern player. However, would you consider the goal kick above an accurate one? I wouldn't... 

I would rather propose the following definition for an accurate goal kick: A goal kick was accurate if the first pass, shot or free kick after the goal kick was made by the team having the goal kick. 

That way, in the example above, the goal kick would have been inaccurate as the first pass after the goal kick was the one going from Dragovic to Leno. Makes sense, doesn't it? However, before we use this definition, let's look at a couple more examples of goal kicks...

In [80]:
duration_after_event = 20
df_example_gk = df_zone.head(10)

for _, row in df_example_gk.iterrows():
    
    # this extracts all events between *event_id* and *duration_after_event* seconds after the event
    df_event_after_gk = ed_help.get_time_around_special_event(df_events, row["id"], secs_after=duration_after_event)

    # Notice that with the *left_team* argument we can make sure that the goal kick is always displayed from 
    # the left hand side. This makes it way easier to look at.
    df = py_help.prepare_event_plot(df_event_after_gk, "posBeforeXMeters", "posBeforeYMeters", left_team=row["teamId"])
    fig = py_help.create_event_plot(df, "posBeforeXMeters", "posBeforeYMeters")
    fig.show()

Ok, while one might argue that in the third plot it is actually an accurate goal kick even though Langkamp passed the ball first, I think for the beginning it makes sense to define accuracy as described above.

Let's therefore compute for each goal kick which team made the first pass / shot / free kick after the goal kick by using yet another helper function.

In [81]:
df_goal_kick = ed_help.get_event_after(df_goal_kick, df_events, 
                                       considered_events=["Pass", "Free Kick", "Shot"],
                                       cols_return = {"teamId": "teamNextPass"})

We can now easily add whether a goal kick was accurate (i.e. the next pass was taken by the same team) or not

In [82]:
df_goal_kick["accurate"] = 1*(df_goal_kick["teamId"] == df_goal_kick["teamNextPass"])

Let's look at the share of accurate goal kicks

In [83]:
print(f"Share of accurate goal kicks: {df_goal_kick['accurate'].mean()*100:.1f}%")

Share of accurate goal kicks: 60.2%


We see that only 6 out of 10 goal kicks end up with the team having the goal kick... That does not seem much. 

Just out of interest: How is the accuracy rate for the different zones?

In [84]:
# we first extract all accurate goal kicks
df_acc_goal_kick = df_goal_kick[df_goal_kick["accurate"] == 1].copy()

# get the number of accurate goal kicks per zone
nb_acc_goal_kick, x, y = py_help.prepare_heatmap(df_acc_goal_kick, "posAfterXMeters", "posAfterYMeters", 8, 5, return_df=False)

# compute the share of accurate goal kicks per zone
share_acc_goal_kick = nb_acc_goal_kick / nb_goal_kick * 100

# let's set the zones with < 10 kicks ending there to 0
share_acc_goal_kick = np.where(nb_goal_kick < 10, 0, share_acc_goal_kick)

# display the information
# update the information to be shown when hovering over a zone
dict_info = {"Share accurate goal kick (%)": {"values": share_acc_goal_kick, "display_type": ".1f"},
             "Goal kicks": {"values": nb_goal_kick, "display_type": ".0f"}, 
             "Share goal kick (%)": {"values": share_goal_kick, "display_type": ".1f"},
             "Index x": {"values": x_index, "display_type": ".0f"},
             "Index y": {"values": y_index, "display_type": ".0f"}
            }

# create a heatmap with the share of accurate goal kicks for each zone
field = py_help.create_heatmap(x, y, share_acc_goal_kick, dict_info, title_name="Share of accurate goal kicks")
field.show()

# we also create a heatmap with the share of goal kicks ending in each zone (now it has the accuracy information on it
# when hovering over the zone)
field = py_help.create_heatmap(x, y, share_goal_kick, dict_info, title_name="Share of goal kicks")
field.show()

This is very interesting! In the zones where most of the goal kicks end, there is only a 40% chance of having the next pass. Whereas, when playing the ball short, the first pass almost always ends up in possession.

Ok, one might now argue that if I kick the ball further from my goal, there will be

1. Lower likelihood into getting a goal within the next seconds
2. Higher likelihood of scoring a goal within the next seconds

That might be some interesting analysis for the future... 

Let us, however, get back to our initial question and try to find the teams that did especially good / bad in goal kicks.

# 4. Goal kick team performance

Now that we have a clear definition for an accurate goal kick, we can finally start to look at the goal kick accuracy for each team

In [85]:
df_accuracy = df_goal_kick.groupby("teamId").agg(shareAccurate=("accurate","mean")).\
                           sort_values("shareAccurate").reset_index()
df_accuracy["shareAccurate"] *= 100
df_accuracy = pd.merge(df_accuracy, df_league, how="left", on="teamId")

# plot bar chart with number of goal kicks per team
fig = px.bar(df_accuracy, x="teamName", y="shareAccurate", 
             labels={"shareAccurate":"Accuracy in %", "teamName": "Team"},
             title="Goal kick accuracy")
# update the hover template to only show one decimal for the accuracy
fig.update_traces(hovertemplate= "Team: %{x}" + "<br>Accuracy (in %): %{y:.1f}")
fig.show()

Wow, there is a huge difference between accuracy for the different teams. While Bayern has an accuracy of almost 80%, Hertha, Stuttgart and Freiburg more often lose the ball than winning it after a goal kick. 

However, there might obviously also be a difference on how each team plays the goal kick. Let's therefore include the average length (in x-direction) of a goal kick to the chart

In [86]:
df_length_gk = df_goal_kick.groupby("teamId").agg(lengthGoalKick=("posAfterXMeters", "mean")).reset_index()
df_accuracy = pd.merge(df_accuracy, df_length_gk, how="left", on="teamId")

# plot bar chart with number of goal kicks per team
fig = px.bar(df_accuracy, 
             x="teamName", 
             y="shareAccurate", 
             color="lengthGoalKick",
             labels={"shareAccurate":"Accuracy in %", 
                     "teamName": "Team",
                     "lengthGoalKick": "Length of goal kick (in m)"},
             title="Goal kick accuracy")

fig.show()

Ok, there is definitely a clear difference on how the different teams play the goal kicks. While Bayern only has ~35m on average, Freiburg averages ~60m. Moreover, we can tell that not surprisingly teams with longer goal kicks have less accuracy.

There are some interesting examples, however:
1. Schalke seems to be relativly bad in goal kicks as the length is their length is rather short but they still only have an accuracy of ~59%
2. Wolfsburg on the other hand has long goal kicks with still an acceptable accuracy (~63%)

Before we dive into a "goodness" measure for goal kicks, let's first quickly visualize the different ways Bayern (shortest goal kicks) and Freiburg (longest goal kicks) play them

In [87]:
teams = {2444:"Bayern", 2453:"Freiburg"}

# loop over the different teams
for team_id in teams:
    df_goal_kick_team = df_goal_kick[df_goal_kick["teamId"] == team_id].copy()
    nb_goal_kick_team, x, y = py_help.prepare_heatmap(df_goal_kick_team, "posAfterXMeters", "posAfterYMeters", 6, 4)
    share_goal_kick_team = nb_goal_kick_team / len(df_goal_kick_team) * 100

    # set the hover information
    dict_info = {"Goal kicks": {"values": nb_goal_kick_team, "display_type": ".0f"}, 
                 "Share of goal kicks (%)": {"values": share_goal_kick_team, "display_type": ".1f"}
                }

    # create a heatmap with the share of accurate goal kicks for each zone
    field = py_help.create_heatmap(x, y, share_goal_kick_team, dict_info, title_name=f"Goal kicks: {teams[team_id]}")
    field.show()

Ok, this indeed does look completely different. Bayern likes to play from the back while Freiburg likes to put the goal kicks shortly behind the center line...

This itself is very interesting and if I was preparing a game I would definitely look deeper into how exactly the opponent likes to play the goal kicks. For now, however, let's go back to trying to measure how good each team is with respect to goal kicks.

We have already seen that just taking the accuracy does not make a whole lot of sense. Bayern does have a very good accuracy, however, they are also playing the "easy" balls. We therefore need to find a way to measure if a team is doing better than the other teams when playing the ball into different zones. What I mean is, that if a team has an accuracy of 60% when playing the ball into a zone A and the rest of the teams only has an accuracy of 55%, then the team is definitely doing well. 

Let's therefore more exactly that...

In [88]:
# we first compute the accuracy for each zone for all teams
df_zone_accuracy = df_goal_kick.groupby(["posAfterXMetersZone", "posAfterYMetersZone"]).\
                                agg(meanAccuracy=("accurate","mean"),
                                    nbGoalKicks=("accurate", "count")).reset_index()

# only keep zones with at least 10 goal kicks ending there
df_zone_accuracy = df_zone_accuracy[df_zone_accuracy["nbGoalKicks"] >= 10].copy()

# attach the mean accuracy for the zone in which the goal kick ended
df_goal_kick = pd.merge(df_goal_kick, df_zone_accuracy, on=["posAfterXMetersZone", "posAfterYMetersZone"])

Ok nice, now we attached for each goal kick the mean accuracy for the zone the ball landed in. Let's now compute the "goodness" of each goal kick by subtracting the mean accuracy of the zone from the actual accuracy. We can then get to the team goodness by averaging over the goodness of all goal kicks.

Example: Assume a team had 2 goal kicks (one accurate, one not) that both ended up in a zone with overall 60% accuracy. We would then first calculate the goodness for the 2 goal kicks which is 0.4 for the accurate one and -0.6 for the not accurate. The average over those leaves us with -0.1 which indicates that the team did worse than average. 

Notice that this is the same as taking the weighted average over the team's zone accuracies vs. the overall zone accuracies (I just find it easier to calculate this way :-)) Said in other words: The goodness of a team is the percentage point uplift a team has on goal kicks compared to the other teams.

In [89]:
# compute the goodness for each goal kick
df_goal_kick["goodness"] = df_goal_kick["accurate"] - df_goal_kick["meanAccuracy"]

# take the average for each team
df_goodness = df_goal_kick.groupby("teamId").agg(goodness=("goodness","mean")).sort_values("goodness").reset_index()
df_goodness["goodness"] = df_goodness["goodness"] * 100

# add the team name
df_goodness = pd.merge(df_goodness, df_league, on="teamId")

# plot bar chart with the goodness - change to color scale from red to green
fig = px.bar(df_goodness, 
             x="teamName", 
             y="goodness", 
             color="goodness",
             color_continuous_scale=["red", "green"],
             labels={"goodness":"Goodness", 
                     "teamName": "Team"},
             title="Team performance on goal kicks")

# update the hover template to only show one decimal for the goodness
fig.update_traces(hovertemplate= "Team: %{x}" + "<br>Goodness: %{y:.1f}")
fig.show()

What we see above is that Schalke is 4.7pp worse on their goal kicks than the average. Leipzig, however, is doing very well with being 3.6pp better than the average. If I was the coach of Schalke, Bremen or Hertha I would definitely have a closer look on how the goal kicks can be improved. And using the tools we have learned above we could now easily do that... :-)

As the last analysis let's quickly have a look on how the teams did defensively. This analysis is very similiar to the one above with the only difference that a team is doing well in defending if the opponent is doing bad in goal kicks

In [90]:
df_goal_kick["defendingTeamId"] = np.where(df_goal_kick["teamId"] == df_goal_kick["homeTeamId"], 
                                           df_goal_kick["awayTeamId"], df_goal_kick["homeTeamId"])

In [91]:
# take the average for each team
df_goodness = df_goal_kick.groupby("defendingTeamId").agg(goodness=("goodness","mean")).reset_index()

# A team is doing well in defending if the opponent is doing bad in their goal kick. We therefore multiply the goodness
# by -1 to get to the defending goodness
df_goodness["defensiveGoodness"] = df_goodness["goodness"] * -1 * 100

# sort by defensive goodness
df_goodness.sort_values("defensiveGoodness", inplace=True)

# add the team name
df_goodness = pd.merge(df_goodness.rename(columns={"defendingTeamId":"teamId"}), df_league, on="teamId")

# plot bar chart with the goodness - change to color scale from red to green
fig = px.bar(df_goodness, 
             x="teamName", 
             y="defensiveGoodness", 
             color="defensiveGoodness",
             color_continuous_scale=["red", "green"],
             labels={"defensiveGoodness":"Goodness in defense", 
                     "teamName": "Team"},
             title="Team performance on defending goal kicks")

# update the hover template to only show one decimal for the goodness
fig.update_traces(hovertemplate= "Team: %{x}" + "<br>Goodness in defense: %{y:.1f}")
fig.show()

Wow, there are two teams that are extremely good in defending goal kicks. This is, not surprisingly, Bayern but also Augsburg is doing very well! Given that Augsburg is also the third best when having a goal kick, we might name them the best team of the Bundesliga when it comes to goal kicks.

## Summary

I hope you found this notebook entertaining and had fun going through it. Let me quickly summarize what we learned:
- Usage of plotly bar plots and some of their features (changing color, changing hover information, ...)
- Drawing heat maps over a soccer field using the *create_heatmap* function
- Visualization of events using the *create_event_plot* function
- Animation of events using the *create_event_animation* function
- Leipzig is the best team in goal kicks and Bayern the best in defending them :-)
