# **05 - Football Dataset**
___

#### **Autor/es:**
- Peitsch, Pablo [[1]](#1)

**Fecha:** 2022-09-06

## **DATASETS**
___

### **1. Data exploration**

Luego de importar los paquetes necesarios, se exploran los datasets para su posterior análisis.

In [None]:
# Library imports
library(tidyverse)
# Datasets folder
data_folder <- "datasets"
# Files list of datasets folder
list_files <- list.files(data_folder)
# Number of datasets
n <- length(list_files)
# Se cambia tamaño del plot
options(repr.plot.width=25, repr.plot.height=12)

In [None]:
# Name list of datasets
d_names <- list()
f_names <- list()
for (i in 1:n){
    f_names[i] <- list_files[i]
    temp_name <- list_files[i]
    temp_name <- tools::file_path_sans_ext(temp_name)
    d_names[i] <- temp_name
    temp_df <- read.csv(file.path(data_folder, f_names[i]), dec=",")
    assign(d_names[[i]], temp_df)
}

In [None]:
# Structure of datasets, data variables and typesa cell in jupyter notebook, we get a nice table with the dataa cell in jupyter notebook, we get a nice table with the data
for (i in 1:n){
    print(d_names[[i]])
    str(get(d_names[[i]]))
    print(colnames(get(d_names[[i]])))
    print("----------")
}

### **2. Foreign and principal keys**

Se evalúan las claves principales y foráneas en cada dataset; luego, se muestran las imágenes generadas.

In [None]:
# Datasets folder
keys_folder <- "keys"
# Files list of datasets folder
list_pdf <- list.files(keys_folder)
list_pdf

<img src="keys/appearances.jpg" alt="appearances"  width="900" height="600">
<img src="keys/players.jpg" alt="players"  width="900" height="600">
<img src="keys/teams.jpg" alt="teams"  width="900" height="600">

## **RELATIONAL DATA**
___

### **1. Los 10 equipos más goleadores**

In [None]:
df_team <- inner_join(teams, teamstats, by="teamID")

In [None]:
str(df_team)

In [None]:
team_goals <- group_by(df_team, teamID) %>% mutate(total_goals=sum(goals)) %>% arrange(desc(total_goals)) %>%
    subset(select=-c(yellowCards, redCards, result, season, gameID, date, location, goals, fouls, corners, ppda))

In [None]:
best_10_goals <- distinct(team_goals, teamID, .keep_all=TRUE)
best_10_goals <- head(best_10_goals, 10) %>% arrange(desc(total_goals))

In [None]:
best_10_goals

### **2. Gráfica de los 10 equipos más goleadores**

In [None]:
ggplot(best_10_goals, aes(x=reorder(name, total_goals), y=total_goals)) + 
    geom_bar(stat = "identity") +
    geom_label(aes(label=total_goals),
                #vjust=-0.9, 
                color="dark orange", 
                hjust="center", 
                angle=0, 
                size=6.0,
                fontface="bold"
            ) +
    coord_cartesian(ylim=c(450, 720)) +
    theme(axis.line = element_line(colour = "black", size = 1), text = element_text(size = 24)) +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1), plot.title = element_text(color="Black", size=28, face="bold")) +
    labs(x="Teams", y="Goals per year, 2015", title="Top 10 scoring teams")

### **3. Gráfica de los 10 equipos con mayor número de tiros al arco**

In [None]:
best_10_shots <- group_by(team_goals, teamID) %>% mutate(total_shots=sum(shots)) %>%
    summarise(name, total_goals, total_shots, total_deep=sum(deep), total_OnTarget=sum(shotsOnTarget)) %>%
    arrange(desc(total_shots)) %>% distinct(teamID, .keep_all=TRUE)

In [None]:
best_10_shots <- head(best_10_shots, 10)
best_10_shots

In [None]:
ggplot(best_10_shots, aes(x=reorder(name, total_shots), y=total_shots, fill=total_OnTarget)) + 
    geom_bar(stat = "identity") +
    geom_label(aes(label=total_goals),
                color="white", 
                hjust="center", 
                angle=0, 
                size=6.0,
                fontface="bold"
            ) +
    coord_cartesian(ylim=c(3900, 4700)) +
    theme(axis.line = element_line(colour = "black", size = 1), text = element_text(size = 24)) +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1), plot.title = element_text(color="Dark Blue", size=28, face="bold")) +
    labs(x="Teams", y="Goals per year, 2015", fill="On Target Shots", title="Top 10 teams with the most shots on goal")

### **4.  Gráfica de los 10 equipos más goleadores y con mayor número de tiros al arco**

In [None]:
best_10 <- group_by(team_goals, teamID, shots) %>% mutate(total_shots=sum(shots)) %>% group_by(teamID, total_goals, total_shots) 
    summarise(name, total_goals, total_shots, total_deep=sum(deep), total_OnTarget=sum(shotsOnTarget)) %>%
    arrange(desc(total_shots)) %>% distinct(teamID, .keep_all=TRUE)

In [None]:
ggplot(best_10, aes(x=reorder(name, total_shots))) + 
    geom_point(aes(y=total_shots), size=5, color="dark green") +
    geom_label(aes(label=total_shots, y=total_shots),
                vjust="bottom",
                color="dark green", 
                hjust="left", 
                angle=0, 
                size=6.0,
                fontface="bold",
                nudge_x=0.05,
            ) +
    geom_point(aes(y=total_goals), size=5, color="dark orange") +
    geom_label(aes(label=total_goals, y=total_goals),
                vjust="bottom",
                color="dark orange", 
                hjust="left", 
                angle=0, 
                size=6.0,
                fontface="bold",
                nudge_x=0.05,
            ) +
    theme(axis.line = element_line(colour = "black", size = 1), text = element_text(size = 24)) +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1), plot.title = element_text(color="Dark Blue", size=28, face="bold")) +
    labs(x="Teams", y="Goals per year, 2015", title="Top 10 scoring teams and with the most shots on goal")

### **Datos:**

#### <a id="1"></a> Pablo Peitsch
#### Mis repositorios de Github: <a href="https://github.com/PPeitsch">@PPeitsch</a>
#### Los datasets fueron adquiridos de: <a href="https://www.kaggle.com/datasets/technika148/football-database">Football dataset</a>
#### Información sobre la variable Xstats, Expected Goals (xG):
- <a href="https://understat.com/">understat.com</a>
- <a href="https://onefootball.com/es/noticias/que-son-los-expected-goals-xg-30199741">onefootball.com</a>
