# Football Data Analysis Project

This project involves analyzing football data to gain insights into team performance and scoring patterns. The data includes information about matches, teams, goals, and season details.

## Data Description

The dataset consists of the following columns:

- **Match_ID**: Unique identifier of the match.
- **Div**: Division of the league in which the match took place.
- **Season**: Season in which the match was played.
- **Date**: Date of the match.
- **HomeTeam**: Home team.
- **AwayTeam**: Away team.
- **FTHG**: Number of goals scored by the home team at the end of the match.
- **FTAG**: Number of goals scored by the away team at the end of the match.
- **FTR**: Final result of the match: 'H' (home team victory), 'A' (away team victory), or 'D' (draw).

## Data Analysis Steps

The analysis is performed using PySpark, a powerful data processing framework. Here are the key steps involved in the analysis:

1. **Calculating Home Team Points**: 
   - Filter the matches to select those from Division "D1" and seasons between 2007 and 2016.
   - Group the matches by HomeTeam and Season.
   - Calculate the sum of points based on the match result (H, D, or A).
   - Order the results by Season and points.

2. **Calculating Away Team Points**: 
   - Filter the matches to select those from Division "D1" and seasons between 2007 and 2016.
   - Group the matches by AwayTeam and Season.
   - Calculate the sum of points based on the match result (H, D, or A).
   - Order the results by Season and points.

3. **Points Calculation**: 
   - Combine the home team points and away team points using the union operation to create a unified points table.

4. **Classification of Teams**: 
   - Group the Points table by HomeTeam and Season.
   - Calculate the sum of points for each team in each season.
   - Order the results by Season and Points.

5. **Adding Position Column**: 
   - Define a window partitioned by Season and ordered by Points.
   - Add the "Position" column to the classification table using the row_number function over the window.
   - Display the classification table, showing the team, season, Points, and position.

6. **Selecting Winner Teams**: 
   - Add the "row_number" column to the classification table using the row_number function over the window.
   - Filter the classification table to select the teams with the highest Points in each season (where row_number is equal to 1).
   - Select the team, season, and Points columns.
   - Drop the "row_number" column.
   - Display the winners' table, showing the team, season, and Points.

7. **Scoring Goals Analysis**:
   - Calculate the HSG scored by each team in each season.
   - Group the matches by HomeTeam and Season.
   - Calculate the sum of FTHG (HSG).
   - Order the results by HSG in descending order.
   - Calculate the ASG scored by each team in each season.
   - Group the matches by AwayTeam and Season.
   - Calculate the sum of FTAG (ASG).
   - Order the results by ASG in descending order.

8. **Total Goals Calculation**:
   - Join the HSG and ASG tables on the team and season columns using an inner join.
   - Add the "Total Goals" column by summing the HSG and ASG.
   - Display the

 results, showing the team, HSG, ASG, season, and total goals.

Ps. All the outputs are limited to 40 rows, except for the dtClassification table, to facilitate visualization.

In [2]:
from pyspark.sql.functions import *
from pyspark.sql import Window

In [3]:
def legend():
    print('HSG = Home Scored Goals')
    print('ASG = Away Scored Goals')
    print('TSG = Total Scored Goals')
    print('HCG = Home Conceded Goals')
    print('ACG = Away Conceded Goals')
    print('TCG = Total Conceded Goals')
    print('GD = Difference of Goals')
    return 0


In [4]:
# Load Data Function
def loadDf(fileName):
    dt = spark.read.format('delta').options(header='true').load(fileName)
    return dt


In [5]:
# Matches

dtMatches = loadDf("dbfs:/user/hive/warehouse/matches")

dtMatches.show(n=40)

# Match_ID: Unique identifier of the match.
# Div: Division of the league in which the game took place.
# Season: Season in which the game was held.
# Date: Date of the game.
# HomeTeam: Home team.
# AwayTeam: Away team.
# FTHG: Number of goals scored by the home team (HomeTeam) at the end of the match.
# FTAG: Number of goals scored by the away team (AwayTeam) at the end of the match.
# FTR: Final result of the match. It can be 'H' (home team victory), 'A' (away team victory), or 'D' (draw).



+--------+---+------+----------+------------------+--------------+----+----+---+
|Match_ID|Div|Season|      Date|          HomeTeam|      AwayTeam|FTHG|FTAG|FTR|
+--------+---+------+----------+------------------+--------------+----+----+---+
|       1| D2|  2009|2010-04-04|        Oberhausen|Kaiserslautern|   2|   1|  H|
|       2| D2|  2009|2009-11-01|       Munich 1860|Kaiserslautern|   0|   1|  A|
|       3| D2|  2009|2009-10-04|     Frankfurt FSV|Kaiserslautern|   1|   1|  D|
|       4| D2|  2009|2010-02-21|     Frankfurt FSV|     Karlsruhe|   2|   1|  H|
|       5| D2|  2009|2009-12-06|             Ahlen|     Karlsruhe|   1|   3|  A|
|       6| D2|  2009|2010-04-03|      Union Berlin|     Karlsruhe|   1|   1|  D|
|       7| D2|  2009|2009-08-14|         Paderborn|     Karlsruhe|   2|   0|  H|
|       8| D2|  2009|2010-03-08|         Bielefeld|     Karlsruhe|   0|   1|  A|
|       9| D2|  2009|2009-09-26|    Kaiserslautern|     Karlsruhe|   2|   0|  H|
|      10| D2|  2009|2009-11

In [6]:
# Filter matches where "Div" is "D1" and "Season" is between 2007 and 2016
dtHomePoints = (
    dtMatches
    .filter((col("Div") == "D1") & col("Season").between(2007, 2016))
    .groupBy("HomeTeam", "Season")
    .agg(
        sum(when(col("FTR") == "H", 3).when(col("FTR") == "D", 1).otherwise(0)).alias("Points")
    )
    .orderBy("Season", desc("Points"))
    .select("HomeTeam", "Season", "Points")
)

# Display dtHomePoints table
dtHomePoints.show(n=40)



+-------------+------+------+
|     HomeTeam|Season|Points|
+-------------+------+------+
|Bayern Munich|  2007|    41|
|Werder Bremen|  2007|    39|
|    Stuttgart|  2007|    38|
|   Schalke 04|  2007|    34|
|      Hamburg|  2007|    32|
|   Leverkusen|  2007|    31|
|       Hertha|  2007|    30|
|     Hannover|  2007|    29|
|Ein Frankfurt|  2007|    28|
|    Wolfsburg|  2007|    27|
|      Cottbus|  2007|    26|
|     Dortmund|  2007|    26|
|    Bielefeld|  2007|    25|
|       Bochum|  2007|    24|
|    Karlsruhe|  2007|    24|
|     Nurnberg|  2007|    22|
|Hansa Rostock|  2007|    19|
|     Duisburg|  2007|    12|
|    Wolfsburg|  2008|    49|
|      Hamburg|  2008|    41|
|    Stuttgart|  2008|    39|
|       Hertha|  2008|    39|
|Bayern Munich|  2008|    38|
|Werder Bremen|  2008|    34|
|     Dortmund|  2008|    33|
|   Hoffenheim|  2008|    32|
|     Hannover|  2008|    31|
|   Schalke 04|  2008|    30|
|   Leverkusen|  2008|    23|
|    Karlsruhe|  2008|    19|
|   M'glad

In [7]:
# Filter matches where "Div" is "D1" and "Season" is between 2007 and 2016
dtAwayPoints = (
    dtMatches
    .filter((col("Div") == "D1") & col("Season").between(2007, 2016))
    .groupBy("AwayTeam", "Season")
    .agg(
        sum(when(col("FTR") == "A", 3).when(col("FTR") == "D", 1).otherwise(0)).alias("Points")
    )
    .orderBy("Season", desc("Points"))
    .select("AwayTeam", "Season", "Points")
)

# Display dtAwayPoints table
dtAwayPoints.show(n=40)



+-------------+------+------+
|     AwayTeam|Season|Points|
+-------------+------+------+
|Bayern Munich|  2007|    35|
|   Schalke 04|  2007|    30|
|Werder Bremen|  2007|    27|
|    Wolfsburg|  2007|    27|
|      Hamburg|  2007|    22|
|   Leverkusen|  2007|    20|
|     Hannover|  2007|    20|
|    Karlsruhe|  2007|    19|
|Ein Frankfurt|  2007|    18|
|       Bochum|  2007|    17|
|     Duisburg|  2007|    17|
|     Dortmund|  2007|    14|
|    Stuttgart|  2007|    14|
|       Hertha|  2007|    14|
|Hansa Rostock|  2007|    11|
|      Cottbus|  2007|    10|
|    Bielefeld|  2007|     9|
|     Nurnberg|  2007|     9|
|Bayern Munich|  2008|    29|
|     Dortmund|  2008|    26|
|   Leverkusen|  2008|    26|
|    Stuttgart|  2008|    25|
|       Hertha|  2008|    24|
|   Hoffenheim|  2008|    23|
|      FC Koln|  2008|    22|
|      Hamburg|  2008|    20|
|   Schalke 04|  2008|    20|
|    Wolfsburg|  2008|    20|
|    Bielefeld|  2008|    14|
|Ein Frankfurt|  2008|    14|
|       Bo

In [8]:
# Union of the all points made by HomeTeam and AwayTeam
dtTotalPoints = dtHomePoints.unionAll(dtAwayPoints)

# Group by "HomeTeam" and "Season", aggregate sum of "Points" as "Points"
# Order by "Season" and "Points" in descending order
# Select columns "HomeTeam" as "Team", "Season", and "Points"

dtClassification = (
    dtTotalPoints
    .groupBy("HomeTeam", "Season")
    .agg(sum("Points").alias("Points"))
    .orderBy("Season", desc("Points"))
    .select(col("HomeTeam").alias("Team"), "Season", "Points")
)

# Display dtClassification table
dtClassification.show(n=40)

# Defines and order the partition window by "Points" and "Season"
window = Window.partitionBy("Season").orderBy(col("Points").desc())

# Adds the column "row_number" to dtClassification
dtWinner = dtClassification.withColumn("row_number", row_number().over(window))

# Filters the Winners (where the registry of the line is equal to one)
# Selects columns "Team", "Season", and "Points"
# Drops the column "row_number"

dtWinner = (
    dtClassification
    .withColumn("row_number", row_number().over(window))
    .filter(col("row_number") == 1)
    .select("Team", "Season", "Points")
    .drop("row_number")
)

# Display dtWinner table without truncating column values
dtWinner.display(truncate=False)


Team,Season,Points
Bayern Munich,2007,76
Wolfsburg,2008,69
Bayern Munich,2009,70
Dortmund,2010,75
Dortmund,2011,81
Bayern Munich,2012,91
Bayern Munich,2013,90
Bayern Munich,2014,79
Bayern Munich,2015,88
Bayern Munich,2016,82


In [9]:
# Filter dtMatches for Div "D1", group by HomeTeam and Season, and calculate the sum of FTHG as "HSG"

dtScoredGoalsHome = (
    dtMatches
    .filter(col("Div") == "D1")
    .groupBy("HomeTeam", "Season")
    .agg(
        sum(col("FTHG")).alias("HSG"),
    )
    .select(
        col("HomeTeam").alias("Team"),  # Rename HomeTeam column to Team
        col("HSG"),
        col("Season")
    )
    .orderBy(desc("HSG"))  # Order by HSG in descending order
)

# Display dtScoredGoalsHome table
dtScoredGoalsHome.show(n=40)


+-------------+---+------+
|         Team|HSG|Season|
+-------------+---+------+
|Bayern Munich| 56|  2012|
|Bayern Munich| 55|  2016|
|    Wolfsburg| 51|  2008|
|Bayern Munich| 51|  2015|
|Werder Bremen| 50|  2005|
|Bayern Munich| 50|  1999|
|Bayern Munich| 49|  2011|
|     Dortmund| 49|  2015|
|Werder Bremen| 48|  2007|
|Bayern Munich| 48|  2010|
|Bayern Munich| 48|  2013|
|Bayern Munich| 48|  1998|
|   Schalke 04| 47|  2011|
|   Leverkusen| 47|  1996|
|   Leverkusen| 46|  2001|
|    Stuttgart| 46|  1996|
|Bayern Munich| 46|  2014|
|Bayern Munich| 45|  2008|
|     Dortmund| 45|  1995|
|Werder Bremen| 44|  1994|
|   Leverkusen| 44|  1999|
|Bayern Munich| 44|  1993|
|     Dortmund| 44|  2011|
|Bayern Munich| 44|  2004|
|   Hoffenheim| 43|  2013|
|Bayern Munich| 43|  2003|
|Werder Bremen| 43|  2008|
|   Leverkusen| 43|  1997|
|   Leverkusen| 43|  2003|
|Werder Bremen| 42|  2003|
|   Leverkusen| 42|  2004|
|   M'gladbach| 42|  2015|
|Bayern Munich| 42|  2001|
|Bayern Munich| 42|  2005|
|

In [10]:
# Filter dtMatches for Div equal to "D1", group by "AwayTeam" and "Season", aggregate sum of "FTAG" as "ASG"
# Select columns "AwayTeam" as "Team", "ASG", and alias "Season" as "Season"
# Order by "ASG" in descending order

dtScoredGoalsAway = (
    dtMatches
    .filter(col("Div") == "D1")
    .groupBy("AwayTeam", "Season")
    .agg(
        sum(col("FTAG")).alias("ASG"),
    )
    .select(
        col("AwayTeam").alias("Team"),
        col("ASG"),
        col("Season").alias("Season")
    )
    .orderBy(desc("ASG"))
)

# Display dtScoredGoalsAway table
dtScoredGoalsAway.show(n=40)


+-------------+---+------+
|         Team|ASG|Season|
+-------------+---+------+
|Bayern Munich| 46|  2013|
|Werder Bremen| 43|  2006|
|Bayern Munich| 42|  2012|
|     Dortmund| 41|  2012|
|     Dortmund| 39|  2013|
|Werder Bremen| 37|  2003|
|Werder Bremen| 37|  2009|
|     Dortmund| 36|  2011|
|Werder Bremen| 35|  2004|
|Bayern Munich| 34|  2016|
|Bayern Munich| 34|  2014|
|   Leverkusen| 34|  2005|
|    Wolfsburg| 34|  2014|
|Bayern Munich| 33|  2010|
|   Hoffenheim| 33|  2008|
|Bayern Munich| 33|  2009|
|Bayern Munich| 33|  1997|
|     Dortmund| 33|  2015|
|Bayern Munich| 33|  2002|
|Werder Bremen| 33|  2016|
|     Dortmund| 32|  1994|
|    Stuttgart| 32|  1996|
|     Dortmund| 32|  2010|
|     Dortmund| 31|  2016|
|   Leverkusen| 31|  2001|
|   Leverkusen| 31|  2010|
|Bayern Munich| 31|  1996|
|    Stuttgart| 31|  2006|
|Bayern Munich| 31|  1995|
|   RB Leipzig| 31|  2016|
|   Leverkusen| 31|  1994|
|     Dortmund| 31|  2001|
|     Dortmund| 31|  1995|
|      Hamburg| 31|  2009|
|

In [11]:
# Filter dtScoredGoalsHome for Season not equal to 2017, select columns "Team", "HSG", "Season"
# Join with dtScoredGoalsAway on columns "Team" and "Season" using inner join
# Add a new column "Total Goals" by summing "HSG" and "ASG"

dtScoredGoals = (
                dtScoredGoalsHome
                .filter(col("Season") != 2017)
                .select("Team", "HSG", "Season")
                .join(dtScoredGoalsAway, ["Team", "Season"], "inner")
                .withColumn("TSG", col("HSG") + col("ASG"))
        )

# Display dtScoredGoals table
dtScoredGoals.show(n=40)


+--------------+------+---+---+---+
|          Team|Season|HSG|ASG|TSG|
+--------------+------+---+---+---+
|    M'gladbach|  1996| 34| 12| 46|
| Bayern Munich|  2011| 49| 28| 77|
| Ein Frankfurt|  1995| 29| 14| 43|
|    M'gladbach|  2013| 38| 21| 59|
|       FC Koln|  1994| 31| 23| 54|
| Werder Bremen|  2003| 42| 37| 79|
|      Augsburg|  2015| 18| 24| 42|
|    M'gladbach|  1994| 37| 29| 66|
|    M'gladbach|  2012| 27| 18| 45|
|     Bielefeld|  2007| 21| 14| 35|
| Bayern Munich|  1994| 32| 23| 55|
|       Dresden|  1994| 17| 16| 33|
|    Leverkusen|  2001| 46| 31| 77|
| Ein Frankfurt|  2003| 25| 11| 36|
| Werder Bremen|  2007| 48| 27| 75|
|Kaiserslautern|  1994| 32| 26| 58|
| Ein Frankfurt|  1998| 26| 18| 44|
| Werder Bremen|  2011| 31| 18| 49|
|      Freiburg|  2014| 21| 15| 36|
|       Hamburg|  2007| 30| 17| 47|
| Werder Bremen|  1994| 44| 26| 70|
|    Schalke 04|  1995| 28| 17| 45|
|      Freiburg|  2011| 24| 21| 45|
|       FC Koln|  1997| 34| 15| 49|
|       FC Koln|  2016| 29| 

In [12]:
# Filter dtMatches for Div "D1", group by HomeTeam and Season, and calculate the sum of FTHG as "HSG"
dtConcededGoalsHome = (
    dtMatches
    .filter(col("Div") == "D1")
    .groupBy("HomeTeam", "Season")
    .agg(
        sum(col("FTAG")).alias("HCG"),
    )
    .select(
        col("HomeTeam").alias("Team"),  # Rename HomeTeam column to Team
        col("HCG"),
        col("Season")
    )
    .orderBy(desc("HCG"))  # Order by HSG in descending order
)

# Display dtScoredGoalsHome table
dtConcededGoalsHome.show(n=40)

+--------------+---+------+
|          Team|HCG|Season|
+--------------+---+------+
|     Wolfsburg| 39|  2009|
|        Aachen| 37|  2006|
|Greuther Furth| 36|  2012|
|      St Pauli| 35|  2010|
|        Bochum| 35|  2009|
|    M'gladbach| 35|  1998|
|       Hamburg| 34|  2013|
|Kaiserslautern| 33|  2005|
|      Hannover| 33|  2009|
|   Munich 1860| 33|  2001|
|      Hannover| 33|  2002|
|      Nurnberg| 32|  2013|
|     Stuttgart| 32|  2015|
|     Uerdingen| 32|  1995|
|    Hoffenheim| 31|  2013|
|      Freiburg| 31|  2004|
|    M'gladbach| 31|  2010|
| Hansa Rostock| 31|  2004|
|     Paderborn| 31|  2014|
|       FC Koln| 30|  1996|
| Ein Frankfurt| 30|  2006|
|    Ingolstadt| 30|  2016|
|      Freiburg| 30|  2013|
|    Hoffenheim| 30|  2012|
|       FC Koln| 30|  1997|
| Werder Bremen| 30|  2013|
| Werder Bremen| 30|  2015|
|      Hannover| 30|  2015|
|        Bochum| 30|  2006|
| Werder Bremen| 30|  2012|
|     Wolfsburg| 30|  2012|
|       Cottbus| 30|  2002|
|        Hertha| 29|

In [13]:
# Filter dtMatches for Div equal to "D1", group by "AwayTeam" and "Season", aggregate sum of "FTAG" as "ASG"
# Select columns "AwayTeam" as "Team", "ASG", and alias "Season" as "Season"
# Order by "ASG" in descending order

dtConcededGoalsAway = (
    dtMatches
    .filter(col("Div") == "D1")
    .groupBy("AwayTeam", "Season")
    .agg(
        sum(col("FTHG")).alias("ACG"),
    )
    .select(
        col("AwayTeam").alias("Team"),
        col("ACG"),
        col("Season").alias("Season")
    )
    .orderBy(desc("ACG"))
)

# Display dtScoredGoalsAway table
dtConcededGoalsAway.show(n=40)

+--------------+---+------+
|          Team|ACG|Season|
+--------------+---+------+
| Ein Frankfurt| 46|  2000|
|       FC Koln| 46|  2011|
|      Duisburg| 46|  1999|
|      Freiburg| 44|  2004|
|      Hannover| 44|  2008|
|        Bochum| 44|  1994|
|    M'gladbach| 44|  1998|
|  Wattenscheid| 43|  1993|
|     Stuttgart| 43|  2015|
|Kaiserslautern| 43|  2003|
|      Freiburg| 43|  1996|
|      St Pauli| 42|  2001|
|       Dresden| 42|  1994|
|      Freiburg| 42|  2003|
|       FC Koln| 42|  2005|
|     Bielefeld| 42|  2007|
|       Hamburg| 41|  2013|
|       FC Koln| 41|  2010|
| Werder Bremen| 41|  2014|
|      Freiburg| 41|  2011|
|      St Pauli| 41|  1996|
|    RB Leipzig| 41|  1993|
| Ein Frankfurt| 40|  1995|
|      Duisburg| 40|  2005|
|       FC Koln| 40|  2001|
|     Stuttgart| 40|  1994|
| Werder Bremen| 40|  2016|
|     Wolfsburg| 39|  2005|
|  Unterhaching| 39|  2000|
|        Bochum| 39|  2000|
|    Hoffenheim| 39|  2013|
|      Hannover| 39|  2012|
|        Bochum| 39|

In [14]:
dtConcededGoals = (
    dtConcededGoalsHome
    .filter(col("Season") != 2017)
    .join(dtConcededGoalsAway, ["Team", "Season"], "inner")
    .withColumn("TCG", col("HCG") + col("ACG"))
    .select("Team", "HCG", "ACG","TCG","Season")
)
# Display dtScoredGoals table
dtConcededGoals.show(n=40)

+--------------+---+---+---+------+
|          Team|HCG|ACG|TCG|Season|
+--------------+---+---+---+------+
|    M'gladbach| 17| 31| 48|  1996|
| Bayern Munich|  6| 16| 22|  2011|
| Ein Frankfurt| 28| 40| 68|  1995|
|    M'gladbach| 17| 26| 43|  2013|
|       FC Koln| 28| 26| 54|  1994|
| Werder Bremen| 21| 17| 38|  2003|
|      Augsburg| 27| 25| 52|  2015|
|    M'gladbach| 16| 25| 41|  1994|
|    M'gladbach| 20| 29| 49|  2012|
|     Bielefeld| 18| 42| 60|  2007|
|       Dresden| 26| 42| 68|  1994|
| Bayern Munich| 19| 22| 41|  1994|
|    Leverkusen| 13| 25| 38|  2001|
| Ein Frankfurt| 24| 29| 53|  2003|
| Werder Bremen| 19| 26| 45|  2007|
|Kaiserslautern| 16| 25| 41|  1994|
| Ein Frankfurt| 21| 33| 54|  1998|
| Werder Bremen| 23| 35| 58|  2011|
|      Freiburg| 22| 25| 47|  2014|
|       Hamburg| 11| 15| 26|  2007|
| Werder Bremen| 16| 23| 39|  1994|
|    Schalke 04| 16| 20| 36|  1995|
|      Freiburg| 20| 41| 61|  2011|
|       FC Koln| 30| 34| 64|  1997|
|       FC Koln| 17| 25| 42|

In [15]:
dtGdGoals = (
    dtScoredGoals
    .filter(col("Season") != 2017)
    .join(dtConcededGoals, ["Team", "Season"], "inner")
    .withColumn("GD", col("TSG") - col("TCG"))
    .select("Team","GD","Season")
)
# Display dtScoredGoals table
dtGdGoals.show(n=40)

+--------------+---+------+
|          Team| GD|Season|
+--------------+---+------+
|    M'gladbach| -2|  1996|
| Bayern Munich| 55|  2011|
| Ein Frankfurt|-25|  1995|
|    M'gladbach| 16|  2013|
|       FC Koln|  0|  1994|
| Werder Bremen| 41|  2003|
|      Augsburg|-10|  2015|
|    M'gladbach| 25|  1994|
|    M'gladbach| -4|  2012|
|     Bielefeld|-25|  2007|
| Bayern Munich| 14|  1994|
|       Dresden|-35|  1994|
|    Leverkusen| 39|  2001|
| Ein Frankfurt|-17|  2003|
| Werder Bremen| 30|  2007|
|Kaiserslautern| 17|  1994|
| Ein Frankfurt|-10|  1998|
| Werder Bremen| -9|  2011|
|      Freiburg|-11|  2014|
|       Hamburg| 21|  2007|
| Werder Bremen| 31|  1994|
|    Schalke 04|  9|  1995|
|      Freiburg|-16|  2011|
|       FC Koln|-15|  1997|
|       FC Koln|  9|  2016|
|  Wattenscheid|-22|  1993|
|      Hannover|-14|  2003|
|      Nurnberg|-33|  2013|
| Bayern Munich| 34|  1996|
|      Dortmund| 24|  2002|
|    Hoffenheim|  2|  2013|
|    Leverkusen| 19|  2013|
| Bayern Munich| 67|

In [16]:
# Defines and orders the partition window by the Points and Season
window = Window.partitionBy("Season").orderBy(col("Points").desc())

# Adds the Column "Position" to dtClassification
dtClassification = dtClassification.withColumn("Position", row_number().over(window))

# Joins dtScoredGoals with dtClassification and orders by Season and Points in descending order
from pyspark.sql.functions import desc

dtFinalTable = dtScoredGoals.alias("sg")\
                                        .join(dtClassification.alias("c"), ["Team", "Season"])\
                                        .join(dtWinner.alias("w"), ["Team", "Season"])\
                                        .join(dtScoredGoals.alias("sg2"), ["Team", "Season"])\
                                        .join(dtConcededGoals.alias("cg"), ["Team", "Season"])\
                                        .join(dtGdGoals.alias("gd"), ["Team", "Season"])\
                                        .orderBy("w.Season","c.Position")\
                                        .select("w.Season", "c.Position","w.Team", "c.Points",
                                                "sg.HSG", "sg.ASG", "sg.TSG",
                                                "cg.HCG", "cg.ACG", "cg.TCG",
                                                "gd.GD")


# # Selects all columns from dtFinalTable except 'Position'
# columns = [c for c in dtFinalTable.columns if c != 'Position']


# # Reorders the columns with 'Position' as the first column
# dtFinalTable = dtFinalTable.select(col('Position'), *columns)

# columns = [c for c in dtFinalTable.columns if c != 'Points']

# dtFinalTable = dtFinalTable.select(*columns,col('Points'))


# Displays the final table

legend()
dtFinalTable.show(n=40)


HSG = Home Scored Goals
ASG = Away Scored Goals
TSG = Total Scored Goals
HCG = Home Conceded Goals
ACG = Away Conceded Goals
TCG = Total Conceded Goals
GD = Difference of Goals
+------+--------+-------------+------+---+---+---+---+---+---+---+
|Season|Position|         Team|Points|HSG|ASG|TSG|HCG|ACG|TCG| GD|
+------+--------+-------------+------+---+---+---+---+---+---+---+
|  2007|       1|Bayern Munich|    76| 41| 27| 68|  8| 13| 21| 47|
|  2007|       2|Werder Bremen|    66| 48| 27| 75| 19| 26| 45| 30|
|  2007|       3|   Schalke 04|    64| 29| 26| 55| 13| 19| 32| 23|
|  2007|       4|      Hamburg|    54| 30| 17| 47| 11| 15| 26| 21|
|  2007|       5|    Wolfsburg|    54| 28| 30| 58| 17| 29| 46| 12|
|  2007|       6|    Stuttgart|    52| 39| 18| 57| 19| 38| 57|  0|
|  2007|       7|   Leverkusen|    51| 32| 25| 57| 13| 27| 40| 17|
|  2007|       8|     Hannover|    49| 32| 22| 54| 27| 29| 56| -2|
|  2007|       9|Ein Frankfurt|    46| 24| 19| 43| 24| 26| 50| -7|
|  2007|      10|  