### Prepping Data Challenge: The Super League (week 16)
This week we want to analyse the orders that customers have made over a period of time in our restaurant Serendipia. In order to identify how much money we earn each day of the week and also to discover who our top customer is. We are going to be using calculations, pivots and aggregations so lots of the fundamental techniques that are used within data prep! 

#### Requirement:
 - Input the data
 - Calculate the Total Points for each team. The points are as follows: 
   - Win - 3 Points
   - Draw - 1 Point
   - Lose - 0 Points
 - Calculate the goal difference for each team. Goal difference is the difference between goals scored and goals conceded. 
 - Calculate the current rank/position of each team. This is based on Total Points (high to low) and in a case of a tie then Goal Difference (high to low).
 - The current league table is our first output.

 - Assuming that the 'Big 6' didn't play any games this season, recalculate the league table.
 - After removing the 6 clubs, how has the position changed for the remaining clubs?
 - The updated league table is the second output.

### Input the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('WK16-Input.csv', parse_dates=['Date'])\
       .dropna(subset=['Result'])

In [3]:
df.head()

Unnamed: 0,Round Number,Date,Location,Home Team,Away Team,Result
0,1,2020-12-09 12:30:00,Craven Cottage,Fulham,Arsenal,0 - 3
1,1,2020-12-09 15:00:00,Selhurst Park,Crystal Palace,Southampton,1 - 0
2,1,2020-12-09 17:30:00,Anfield,Liverpool,Leeds,4 - 3
3,1,2020-12-09 20:00:00,London Stadium,West Ham,Newcastle,0 - 2
4,1,2020-09-13 14:00:00,The Hawthorns,West Brom,Leicester,0 - 3


In [4]:
# identify games involving the big 6
# sorted(concat([df['Home Team'], df['Away Team']]).unique())
big6 = ['Arsenal', 'Chelsea', 'Liverpool', 'Man Utd', 'Man City', 'Spur']
df['big6'] = np.where(df['Home Team'].isin(big6) | df['Away Team'].isin(big6), 1, 0)

In [5]:
#split the Result to Match the Teams
df[['Home Goal','Away Goal']] = df['Result'].str.split('-', expand=True).astype(int)

In [6]:
df['outcome'] = np.where(df['Away Goal'] == df['Home Goal'], 'Draw',
                 np.where(df['Away Goal'] > df['Home Goal'], 'Away', 'Home'))

In [7]:
#combine the teams
value_vars = ['Home Team', 'Away Team']
df = df.melt(id_vars=[c for c in df.columns if c not in value_vars], value_name='Team')

### Calculate the Total Points for each team

### Calculate the goal difference for each team. Goal difference is the difference between goals scored and goals conceded. 

In [8]:
# calculate the goals scored or conceded and total points for ech team
df['Goals Scored'] = np.where(df['variable'] == 'Home Team', df['Home Goal'],
                             df['Away Goal'])

df['Goals Conceded'] = np.where(df['variable'] == 'Home Team', df['Away Goal'],
                               df['Home Goal'])

df['Total Points'] = np.where(df['Goals Scored'] == df['Goals Conceded'], 1,
                         np.where(df['Goals Scored'] > df['Goals Conceded'], 3, 0))

In [9]:
df['Goal Difference'] = df['Goals Scored'] - df['Goals Conceded']

In [10]:
df['Total Games Played'] = 1
Total_cols = ['Goal Difference', 'Total Games Played','Total Points']

ranks = df.groupby('Team')[Total_cols].sum().reset_index()

### Calculate the current rank/position of each team. This is based on Total Points (high to low) and in a case of a tie then Goal Difference (high to low).

In [11]:
cols = ['Total Points','Goal Difference']

ranks['Position'] = ranks[cols].apply(tuple, axis=1)\
                                   .rank(method='min', ascending=False).astype(int)

 - Assuming that the 'Big 6' didn't play any games this season, recalculate the league table.
 - After removing the 6 clubs, how has the position changed for the remaining clubs?
 - The updated league table is the second output.

In [12]:
ranks2 = df[df['big6'] == 0].groupby('Team')[Total_cols].sum().reset_index()

In [13]:
ranks2['Position'] = ranks2[cols].apply(tuple, axis=1)\
                                   .rank(method='min', ascending=False).astype(int)

In [14]:
# find the change in position
ranks2 = pd.merge(ranks2, ranks[['Team', 'Position']], on='Team', how='left', suffixes=['', '_1'])
ranks2['Position Change'] = ranks2['Position_1'] - ranks2['Position']
ranks2.drop(columns=['Position_1'], inplace=True)

In [15]:
ranks.head()

Unnamed: 0,Team,Goal Difference,Total Games Played,Total Points,Position
0,Arsenal,8,32,46,9
1,Aston Villa,10,30,44,11
2,Brighton,-5,31,33,16
3,Burnley,-19,32,33,17
4,Chelsea,19,31,54,5


In [16]:
ranks2.head()

Unnamed: 0,Team,Goal Difference,Total Games Played,Total Points,Position,Position Change
0,Aston Villa,5,23,34,6,5
1,Brighton,0,24,29,11,5
2,Burnley,-6,23,26,12,5
3,Crystal Palace,-3,24,33,8,5
4,Everton,5,23,38,5,3


### Output the Data

In [17]:
ranks.to_csv('WK16-output1.csv', index=False,
               columns=['Position', 'Team', 'Total Games Played', 'Total Points', 'Goal Difference'])
ranks2.to_csv('WK16-output2.csv', index=False,
               columns=['Position Change', 'Position', 'Team', 'Total Games Played',
                        'Total Points', 'Goal Difference'])