### Prepping Data Challenge: House of Games Winners (Week 14)

Each day the player who has scored the most, will receive 4 points, 2nd place will receive 3 points, 3rd place will receive 2 points and last place will receive one point. These points will be added up across the week to determine the overall winner, but with a twist! Each Friday double points are awarded so 1st place receives 8 points and so on. This leads me to wondering:

- Would there be a different winner if there was no double points Friday?
- What about if participants weren't ranked at the end of each day and they had a running total score across the week instead, - would that lead to a different winner?
- What about doubling the scores on the Friday, instead of the points awarded?

### Requirements
- Input the data
- Only keep relevant fields and rename certain fields to remove duplication
  - Ser. becomes Series
  - Wk. becomes Week
  - T becomes Tu
  - T 1 becomes Th
  - Total becomes Score
  - Week becomes Points
  - Week 1 becomes Rank
- Filter the data to remove Series that have a null value, or are preceded by an 'N'
- Calculate the Points without double points Friday
  - Rank the players based on this new field
  - Create a field to determine if there has been a change in winner for that particular Series and Week
- Rank the players based on their Score instead
  - Create a field to determine if there has been a change in winner for that particular Series and Week
- Calculate the Score if the score on Friday was doubled (instead of the Points)
  - Rank the players based on this new field
  - Create a field to determine if there has been a change in winner for that particular Series and Week
- Remove unnecessary fields
- Output the data

In [1]:
import pandas as pd

In [2]:
# Input the data.
df = pd.read_csv('WK14-Input.csv')

In [3]:
df.columns

Index(['Player', 'Ser.', 'Wk.', 'Seat', 'M', 'T', 'W', 'T.1', 'F', 'Total',
       'Avg', 'Rate*', 'M.1', 'T.2', 'W.1', 'T.3', 'F.1', '1st', '2nd', '3rd',
       '4th', 'Week', 'Week.1',
       '*Scoring Rate = % of Total Daily Points Scored Across Week',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Ser..1', 'Wk..1',
       '4-Player Total'],
      dtype='object')

In [4]:
df = df.drop(["*Scoring Rate = % of Total Daily Points Scored Across Week",
         'Avg', 'Rate*', 'M.1', 'T.2', 'W.1', 'T.3', '1st', '2nd', '3rd',
         '4th', 'Ser..1', 'Wk..1', '4-Player Total', 'Seat','Unnamed: 24','Unnamed: 25','Unnamed: 26'], axis=1)

In [5]:
#Only keep relevant fields and rename certain fields to remove duplication
df = df.rename(columns = {'Ser.': 'Series','Wk.': 'Week','T':'Tu','T.1':'Th','F':'Fri','Total':'Score','F.1':'Fri_rank',
                          'Week':'Points','Week.1':'Original Rank'})

In [6]:
df.head()

Unnamed: 0,Player,Series,Week,M,Tu,W,Th,Fri,Score,Fri_rank,Points,Original Rank
0,Angela Barnes,4,6.0,15.0,14.0,14.0,20.0,19.0,82.0,1st,24.0,1st
1,Ed Gamble,3,1.0,16.0,11.0,10.0,14.0,12.0,63.0,1st,23.0,1st
2,Simon Hickson,5,16.0,12.0,15.0,6.0,14.0,15.0,62.0,1st,23.0,1st
3,Steve Pemberton,2,2.0,12.0,11.0,15.0,14.0,8.0,60.0,2nd,22.0,1st
4,Shaun Williamson,4,20.0,14.0,13.0,10.0,11.0,11.0,59.0,1st,23.0,1st


In [7]:
#Filter the data to remove Series that have a null value, or are preceded by an 'N'
df = df.dropna(subset=['Series'])
df = df[~df['Series'].str.startswith('N')]

In [8]:
df['Original Rank'] = df['Original Rank'].str[0].astype(int)
df['Fri_rank'] = df['Fri_rank'].str[0].astype(int)
df['Week'] = df['Week'].astype(int)

In [9]:
#Calculate the Points without double points Friday
df['Points without double points Friday'] = df['Points'] - (5-df['Fri_rank'])
df['Score if double score Friday'] = df['Score'] + df['Fri']
df['Rank without double points Friday'] = df.groupby(['Week','Series'])['Points without double points Friday'].rank(method='min',ascending=False).astype(int)

In [10]:
#Rank the players based on their Score instead
df['Rank based on Score'] = df.groupby(['Week','Series'])['Score'].rank(method='min',ascending=False).astype(int)
#Calculate the Score if the score on Friday was doubled (instead of the Points)
df['Rank if Double Score Friday'] = df.groupby(['Week','Series'])['Score if double score Friday'].rank(method='min',ascending=False).astype(int)

In [11]:
#Create a field to determine if there has been a change in winner for that particular Series and Week
df['Change in winner with no double points Friday?'] = df.apply(lambda x : x['Original Rank'] == 1 and
                                                                x['Rank without double points Friday'] > 1, axis=1)\
                                                         .groupby([df['Week'],df['Series']]).transform('max')

In [12]:
df['Change in winner based on Score?'] = df.apply(lambda x : x['Original Rank'] == 1 and
                                                                x['Rank based on Score'] > 1, axis=1)\
                                                         .groupby([df['Week'],df['Series']]).transform('max')

In [13]:
df['Change in winner if Double Score Friday?'] = df.apply(lambda x : x['Original Rank'] == 1 and
                                                                x['Rank if Double Score Friday'] > 1, axis=1)\
                                                         .groupby([df['Week'],df['Series']]).transform('max')

In [14]:
df = df[['Series', 'Week', 'Player', 'Original Rank', 'Rank without double points Friday',
    'Rank based on Score', 'Change in winner based on Score?', 'Rank if Double Score Friday',
    'Change in winner if Double Score Friday?', 'Points', 'Score', 'Points without double points Friday',
    'Score if double score Friday']]

In [15]:
df.head()

Unnamed: 0,Series,Week,Player,Original Rank,Rank without double points Friday,Rank based on Score,Change in winner based on Score?,Rank if Double Score Friday,Change in winner if Double Score Friday?,Points,Score,Points without double points Friday,Score if double score Friday
0,4,6,Angela Barnes,1,1,1,False,1,False,24.0,82.0,20.0,101.0
1,3,1,Ed Gamble,1,1,1,False,1,False,23.0,63.0,19.0,75.0
2,5,16,Simon Hickson,1,1,1,False,1,False,23.0,62.0,19.0,77.0
3,2,2,Steve Pemberton,1,1,1,False,1,False,22.0,60.0,19.0,68.0
4,4,20,Shaun Williamson,1,1,1,False,1,False,23.0,59.0,19.0,70.0


In [16]:
#output the data
df.to_csv('wk14-output.csv', index=False)