### Prepping Data Challenge:  Southend Stats (week 41)

### Requirements
- Input the data
- Rename the penultimate column from P 1 (as it appears in Prep) to Pts
- Exclude null rows
- Create a Special Circumstances field with the following categories
  - Incomplete (for the most recent season)
  - Abandoned due to WW2 (for the 1939 season)
  - N/A for full seasons
- Ensure the POS field only has values for full seasons
- Extract the numeric values from the leagues
  - FL-CH should be assigned a value of 0 
  - NAT-P should be assigned a value of 5
- Create an Outcome field with 3 potential values. (Note: this should apply to all seasons in the data order regardless of any gaps. The current season will have a null value)
  - Promoted, where they are in a league higher than their current league in the following season
  - Relegated, where they are in a league lower than their current league in the following season
  - Same League, where they do not change leagues between seasons
- Create new rows for seasons that were missed due to WW1 and WW2
- Update the fields with relevant values for these new rows
  - e.g. change their Special Circumstances value to WW1/WW2
- Output the data

In [1]:
import pandas as pd
import numpy as np

In [2]:
#Input the data
#Rename the penultimate column from P 1 (as it appears in Prep) to Pts
df = pd.read_csv("wk41-Southend Stats.csv", sep='\s+').rename(columns={'P.1' : 'Pts'})

In [3]:
df.columns = [c if c == 'POS' else c.title() for c in df.columns]

In [4]:
df.head(10)

Unnamed: 0,Season,League,P,W,D,L,F,A,Pts,POS
0,1906-07,SOUTH-2,22.0,14.0,5.0,3.0,58.0,23.0,33.0,1/12
1,1907-08,SOUTH-2,18.0,13.0,3.0,2.0,47.0,16.0,29.0,1/10
2,1908-09,SOUTH-1,40.0,14.0,10.0,16.0,52.0,54.0,38.0,12/21
3,1909-10,SOUTH-1,42.0,12.0,9.0,21.0,51.0,90.0,33.0,20/22
4,1910-11,SOUTH-1,38.0,10.0,9.0,19.0,47.0,64.0,29.0,19/20
5,1911-12,SOUTH-2,26.0,16.0,1.0,9.0,73.0,24.0,33.0,4/14
6,1912-13,SOUTH-2,24.0,14.0,6.0,4.0,43.0,23.0,34.0,2/13
7,1913-14,SOUTH-1,38.0,10.0,12.0,16.0,41.0,66.0,32.0,16/20
8,1914-15,SOUTH-1,38.0,10.0,8.0,20.0,44.0,64.0,28.0,18/20
9,1919-20,SOUTH-1,42.0,13.0,17.0,12.0,46.0,48.0,43.0,11/22


In [5]:
#Create a Special Circumstances field with the following categories
df['Special Circumstances'] = np.where(df['Season'] == df['Season'].max(), 'Incomplete',
                                      np.where(df['Season'] == '1939-40', 'Abandoned due to WW2', 'N/A'))

In [6]:
#Ensure the POS field only has values for full seasons
df['POS'] = np.where(df['Special Circumstances'] == 'N/A', df['POS'], np.nan)

In [7]:
#Extract the numeric values from the leagues
df['num_values'] = np.where(df['League'] == 'FL-CH', 0,
                     np.where(df['League'] == 'NAT-P', 5,
                           df['League'].str.extract('.*-(\d+)', expand=False).astype(float)))

In [8]:
#Create an Outcome field with 3 potential values.
df = df.sort_values(by='Season')
df['Outcome'] = np.where(df['num_values'].shift(-1) < df['num_values'], 'Promoted',
                  np.where(df['num_values'].shift(-1) > df['num_values'], 'Relegated',
                    np.where(df['num_values'].shift(-1) == df['num_values'], 'Same League', 'N/A')))

In [9]:
#df.head(10)

In [10]:
#Create new rows for seasons that were missed due to WW1 and WW2
#Update the fields with relevant values for these new rows
missing_years = [*range(1915, 1919), *range(1940, 1946)]
df2 = pd.DataFrame({'Season' : [f'{c}-{(c+1) % 100}' for c in missing_years],
                     'Special Circumstances' : ['WW1' if c <= 1919 else 'WW2' for c in missing_years],
                     'Outcome' : ['N/A']*len(missing_years)})
df = df.append(df2)

In [11]:
output = df[['Season','Outcome','Special Circumstances','League','P','W','D','L','F','A','Pts','POS']]

In [12]:
output.head(10)

Unnamed: 0,Season,Outcome,Special Circumstances,League,P,W,D,L,F,A,Pts,POS
0,1906-07,Same League,,SOUTH-2,22.0,14.0,5.0,3.0,58.0,23.0,33.0,1/12
1,1907-08,Promoted,,SOUTH-2,18.0,13.0,3.0,2.0,47.0,16.0,29.0,1/10
2,1908-09,Same League,,SOUTH-1,40.0,14.0,10.0,16.0,52.0,54.0,38.0,12/21
3,1909-10,Same League,,SOUTH-1,42.0,12.0,9.0,21.0,51.0,90.0,33.0,20/22
4,1910-11,Relegated,,SOUTH-1,38.0,10.0,9.0,19.0,47.0,64.0,29.0,19/20
5,1911-12,Same League,,SOUTH-2,26.0,16.0,1.0,9.0,73.0,24.0,33.0,4/14
6,1912-13,Promoted,,SOUTH-2,24.0,14.0,6.0,4.0,43.0,23.0,34.0,2/13
7,1913-14,Same League,,SOUTH-1,38.0,10.0,12.0,16.0,41.0,66.0,32.0,16/20
8,1914-15,Same League,,SOUTH-1,38.0,10.0,8.0,20.0,44.0,64.0,28.0,18/20
9,1919-20,Relegated,,SOUTH-1,42.0,13.0,17.0,12.0,46.0,48.0,43.0,11/22


In [13]:
#output the data
output.to_csv('wk41-output.csv', index=False)