### Prepping Data Challenge: Premier League Statistics (week 13)
 

#### Requirement:
 
Open play goal scoring prowess in the Premier League 2015-2020
 1. Input all the files
 2. Remove all goalkeepers from the data set
 3. Remove all records where appearances = 0	
 4. In this challenge we are interested in the goals scored from open play
    - Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored that weren’t penalties or freekicks)
    - Note some players will have scored free kicks or penalties with their left or right foot
    - Be careful how Prep handles null fields! (have a look at those penalty and free kick fields) 
    - Rename the original Goals scored field to Total Goals Scored
 5. Calculate the totals for each of the key metrics across the whole time period for each player, (be careful not to lose their position)
 6. Create an open play goals per appearance field across the whole time period
 7. Rank the players for the amount of open play goals scored across the whole time period, we are only interested in the top 20 (including those that are tied for position) – Output 1
 8. Rank the players for the amount of open play goals scored across the whole time period by position, we are only interested in the top 20 (including those that are tied for position) – Output 2
 9. Output the data – in your solution on twitter / the forums, state the name of the player who was the only non-forward to make it into the overall top 20 for open play goals scored

###  1. Input all the files

In [1]:
#import libraries
import pandas as pd

In [2]:
df = pd.concat(map(pd.read_csv,['WK13-pl_15-16.csv','WK13-pl_16-17.csv','WK13-pl_17-18.csv','WK13-pl_18-19.csv','WK13-pl_19-20.csv']))

In [None]:
#df.head()

### 2 & 3. Remove all goalkeepers from the data set & Remove all records where appearances = 0	

In [4]:
remove = ['Goalkeeper', 0]
df = df[(~df['Position'].isin(remove)) & (~df['Appearances'].isin(remove))]

### 4. In this challenge we are interested in the goals scored from open play

In [5]:
# Create a new “Open Play Goals” field (the goals scored from open play is the number of goals scored 
# that weren’t penalties or freekicks)
# Note some players will have scored free kicks or penalties with their left or right foot
# Be careful how Prep handles null fields! (have a look at those penalty and free kick fields)
df = df.fillna(0)
df['Open Play Goals'] = (df['Goals'] - (df['Penalties scored']+df['Freekicks scored']))

In [6]:
# Rename the original Goals scored field to Total Goals Scored
df.rename(columns={'Goals':'Total Goals'}, inplace=True)

###   5. Calculate the totals for each of the key metrics across the whole time period for each player, (be careful not to lose their position)

In [7]:
df.groupby(['Name','Position']).size().reset_index()
df.groupby('Name').size().reset_index().sort_values(by=0, ascending=False)

Unnamed: 0,Name,0
428,Jamie Vardy,5
728,Moussa Sissoko,5
973,Victor Wanyama,5
895,Simon Francis,5
519,Joshua King,5
...,...,...
469,Joelinton,1
470,Joey Barton,1
471,John Egan,1
472,John Fleck,1


In [8]:
df_Total = df.groupby(['Name','Position'])[['Appearances','Open Play Goals','Goals with right foot','Goals with left foot',
                                           'Headed goals','Total Goals']].sum().reset_index()

### 6. Create an open play goals per appearance field across the whole time period

In [9]:
df_Total['Open Play Goals/Game'] = df_Total['Open Play Goals']/df_Total['Appearances'].astype(float)

###  7 & 8. Rank the players for the amount of open play goals scored across the whole time period, we are only interested in the top 20 (including those that are tied for position) – Output 1

###  Rank the players for the amount of open play goals scored across the whole time period by position, we are only interested in the top 20 (including those that are tied for position) – Output 2

In [10]:
df_Total['Position Rank'] = df_Total.groupby('Position')['Open Play Goals'].rank(ascending=False,method='min').astype(int)
df_Total['Rank'] = df_Total['Open Play Goals'].rank(ascending=False,method='min').astype(int)

### 9. Output the data 

In [11]:
out_cols1 = ['Position Rank','Open Play Goals','Goals with right foot','Goals with left foot','Position','Appearances','Total Goals','Open Play Goals/Game','Headed goals','Name']
out_cols2 = ['Open Play Goals','Goals with right foot','Goals with left foot','Position','Appearances','Rank','Total Goals','Open Play Goals/Game','Headed goals','Name']

In [12]:
df_Total[df_Total['Position Rank'] <= 20].sort_values(by=['Position','Position Rank']).to_csv('WK13-PLS output1.csv', index=False, columns=out_cols1)
df_Total[df_Total['Rank'] <= 20].sort_values(by='Rank').to_csv('WK13-PLS output2.csv', index=False, columns=out_cols2)