# Python for data analytics – Project assessment

## Task 1 - Load your data

You should take your data from three files:
  * data/afl/stats.csv
  * data/afl/players.csv
  * data/afl/games.csv

And it should be loaded into a single dataframe by merging the three data sets.


In [None]:
import pandas as pd

stats = pd.read_csv("data/afl/stats.csv")
players = pd.read_csv("data/afl/players.csv", index_col="playerId")
games = pd.read_csv("data/afl/games.csv", index_col="gameId")

games_stats = games.join(stats.set_index("gameId"), on='gameId', lsuffix='_games', rsuffix='_stats')
games_stats_players = games_stats.join(players, on='playerId', lsuffix='_stats', rsuffix='_players')

# loaded into a single dataframe by merging the three data sets
games_stats_players

## Task 2 – Plot one player
For a particular player - say "Franklin, Lance" - plot their accumlation of goals over time. The x-axis should be the number of games played and the y-axis should be the number of goals accumulated. Thus we expect to see a line that monotonically increases, but in what way will depend on the player's career.

In [None]:
#!Note: modify below to choose a specific player
player = 'Franklin, Lance'

#try to get a random player working
# player = games_stats_players["displayName_stats"].sample()

#create mask for filtering
player_mask = games_stats_players["displayName_stats"] == player

#create dataframe & sort to allow for accumulation
player_stats_desc = games_stats_players[player_mask].sort_values(by="gameNumber", ascending=True)

#add a cumulative goals column
player_stats_desc["Goals (Cumulative)"] = player_stats_desc["Goals"].cumsum()

#create player series for graphing. reset_index() to normalise
player_series = player_stats_desc["Goals (Cumulative)"].reset_index(drop=True)

#graph series
player_series.plot(xlabel="Number of games",ylabel="Goals", title=f"{player} - Accumulation of goals over time")

## Task 3 – Plot multiple players
In the one chart, plot the lines for the following players:
•	  "Franklin, Lance"
•	  "Papley, Tom"
•	  "Mumford, Shane"
•	  "Hooker, Cale".

Plot each in a different colour so they can be distinguished and add a legend.

In [None]:
#create masks for filtering
lance_mask = games_stats_players["displayName_stats"] == 'Franklin, Lance'
tom_mask = games_stats_players["displayName_stats"] == 'Papley, Tom'
shane_mask = games_stats_players["displayName_stats"] == 'Mumford, Shane'
cale_mask = games_stats_players["displayName_stats"] == 'Hooker, Cale'

#create dataframes per player & sort to allow for accumulation
lance_stats_desc = games_stats_players[lance_mask].sort_values(by="gameNumber", ascending=True)
tom_stats_desc = games_stats_players[tom_mask].sort_values(by="gameNumber", ascending=True)
shane_stats_desc = games_stats_players[shane_mask].sort_values(by="gameNumber", ascending=True)
cale_stats_desc = games_stats_players[cale_mask].sort_values(by="gameNumber", ascending=True)

#add a cumulative goals column per dataframe
lance_stats_desc["Goals (Cumulative)"] = lance_stats_desc["Goals"].cumsum()
tom_stats_desc["Goals (Cumulative)"] = tom_stats_desc["Goals"].cumsum()
shane_stats_desc["Goals (Cumulative)"] = shane_stats_desc["Goals"].cumsum()
cale_stats_desc["Goals (Cumulative)"] = cale_stats_desc["Goals"].cumsum()

#create player series for graphing. reset_index() to normalise
lance_series = lance_stats_desc["Goals (Cumulative)"].reset_index(drop=True)
tom_series = tom_stats_desc["Goals (Cumulative)"].reset_index(drop=True)
shane_series = shane_stats_desc["Goals (Cumulative)"].reset_index(drop=True)
cale_series = cale_stats_desc["Goals (Cumulative)"].reset_index(drop=True)

#graph all 4 series on a single view
combined_player_graph = \
lance_series.plot(label="Franklin, Lance",legend=True)
tom_series.plot(label="Papley, Tom",legend=True,ax=combined_player_graph)
shane_series.plot(label="Mumford, Shane",legend=True,ax=combined_player_graph)
cale_series.plot(xlabel="Number of games",ylabel="Goals", label="Hooker, Cale",ax=combined_player_graph, title="Players - Accumulation of goals over time",legend=True)

## Task 4 – Linear Regression
Create a second plot showing just "Franklin, Lance" and "Hooker, Cale" but include the linear regression line for each. In other words. as well as showing their actual cummulative goals over time, plot their predicted goals over time where the prediction is done via Linear Regression. Be sure to use different colours for each line and include a legend.

In [None]:
#todo