# RLCS 2021-22 Dataset Demo: Player profile explain with score.

This notebook with outputs is available here: ["RLCS 2021-22 - Demo" on Kaggle](https://www.kaggle.com/dylanmonfret/rlcs-2021-22-demo).
<br>
<hr>

Ok, so I do not really know what to put in this notebook to show what is possible with the data available [here](https://www.kaggle.com/dylanmonfret/rlcs-202122), because we could basically do any kind of data analysis or machine learning process compatible with tabular data. For example, we could remake some ballchasing.com data visualization elements, show head-to-head results between teams or players through the whole season, or build predictive models based on what we have in our hands (I am currently working on this).

The first two options could be interesting to train our self with data manipulation and the last one might be quite long and need some reflexion to establish the right methodology.

So, let's try something easier but still interesting to analyse: "players type".

We are not going to explain entirely what Rocket League is again, since it was already done [here](https://www.kaggle.com/dylanmonfret/rlcs-202122). But let's just make a quick reminder: two teams of 3 players face each others in a 5 minutes (+ overtime in case of tie after 5 minutes) game and the team who scored the most goals wins the game.

This is basically football with car (and not "soccer", and EU > NA, always), meaning goals, assists and saves are counted like we use to do with actual football games. In this way, we can define simple types of player by the field their performing the most:

* __Scorer / Striker__: a player contributing to the team by scoring.
* __Passer / Support__: a player contributing to the team by doing assits.
* __Keeper / Defender__: a player contributing to the team by saving goals.

Let's now try to illustrate this with `by_players.csv`, `by_teams.csv`, `general.csv` files.

## Librairies imports

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

## Dataframe imports

In [None]:
players = pd.read_csv('../data/retrieved/by_players.csv', low_memory=False, encoding='utf8')
teams = pd.read_csv('../data/retrieved/by_teams.csv', low_memory=False, encoding='utf8')
general = pd.read_csv('../data/retrieved/general.csv', low_memory=False, encoding='utf8')

In [None]:
players.head()

In [None]:
teams.head()

In [None]:
general.head()

## Preparations

We actually do not need all the features from each dataframe for what is following, so we will keep what (I believe) is usefull or interesting to have to define players profiles (Stikers, Passers and Goalkeepers).

In [None]:
# general.columns.tolist()
general = general[['ballchasing_id', 'region', 'split', 'event', 'phase', 'stage', 'round', 'created']]
general.head()

In [None]:
# players.columns.tolist()
players = players[['ballchasing_id','color', 'name', 'p_name', 'p_platform', 'p_platform_id', 'p_car_name', 'p_core_goals', 'p_core_saves', 'p_core_assists',
                   'p_core_score', 'p_core_mvp']]
players.head()

In [None]:
# teams.columns.tolist()
teams = teams[['ballchasing_id','color', 'name', 'core_goals']]
teams.head()

At this point, `teams` dataframe is useless, since where focusing on players, but we can retrieve game results using `ballchasing_id` feature to know how players are performing if they are winning or losing. So first I filter `teams` by color/side (blue or orange)...

In [None]:
blue_side = teams.loc[teams.color=='blue']
blue_side.head()

In [None]:
orange_side = teams.loc[teams.color == 'orange']
orange_side.head()

We have now to delete `color` columns and rename `name` and `core_goals` to merge both dataframe and know which teams won in a single dataframe `match_results`. Then, we will be able to drop useless variables from `match_results` for upcoming join with `players` dataframe, and finally tell for each player if their team win the game or not.

In [None]:
blue_side = blue_side.rename(columns={'name': 'blue_team', 'core_goals': 'blue_goals'}).drop('color', axis=1)
orange_side = orange_side.rename(columns={'name': 'orange_team', 'core_goals': 'orange_goals'}).drop('color', axis=1)

In [None]:
blue_side.head()

In [None]:
orange_side.head()

In [None]:
match_results = blue_side.merge(orange_side)
match_results['winner'] = np.where(match_results.orange_goals > match_results.blue_goals, 'orange', 'blue')
match_results.head()

In [None]:
match_results.drop(['blue_team', 'blue_goals', 'orange_team', 'orange_goals'], axis=1, inplace=True)
match_results.head()

In [None]:
players = players.merge(match_results).rename(columns={'winner': 'win'})
players.win = np.where(players.color == players.win, True, False)

players.head()  # Notice: 'p_core_mvp' could be used to tell which team won, because each game MVP is on winner side.

This being done, we can make a last join between `general` and `players` to have information about each game (events, round, region, etc.), and we will actually platform columns to have both platform name and ID in a single column and sort game by date (from older to latest ones) to keep most recent player names.

In [None]:
data = general.merge(players).rename(columns={'created': 'game_time', 'name': 'team', 'p_name': 'name', 'p_platform': 'platform', 'p_platform_id': 'platform_id',
                                              'p_car_name': 'car_name', 'p_core_goals': 'goals', 'p_core_saves': 'saves', 'p_core_assists': 'assists',
                                              'p_core_score': 'score', 'p_core_mvp': 'mvp'})

data.platform = data['platform'] + '_' + data['platform_id'].astype(str)

data = data.rename(columns={'platform': 'player_id'}).drop('platform_id', axis=1).sort_values(['game_time', 'color', 'name'])

data.head()

In [None]:
player_name_db = data[['name', 'team', 'player_id']] \
                    .drop_duplicates('player_id', keep='last') \
                    .sort_values('name', key=lambda col: col.str.lower()) \
                    .reset_index(drop=True)
player_name_db

In [None]:
data.drop(['team', 'name', 'game_time', 'color'], axis=1, inplace=True)
data

## Filtering & Averaging variables

### Filtering outlier

Since we are going to average variables over the whole dataset to then plot them, it would be interesting to get rid of the few "outliers" present in the data. We are talking here about players with extreme values due to a number of games played lower than normal. So let's count game played by players during the season.



In [None]:
counting = data[['player_id', 'score']] \
            .groupby('player_id', as_index=False) \
            .count() \
            .rename(columns={'score': 'count'}) \
            .sort_values('count', ascending=False)

Then let's keep players with a minimum of 9 games played since the beginning of RLCS. Now, you may ask "why 9 games especially" ? Well, because at the moment (with the Fall split finished and the ongoing Winter Split), 9 is the minimal number of game a starter player can make during Main Event (going 0-3 Bo5 in Swiss Stage during Fall or going 0-3 Bo5 in Group Stage during Winter and getting swept at each series). This will get rid of substitute / stand-in players with not enough games to be relevant for further analysis.

In [None]:
counting = counting[counting['count'] >= 9].reset_index(drop=True)
counting

In [None]:
validate = set(counting.player_id.tolist())  # Creating a set to aply isin() function after.

### Averaging

To compare the players, we will average their statistics over all games available, using `player_id` as the identifier to apply average function.

In [None]:
avg_all = data[data.player_id.isin(validate)][['player_id', 'goals', 'saves', 'assists', 'score']].groupby('player_id', as_index=False).mean()
avg_all = player_name_db.merge(avg_all).merge(counting).sort_values(['team', 'name'])
avg_all

Let's check **Team Vitality** & **The General NRG** players and see how they performed during the season.

In [None]:
avg_all.loc[avg_all.team == 'TEAM VITALITY']

In [None]:
avg_all.loc[avg_all.team == 'THE GENERAL NRG']

**Alpha54** and **justin** seem to stand out from their teammates with high scoring due to the number of goals scored and saves made per game. And this is completly normal, in game:

* A goal give 100 points to a player
* An assists give 50 points to a player
* A simple save give 50 points to a player and a "miracle" save 75.

They are what we use to call "super-carries" of their team: most clutch players of their team, very offensive and able to score a lot, but also able to make save in very difficult situation.

Let's now present the data to compare all players to each other.

In [None]:
avg_describe = avg_all.describe()
avg_describe

## Data visualization

![And Here We Go!](https://c.tenor.com/x-FL-l7ERS4AAAAC/and-here-we-go-joker.gif)

### Goals / Score

In [None]:
fig_11 = px.scatter(avg_all,
                 x="score",
                 y="goals",
                 color='team',
                 hover_data=['name', 'count'],
                 labels={"team": "Last team",
                         "score": "Average Score per game",
                         "goals": "Average Goals per game",
                         "name": "Player name",
                         "count": "Games played"},
                 title='Score visualisation with goal per game',
                 width=960,
                 height=720)

fig_11.add_vline(x=avg_describe.loc['mean', 'score'], line_width=1, line_dash="dash")
fig_11.add_hline(y=avg_describe.loc['mean', 'goals'], line_width=1, line_dash="dash")

fig_11.show()

In [None]:
fig_12 = px.scatter(avg_all,
                    x="score",
                    y="goals",
                    hover_data=['name', 'count'],
                    marginal_x="box",
                    marginal_y="box",
                    labels={"team": "Last team",
                            "score": "Average Score per game",
                            "goals": "Average Goals per game",
                            "name": "Player name",
                            "count": "Games played"},
                    title='Score visualisation with goal per game',
                    width=960,
                    height=960)
fig_12.show()

### Assists / Score

In [None]:
fig_21 = px.scatter(avg_all,
                 x="score",
                 y="assists",
                 color='team',
                 hover_data=['name', 'count'],
                 labels={"team": "Last team",
                         "score": "Average Score per game",
                         "assists": "Average Assists per game",
                         "name": "Player name",
                         "count": "Games played"},
                 title='Score visualisation with assists per game',
                 width=960,
                 height=720)

fig_21.add_vline(x=avg_describe.loc['mean', 'score'], line_width=1, line_dash="dash")
fig_21.add_hline(y=avg_describe.loc['mean', 'assists'], line_width=1, line_dash="dash")

fig_21.show()

In [None]:
fig_22 = px.scatter(avg_all,
                    x="score",
                    y="assists",
                    hover_data=['name', 'count'],
                    marginal_x="box",
                    marginal_y="box",
                    labels={"team": "Last team",
                            "score": "Average Score per game",
                            "assists": "Average Assists per game",
                            "name": "Player name",
                            "count": "Games played"},
                    title='Score visualisation with assists per game',
                    width=960,
                    height=960)
fig_22.show()

### Saves / Score

In [None]:
fig_31 = px.scatter(avg_all,
                 x="score",
                 y="saves",
                 color='team',
                 hover_data=['name', 'count'],
                 labels={"team": "Last team",
                         "score": "Average Score per game",
                         "saves": "Average Saves per game",
                         "name": "Player name",
                         "count": "Games played"},
                 title='Score visualisation with saves per game',
                 width=960,
                 height=720)

fig_31.add_vline(x=avg_describe.loc['mean', 'score'], line_width=1, line_dash="dash")
fig_31.add_hline(y=avg_describe.loc['mean', 'saves'], line_width=1, line_dash="dash")

fig_31.show()

In [None]:
fig_32 = px.scatter(avg_all,
                    x="score",
                    y="saves",
                    hover_data=['name', 'count'],
                    marginal_x="box",
                    marginal_y="box",
                    labels={"team": "Last team",
                            "score": "Average Score per game",
                            "saves": "Average Saves per game",
                            "name": "Player name",
                            "count": "Games played"},
                    title='Score visualisation with saves per game',
                    width=960,
                    height=960)

fig_32.show()