# Cloud Academy data Science Webinar
## Boost your data Science career with data Visualization
### Speaker: Andrea Giussani

Original Source Data: https://www.kaggle.com/stefanoleone992/fifa-21-complete-player-dataset

I suggest to run this notebook inside a Colab Session. Please follow the instructions available [here](https://github.com/cloudacademy/ca-webinar-data-science-visualization) to get the most from this webinar.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import pandas as pd

In [None]:
df_all = pd.read_csv('/content/drive/MyDrive/ca.webinars/data_viz/archive-2/all_fifa_data.zip')

In [None]:
df_all.head(2)

We just retain a few columns

In [None]:
filtered_df = df_all[['sofifa_id', 'short_name', 'age', 'nationality', 'club_name', 'overall', 'potential', 'value_eur', 'wage_eur', 'year' ]]

Bokeh has a nice submodule called `plotting` which has the `figure` method.
This is the object you need to create and customize a figure in bokeh.
This method has then several class methods that are used to draw a plot inside a figure object, and those are called glyphs.

In [None]:
# TO BE FILLED

We can add a legend 

In [None]:
# TO BE FILLED

### Adding Multiple Players

Let us now plot the series for three distinct players

In [None]:
from bokeh.palettes import Colorblind3 # util that deals the color for us

In [None]:
p = figure(plot_width=400, plot_height=400)
# TO BE FILLED

#### Interactivity with Legend

Legends added to Bokeh plots can be made interactiv in case one needs to mute a certain glyph in a plot. These modes are activated by setting the click_policy property on a Legend to either "hide" or "mute".

In [None]:
# TO BE FILLED

### Adding Interactivity with Hoover


Bokeh comes with a number of interactive tools that can be used to report information, such as the so-called gestures, which are tools that respond to single gestures,  such as the `Pan` or the `WheelZoom`. In particular, for each type of gesture, one tool can be active at any given time, and the active tool is indicated on the toolbar by a highlight next to the tool icon. 

But there are other type of tools in bokeh. An example is the family of `Inspectors`.

Inspectors are passive tools that report information about the plot, based on the current cursor position. Any number of inspectors may be active at any given time. The inspectors menu in the toolbar allows users to toggle the active state of any inspector. The most famous member of this familiy is by far the `Hover Tool`.

We also introduce a new concept here, called Column Data Source (CDS): this is the corresponding dataFrame in Pandas, a sort of data store, that is pretty efficient when used to store data in bokeh.

In [None]:
from bokeh.models import ColumnDataSource

In [None]:
# TO BE FILLED


The hover tool is used to generate a “tabular” tooltip containing information for a particular row of the dataset.  Typically, the labels and values are supplied as a list of (label, value) tuples.


In [None]:
from bokeh.models import HoverTool

tooltips = [
            ('Player', '@short_name'),
            ('Age', '@age'),
            ('Club', '@club_name'),
            ('Mkt Value', '@value_eur'),
            ('Wage', '@wage_eur')   
           ]


# TO BE FILLED

### Plotting Categorical Data

In [None]:
def get_top_countries(df, year_filter, top_n=3):
  df_tmp = df.query('year==@year_filter')[['short_name', 'nationality']].groupby('nationality').count().rename(
    columns={'short_name': 'cnt'}
    ).sort_values(
    by='cnt', ascending=False
    )
  top_countries = df_tmp.head(top_n).reset_index()
  return top_countries


In [None]:
top10_countries =  get_top_countries(filtered_df, year_filter=2021, top_n=10)

In [None]:
top10_countries

We want to plot the top 10 Countries with respect to the number of players. We set the argiment `x_range` as equal to the series `top10_countries.nationality`, which is already sorted: in this way the data is gonna be shown in descending order.

In [None]:
# TO BE FILLED

Now we want to investigate how the average wage for three countires (England, Spain and Italy) evolved in the last three years. To do so we need a little bit of data wrangling. This has been done for you down below here.

In [None]:
pivot_table_wages = pd.pivot_table(
    data=filtered_df,
    index='nationality',
    columns='year',
    aggfunc='mean',
    values='wage_eur'
)

We force the columns' name to be of type string

In [None]:
pivot_table_wages.columns = pivot_table_wages.columns.map(str)

In [None]:
countries = pivot_table_wages.loc[['England', 'Italy', 'Spain']].index.to_list()

In [None]:
years = pivot_table_wages.loc[['England', 'Italy', 'Spain']].columns.to_list()[4:]

In [None]:
pivot_table_wages.loc[['England', 'Italy', 'Spain'], years]

In [None]:
data = {
    'country' : countries,
    '2019'   : pivot_table_wages.loc[['England', 'Italy', 'Spain'], '2019'].to_list(),
    '2020'   : pivot_table_wages.loc[['England', 'Italy', 'Spain'], '2020'].to_list(),
    '2021'   : pivot_table_wages.loc[['England', 'Italy', 'Spain'], '2021'].to_list()
    }

In [None]:
from bokeh.models import FactorRange

# TO BE FILLED

**END**