In [5]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [6]:
df = pd.read_csv("wdi.csv")
df20 = df.query("year==2020").reset_index(drop=True)
germany = df.query("country=='Germany'").reset_index(drop=True)

# 1. The Plotly figure

- Plotly express is especially convenient if we are working with **Pandas DataFrames**. 
- However, we can also use it with **Pandas Series**, **Numpy Arrays** or **lists**. 
- The minimal scatterplot needs only the `x` and `y` values
- By default, the **figure is scaled to the size of the window**. Thus, if you resize your window, the appearance of the plot may change drastically. We can change the default behaviour by setting `height` and `width` parameters.
- The **most important parameters** (`title`, `labels`, ...) are usually available in the main plotly express functions (e.g. `px.scatter`). Further aspects of the plot can be changed for instance via the `update_traces` or `update_layout` methods.

In [8]:
fig = px.scatter(data_frame = df20, x='gdp_capita', y = 'life_expectancy', 
                 title = 'Life expectancy in 2020', 
                 labels = {'gdp_capita':'GDP per capita (current US $)', 'life_expectancy':'Life expectancy at birth (years)'}, 
                 width = 800, height = 500)
fig.show()

- We can inspect the internals of the plotly figure via `print` (most important attributes) and `to_dict` (all attributes). 
- This is useful to understand how the Plotly figure is structured, and how/what we can modify it. 
- Each figure consists of a `layout` attribute that defines the title, axis settings, etc. and a `data` attribute contains the data traces of the figure. - In our simple example, we have only one trace (a scatter plot). 
- Note that most of the attributes are set to default values, and we can modify them to customize the figure.

```python

In [9]:
print(fig)    # Prints the most important information about the figure

Figure({
    'data': [{'hovertemplate': ('GDP per capita (current US $)=' ... 'th (years)=%{y}<extra></extra>'),
              'legendgroup': '',
              'marker': {'color': '#636efa', 'symbol': 'circle'},
              'mode': 'markers',
              'name': '',
              'orientation': 'v',
              'showlegend': False,
              'type': 'scatter',
              'x': array([ 2078.59508637, 14064.03861499, 11452.22662377, ...,            nan,
                           3361.97886896,  2101.80459679]),
              'xaxis': 'x',
              'y': array([62.575, 76.989, 74.453, ..., 64.65 , 62.38 , 61.124]),
              'yaxis': 'y'}],
    'layout': {'height': 500,
               'legend': {'tracegroupgap': 0},
               'template': '...',
               'title': {'text': 'Life expectancy in 2020'},
               'width': 800,
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'GDP per capita (current US $)'}},
               'y

In [10]:
fig.to_dict()

{'data': [{'hovertemplate': 'GDP per capita (current US $)=%{x}<br>Life expectancy at birth (years)=%{y}<extra></extra>',
   'legendgroup': '',
   'marker': {'color': '#636efa', 'symbol': 'circle'},
   'mode': 'markers',
   'name': '',
   'orientation': 'v',
   'showlegend': False,
   'x': array([  2078.59508637,  14064.03861499,  11452.22662377,             nan,
                      nan,   6367.43731676,  19839.92558905,  20787.85787061,
           14105.91125253,  35012.39146639,  53066.49098867,  57258.69022723,
           14495.65697213,  27224.47085662,  50682.75956475,   5904.56737548,
           14579.09038031,  20317.23192048,  54569.92538502,   8134.59827209,
            3364.97321818,  80381.44549332,  11137.83100468,   8110.11475624,
           15860.10452215,  14303.93611302,  14900.10421632,             nan,
           65054.19432127,  25296.07093659,   2211.01143963,    751.20091187,
            6577.30601375,   4515.70950577,   3870.82943304,  47226.36514033,
          

We see that Plotly figures are represented as dictionaries. This means that we can inspect or change any aspect of the figure by accessing the respective key in the dictionary.

In [11]:
fig['data'][0]['marker']['color'] = 'green'    # [] notation
fig.data[0].marker.size = 15                   # .  notation
fig.show()

However, there are also other ways to access and modify the figure attributes:

- `update_traces` updates the data traces of the figure
- `update_layout` updates the layout of the figure

In [12]:
fig.update_traces(marker=dict(color='red', size=10, symbol='star', opacity=0.5))
fig.update_layout(font=dict(family='Consolas'), titlefont=dict(size=30, color='blue'))

# 2. Encoding


Encoding means representing columns of our DataFrame using visual properties like:

- `x`: position on the x-axis
- `y`: position on the y-axis
- `size`: size of the marker (area (default) or diameter)
- `symbol`: shape of the marker (circle, square, diamond, ...) 
- `color` 
- `text`
- `hovername`: the name that appears when hovering over a point
- `hoverdata`: list of additional information that appears when hovering over a point

Not all choices are equally effective... Try it out. We will cover this in more detail in the next lectures.

In [None]:
fig = px.scatter(df20, x='gdp_capita', y='life_expectancy', size='population', color='continent', 
                 size_max=70, hover_name= 'country', hover_data= 'fertility')
fig.show()

- If we use a **categorical** column such as `continent` in our encoding, Plotly express creates **separate traces for each of its values** (here: for each continent) 
- We can select or deselect traces by **clicking on the legend**.
- If we use a **numerical** column such as **population**, Plotly express interprets this as a continuous measure and represents it as one trace. 

In [18]:
print(f'Inspecting our figure, we can verify that it has {len(fig.data)} traces')

Inspecting our figure, we can verify that it has 5 traces


- As before, we can use the `update_traces` method to modify the appearance of the traces. By default, we change all traces.
- But we can also change the appearance of a single trace by specifying the `selector` parameter. The selector is a flexible way to select traces based on their attributes (e.g. their name, type, index, color, ...)

```python

In [19]:
fig.update_traces(marker=dict(symbol='star'))
fig.update_traces(selector=dict(name='Europe'), marker=dict(symbol='square',line=dict(color='black')))

In [20]:
# Try out other visual channels: text, hover_name, symbol, ...

# 3. Chart Types

- Plotly express offers a wide range of **different chart types**: scatter plots, line plots, bar plots, pie charts, histograms, box plots, violin plots, sunburst charts, ...
- For a given encoding (e.g. x='year' and y='population'), **we may have multiple choices of chart types**. 
- **Not all choices are equally effective** ... Try it out. We will cover this in more detail in the next lectures. 

In [21]:
germany = df[df.country == "Germany"].copy(deep=True)

# Markers
fig1 = px.scatter(germany, x='year', y='gdp_capita')
fig2 = px.area(germany, x='year', y='gdp_capita')
fig3 = px.line(germany, x='year', y='gdp_capita')
fig4 = px.bar(germany, x='year', y='gdp_capita')

Here we arrange the figures in a 1x4 grid using the `make_subplots` function, to compare them side by side:

In [22]:
from plotly.subplots import make_subplots
fig = make_subplots(rows=1, cols=4, subplot_titles=['Scatter', 'Area', 'Line','Bar'])
fig.add_trace(fig1.data[0], row=1, col=1)
fig.add_trace(fig2.data[0], row=1, col=2)
fig.add_trace(fig3.data[0], row=1, col=3)
fig.add_trace(fig4.data[0], row=1, col=4)
fig.show()


- Side remark: Observe that the first three charts (scatter, area and line) are internally all represented using a `scatter` trace. 
- The difference is in the way the data is connected. 
- If wwe create these Plots with the lower-level `go.Scatter` function, we have to specify the `mode` parameter to distinguish between these types of plots.

In [None]:
for figure in [fig1, fig2, fig3, fig4]:
    print(figure.data[0].type)

In [None]:
fig.update_traces(selector=dict(type='scatter'), mode='markers+lines', line=dict(color='red'))  # Here we update all 3 scatter traces at once

# 4. Scaling

- For **each encoding channel** (x, y, color, size, shape), we can choose between **different scaling or mapping options**
- For **numerical columns**, we can typically choose minimum and maximum values and the scale type (linear, log, date, ...)
- For **categorical columns**, we can e.g. choose which color or symbol is assigned to which category
- These choices have a big impact on the **readability and interpretability** of the plot. Try it out. We will cover this in more detail in the next lectures.

```python

## 4.1 Axis Scaling

In [None]:
px.scatter(df20, x='gdp_capita',  y='life_expectancy', log_x=True, range_x = [200, 5e5], hover_name='country')


 ## 4.2 Color Scaling

- Color is very important in data visualization
- When setting colors or color scales, we can use **named colors**, **RGB** colors, **HEX** colors, or **predefined color scales**
- There are several helper functions to choose colors and color scales

In [None]:
px.colors?

In [None]:
px.colors.sequential.swatches()

- By default, **numerical data** is represented by a **continuous color scale**
- **Categorical data** is represented by a **discrete color scale**.
- We can change the color scale by setting the `color_continuous_scale` or `color_discrete_map` parameter

In [None]:
px.scatter(df20, x='gdp_capita', y='life_expectancy', color='fertility', color_continuous_scale=px.colors.sequential.Greens)

In [None]:
px.colors.qualitative.swatches()

In [None]:
px.scatter(df20, x='gdp_capita', y='life_expectancy', color='continent', color_discrete_sequence=px.colors.qualitative.Bold)

We can also define our own discrete or continuous color mappings

In [None]:
px.bar(df20[df20.country.isin(['China','Argentina','Saudi Arabia'])], 
       x='country', y='gdp_capita', color='country', 
       color_discrete_map={'China': 'red', 'Argentina': '#84a7d6', 'Saudi Arabia': 'rgb(57, 104, 55)'})


In [None]:
color_scale = [[0, 'red'], [0.5, 'yellow'], [1, 'blue']]
px.scatter(df20, x='gdp_capita', y='life_expectancy', color='fertility', color_continuous_scale=color_scale)

# 5. Gridded and layered plots


Plotly Express provides us with easy ways to create more complex plots, by creating and organizing multiple traces in a single figure:

- We can add **marginals** to a scatter plot to show the **distribution of the data** along the x and y axes
- We can create **facets** (small multiples) by creating **subplots** for different categories or values of a column
- Complex plots can be created by **layering** multiple traces on top of each other. Here, Plotly Graph Objects (`go`) are more flexible than Plotly Express (`px`), although more verbose.

In [None]:
px.scatter(df20, x='gdp_capita', y='life_expectancy', marginal_x='box', marginal_y='box')

In [None]:
px.scatter(df20.dropna(), x='gdp_capita', y='life_expectancy', 
           facet_col = 'continent', 
           facet_col_wrap=3, 
           facet_col_spacing=0.05, facet_row_spacing=0.3)

In [None]:
fig = go.Figure()
fig.add_scatter(x=df20.gdp_capita, 
                y=df20.life_expectancy, 
                mode='markers', 
                marker=dict(color='lightgrey', size=10, opacity=0.5))
fig.add_scatter(x=df20.query("country=='Germany'").gdp_capita, 
                y=df20.query("country=='Germany'").life_expectancy,
                text='Germany',
                mode='markers+text', 
                marker=dict(color='blue', size=20),
                textposition='top center')
fig.update_layout(showlegend=False)

# 6. Layouts

- We will cover layout aspects in more detail later on
- First of all, you can choose one of the predefined templates (e.g. `plotly`, `plotly_dark`, `plotly_white`, `ggplot2`, `seaborn`, ...)
- Secondly, you can further customize each aspect of the layout

In [None]:
fig = px.scatter(data_frame = df20, x='gdp_capita', y = 'life_expectancy', 
                 title = 'Life expectancy in 2020', 
                 labels = {'gdp_capita':'GDP per capita (current US $)', 'life_expectancy':'Life expectancy at birth (years)'}, 
                 width = 800, height = 500)
fig.update_layout(template='plotly_white')

In [None]:
fig = px.scatter(data_frame = df20, x='gdp_capita', y = 'life_expectancy', 
                 title = 'Life expectancy in 2020', 
                 labels = {'gdp_capita':'GDP per capita (current US $)', 'life_expectancy':'Life expectancy at birth (years)'}, 
                 width = 800, height = 500)

fig.update_layout(
    plot_bgcolor='rgba(240, 240, 240, 0.9)', 
    font_family='Consolas',
    title={
        'x': 0.5,  # Center the title
        'xanchor': 'center',
        'font': {'size': 24, 'color': '#6770f6'}
    }
)

fig.update_layout()
