<a href="https://colab.research.google.com/github/JARIN-TIAS/Data-Visualization/blob/main/plotly_tutorial_be_a_visualization_grandmaster_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div
     style="padding: 20px;
            color: black;
            margin: 0;
            font-size: 250%;
            text-align: center;
            display: fill;
            border-radius: 5px;
            background-color: #f79d28;
            overflow: hidden;
            font-weight: 700;
            border: 5px solid black;"
     >
    Plotly Tutorial
</div>

We'll be taking our data visualization skills to the next level with plotly.  
plotly is a great package that allows you to create dynamic data visualizations in Python.  
Here's an example of what we'll be able to create by the end of this tutorial.

The prerequisites for this tutorial are intermediate knowledge of Python and beginner knowledge of pandas and numpy.

During the course of this tutorial we'll be exploring 2 plotly modules: plotly.express and plotly.graph_objects.  
plotly.express (px) is a fast and easy way to create dynamic data visualizations.  
plotly.graph_objects (go) is the lower level API that grants more control over your visualizations, but is more code intensive.

Shoutout to Derek Banas, I created this Kaggle Notebook in part by following his [tutorial on YouTube](https://www.youtube.com/watch?v=GGL6U0k8WYA).  


# 🚚 Import

In [None]:
import numpy as np  # linear algebra
import pandas as pd  # data processing
import seaborn as sns  # datasets
import itertools  # iteration utils
from scipy.interpolate import griddata  # for 3d surface plot

import plotly.express as px
import plotly.graph_objects as go
from plotly import subplots

# 📊 Bar Chart

Let's start by creating a few bar charts.  
We'll be analizing the builtin plotly.express "gapminder" dataset, which contains a few metrics for each country of the world.

In [None]:
df_world = px.data.gapminder()
df_world.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.85303,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.10071,AFG,4
3,Afghanistan,Asia,1967,34.02,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4


In [None]:
px.bar(df_world, x='year', y='pop', color='continent', hover_name='country', title='World Population Growth')

Next let's take a more detailed look at Europe's population distribution by country in 2007.  
We can query the dataset by using the query() method and by passing in a Python logic statement as a string parameter.

In [None]:
df_europe = px.data.gapminder().query("continent == 'Europe' and year == 2007 ")
df_europe.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
23,Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
83,Austria,Europe,2007,79.829,8199783,36126.4927,AUT,40
119,Belgium,Europe,2007,79.441,10392226,33692.60508,BEL,56
155,Bosnia and Herzegovina,Europe,2007,74.852,4552198,7446.298803,BIH,70
191,Bulgaria,Europe,2007,73.005,7322858,10680.79282,BGR,100


We can instantiate any chart/plot as an object and access its methods for more granular control over our data

In [None]:
fig = px.bar(df_europe, x='country', y='pop', text='pop', color='country')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.show()

# 📈 Line Plot

Let's create a few line plots.  
Line plots are generally used to visualize a variable y that changes along the x axis (usually time or space, but not necessarily).  
We'll be analizing the builtin plotly.express "stocks" dataset.

In [None]:
df_stocks = px.data.stocks()
df_stocks.head()

Unnamed: 0,date,GOOG,AAPL,AMZN,FB,NFLX,MSFT
0,2018-01-01,1.0,1.0,1.0,1.0,1.0,1.0
1,2018-01-08,1.018172,1.011943,1.061881,0.959968,1.053526,1.015988
2,2018-01-15,1.032008,1.019771,1.05324,0.970243,1.04986,1.020524
3,2018-01-22,1.066783,0.980057,1.140676,1.016858,1.307681,1.066561
4,2018-01-29,1.008773,0.917143,1.163374,1.018357,1.273537,1.040708


We'll instantiate the go.Figure() class as fig, which will handle the graph layout.  
Using the fig.add_trace() method we'll be able to add plots to our layout.  
The best way to create a line chart is the go.Scatter() class and setting mode='lines'.  
When using the go.Figure() class, so we pass the plot class (go.Scatter() in this case) as the first parameter of the fig.add_trace() method.

In [None]:
fig = go.Figure()
# we subtract 1 to stocks price data to show performance
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.GOOG - 1, mode='lines', name='Google'))
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AAPL - 1, mode='lines', name='Apple'))
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AMZN - 1, mode='lines+markers', name='Amazon'))

fig.update_layout(
    title='Some Tech Stocks Performance (Jan 2018 - Jan 2020)',
    xaxis_title='Date', yaxis_title='Price'
)

# 🤔 Advanced Styling 1 | update_layout()

Now we'll be going bonkers on the the previous graph's styling by adding in a lot of parameters to the go.Figure.update_layout() method.

In [None]:
fig = go.Figure()
# we subtract 1 to stocks price data to show performance
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.GOOG - 1, mode='markers', name='Google'))
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AAPL - 1, mode='lines', name='Apple'))
fig.add_trace(go.Scatter(x=df_stocks.date, y=df_stocks.AMZN - 1, mode='lines+markers', name='Amazon'))

fig.update_layout(
    title='Some Tech Stocks Performance (Jan 2018 - Jan 2020)',
    xaxis=dict(
        showline=True, showgrid=False, showticklabels=True,
        linecolor='rgb(204, 204, 204)', linewidth=2, ticks='outside',
        tickfont=dict(
            family='Arial', size=12, color='rgb(82, 82, 82)'
        )
    ),
    yaxis=dict(
        showgrid=False, showline=False, tickformat='.2%'
    ),
    margin=dict(
        autoexpand=False, l=100, r=20, t=110
    ),
    showlegend=False,
    plot_bgcolor='white'
)

# 🔵 Scatter Plot

We already introduced scatter plots in the line plot section, so let's build upon that.  
The main difference between a scatter plot and a line plot is that a scatter plot is used, in general, to explore correlation of 2 variables or clustering of 2 variables in discrete groups.

In [None]:
df_iris = px.data.iris()
df_iris.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,species_id
0,5.1,3.5,1.4,0.2,setosa,1
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,1
3,4.6,3.1,1.5,0.2,setosa,1
4,5.0,3.6,1.4,0.2,setosa,1


In [None]:
fig = px.scatter(
    df_iris, x='sepal_width', y='sepal_length',
    color='species', size='petal_width', hover_data=['petal_length']
)
fig.update_layout(width= 1000, height=600)
fig.show()

We can see there is some correlation between sepal_width and sepal_length, and the "setosa" species is in a cluster that's distinct from "versicolor" and "virginica".

For large datasets (size > 10000 rows) the Scattergl class is faster (performance-wise) than Scatter, but it has less features.

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scattergl(
        x=np.random.randn(100000),
        y=np.random.randn(100000),
        mode='markers',
        marker=dict(
            color=np.random.randn(100000),
            colorscale='sunsetdark',
            line_width=1
        )
    )
)
fig.show()

# 🍩 Donut Chart (a.k.a. Pie Chart But Better)

Donut Charts and Pie Charts are useful when identifying which categories of data represent a bigger slice of the whole, based on a numeric variable.  
For example, let's see how the population of Asia (numeric variable) is distributed among Asia's countries (categoric variable).

In [None]:
df_asia = px.data.gapminder().query("continent == 'Asia' and year == 2007")
df_asia.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
11,Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
95,Bahrain,Asia,2007,75.635,708573,29796.04834,BHR,48
107,Bangladesh,Asia,2007,64.062,150448339,1391.253792,BGD,50
227,Cambodia,Asia,2007,59.723,14131858,1713.778686,KHM,116
299,China,Asia,2007,72.961,1318683096,4959.114854,CHN,156


In [None]:
fig = px.pie(
    df_asia, values='pop', names='country',
    hole=0.5
)
fig.update_layout(height=1000, title='Population of Asia')
fig.show()

# 🤔 Advanced Styling 2 | Data Manipulation

To reach our visualization goals, we'll sometimes need to manipulate the datasets we're given as input.  
To further style the previous chart, let's define an additional array that will specify how much we should pull each sector from the chart, based on its percentage value.  
Let's say that our rule is that if a sector's percentage value is less than 1.5%, we pull it away from the center of the donut by 60%

In [None]:
tot_asia_pop = df_asia['pop'].sum()
asia_pop_perc = df_asia['pop'] / tot_asia_pop
pull_array = asia_pop_perc.map(lambda x: 0.6 if x < 0.015 else 0).values
pull_array

array([0.6, 0.6, 0. , 0.6, 0. , 0.6, 0. , 0. , 0. , 0.6, 0.6, 0. , 0.6,
       0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0. , 0. , 0.6, 0.6,
       0.6, 0.6, 0.6, 0. , 0. , 0.6, 0.6])

We'll also be doing some extra styling in the update_traces and update_layout methods

In [None]:
fig = go.Figure()
fig.add_traces(
    go.Pie(
        values=df_asia['pop'], labels=df_asia['country'], hole=0.5
    )
)
fig.update_traces(
    pull=pull_array,
    rotation=135,
    hoverinfo='label+value+percent',
    marker=dict(
        colors=px.colors.sequential.Agsunset
    )
)
fig.update_layout(
    height=1000, title='Population of Asia'
)
fig.show()

# 🎲 Histogram

Let's analize a value's probability when rolling 2 6-faced dice.  
We'll do it with a histogram.

In [None]:
die1 = np.random.randint(1, 7, 100000)
die2 = np.random.randint(1, 7, 100000)
dice = die1 + die2

In [None]:
fig = px.histogram(
    dice, nbins=11, labels={'value': 'Dice Roll'},
    title='10000 Dice Roll Histogram',
    color_discrete_sequence=['darkred']
)
fig.update_layout(showlegend=False)
fig.show()

# 🤔 Advanced Styling 3 | Subplots

Let's say that we want to show both value counts and the probability distribution (value_count / total).  
We'll need to generate 2 plots. We'll do that by using subplots.  
In our imports we imported subplots from plotly, we'll be using that module.

In [None]:
fig = subplots.make_subplots(
    rows=1, cols=2,
    subplot_titles=['Value Counts', 'Value Probability']
)

fig.add_trace(
    go.Histogram(
        x=dice, nbinsx=11,
        marker=dict(
            color=['darkred'] * 11
        ),
        hoverinfo='x+y'
    ),
    row=1, col=1
)

fig.add_trace(
    go.Histogram(
        x=dice, nbinsx=11, histnorm='probability',
        marker=dict(
            color=['darkblue'] * 11
        ),
        hoverinfo='x+y'
    ),
    row=1, col=2
)

fig.update_layout(
    showlegend=False,
    yaxis2_tickformat='.2%'
)

fig.show()

Since the 2 distributions are the same but with different y-axis scaling, we can consolidate the 2 subplots into 1 by using the secondary_y attribute.

In [None]:
fig = subplots.make_subplots(
    rows=1, cols=1,
    subplot_titles=['100000 Dice Roll Distribution'],
    specs=[[{'secondary_y': True}]]
)

fig.add_trace(
    go.Histogram(
        x=dice, nbinsx=11,
        marker=dict(
            color=['darkred'] * 11
        ),
        hoverinfo='x+y'
    ),
    row=1, col=1, secondary_y=False
)

fig.add_trace(
    go.Histogram(
        x=dice, nbinsx=11, histnorm='probability',
        marker=dict(
            color=['darkred'] * 11
        ),
        hoverinfo='x+y'
    ),
    row=1, col=1, secondary_y=True
)

fig.update_layout(
    showlegend=False,
    yaxis2_tickformat='.2%'
)

fig.show()

# 🗳️ Box Plot

A Box Plot is used to visualize a numeric variable's quartiles.  
When you have a sorted array of numeric variables, the quartiles are defined as follows.  
Q0 is the lowest data point, exluding outliers, so equal or close to 0% of the data.  
Q1 is the point that splits off the bottom 25% of the data.  
Q2 (better known as the median) is the point that splits off the bottom 50% of the data.  
Q3 is the point that splits off the bottom 50% of the data.  
Q4 is the highest data point, excluding outliers, so equal or close to 100% of the data.  
In the Box Plot, the data within the box is all the data within Q1 and Q3.  
The line inside the box is the median (or Q2).  
The bottom whisker contains the data within Q0 and Q1.  
The upper whisker contains the data within Q3 and Q4.  
Outliers are displayed as data points outside the whiskers.

In [None]:
df_tips = px.data.tips()
df_tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [None]:
px.box(df_tips, x='day', y='tip', color='sex')

In [None]:
col_cycle = itertools.cycle(px.colors.qualitative.Alphabet)
fig = go.Figure()
for sex in df_tips['sex'].unique():
    df_plot = df_tips.loc[df_tips['sex'] == sex]
    color = next(col_cycle)

    fig.add_trace(
        go.Box(
            x=df_plot['day'], y=df_plot['tip'],
            line=dict(
                color=color
            ),
            notched=True,
            name=sex,
            boxmean='sd'
        )
    )
fig.update_layout(
    boxmode='group',
    title='Tips by Day and Sex',
    yaxis_tickformat='$'
)
fig.show()

# 🎻 Violin Plot

A Violin plot is similar to a Box Plot, but it applies a Kernel Density Estimation (KDE) to the data, thereby making the visualization smoother.

In [None]:
df_tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [None]:
# let's compare it to the box plot by setting box=True
px.violin(df_tips, y='total_bill', box=True, points='all')

Let's look at a 2 sided Violin Plots, a way to visualize 2 distributions of the same numeric data.

In [None]:
fig = go.Figure()

df_yes = df_tips.loc[df_tips['smoker'] == 'Yes']
fig.add_trace(
    go.Violin(
        x=df_yes['day'], y=df_yes['total_bill'], name='Smoker',
        legendgroup='Yes', scalegroup='Yes',
        side='negative', line_color='red'
    )
)

df_no = df_tips.loc[df_tips['smoker'] == 'No']
fig.add_trace(
    go.Violin(
        x=df_no['day'], y=df_no['total_bill'], name='Non Smoker',
        legendgroup='Yes', scalegroup='Yes',
        side='positive', line_color='green'
    )
)

fig.update_layout(
    title='Total Bill by Day | Smokers vs. Non Smokers'
)

fig.show()

# 🟧 Density Heatmap (or 2D Histogram)

A Density Heatmap is kinda like a 2D Histogram, where the x and y axis are dedicated to 2 variables and the "z axis" (or rather the color of the Heatmap cell) is dedicated to count/sum of occurences.

Next let's overlay the number of passengers in the Heatmap cells and show the Year over Year growth as a Bar Chart.  
We'll be doing some data transformation to get the YoY growth.

In [None]:
import numpy as np  # linear algebra
import pandas as pd  # data processing
import seaborn as sns  # datasets
import itertools  # iteration utils

import plotly.express as px
import plotly.graph_objects as go
from plotly import subplots

In [None]:
df_flights = sns.load_dataset("flights")
df_flights.head(5)

Unnamed: 0,year,month,passengers
0,1949,Jan,112
1,1949,Feb,118
2,1949,Mar,132
3,1949,Apr,129
4,1949,May,121


Data for Heatmap

In [None]:
flights = df_flights.pivot(index = "month", columns = "year", values = "passengers")

In [None]:
flights.head(5)

year,1949,1950,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Jan,112,115,145,171,196,204,242,284,315,340,360,417
Feb,118,126,150,180,196,188,233,277,301,318,342,391
Mar,132,141,178,193,236,235,267,317,356,362,406,419
Apr,129,135,163,181,235,227,269,313,348,348,396,461
May,121,125,172,183,229,234,270,318,355,363,420,472


Data for Year on Year Growth

In [None]:
df_yoy_passangers = df_flights.groupby(['year']).sum('passengers')
df_yoy_passangers = df_yoy_passangers.reset_index()
df_yoy_passangers.head(5)

Unnamed: 0,year,passengers
0,1949,1520
1,1950,1676
2,1951,2042
3,1952,2364
4,1953,2700


In [None]:
df_growth = df_yoy_passangers
df_growth['prev_passengers'] = df_growth['passengers'].shift(1)
df_growth['passenger_growth_yoy'] = (df_growth['passengers'] - df_growth['prev_passengers']) / df_growth['prev_passengers']
df_growth.head(5)

Unnamed: 0,year,passengers,prev_passengers,passenger_growth_yoy
0,1949,1520,,
1,1950,1676,1520.0,0.102632
2,1951,2042,1676.0,0.218377
3,1952,2364,2042.0,0.157689
4,1953,2700,2364.0,0.142132


Plotly Express Heatmap

In [None]:
fig = px.imshow(
        flights,
        text_auto=True,
        title = "Monthly and Yearly Distribution"
    )
fig.show()

Plotly GO Object Heatmap

In [None]:
fig = go.Figure(data=go.Heatmap(
                    z=df_flights['passengers'],
                    x = df_flights['year'],
                    y = df_flights['month'],
                    text = df_flights['passengers'],
                    texttemplate="%{text}",
                    textfont={"size":10}))

fig.show()

Custom Graph containing subplots of Bar Chart and Heatmap

In [None]:
fig = subplots.make_subplots(
    rows=2, cols=1, row_heights=[0.3, 0.7],
    subplot_titles=['YoY Growth', 'Monthly and Yearly Distribution']
)

fig.add_trace(
    go.Bar(
        x=df_growth['year'], y=df_growth['passenger_growth_yoy'], name='YoY Growth',
        marker_color='green'
    ),
    row=1, col=1
)

fig.add_trace(
    go.Heatmap(
        z=df_flights['passengers'],
        x = df_flights['year'],
        y = df_flights['month'],
        text = df_flights['passengers'],
        texttemplate="%{text}",
        textfont={"size":10},
        name='Distribution'
    ),
    row=2, col=1
)

fig.update_layout(
    title='Flight Passengers Evolution 1949-1960',
    height=800,
    yaxis1_tickformat='.2%',
    xaxis2_title='Year', yaxis2_title='Month',
    plot_bgcolor='white'
)

fig.show()

# 🤖 3D Plots

3D Plots allow you to add a 3rd axis to your plots and rotate/pan/zoom on the plot with your mouse and keyboard.  
Let's see them in action on the flights dataset.

In [None]:
df_flights = sns.load_dataset("flights")
df_flights

Unnamed: 0,year,month,passengers
0,1949,Jan,112
1,1949,Feb,118
2,1949,Mar,132
3,1949,Apr,129
4,1949,May,121
...,...,...,...
139,1960,Aug,606
140,1960,Sep,508
141,1960,Oct,461
142,1960,Nov,390


## 3D Scatter Plot

In [None]:
fig = px.scatter_3d(
    df_flights, x='year', y='month', z='passengers',
    color='month', opacity=0.7
)

fig.show()





## 3D Line Plot

In [None]:
fig = px.line_3d(
    df_flights, x='year', y='month', z='passengers',
    color='month'
)

fig.show()





## 3D Surface Plot

Surface Plots require data to be formatted in a grid-like fashion. We'll be using numpy and the griddata function in scipy.interpolate for the heavy lifting.

In [None]:
month_to_int = {
    'Jan': 1,
    'Feb': 2,
    'Mar': 3,
    'Apr': 4,
    'May': 5,
    'Jun': 6,
    'Jul': 7,
    'Aug': 8,
    'Sep': 9,
    'Oct': 10,
    'Nov': 11,
    'Dec': 12
}
df_flights['month'].replace(month_to_int)

Unnamed: 0,month
0,1
1,2
2,3
3,4
4,5
...,...
139,8
140,9
141,10
142,11


In [None]:
fig = go.Figure()

xi = np.linspace(
    min(df_flights['year']),
    max(df_flights['year']),
    num=12
)
yi = np.linspace(
    min(df_flights['month'].replace(month_to_int)),
    max(df_flights['month'].replace(month_to_int)),
    num=12
)

x_grid, y_grid = np.meshgrid(xi,yi)

z_grid = griddata(
    (df_flights['year'],df_flights['month'].replace(month_to_int)),
    df_flights['passengers'],
    (x_grid,y_grid),
    method='cubic'
)

fig.add_trace(
    go.Surface(
        x=x_grid, y=y_grid, z=z_grid,
        colorscale='viridis'
    )
)

fig.show()

# 🤯 Scatter Matrix

A scatter matrix creates multiple 2d scatter plots to explore relationships there might be between column variables.  
$$n = \# columns \implies \# plots = n^2$$

In [None]:
fig = px.scatter_matrix(df_flights)
fig.show()

You can highlight a certain variable with the color parameter.

In [None]:
fig = px.scatter_matrix(df_flights, color='passengers')
fig.show()

This tool is very useful for quickly exploring correlation between variables

# 🌍 Map Scatter Plot

The map plot is my favorite plot, because it allows us the see data on our home, Planet Earth.

In [None]:
df_world_now = df_world.query("year == 2007")
df_world_now.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
11,Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
23,Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
35,Algeria,Africa,2007,72.301,33333216,6223.367465,DZA,12
47,Angola,Africa,2007,42.731,12420476,4797.231267,AGO,24
59,Argentina,Americas,2007,75.32,40301927,12779.37964,ARG,32


In [None]:
fig = px.scatter_geo(
    df_world_now, locations="iso_alpha", color="continent",
    size="pop", hover_name="country",
    projection="orthographic"
)
fig.show()

# ⭕ Polar Charts

Polar Charts display data in polar coordinates.  
Any time you have variables dependent on directions/angles, polar charts are a good idea.

In [None]:
df_wind = px.data.wind()
df_wind.head()

Unnamed: 0,direction,strength,frequency
0,N,0-1,0.5
1,NNE,0-1,0.6
2,NE,0-1,0.5
3,ENE,0-1,0.4
4,E,0-1,0.4


## Scatter Polar

In [None]:
px.scatter_polar(
    df_wind, r='strength', theta='direction',
    color='frequency', size='frequency'
)

## Line Polar

In [None]:
px.line_polar(
    df_wind, r='frequency', theta='direction',
    color='strength', line_close=True, template='plotly_dark'
)

# 🔺 Ternary Plot

Ternary Plot tries to have its cake and eat it too.  
It projects 3d space in 2d space by putting the x, y and z axes on the sides of an equilateral triangle.

In [None]:
df_exp = px.data.experiment()
df_exp.head()

Unnamed: 0,experiment_1,experiment_2,experiment_3,gender,group
0,96.876065,93.417942,73.033193,male,control
1,87.301336,129.603395,66.056554,female,control
2,97.691312,106.187916,103.422709,male,treatment
3,102.978152,93.814682,56.99587,female,treatment
4,87.106993,107.019985,72.140292,male,control


In [None]:
fig = px.scatter_ternary(
    df_exp, a='experiment_1', b='experiment_2', c='experiment_3',
    color='group', hover_name='gender',
)

fig.show()

# 🤔 Advanced Styling 4 | Facets

Facets allow us to easily create multiple versions of the same plot, depending on a certain facet variable.  
Let's look at a simple example

In [None]:
df_tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [None]:
px.scatter(
    df_tips, x='total_bill', y='tip', color='smoker',
    facet_col= 'sex'
)

From a higher dimension geometry point of view, it allows us to see discrete slices along the dimension of the facet variable

We can define an additional facet with the facet_row parameter

In [None]:
px.histogram(
    df_tips, x='total_bill', y='tip', color='sex',
    facet_row='time', facet_col='day',
    category_orders={
        "day": ["Thur", "Fri", "Sat", "Sun"],
        "time": ["Lunch", "Dinner"]
    }
)

Last example

In [None]:
df_att = sns.load_dataset("attention")
df_att.head()

Unnamed: 0.1,Unnamed: 0,subject,attention,solutions,score
0,0,1,divided,1,2.0
1,1,2,divided,1,3.0
2,2,3,divided,1,3.0
3,3,4,divided,1,5.0
4,4,5,divided,1,4.0


In [None]:
df_att = df_att['score'].groupby([df_att['attention'], df_att['solutions'], df_att['subject']]).sum().reset_index()
df_att.head()

Unnamed: 0,attention,solutions,subject,score
0,divided,1,1,2.0
1,divided,1,2,3.0
2,divided,1,3,3.0
3,divided,1,4,5.0
4,divided,1,5,4.0


In [None]:
fig = px.bar(
    df_att, x='solutions', y='score', color='attention', facet_col='subject',
    facet_col_wrap=5, title='Scores Based on Student Attention'
)
fig.show()

# 🤔 Advanced Styling 5 | Animation

Animation let's you animate your plots by changing a certain variable in time.  
animation_frame will be the variable that evolves with time

In [None]:
df_world.head()

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
0,Afghanistan,Asia,1952,28.801,8425333,779.445314,AFG,4
1,Afghanistan,Asia,1957,30.332,9240934,820.85303,AFG,4
2,Afghanistan,Asia,1962,31.997,10267083,853.10071,AFG,4
3,Afghanistan,Asia,1967,34.02,11537966,836.197138,AFG,4
4,Afghanistan,Asia,1972,36.088,13079460,739.981106,AFG,4


In [None]:
px.scatter(
    df_world, x='gdpPercap', y='lifeExp',
    animation_frame='year',
    size='pop', color='continent', hover_name='country',
    log_x=True, size_max=50,
    range_x=[100, 100_000], range_y=[25, 90]
)

In [None]:
px.bar(
    df_world, x='continent', y='pop', color='continent',
    animation_frame='year',
    range_y=[0, 4_000_000_000]
)

# 🙇‍♂️ Conclusion

This marks the end of our tutorial.  
I hope you found this notebook useful/fun!
For more information visit the [plotly python documentation](http://https://plotly.com/python/).