<a href="https://colab.research.google.com/github/EmAchieng/DataSciencePracticeSeries/blob/master/Plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a id='Q0'></a>
<center><a target="_blank" href="http://www.propulsion.academy"><img src="https://drive.google.com/uc?id=1McNxpNrSwfqu1w-QtlOmPSmfULvkkMQV" width="200" style="background:none; border:none; box-shadow:none;" /></a> </center>
<center> <h1> Day 2, Tutorial 5: Introduction to Plotly </h1> </center>
<p style="margin-bottom:1cm;"></p>
<center><h4>Propulsion Academy, 2021</h4></center>
<p style="margin-bottom:1cm;"></p>

<div style="background:#EEEDF5;border-top:0.1cm solid #EF475B;border-bottom:0.1cm solid #EF475B;color:#303030">
    <div style="margin-left: 0.5cm;margin-top: 0.5cm;margin-bottom: 0.5cm">
        <p><strong>Goal:</strong> Learn how to produce interactive graphs with Python, using Plotly </p>
        <strong> Outline:</strong>
        <a id="P0" name="P0"></a>
        <ol>
            <li> <a href='#I'>Introduction </a> </li>
            <li> <a href='#SU'>Set up</a></li>
            <li> <a href='#P1'>Create and Save Plotly Figures</a></li>
            <li> <a href='#P2'>Common Plots Statistical Plots</a></li>
            <li> <a href='#CL'>Conclusion</a></li>
        </ol>
        <strong>Topics Trained:</strong> data input/output, data cleaning, operations on data
    </div>
</div>

<nav style="text-align:right"><strong>
        <a style="color:#00BAE5" href="https://monolith.propulsion-home.ch/backend/api/momentum/materials/intro-2-ds-materials/index.html" title="momentum"> Introduction to Data Science </a>|
        <a style="color:#00BAE5" href="https://monolith.propulsion-home.ch/backend/api/momentum/materials/intro-2-ds-materials/weeks/week1/day2/index.html" title="momentum">Day 2, Data Wrangling </a>|
        <a style="color:#00BAE5" href="https://monolith.propulsion-home.ch/backend/api/momentum/materials/intro-2-ds-materials/weeks/week1/day2/pages/materials.html" title="momentum"> Live Coding 5, introduction to Plotly</a>
</strong></nav>

<a id='I' name="I"></a>
## [Introduction](#P0)

We saw how easy it is to import data in Python using pandas. We also learned how to explore data, perform simple statistical operations and produce some basic plots. In this session, we will focus on one specific package that allows us to produce visualizations for reports, webpages and presentations: Plotly

<a id='SU' name="SU"></a>
## [Set up](#P0)

### packages updates and installations

We upgrade plotly to the latest version:

In [None]:
!pip install --upgrade plotly

Collecting plotly
[?25l  Downloading https://files.pythonhosted.org/packages/1f/f6/bd3c17c8003b6641df1228e80e1acac97ed8402635e46c2571f8e1ef63af/plotly-4.14.3-py2.py3-none-any.whl (13.2MB)
[K     |████████████████████████████████| 13.2MB 326kB/s 
Installing collected packages: plotly
  Found existing installation: plotly 4.4.1
    Uninstalling plotly-4.4.1:
      Successfully uninstalled plotly-4.4.1
Successfully installed plotly-4.14.3


We install kaleido to be able to export plotly figures in a static format:

In [None]:
!pip install -U kaleido

Collecting kaleido
[?25l  Downloading https://files.pythonhosted.org/packages/ae/b3/a0f0f4faac229b0011d8c4a7ee6da7c2dca0b6fd08039c95920846f23ca4/kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9MB)
[K     |████████████████████████████████| 79.9MB 48kB/s 
[?25hInstalling collected packages: kaleido
Successfully installed kaleido-0.2.1


Let's now check plotly's version:

In [None]:
import plotly
plotly.__version__

'4.14.3'

In [None]:
import pandas as pd

### Google Drive Connection

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Other Packages imports

In [None]:
import plotly.graph_objects as go
import plotly.io as pio
import plotly.express as px
from plotly.subplots import make_subplots

import numpy as np
import os

### User Defined variables

In [None]:
tutorial_path = "/content/drive/MyDrive/Introduction2DataScience/tutorials/"

In [None]:
if not os.path.exists(f"{tutorial_path}images"):
    os.mkdir(f"{tutorial_path}images")

FileNotFoundError: ignored

<a id='P1' name="P1"></a>
## [Create and Save Plotly Figures](#P0)

In this section, we will see how to create, modify and save a plotly figure. As an example, we aim at making a scatter plot using these 2 vectors

In [None]:
x = [0, 1, 2, 3, 4]
y = [10., 12.5, 4.5, 5.6, 20.]

### Plotly figures: graph_objects.Figure

The simplest way to create a figure with plotly is to use the function `Figure()` from submodule `graph_objects` (we already imported this module as `go`):



In [None]:
fig = go.Figure()

Once you created the figure `fig`, you can add data to plot using the `add_trace` method:

In [None]:
_ = fig.add_trace(go.Scatter(
    x=x,
    y=y,
    ))

Finally, you can use the method `show()` to display the figure:

In [None]:
fig.show()

Alternativley, the plotly team has implemented a series of functions to allow you to create common plots much faster. Those are defined in the submodule `plotly.express` (we already imported it as `px`):

In [None]:
fig = px.scatter(x=x,y=y)

Likewise, you can display the figure in your notebook by typing:

In [None]:
fig.show()

### Saving a Figure

You can save the figure in the most common formats using the figure method `write_image`. To do so, simply specify the full path for your image, including image name and extension. Plotly will deduce the format in which the image needs to be solved automatically:

In [None]:
fig.write_image(f"{tutorial_path}images/fig1.png")

In [None]:
fig.write_image(f"{tutorial_path}images/fig1.jpeg")

In [None]:
fig.write_image(f"{tutorial_path}images/fig1.svg")

In [None]:
fig.write_image(f"{tutorial_path}images/fig1.pdf")

Alternatively, you can save your image in html, wich preserves its interactive elements. To do this, use the method `write_html()` with the full path where you want your image to be saved:

In [None]:
fig.write_html(f"{tutorial_path}images/fig1.html")

### Common layout adjustments

#### main title and axis titles

In [None]:
fig.update_layout(title="Evolution of y as a function of x", yaxis_title='value', xaxis_title="time")

#### figure size 

values are in pixels.


In [None]:
fig.update_layout(width=700, height=300)

#### templates

You can quickly adapt the style of the plots by using templates:

In [None]:
pio.templates

In [None]:
template = ["plotly", "plotly_white", "plotly_dark", "ggplot2", "seaborn", "simple_white", "none"]

In [None]:
fig.update_layout(template=template[1])

further info on templates [here](https://plotly.com/python/templates/).

<a id='P2' name="P2"></a>
## [Common plots](#P0)

As we progress in the course, we will encounter various plots. Let's checkout some standard plots. For this, we will use plotly example tips data set:

In [None]:
df = px.data.tips()

In [None]:
df.head()

### Single Variable Distribution

use histogram to plot eithe continuous variables:

In [None]:
fig = px.histogram(df, x="tip", nbins=20)
fig.show()

or categories:

In [None]:
fig = px.histogram(df, x="time")
fig.show()

### Relations Between Variables

The function px.scatter_matrix will plot a grid of scatter plots showing the distribution of points for each pair of variables:

In [None]:
fig = px.scatter_matrix(df)
fig.show()

the function `px.density_heatmap()` will show 2 dimensional histograms of numerical values, along their marginal 1D histograms:

In [None]:
fig = px.density_heatmap(df, x="total_bill", y="tip", marginal_x="histogram", marginal_y="histogram")
fig.show()

You can also plot the data in 3D and use the interaction functionality to visualize the clouds of points:

In [None]:
fig = px.scatter_3d(df, x='total_bill', y='tip', z='size',
              color='sex')
fig.show()

### Numeric Variables as a function of Categorical 

There are serveral plots you can make to look at how numerical data vary as a function of categorical variables. The simplest one is to make a scatter plot with categories in a different color:

In [None]:
fig = px.scatter(df, x="total_bill", y="tip", color="day")
fig.show()

you can also plot each category in a separate plot:

In [None]:
fig = px.scatter(df, x="total_bill", y="tip", facet_col="day")
fig.show()

You can plot summary statistics for each category using a boxplot (shows the median, first and third quartiles as well as min and max/lower/upper fence):

In [None]:
fig = px.box(df, y="total_bill", x="day", color='time')
fig.show()

or you can represent the estimated density using violinplots:

In [None]:
fig = px.violin(df, y="total_bill", x="day", color = 'sex')
fig.show()

In [None]:
fig = px.parallel_categories(df, color='total_bill')
fig.show()

<a id='CL' name="CL"></a>
## [Conclusion](#P0)

Plotly implements many more visualizations in a minimalistic interface, for example, notice how many lines of code are needed to produce this GDP/Life Expectancy scatter plot animation:

In [None]:
df = px.data.gapminder()
fig = px.scatter(df, x="gdpPercap", y="lifeExp", animation_frame="year", animation_group="country",
           size="pop", color="continent", hover_name="country",
           log_x=True, size_max=55, range_x=[100,100000], range_y=[25,90])
fig.show()

The best way to learn about Plotly is to check their [documentation page](https://plotly.com/python/), and especially their gallery of example!

Now, let's practice by revisiting our exploratory data analysis notebook with Plotly visualizations!