<a href="https://colab.research.google.com/github/PaulToronto/DataCamp-Introduction-to-Data-Visualization-with-Plotly-in-Python/blob/main/1_Introduction_to_Plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Plotly

## Imports

In [51]:
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

In [52]:
from google.colab import drive

drive.mount('/content/gdrive')

drive_path = '/content/gdrive/MyDrive/Colab Notebooks/Data Science/DataCamp Introduction to Data Visualization with Plotly in Python/data/'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


## Plotly and the Plotly Figure

- a JavaScript graphing library
- Plotly has a Python wrapper
- `plotly.express` for low code/low effort option
- customizable
- interactive by default
- Plotly graphs can be created with:
    1. `plotly.express` (`px`)
    2. `plotly.graph_objects` (`go`) for more customization
    3. `plotly.figure_factory` for specific, advanced figures
- Documentation
    - https://plotly.com/python/
    - https://plotly.com/python/graph-objects/ (`go`)
    - https://plotly.com/python/creating-and-updating-figures/
- 3 Main components of a Plotly Figure:
    1. `layout`: Dictionary controlling style of the figure
        - one `layout` per figure
    2. `data`: List of dictionaries setting graph type and data itself
        - Data + type = a `trace`. There are over 40 types
        - Can have multiple traces per graph
    3. `frames`: For animated plots (beyond the scope of this course)
- We can see inside a plotly figure
    ```python
    print(fig)
    ```


In [53]:
figure_config = dict({
    'data': [{'type': 'bar',
              'x': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
              'y': [28, 27, 25, 31, 32, 35, 36]}],
    'layout': {'title': {'text': 'Temperatures of the week',
                         'x': 0.5,
                         'font': {'color': 'red', 'size': 15}}}
    })

In [54]:
fig = go.Figure(figure_config)
print(fig)

Figure({
    'data': [{'type': 'bar',
              'x': [Monday, Tuesday, Wednesday, Thursday, Friday, Saturday,
                    Sunday],
              'y': [28, 27, 25, 31, 32, 35, 36]}],
    'layout': {'template': '...',
               'title': {'font': {'color': 'red', 'size': 15}, 'text': 'Temperatures of the week', 'x': 0.5}}
})


In [55]:
fig.show()

### Fixing a Plotly figure

In [56]:
monthly_sales = {'data': [{'type': '',
                           'x': ['Jan', 'Feb', 'March'],
                           'y': [450, 475, 400]}],
                 'layout': {'title': {'text': ''}}}

In [57]:
monthly_sales['data'][0]['type'] = 'bar'
monthly_sales

{'data': [{'type': 'bar', 'x': ['Jan', 'Feb', 'March'], 'y': [450, 475, 400]}],
 'layout': {'title': {'text': ''}}}

In [58]:
monthly_sales['layout']['title']['text'] = 'Sales for Jan-Mar 2020'
monthly_sales

{'data': [{'type': 'bar', 'x': ['Jan', 'Feb', 'March'], 'y': [450, 475, 400]}],
 'layout': {'title': {'text': 'Sales for Jan-Mar 2020'}}}

In [59]:
fig = go.Figure(monthly_sales)
print(fig)

Figure({
    'data': [{'type': 'bar', 'x': ['Jan', 'Feb', 'March'], 'y': [450, 475, 400]}],
    'layout': {'template': '...', 'title': {'text': 'Sales for Jan-Mar 2020'}}
})


In [60]:
fig.show()

## Univariate Visualizations

- Plotly shortcut methods:
    1. `plotly.express`
        - specify a DataFrame and its columns as arugments
        - quick, but less customization
    2. `graph_objects`
        - `go.X()` methods:
            - `go.Bar()`
            - `go.Scatter()`
            - ...
        - many more customization options but more code needed
- Univariate plots display only one variable
    - for analyzing ***distribution*** of a variable
    - common univariate plots:
        - Bar chart
        - Histogram
        - Box plot
        - Density plots

### Bar plots

In [61]:
weekly_temps = pd.DataFrame({
    'day': ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'],
    'temp': [28, 27, 25, 31, 32, 35, 36]
})

weekly_temps

Unnamed: 0,day,temp
0,Monday,28
1,Tuesday,27
2,Wednesday,25
3,Thursday,31
4,Friday,32
5,Saturday,35
6,Sunday,36


In [62]:
fig = px.bar(data_frame=weekly_temps,
             x='day',
             y='temp')
print(fig)

Figure({
    'data': [{'alignmentgroup': 'True',
              'hovertemplate': 'day=%{x}<br>temp=%{y}<extra></extra>',
              'legendgroup': '',
              'marker': {'color': '#636efa', 'pattern': {'shape': ''}},
              'name': '',
              'offsetgroup': '',
              'orientation': 'v',
              'showlegend': False,
              'textposition': 'auto',
              'type': 'bar',
              'x': array(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',
                          'Sunday'], dtype=object),
              'xaxis': 'x',
              'y': array([28, 27, 25, 31, 32, 35, 36]),
              'yaxis': 'y'}],
    'layout': {'barmode': 'relative',
               'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'template': '...',
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'day'}},
               'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'tex

In [63]:
fig.show()

### Histograms

In [64]:
?pd.read_csv

In [65]:
penguins = pd.read_csv(drive_path + 'penguins.csv', index_col=0)
penguins.head(3)

Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments
1,PAL0708,1,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,2007-11-11,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.
2,PAL0708,2,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A2,Yes,2007-11-11,39.5,17.4,186.0,3800.0,FEMALE,8.94956,-24.69454,
3,PAL0708,3,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N2A1,Yes,2007-11-16,40.3,18.0,195.0,3250.0,FEMALE,8.36821,-25.33302,


In [66]:
penguins.info()

<class 'pandas.core.frame.DataFrame'>
Index: 344 entries, 1 to 344
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   studyName            344 non-null    object 
 1   Sample Number        344 non-null    int64  
 2   Species              344 non-null    object 
 3   Region               344 non-null    object 
 4   Island               344 non-null    object 
 5   Stage                344 non-null    object 
 6   Individual ID        344 non-null    object 
 7   Clutch Completion    344 non-null    object 
 8   Date Egg             344 non-null    object 
 9   Culmen Length (mm)   342 non-null    float64
 10  Culmen Depth (mm)    342 non-null    float64
 11  Flipper Length (mm)  342 non-null    float64
 12  Body Mass (g)        342 non-null    float64
 13  Sex                  333 non-null    object 
 14  Delta 15 N (o/oo)    330 non-null    float64
 15  Delta 13 C (o/oo)    331 non-null    float64


In [67]:
fig = px.histogram(data_frame=penguins,
                   x='Body Mass (g)',
                   nbins=10)
print(fig)

Figure({
    'data': [{'alignmentgroup': 'True',
              'bingroup': 'x',
              'hovertemplate': 'Body Mass (g)=%{x}<br>count=%{y}<extra></extra>',
              'legendgroup': '',
              'marker': {'color': '#636efa', 'pattern': {'shape': ''}},
              'name': '',
              'nbinsx': 10,
              'offsetgroup': '',
              'orientation': 'v',
              'showlegend': False,
              'type': 'histogram',
              'x': array([3750., 3800., 3250., ..., 3775., 4100., 3775.]),
              'xaxis': 'x',
              'yaxis': 'y'}],
    'layout': {'barmode': 'relative',
               'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'template': '...',
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Body Mass (g)'}},
               'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'count'}}}
})


In [68]:
fig.show()

#### Useful histogram arguments

- `orientation`: vertically ('v') or horizontally ('h')
- `histfuc`: choose bin aggregation function, default is 'count'


### Box (and whisker) plots

In [69]:
penguins.head(1)

Unnamed: 0,studyName,Sample Number,Species,Region,Island,Stage,Individual ID,Clutch Completion,Date Egg,Culmen Length (mm),Culmen Depth (mm),Flipper Length (mm),Body Mass (g),Sex,Delta 15 N (o/oo),Delta 13 C (o/oo),Comments
1,PAL0708,1,Adelie Penguin (Pygoscelis adeliae),Anvers,Torgersen,"Adult, 1 Egg Stage",N1A1,Yes,2007-11-11,39.1,18.7,181.0,3750.0,MALE,,,Not enough blood for isotopes.


In [81]:
fig = px.box(data_frame=penguins,
             y='Flipper Length (mm)')
print(fig)

Figure({
    'data': [{'alignmentgroup': 'True',
              'hovertemplate': 'Flipper Length (mm)=%{y}<extra></extra>',
              'legendgroup': '',
              'marker': {'color': '#636efa'},
              'name': '',
              'notched': False,
              'offsetgroup': '',
              'orientation': 'v',
              'showlegend': False,
              'type': 'box',
              'x0': ' ',
              'xaxis': 'x',
              'y': array([181., 186., 195., ..., 193., 210., 198.]),
              'y0': ' ',
              'yaxis': 'y'}],
    'layout': {'boxmode': 'group',
               'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'template': '...',
               'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0]},
               'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Flipper Length (mm)'}}}
})


In [82]:
fig.show()

#### Useful box plot arguments

- `hover_data`: a list of column names to display on hover
    - useful to understand outliers
- `points`: further specify how to show outliers

### Student scores bar graph