# Visualization with `Plotly Express`

Reference: https://plotly.com/python/plotly-express/

"Plotly Express provides more than 30 functions for creating different types of figures. The API for these functions was carefully designed to be as consistent and easy to learn as possible, making it easy to switch from a scatter plot to a bar chart to a histogram to a sunburst chart throughout a data exploration session."

**Table of Content**

1. [Create a barchart](#sec1)
2. [Create a scatterplot](#sec2)
3. [Create a heatmap](#sec3)

Before starting, check if you need to install the plotly.express module.

In [1]:
import plotly.express as px
import json

Plotly Express has several built-in demo datasets which we can use. Here is the list of such datasets:

In [None]:
print(dir(px.data))

In [3]:
# we are going to use the tips dataset
tips = px.data.tips()

The type of the `tips` dataset is a Pandas dataframe.

In [4]:
type(tips)

pandas.core.frame.DataFrame

In [5]:
tips.head() # to see how our dataset looks like

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [6]:
tips.shape # the dimension

(244, 7)

<a id="sec1"></a>
## 1. Create a Barchart

First we prepare the data we want to visualize, then we invoke the function to do so.

In [7]:
# we want to plot the total bills received by female and male employees
female_tips = sum(tips[tips["sex"]=="Female"]["total_bill"]) # notice that we're filtering by sex first
male_tips = sum(tips[tips["sex"]=="Male"]["total_bill"])

In [8]:
female_tips

1570.9499999999998

In [9]:
male_tips

3256.8200000000024

In Plotly Express, visualization functions have default values, so one can create graphs with very few arguments. Notice below that we are only providing the values for x and y.

In [10]:
fig = px.bar(x=["Female", "Male"], y=[female_tips, male_tips])
fig.show() # to show the plot

We can also use the dataset directly to create a kind of bar chart were each observation is a line and they are stacked.

In [11]:
fig = px.bar(tips, x="sex", y="total_bill")
fig.show() # to show the plot

Notice that the bars have the same height in both barcharts, but the second visualization was created by Plotly without us having to generate the bar heights through formulas. 

Also, the "white lines" in each bar represent the singular data points that are contributing in building the bar. Hover over each line to read the information about the data point.

<a id="sec2"></a>
## 2. Create a scatterplot

For this, let's use the election dataset

In [12]:
election = px.data.election()
election.head()

Below, we will create a scatter plot of the number of votes for Coderre against that for Joly. Scatterplots are always good to compare two variables to one another.

One great thing about Plotly is that it is interactive (e.g. can show information as we hover, can have sliders, etc.).

In [13]:
fig1 = px.scatter(election, 
                  x="Bergeron", 
                  y="Coderre", 
                  hover_data=["district"], 
                  color="winner", # the variable that specifies the number of colors to use
                  size="total")   # the size of each bubble in the plot
fig1.show()

There are many optional arguments that we can supply to the `px.scatter` function to create many cool features, like regression line, facets, etc. Complete reference: https://plotly.com/python/line-and-scatter/

<a id="sec3"></a>
## 3. Create a heatmap

For this, we will import our own dataset, `positional.json`. This dataset is a dictionary, with ten news sources as its keys. The value of each key is a list of the likelihood of seeing an article from that news source given that we are looking at the the first until tenth positions of the Google's Top Stories panel.

This dataset was created from data similar to "Women for President" (recall the Analysis Question). In addition to the media source, the 'position' key of each article was used.

In [14]:
with open('positional.json', 'r') as infile:
    pos = json.load(infile)
print(pos)

For the heatmap, the `x` axis will show the first-tenth positions of the articles on Google Top Stories panel. The `y` axis will show the name of the news sources. The saturation/density of the color in each grid (`z`) represents the likelihood of seeing an article from `x` news source given the position `y`.

The dataset is presented in matrix form.

In [15]:
fig3 = px.imshow([ii for ii in pos.values()], 
                 color_continuous_scale=px.colors.diverging.Fall, # the color palette name
                x=[1,2,3,4,5,6,7,8,9,10],
                y=list(pos.keys()),
                labels={
                    'x':"article position",
                    'y':"news source"
                })
fig3.update_xaxes(side="top")

fig3.show()

If you don't like the colors, you can pick another color palette. Below is the list of all the available palettes:

In [16]:
dir(px.colors.diverging)

### Your Turn

Try out different palettes and show below your favorite.

There are so many other cool things that can be done with plotly, like maps, animations, etc. Be sure to check out https://plotly.com/python/plotly-express/.