# Interactive Visualization with Plotly - 2

## Advanced plots

This is the second lesson in plotly visualization where we will learn some advanced visualization in plotly. These visualizations are basically specialized plots which convey rich statistical observations rather than simply representing data on a graph. Hence these plots are being termed as advanced plots within this course.

### Histograms

Histograms are the most common and basic plots used to showcase the distribution of a dataset/feature. The premise for construction of a histogram is a frequency chart.
* Lets say you have 100 observations in the range of A to B.
* The range A to B is divided into buckets each of equal size, if the range between A and B is divided into 'n' equally sized buckets (or intervals, as they are statistically referred to), then we say that the range A to B has "n classes".
* Observations belonging to each interval are counted. Total observations in a given interval/bucket determine its frequency count.
* Intervals are plotted on the x-axis and the frequency is plotted on the y-axis. Each interval has a bar, the length of which is determined by the frequency count.
* The bars do not have any gap between them, thus denoting a continuous range being broken into intervals. This is a distinction between histogram and bar chart, where bars in bar chart have a gap between them denoting that each bar is a separate category.

The histogram chart is pretty easy to construct. The data series/feature can be used to construct a 'Histogram' object from the graph objects module. The data object can then be used in the iplot function to visualize the plot.

Here is sample code.

```python
# Importing libraries
import plotly.graph_objects as go
import numpy as np

# Creating data
x = np.random.randn(500)
data = [go.Histogram(x=x)]

# Visualizing plot
go.Figure(data)
```

#### Exercise

Create a histogram for the following data:

<u>Data History:</u>
* A large lighting company has recently created a new LED light which is pending a patent.
* In order to test the design, the company has manufactured a batch of 1000 samples and has recorded various metrics for each bulb - lumen intensity, color temperature, energy consumption etc.
* The data dictionary is as given below:
    1. <i>Tester_ID</i> - This is the unique id assigned to each test unit/bulb.
    2. <i>lumen_intensity</i> - This is the average intensity of light emitted by the test unit over the period of testing. It is measured in lumens.
    3. <i>color_temperature</i> - This is the color of the light emitted by the test unit. It is measured in degrees Kelvin.
    4. <i>energy_consumption</i> - This is the amount of energy consumed by the test unit. It is measured in Watts.
    5. <i>component_temperature</i> - This is the temperature of surrounding components that dissipate energy in the form of heat. It is measure in degrees Celsius.
    6. <i>time_to_failure</i> - This is the estimated lifespan of the test unit. This is measure in years and is calculated using a function dependent upon many variables, including but not limited to - lumen_intensity, energy_consumption, component_temperature etc.
* In this example we will use the "energy_consumption" metric for a histogram visualization.
* Refer to above example code for construction of the histogram.

In [1]:
# Link to dataset
import pandas as pd
led_data = pd.read_csv("../../../data/LED_bulb.csv")
led_data.head()
# Write your code below


Unnamed: 0,Tester_ID,lumen_intensity,color_temperature,energy_consumption,component_temperature,time_to_failure
0,1,1627,4952,16.43,21.1,13.77
1,2,1744,4952,19.38,21.38,13.31
2,3,1763,4956,21.5,21.15,13.52
3,4,1765,4969,20.76,21.94,13.71
4,5,1784,4953,20.99,21.03,13.35


```python
# Importing libraries
import plotly.graph_objects as go
import numpy as np

# Creating data
x = list(led_data['energy_consumption'])
data = [go.Histogram(x=x)]

# Visualizing plot
go.Figure(data)
```

### Box plots

Box plots or Box and whisker plots are one of the most common plots to show distribution of a data set/feature. As its name suggests, it has a box and whiskers. The length of the total box and whisker depict the range of the dataset. The length of the box and the whiskers is determined by the spread of data. The whiskers showcase the top and bottom 25th percentiles of data, whereas the box depicts the middle half, i.e., 25th to 75th percentile of the data. The longer the box/whisker, the more spread out the data would be. There is a line within the box, which depicts the 50th percentile mark of the data. This mark denotes the 'median', one of the measures of central tendency which you will learn later in this course.

In order to create a box plot, we need to create a trace using the 'box' object from the graph objects library. It works similar to most plotly graph object creations. Refer to the example code given by the plotly documentation below:

```python
# Importing numpy to help create sample data
import numpy as np

# Creating sample data to plot
y0 = np.random.randn(50)-1
y1 = np.random.randn(50)+1

# Creating trace using 'Box' graph object
trace0 = go.Box(
    y=y0
)

# Creating a second 'Box' object - you can expect to see two box plots in the output plot
trace1 = go.Box( 
    y=y1
)

# Creating data object
data = [trace0, trace1]

# Visualizing plot
go.Figure(data)
```

We can also create grouped box plots to visualize and compare distributions of multiple features and also how they may change upon another variable. Here is an example of a grouped box plot. Note that some formatting options have been used in the marker dictionary to edit the colors of the boxplot.

```python
# Defining X-vlaues
x = ['day 1', 'day 1', 'day 1', 'day 1', 'day 1', 'day 1',
     'day 2', 'day 2', 'day 2', 'day 2', 'day 2', 'day 2']

# Defining first trace object - boxplot of category 1, kale
trace0 = go.Box(
    y=[0.2, 0.2, 0.6, 1.0, 0.5, 0.4, 0.2, 0.7, 0.9, 0.1, 0.5, 0.3],
    x=x,
    name='kale',
    marker=dict(
        color='#3D9970'
    )
)

# Defining second trace object - boxplot of category 2, radishes
trace1 = go.Box(
    y=[0.6, 0.7, 0.3, 0.6, 0.0, 0.5, 0.7, 0.9, 0.5, 0.8, 0.7, 0.2],
    x=x,
    name='radishes',
    marker=dict(
        color='#FF4136'
    )
)

# Defining third trace object - boxplot of category 3, radishes
trace2 = go.Box(
    y=[0.1, 0.3, 0.1, 0.9, 0.6, 0.6, 0.9, 1.0, 0.3, 0.6, 0.8, 0.5],
    x=x,
    name='carrots',
    marker=dict(
        color='#FF851B'
    )
)

# Defining the data object with all 3 traces
data = [trace0, trace1, trace2]

# Editing layout to tell plotly that it is a grouped bar chart
layout = go.Layout(
    yaxis=dict(
        title='normalized moisture',
        zeroline=False
    ),
    boxmode='group'
)

# Creating figure object
fig = go.Figure(data=data, layout=layout)

# Visualizing the plot
fig.show()
```

#### Exercise

Create a boxplot for the feature 'energy_consumption' from the above LED_bulb dataset.

In [2]:
# Link to dataset
import pandas as pd
led_data = pd.read_csv("../../../data/LED_bulb.csv")
led_data.head()
# Write your code below


Unnamed: 0,Tester_ID,lumen_intensity,color_temperature,energy_consumption,component_temperature,time_to_failure
0,1,1627,4952,16.43,21.1,13.77
1,2,1744,4952,19.38,21.38,13.31
2,3,1763,4956,21.5,21.15,13.52
3,4,1765,4969,20.76,21.94,13.71
4,5,1784,4953,20.99,21.03,13.35


```python
# Loading the data set
led_data = pd.read_csv("../../../data/LED_bulb.csv")

trace_energy = go.Box(
    y=led_data['energy_consumption']
)

# Creating data object
data = [trace_energy]

# Visualizing plot
go.Figure(data)
```

### Dist plots

The distribution plots are another visualization to showcase the spread of a data set. The distribution plots often overlap the histogram, which shows frequency of observations in discrete classes, with a distribution curve that presents a continuous visualization of the distribution.

In order to create a distribution plot, we need to:
* Import the figure factory module from plotly
* Identify the data and labels (if there are multiple data series or series whose distributions are being plotted)
* Create a distplot figure using the create_distplot method from figure factory module, using data and labels as parameters
* Visualize the plot

An example code from plotly documentation, for creation of a distplot is given below.

```python
# Import figure factory module. Provide ff as an alias
import plotly.figure_factory as ff

# Importing numpy to create data
import numpy as np

# Creating data and labels
x = np.random.randn(1000)  
hist_data = [x]
group_labels = ['distplot']

# Creating a distplot using create_distplot method
fig = ff.create_distplot(hist_data, group_labels)

# Visualize the plot
fig.show()
```

We can also create a distplot with multiple datasets/features (or as referred to earlier as multiple data series). Below is an example to show multiple data series distplot.

```python
# Import figure factory module. Provide ff as an alias
import plotly.figure_factory as ff

# Importing numpy to create data
import numpy as np

# Adding histogram data
x1 = np.random.randn(200)-2  
x2 = np.random.randn(200)  
x3 = np.random.randn(200)+2  
x4 = np.random.randn(200)+4  

# Grouping data together and assigning labels to each data series
hist_data = [x1, x2, x3, x4]

group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']

# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2)

# Visualize the Plot!
fig.show()
```

#### Exercise

Plot the distributions of 'energy_consumption' and 'component_temperature' features from the LED_bulb data set.

In [3]:
# Link to dataset
import pandas as pd
led_data = pd.read_csv("../../../data/LED_bulb.csv")
led_data.head()
# Write your code below


Unnamed: 0,Tester_ID,lumen_intensity,color_temperature,energy_consumption,component_temperature,time_to_failure
0,1,1627,4952,16.43,21.1,13.77
1,2,1744,4952,19.38,21.38,13.31
2,3,1763,4956,21.5,21.15,13.52
3,4,1765,4969,20.76,21.94,13.71
4,5,1784,4953,20.99,21.03,13.35


```python
# Solution

# Import figure factory module. Provide ff as an alias
import plotly.figure_factory as ff

# Extracting data from source
x_1 = led_data['energy_consumption']
x_2 = led_data['component_temperature']

# Defining data and labels
hist_data = [x_1, x_2]

labels = ['energy consumption', 'component temperature']

# Creating distplot
fig = ff.create_distplot(hist_data, labels)

# Visualize the Plot!
fig.show()
```

### Heatmap

A heatmap is a cross-table like grid, where each cell of the grid corresponds to an x,y and z value, i.e., each value can represent a 3-dimensional data point. The x value will be on x-axis, y on y-axis and the z value is generally represented by a color. The x and y values determine the position on the grid, i.e., an observation with x,y values as (2,3) would be a cell on the second row and third column of the grid. The z-value, represented visually with wavelength of the color, varies from blue to red, similar to cool and hot temperatures, hence the name heatmap. These colors can be changed to suit needs. The varying intensity of color shows varying intensity of the z-feature.

<b>Note:</b> The x and y values on the heatmap are generally discrete or categorical in nature. 

In order to create a heatmap we should create a trace using the 'Heatmap' method from graph objects module. The z values should be supplied as a 2-dimensional matrix of values. The index of each value of the 2D matrix determines which x,y position in the grid that specific observation would be placed. We may also set x and y labels, by passing 1D arrays.

Basic Heatmap example from plotly:

```python
# Create trace using Heatmap method. z is the 2D array
trace = go.Heatmap(z=[[1, 20, 30],
                      [20, 1, 60],
                      [30, 60, 1]])
# Visualize plot
go.Figure([trace])
```

Heatmap with custom x and y labels, where x and y are categorical. Example from plotly documentation:

```python
# Create trace. x and y have custom labels defined using a 1D array
trace = go.Heatmap(z=[[1, 20, 30, 50, 1], [20, 1, 60, 80, 30], [30, 60, 1, -10, 20]],
                   x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
                   y=['Morning', 'Afternoon', 'Evening'])
# Visualize plot
go.Figure([trace])
```

#### Exercise

Create a heatmap using the following data:
* z = [[2.51,2.13,2.57,2.78,2.43],[1.05,1.22,1.34,0.98,0.95],[14.85,15.34,16.01,15.87,15.65]]
* x = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
* y = ['Copper','Aluminum','Silver']

The z feature is traded price list of each of the 3 metals specified in y array on each of the days specified in x array.

In [4]:
# Write your code below


```python
# Create trace. x and y have custom labels defined using a 1D array
trace = go.Heatmap(z=[[2.51,2.13,2.57,2.78,2.43],[1.05,1.22,1.34,0.98,0.95]],
                   x=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'],
                   y=['Copper','Aluminum'])
# Visualize plot
go.Figure([trace])
```

### Time series plot

A time series plot is a graph where time is the feature that is represented on the x-axis. The plot enables visualization of trends of a feature across time and is a key part in many time based predictive models. 

A time series plot can be created using the Scatter object from graph objects module. A time series plot can accept time as both datetime objects and as strings. Let us see examples of both from the plotly documentation.

Example of datetime object
```python
# Importing datetime to create a datetime object
from datetime import datetime
import pandas_datareader.data as web

# Importing the dataset (The IEX Cloud API key can be obtained on https://iexcloud.io/)
df = web.DataReader("C", 'iex', datetime(2015, 1, 1),
                    datetime(2016, 7, 1))

#df.head()
# Creating Scatter object with series of datetime objects as x and series of daily highs of Apple's stock as y
data = [go.Scatter(x=df.index, y=df.close)]

# Visualizing the plot
go.Figure(data)
```

Example of date strings
```python
# Importing pandas to read from csv file
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv")

# Creating Scatter object with series of datetime objects as x and series of closing price of Apple's stock as y
data = [go.Scatter(
          x=df.Date,
          y=df['AAPL.Close'])]

go.Figure(data)
```

#### Exercise

Create a time series plot by following the below instructions:
* Consider the LED_bulb data set
* Assume that each bulb whose observations were recorded in the data set, were each tested on exactly one day
* Lets say bulb with Tester_ID 1 was tested on January 1st 2014, 2 was tested on January 2nd 2014 and so on...
* Plot a time series plot showing the trend in lumen_intensity, if any. Plot the lists of dates you would generate separately as x-axis and plot lumen_intensity feature from LED_bulb dataset on the y-axis

In [5]:
# Link to dataset
import pandas as pd
led_data = pd.read_csv("../../../data/LED_bulb.csv")
led_data.head()
# Write your code below


Unnamed: 0,Tester_ID,lumen_intensity,color_temperature,energy_consumption,component_temperature,time_to_failure
0,1,1627,4952,16.43,21.1,13.77
1,2,1744,4952,19.38,21.38,13.31
2,3,1763,4956,21.5,21.15,13.52
3,4,1765,4969,20.76,21.94,13.71
4,5,1784,4953,20.99,21.03,13.35


```python
# Import timedelta to add/increment dates
from datetime import datetime
from datetime import timedelta

# Creating the time feature of dates starting from January 1st 2014 to a 1000 days after
x=[]
d = datetime(2014,1,1)
for i in range(0,len(led_data)):
    x.append(d+timedelta(days=i))

# Creating data object using time and lumen intensity features
data = [go.Scatter(
          x=x,
          y=led_data['lumen_intensity'])]

# Customizing y-axis range to lay focus on fluctuations of lumen intensity
layout = dict(
    yaxis = dict(
        range = [1775,1825])
)

# Creating figure dictionary with data and layout
fig = dict(data=data, layout=layout)

# Visualizing the plot
go.Figure(fig)
```