<a href="https://colab.research.google.com/github/biku1998/Plotly-Practice/blob/master/Getting_started_with_plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas as pd 
import numpy as np 
import plotly 
import plotly.graph_objs as go
import plotly.figure_factory as ff

np.random.seed(42)

## In this notebook we will see all the basic plot offered by *plotly data visualization* library

### Some important pointers:
* At a very high level for plotting any figure you need 2 things.
  1. data : what type of data you need to plot and which plot you want to use.
  2. Layout : how the plot will look, like plot title, xaxis title, yaxis title etc.. 

### Enough of the theory, Let's get our hands dirty

---



---



####  Scatter Plot : Comparision of 2 Variable for a set of data. Depending on the trend of scatter points, we could interpret a *Correlation*

In [2]:
# Let's create some random data 

random_X = np.random.randint(1,101,100)
random_Y = np.random.randint(1,101,100)

# data and type of plot 
# Inside the type of plot, you will put your data.

# we also have to define the mode in our scatter plot.
# modes can be [markers, markers+lines, lines]

# plotly uses dict a lot, 90 percent of the time you will define properties of
# an instance into a dictionary.

# the data object 
data = go.Scatter(x=random_X,y=random_Y,mode = 'markers',
	marker = dict(
		size = 14, # size of marker
		color = 'rgb(220,154,96)', # color of marker
		symbol = 'circle', # symbol of marker
		line  = dict(width =2) # circumference line width
		)
	)

# the layout object 
layout = go.Layout(
	title = "random X-Y scatter plot", # plot title
	xaxis = dict(title='random X'),# x axis title
	yaxis = dict(title ='random Y'), # y axis title
	hovermode = 'closest' # hovermode : what to show when mouse pointer hover over any data point.
	)

fig = go.Figure(data=data,layout = layout) # construct figure with data and layout.
# Parameter's naming is imp i.e data = data or layout = layout, sometimes it throughs error when not done

fig.show() # finally show the plot

#### Line plot : A line chart displays a series of data points(markers) connected by line segments. It is similar to scatter plot except that measurment points are ordered (typically by x axis value ) and joined with straight line segments.
  Often used to visualize a trend in data over a intervals of time .i.e. time series data

 * There are total 3 different line styles : lines, lines+markers, markers

In [3]:
x_values  = np.linspace(0,1,100)
y_values = np.random.randn(100)

# Top plot more than one data, we use something called trace.
# A single Trace contains a plot type and data required by this plot.

# we are going to show all 3 line styles
# we will manipulate the y values a little bit such that no plot
# ends up on top of each other

trace_1 = go.Scatter(x=x_values,y=y_values+5,mode = 'markers',name='markers')

trace_2 = go.Scatter(x=x_values,y=y_values,mode = 'lines',name='lines')

trace_3 = go.Scatter(x=x_values,y=y_values-5,mode = 'markers+lines',
	name='markers+lines')


data = [trace_1,trace_2,trace_3]

layout = go.Layout(title = 'Line charts')

fig = go.Figure(data = data,layout  = layout)

fig.show()

#### Bar charts : A Bar chart presents **categorical data** with rectangular bar with heights (or lengths) propostional to the values that they represent.

#### **Remember use bar charts with categorical data, not continuous data**(use histogram for that)

In [4]:
# Let's First download the datasets we will use.

!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2018WinterOlympics.csv 

--2020-01-23 10:03:28--  https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2018WinterOlympics.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 651 [text/plain]
Saving to: ‘2018WinterOlympics.csv.1’


2020-01-23 10:03:28 (128 MB/s) - ‘2018WinterOlympics.csv.1’ saved [651/651]



In [5]:
df = pd.read_csv('2018WinterOlympics.csv')
print(df.columns)
df.head()

Index(['Rank', 'NOC', 'Gold', 'Silver', 'Bronze', 'Total'], dtype='object')


Unnamed: 0,Rank,NOC,Gold,Silver,Bronze,Total
0,1,Norway,14,14,11,39
1,2,Germany,14,10,7,31
2,3,Canada,11,8,10,29
3,4,United States,9,8,6,23
4,5,Netherlands,8,6,6,20


In [6]:

# Let's Start with simple Bar charts 

data = go.Bar(x=df['NOC'],y=df['Total'])

layout = go.Layout(title = "Winter Olympics Analysis")

fig = go.Figure(data = data,layout = layout)

fig.show()

In [7]:
# Let's see multi-level categorical feaures

df = pd.read_csv('2018WinterOlympics.csv')


trace_1 = go.Bar(x=df['NOC'],y=df['Gold'],marker = dict(color = '#FFD700'),
		name = 'Gold')

trace_2 = go.Bar(x=df['NOC'],y=df['Silver'],marker = dict(color = '#9EA0A1'),
		name = 'Silver')

trace_3 = go.Bar(x=df['NOC'],y=df['Bronze'],marker = dict(color = '#CD7F32'),
		name = 'Bronze')


data = [trace_1,trace_2,trace_3]


layout = go.Layout(title = "Winter Olympics Medals Analysis")

fig = go.Figure(data = data,layout = layout)

fig.show()

In [8]:
# There is something called Bar-Mode i.e. how you want to show the multi-level category

df = pd.read_csv('2018WinterOlympics.csv')


trace_1 = go.Bar(x=df['NOC'],y=df['Gold'],marker = dict(color = '#FFD700'),
		name = 'Gold')

trace_2 = go.Bar(x=df['NOC'],y=df['Silver'],marker = dict(color = '#9EA0A1'),
		name = 'Silver')

trace_3 = go.Bar(x=df['NOC'],y=df['Bronze'],marker = dict(color = '#CD7F32'),
		name = 'Bronze')


data = [trace_1,trace_2,trace_3]


layout = go.Layout(title = "Winter Olympics Medals Analysis",barmode ='stack')

fig = go.Figure(data = data,layout = layout)

fig.show()

#### Bubble Plots : Bubble charts are very similar to scatter plots, except we now convey a 3rd variable information through the size of the marker. We can also continue to add variable information by coloring points based on a category.

In [9]:
# get the dataset

!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/mpg.csv

--2020-01-23 10:03:31--  https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/mpg.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17727 (17K) [text/plain]
Saving to: ‘mpg.csv.1’


2020-01-23 10:03:31 (1.42 MB/s) - ‘mpg.csv.1’ saved [17727/17727]



In [10]:
df = pd.read_csv('mpg.csv')
print(df.columns)
df.head()

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'model_year', 'origin', 'name'],
      dtype='object')


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [11]:
# ok let's build our bubble chart
# we use size property of a marker to show the 3rd variable
# Remember if the value of the 3rd variable is small then we can multiply it 
# with a sutiable number, and divide if value is very large.
data = [go.Scatter(
	x = df['horsepower'],
	y = df['mpg'],
	text = df['name'],
	mode = 'markers',
	marker = dict(size = df['weight']/150,color = df['cylinders'],
		showscale = True) # shows a scale for the 3rd variable
	)]

layout = go.Layout(title = 'Bubble chart for MPG dataset',
	xaxis = dict(title = 'horsepower'),
	yaxis = dict(title = 'mpg'),
	hovermode = 'closest'
	)

fig = go.Figure(data = data, layout = layout)

fig.show()

#### Box Plots : used to visualize the variation of features by depicting the continuous numerical data through quantiles. We can then seprate the data based on a categorical feature to compare continuous feature based on the category.

In [12]:

sample_dist_A = [.209,.205,.196,.210,.202,.207,.224,.223,.220,.201]
sample_dist_B = [.225,.262,.217,.240,.230,.229,.235,.217]

# let's plot both of them

data = [
go.Box(y = sample_dist_A,name = 'sample_dist_A'),
go.Box(y = sample_dist_B, name = 'sample_dist_B')
]

layout = go.Layout(title = '2 random samples comarision')
fig = go.Figure(data = data)

fig.show()

#### Histogram : A Histogram displays an accurate representation of a continuous feature. To create histogram we divide the entire range of values of the continuous feature into a series of interval called '*Bins*'. We then count the number of occurrences per bin(per interval range).
* we can change the bin size to get either more or less details   

In [13]:
df = pd.read_csv('mpg.csv')

# Simple histogram

data = [go.Histogram(x = df['mpg']
	)]

layout = go.Layout(title = 'Histogram of mpg')

fig = go.Figure(data = data, layout = layout)

fig.show()

In [14]:
# more controlled histogram

df = pd.read_csv('mpg.csv')

data = [go.Histogram(x = df['mpg'],
	xbins = dict(start = 0, end = 50,size  = 2)
	)]

layout = go.Layout(title = 'Histogram of mpg')

fig = go.Figure(data = data, layout = layout)

fig.show()

#### Distribution Plots or DistPlots : Distplot is typically is 3 layer plot on top of each other

* histogram : each data is divided into bins
* rug plot : marks are placed along the x axis for every data point, which lets you see the distribution of values inside each bin.
* KDE : 'kernel density estimate', KDE is line that tries to describe the shape of the distribution. 

In [15]:
# Let's First plot a simple one

# unlike other plots we have to use hist_data instead of data
# also we have to pass a list containing group columns.
x = np.random.randn(10000)

hist_data  = [x]

group_labels = ['rand-normal values']

fig = ff.create_distplot(hist_data,group_labels)

fig.show()

In [16]:
# Let's plot multiple histograms

x_1 = np.random.randn(300)+3
x_2 = np.random.randn(300)
x_3 = np.random.randn(300)-3
x_4 = np.random.randn(300)+4

hist_data  = [x_1,x_2,x_3,x_4]


group_labels = ['rand-normal x_1','rand-normal x_2','rand-normal x_3'
,'rand-normal x_4']


fig = ff.create_distplot(hist_data,group_labels,
	bin_size = [0.45,0.24,4,3])

fig.show()


In [17]:
# Let's plot the famous iris dataset.

!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/iris.csv

df  = pd.read_csv('iris.csv')

# print(df.head())

# we will make a dist plot for petal_length for each class

classes = df['class'].unique()

hist_data = [
df[df['class'] == clas]['petal_length']
for clas in classes
]


fig = ff.create_distplot(hist_data,group_labels = classes)
fig['layout'].update(title = 'Iris Petal length distplot')
fig.show()

--2020-01-23 10:03:38--  https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/iris.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4464 (4.4K) [text/plain]
Saving to: ‘iris.csv.1’


2020-01-23 10:03:38 (79.5 MB/s) - ‘iris.csv.1’ saved [4464/4464]



#### Heatmaps : Heatmaps allows the visualization of 3 variables.
#### Categorical or continuous features along the x and y axis to make the grid, and then 3rd continuous feature is displayed through color.

In [18]:
# Let's First download the datasets we need.
!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2010SantaBarbaraCA.csv
!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2010SitkaAK.csv
!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2010YumaAZ.csv
!wget https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/flights.csv

--2020-01-23 10:03:41--  https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2010SantaBarbaraCA.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4689 (4.6K) [text/plain]
Saving to: ‘2010SantaBarbaraCA.csv.1’


2020-01-23 10:03:41 (59.9 MB/s) - ‘2010SantaBarbaraCA.csv.1’ saved [4689/4689]

--2020-01-23 10:03:44--  https://raw.githubusercontent.com/biku1998/Plotly-Practice/master/Data/2010SitkaAK.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4583 (4.5K) [text/plain]
Saving to: ‘2010SitkaAK.csv.1’


2020-01-23 10:03

In [19]:
df = pd.read_csv('2010SantaBarbaraCA.csv')
df.head()

Unnamed: 0,LST_DATE,DAY,LST_TIME,T_HR_AVG
0,20100601,TUESDAY,0:00,12.7
1,20100601,TUESDAY,1:00,12.7
2,20100601,TUESDAY,2:00,12.3
3,20100601,TUESDAY,3:00,12.5
4,20100601,TUESDAY,4:00,12.7


In [20]:
# Let's start with a simple heatmap

# let's build our heat map 

data = [go.Heatmap(
	x = df['DAY'],
	y = df['LST_TIME'],
	z = df['T_HR_AVG'].values.tolist()
	)]

layout = go.Layout(title = 'Heatmap SantaBarbara Temprature',
	xaxis = dict(title = 'day'),yaxis = dict(title = 'time'))

fig = go.Figure(data = data,layout = layout)

fig.show()

In [21]:
df = pd.read_csv('2010YumaAZ.csv')

# let's build our heat map 

data = [go.Heatmap(
	x = df['DAY'],
	y = df['LST_TIME'],
	z = df['T_HR_AVG'].values.tolist(),
	colorscale = 'Jet' # we can use different colorscales
	)]

layout = go.Layout(title = 'Heatmap Yuma Arizona Temprature',
	xaxis = dict(title = 'day'),yaxis = dict(title = 'time'))

fig = go.Figure(data = data,layout = layout)

fig.show()


In [22]:
# Let's Plot flight passanger details in year, month and count as heatmaps

df = pd.read_csv('flights.csv')
df.head()

Unnamed: 0,year,month,passengers
0,1949,January,112
1,1949,February,118
2,1949,March,132
3,1949,April,129
4,1949,May,121


In [23]:
# Let's build the heatmaps.

data = [
go.Heatmap(
x = df['year'],
y = df['month'],
z = df['passengers'].tolist()
	)
]

layout = go.Layout(title = 'Flight Passengers heatmap',
	xaxis = dict(title = 'year'),
	yaxis = dict(title = 'month'))

fig = go.Figure(data = data,layout = layout)

fig.show()

In [24]:
# Now Let's see how we can plot multiple heatmaps in the same figure.
# We will use subplots provided by plotly.

# fetching data in 3 different dataframes 

# we will use zmin and zmax to control the 3rd variable value range in the heatmap

df1 = pd.read_csv('2010SitkaAK.csv')
df2 = pd.read_csv('2010SantaBarbaraCA.csv')
df3 = pd.read_csv('2010YumaAZ.csv')


# making 3 different traces 

trace1 = go.Heatmap(
    x=df1['DAY'],
    y=df1['LST_TIME'],
    z=df1['T_HR_AVG'],
    colorscale='Jet',
    zmin = 5, zmax = 40 # add max/min color values to make each plot consistent
)
trace2 = go.Heatmap(
    x=df2['DAY'],
    y=df2['LST_TIME'],
    z=df2['T_HR_AVG'],
    colorscale='Jet',
    zmin = 5, zmax = 40
)
trace3 = go.Heatmap(
    x=df3['DAY'],
    y=df3['LST_TIME'],
    z=df3['T_HR_AVG'],
    colorscale='Jet',
    zmin = 5, zmax = 40
)

# making the figure using subplot module inside plotly.
# we will also share the same x axis for each heatmap
fig = plotly.subplots.make_subplots(rows=1, cols=3,
    subplot_titles=['Sitka, AK','Santa Barbara, CA', 'Yuma, AZ'],
    shared_yaxes = True,
)


# remember row = 1 and column 3 will product a matrix of 1*3 
# we can access each sub fig by index starting from 1,1 not 0,0
# appending each trace to our figure object

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 1, 3)

fig['layout'].update(      # access the layout directly!
    title='Hourly Temperatures, June 1-7, 2010'
)

fig.show()