# Data Visualization in Jupyter <a class="tocSkip">

A common use for notebooks is data visualization. The workspace makes this easy with several charting and visualization tools available pre-installed as Python imports.

**In this notebook:**

* [Matplotlib](#Matplotlib)
* [Seaborn](#Seaborn)
* [Altair](#Altair)
* [Plotly](#Plotly)
* [Bokeh](#Bokeh)
* [Chartify](#Chartify)
* [HoloViews](#HoloViews)
* [Bqplot](#Bqplot)
* [UMAP](#UMAP)
* [Pandas-Profiling](#Pandas-Profiling)
* [Missingno](#Missingno)
* [SHAP](#SHAP)
* [Pillow](#Pillow)

## Matplotlib

[Matplotlib](http://matplotlib.org/) is the most common charting package, see its [documentation](http://matplotlib.org/api/pyplot_api.html) for details, and its [examples](http://matplotlib.org/gallery.html#statistics) for inspiration.

### Activated Matplotlib

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams["figure.figsize"] = (12,6)
%config InlineBackend.figure_format='retina'  # adapt plots for retina displays

### Line Plots

In [None]:
import matplotlib.pyplot as plt

x  = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y1 = [1, 3, 5, 3, 1, 3, 5, 3, 1]
y2 = [2, 4, 6, 4, 2, 4, 6, 4, 2]
plt.plot(x, y1, label="line L")
plt.plot(x, y2, label="line H")
plt.plot()

plt.xlabel("x axis")
plt.ylabel("y axis")
plt.title("Line Graph Example")
plt.legend()
plt.show()

### Bar Plots

In [None]:
import matplotlib.pyplot as plt

# Look at index 4 and 6, which demonstrate overlapping cases.
x1 = [1, 3, 4, 5, 6, 7, 9]
y1 = [4, 7, 2, 4, 7, 8, 3]

x2 = [2, 4, 6, 8, 10]
y2 = [5, 6, 2, 6, 2]

# Colors: https://matplotlib.org/api/colors_api.html

plt.bar(x1, y1, label="Blue Bar", color='b')
plt.bar(x2, y2, label="Green Bar", color='g')
plt.plot()

plt.xlabel("bar number")
plt.ylabel("bar height")
plt.title("Bar Chart Example")
plt.legend()
plt.show()

### Histograms

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Use numpy to generate a bunch of random data in a bell curve around 5.
n = 5 + np.random.randn(1000)

m = [m for m in range(len(n))]
plt.bar(m, n)
plt.title("Raw Data")
plt.show()

plt.hist(n, bins=20)
plt.title("Histogram")
plt.show()

plt.hist(n, cumulative=True, bins=20)
plt.title("Cumulative Histogram")
plt.show()

### Scatter Plots

In [None]:
import matplotlib.pyplot as plt

x1 = [2, 3, 4]
y1 = [5, 5, 5]

x2 = [1, 2, 3, 4, 5]
y2 = [2, 3, 2, 3, 4]
y3 = [6, 8, 7, 8, 7]

# Markers: https://matplotlib.org/api/markers_api.html

plt.scatter(x1, y1)
plt.scatter(x2, y2, marker='v', color='r')
plt.scatter(x2, y3, marker='^', color='m')
plt.title('Scatter Plot Example')
plt.show()

### Pie Charts

In [None]:
import matplotlib.pyplot as plt

labels = 'S1', 'S2', 'S3'
sections = [56, 66, 24]
colors = ['c', 'g', 'y']

plt.pie(sections, labels=labels, colors=colors,
        startangle=90,
        explode = (0, 0.1, 0),
        autopct = '%1.2f%%')

plt.axis('equal') # Try commenting this out.
plt.title('Pie Chart Example')
plt.show()

### fill_between and alpha

In [None]:
import matplotlib.pyplot as plt
import numpy as np

ys = 200 + np.random.randn(100)
x = [x for x in range(len(ys))]

plt.plot(x, ys, '-')
plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6)

plt.title("Fills and Alpha Example")
plt.show()

### Subplotting using Subplot2grid

In [None]:
import matplotlib.pyplot as plt
import numpy as np

def random_plots():
  xs = []
  ys = []
  
  for i in range(20):
    x = i
    y = np.random.randint(10)
    
    xs.append(x)
    ys.append(y)
  
  return xs, ys

fig = plt.figure()
ax1 = plt.subplot2grid((5, 2), (0, 0), rowspan=1, colspan=2)
ax2 = plt.subplot2grid((5, 2), (1, 0), rowspan=3, colspan=2)
ax3 = plt.subplot2grid((5, 2), (4, 0), rowspan=1, colspan=1)
ax4 = plt.subplot2grid((5, 2), (4, 1), rowspan=1, colspan=1)

x, y = random_plots()
ax1.plot(x, y)

x, y = random_plots()
ax2.plot(x, y)

x, y = random_plots()
ax3.plot(x, y)

x, y = random_plots()
ax4.plot(x, y)

plt.tight_layout()
plt.show()

### Matplotlib styles

To customize styling further please see the [matplotlib docs](https://matplotlib.org/users/style_sheets.html).

## Seaborn

There are several libraries layered on top of Matplotlib that you can use in the workspace. One that is worth highlighting is [Seaborn](http://seaborn.pydata.org). Here's a Seaborn [heatmap](https://seaborn.pydata.org/generated/seaborn.heatmap.html):

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Make a 10 x 10 heatmap of some random data
side_length = 10
# Start with a 10 x 10 matrix with values randomized around 5
data = 5 + np.random.randn(side_length, side_length)
# The next two lines make the values larger as we get closer to (9, 9)
data += np.arange(side_length)
data += np.reshape(np.arange(side_length), (side_length, 1))
# Generate the heatmap
sns.heatmap(data)
plt.show()

## Altair


[Altair](https://altair-viz.github.io/index.html) is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. Altair’s API is simple, friendly and consistent and built on top of the powerful Vega-Lite visualization grammar. 

In [None]:
import altair as alt
# Only required for jupyter, in juptyerlab you can remove this line:
alt.renderers.enable('notebook')

from vega_datasets import data
cars = data.cars()

alt.Chart(cars).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color='Origin',
).interactive()

### PDVega

[pdvega](https://github.com/altair-viz/pdvega) is a library that allows you to quickly create interactive Vega-Lite plots from Pandas dataframes, using an API that is nearly identical to Pandas' built-in visualization tools, and designed for easy use within the Jupyter notebook.

In [None]:
import pdvega  # import adds vgplot attribute to pandas
import numpy as np
import pandas as pd

df = pd.DataFrame({'x': np.random.randn(100), 'y': np.random.randn(100)})
df.vgplot.scatter(x='x', y='y')

### NX Altair

[nx_altair](https://github.com/Zsailer/nx_altair) offers a similar draw API to NetworkX but returns Altair Charts instead.

In [None]:
import networkx as nx
import nx_altair as nxa

# Generate a random graph
G = nx.fast_gnp_random_graph(n=20, p=0.25)

# Compute positions for viz.
pos = nx.spring_layout(G)

# Add weights to nodes and edges
for n in G.nodes():
    G.nodes[n]['weight'] = np.random.randn()

for e in G.edges():
    G.edges[e]['weight'] = np.random.uniform(1, 10)


# Draw the graph using Altair
viz = nxa.draw_networkx(
    G, pos=pos,
    node_color='weight',
    cmap='viridis',
    width='weight',
    edge_color='black',
)

# Show it as an interactive plot!
viz.interactive()

## Plotly

[Plotly](https://github.com/plotly/plotly.py) is an interactive, open-source, and browser-based graphing library for Python.

In [None]:
import plotly
plotly.offline.init_notebook_mode(connected=False)
from plotly.graph_objs import *

import pandas as pd
import numpy as np

# Create dataframe with random data
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))

data = [Bar(x=df.A,
            y=df.B)]

plotly.offline.iplot(data)

In [None]:
import plotly
plotly.offline.init_notebook_mode()
import numpy as np
from plotly.offline import iplot
from plotly.graph_objs import *

x = np.random.randn(2000)
y = np.random.randn(2000)
iplot([Histogram2dContour(x=x, y=y, contours=histogram2dcontour.Contours(coloring='heatmap')),
       Scatter(x=x, y=y, mode='markers', marker=scatter.Marker(color='white', size=3, opacity=0.3))], show_link=False)

## Bokeh

[Bokeh](https://github.com/bokeh/bokeh) is an interactive visualization library for Python that enables beautiful and meaningful visual presentation of data in modern web browsers. 

In [None]:
# See more information https://bokeh.pydata.org/en/latest
from bokeh.io import output_notebook, show
output_notebook()

In [None]:
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource

# Create dataframe with random data
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=list('ABCD'))

source = ColumnDataSource(df)

p = figure()
p.circle(x='A', y='B', source=source)

show(p)

## Chartify

[Chartify](https://github.com/spotify/chartify) is a Python library built on top of Bokeh that makes it easy for data scientists to create charts.

In [None]:
import chartify

# Generate example data
data = chartify.examples.example_data()
    
# Plot the data
ch = chartify.Chart(blank_labels=True, x_axis_type='density')
ch.set_title("Horizontal histogram with grouping")
ch.set_subtitle("")
ch.plot.histogram(
    data_frame=data,
    values_column='unit_price',
    color_column='fruit')
ch.show()

## HoloViews


[HoloViews](https://github.com/ioam/holoviews) lets you annotate your data and let it visualize itself. It helps you to express what you want to do in very few lines of code, letting you focus on what you are trying to explore and convey, not on the process of plotting.

In [None]:
import pandas as pd 
import numpy as np
import holoviews as hv
hv.extension('bokeh')

# Load sample dataset
from sklearn.datasets import load_iris
iris = load_iris()
iris_df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= list(iris['feature_names']) + ['species'])

# Declaring Data
from holoviews.operation import gridmatrix

ds = hv.Dataset(iris_df)

grouped_by_species = ds.groupby('species', container_type=hv.NdOverlay)
grid = gridmatrix(grouped_by_species, diagonal_type=hv.Scatter)

# Plot
grid.options('Scatter', tools=['hover', 'box_select'], bgcolor='#efe8e2', fill_alpha=0.2, size=4)

## Bqplot

[Bqplot](https://github.com/bloomberg/bqplot) is a 2-D visualization system for Jupyter, based on the constructs of the Grammar of Graphics.

In [None]:
import numpy as np
from bqplot import pyplot as plt

plt.figure(1, title='Line Chart')
np.random.seed(0)
n = 200
x = np.linspace(0.0, 10.0, n)
y = np.cumsum(np.random.randn(n))
plt.plot(x, y)
plt.show()

## UMAP

Uniform Manifold Approximation and Projection ([UMAP](https://github.com/lmcinnes/umap)) is a dimension reduction technique that can be used for visualisation similarly to t-SNE, but also for general non-linear dimension reduction. 

In [None]:
import umap
from sklearn.datasets import load_digits
import matplotlib.pyplot as plt

digits = load_digits()

embedding = umap.UMAP().fit_transform(digits.data)
embedding

plt.scatter(embedding[:, 0], embedding[:, 1], c=digits.target, cmap='Spectral', s=5)
plt.gca().set_aspect('equal', 'datalim')
plt.colorbar(boundaries=np.arange(11)-0.5).set_ticks(np.arange(10))
plt.title('UMAP projection of the Digits dataset', fontsize=24);

## Pandas-Profiling

[pandas-profiling](https://github.com/pandas-profiling/pandas-profiling) creates HTML profiling reports from pandas `DataFrame` objects.

In [None]:
# Load sample dataset
import pandas as pd
meteorite_df = pd.read_csv('https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv')
meteorite_df.head()

In [None]:
# Load pandas_profiling
import pandas_profiling

# there are some irrelevant warning you might want to filter
import warnings
warnings.filterwarnings("ignore")

# Generate report for dataframe
pandas_profiling.ProfileReport(meteorite_df)

## Missingno

[Missingno](https://github.com/ResidentMario/missingno) provides a small toolset of flexible and easy-to-use missing data visualizations and utilities that allows you to get a quick visual summary of the completeness (or lack thereof) of your dataset.

In [None]:
# Load missingo
import missingno as msno

%matplotlib inline

# Load sample dataset
import pandas as pd
meteorite_df = pd.read_csv('https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv')
meteorite_df.head()

### Matrix

The `msno.matrix` nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

In [None]:
msno.matrix(meteorite_df)

### Heatmap

The `msno.heatmap` measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another:

In [None]:
msno.heatmap(meteorite_df)

### Bar Chart

The `msno.bar` is a simple visualization of nullity by column:

In [None]:
msno.bar(meteorite_df)

### Dendrogram

The `msno.dendrogram` allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

In [None]:
msno.dendrogram(meteorite_df)

## SHAP
[SHAP](https://github.com/slundberg/shap) is a unified approach to explain the output of any machine learning model. SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate additive feature attribution method based on expectations.

In [None]:
# Load SHAP
import shap
shap.initjs() # load JS visualization code to notebook

# train XGBoost model
import xgboost
X,y = shap.datasets.boston()
model = xgboost.train({"learning_rate": 0.01}, xgboost.DMatrix(X, label=y), 100)

# explain the model's predictions using SHAP values
# (same syntax works for LightGBM, CatBoost, and scikit-learn models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)

In [None]:
# visualize the first prediction's explanation
shap.force_plot(explainer.expected_value, shap_values[0,:], X.iloc[0,:])

In [None]:
# visualize the training set predictions
shap.force_plot(explainer.expected_value, shap_values, X)

In [None]:
# create a SHAP dependence plot to show the effect of a single feature across the whole dataset
shap.dependence_plot("RM", shap_values, X)

In [None]:
# summarize the effects of all the features
shap.summary_plot(shap_values, X)

In [None]:
shap.summary_plot(shap_values, X, plot_type="bar")

## Pillow

[Pillow](https://pillow.readthedocs.io/en/5.3.x/index.html) adds image processing capabilities to your Python interpreter. This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities. You can use it as a very fast and simple way to visualize `numpy` arrays:

In [None]:
import numpy as np
from PIL import Image

def display_image(x):
    x_scaled = np.uint8(255 * (x - x.min()) / (x.max() - x.min()))
    return Image.fromarray(x_scaled)

display_image(np.random.rand(200,200))

## Next Steps

- [Rich Output in Jupyter](./jupyter-basics-tutorial.ipynb#Rich-Output): Learn how you can use the Jupyter display system to incorporate a broad range of content into your Notebooks.
- [Jupyter Tipps & Tricks](./jupyter-tipps.ipynb): Explore some amazing functionalities that you can use with Jupyter within the workspace.
- [Introduction to Numpy](./numpy-tutorial.ipynb): Introduction to datatypes, arrays, and mathematical operations of the numpy library.
- [Introduction to Pandas](./pandas-tutorial.ipynb): Introduction to the data structures and functionalities of the pandas library.