# Session 10

## Data Visualization

![Course Hero](images/hero.png)

## Data Visualization

Data Visualization deals with displaying data to identify trends, draw conclusions or quickly communicate information. It is a very powerful tool for transmitting knowledge, as it simplifies complicated data and summarizes massive volumes of it.

The data is organized into charts, graphs and other types of visual representation, making it accessible.


In [None]:
import numpy as np
import pandas as pd
import plotly.express as px

senses = pd.DataFrame(
    {
        "labels": ["Sight", "Hearing", "Smell", "Taste", "Touch"],
        "values": [40, 30, 5, 1, 1],
    }
)

senses.head(5)


In [None]:
fig = px.treemap(
    senses,
    path=["labels"],
    values="values",
    width=800,
    height=400,
    title="Conscious Senses Bandwidth (in bits per second)",
)

fig.update_layout(margin=dict(t=50, l=25, r=25, b=25))

fig.show()


Some great examples of how visualization can communicate hard concepts are found in [Randall Munroe's XKCD Webcomic](https://xkcd.com/).

[A Timeline of Earth's Average Temperature](https://xkcd.com/1732/).

Of course, we have to be careful. Visualizations can also be used to hide or confuse data.

[![Lies](images/lies.png)](https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics)

There are many great Data Visualization Libraries in Python.

- [Matplotlib](https://matplotlib.org/)
- [Seaborn](http://seaborn.pydata.org/index.html)
- [Plotly](https://plot.ly/python/)
- [Bokeh](http://bokeh.pydata.org/en/latest/)
- [Altair](https://altair-viz.github.io/)
- [Geoplotlib](https://github.com/andrea-cuttone/geoplotlib)
- [Gleam](https://github.com/dgrtwo/gleam)
- [Folium](https://github.com/python-visualization/folium)
- And many more...

We will only show 3 of them, as an example.


## pandas visualization

![pandas logo](images/pandas.png)

pandas, by itself, can do visualizations.

Note: Under the hood it uses Matplotlib


In [None]:
import pandas as pd

simple_series = pd.Series([40, 34, 30, 22, 28, 17, 19, 20, 13, 9, 15, 10, 7, 3])

simple_series.plot()


In [None]:
simple_series.plot(
    kind="line",  # bar, barh, hist, box, area, pie
    figsize=(12, 6),
    color="green",
    title="Simple Series Plot",
    grid=False,
    style=["-"],
    xlim=(0, 15),
    xlabel="Independent variable",
    ylim=(0, 50),
    ylabel="Dependent variable",
    fontsize=10,
    table=False,
)


In [None]:
penguins_df = pd.read_csv("penguins.csv")

penguins_df.head(5)


In [None]:
penguins_df["Species"].value_counts()


In [None]:
penguins_df["Species"].str.split(" ").str[0]


In [None]:
penguins_df["short_species"] = penguins_df["Species"].str.split(" ").str[0]

penguins_df["short_species"].value_counts()


In [None]:
penguins_df["short_species"].value_counts().plot(kind="bar")


## [Matplotlib](https://matplotlib.org)

[![Matplotlib logo](images/matplotlib%2Cpng)](https://matplotlib.org)

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible.

- Create publication quality plots.
- Make interactive figures that can zoom, pan, update.
- Customize visual style and layout.
- Export to many file formats.
- Embed in JupyterLab and Graphical User Interfaces.
- Use a rich array of third-party packages built on Matplotlib.

Matplotlib is the base of many of the other visualization libraries. But it is great by itself, and in many cases you don´t need anything else.

[Tutorial](https://matplotlib.org/stable/tutorials/index)

[Examples](https://matplotlib.org/stable/plot_types/index.html)

[Cheatsheets](https://matplotlib.org/cheatsheets/)


In [None]:
import matplotlib.pyplot as plt

plt.hist(
    penguins_df["Culmen Length (mm)"],  # "Culmen Depth (mm)", "Flipper Length (mm)"
    bins=10,
    # bins=[30, 35, 40, 45, 50, 55, 60],
    cumulative=False,
    histtype="bar",  # step
    orientation="horizontal",
    rwidth=0.75,
    color="Green",
)


In [None]:
penguins_adelie = penguins_df[penguins_df["short_species"] == "Adelie"][
    "Culmen Length (mm)"
]
penguins_gentoo = penguins_df[penguins_df["short_species"] == "Gentoo"][
    "Culmen Length (mm)"
]
penguins_chinstrap = penguins_df[penguins_df["short_species"] == "Chinstrap"][
    "Culmen Length (mm)"
]

plt.hist(
    penguins_adelie,
    bins=10,
    cumulative=False,
    histtype="bar",
    rwidth=0.75,
    color="Red",
    alpha=0.5,
)

plt.hist(
    penguins_gentoo,
    bins=10,
    cumulative=False,
    histtype="bar",
    rwidth=0.75,
    color="Green",
    alpha=0.5,
)

plt.hist(
    penguins_chinstrap,
    bins=10,
    cumulative=False,
    histtype="bar",
    rwidth=0.75,
    color="Blue",
    alpha=0.5,
)


In [None]:
plt.title("Penguin comparison")

plt.hist(
    [penguins_adelie, penguins_gentoo, penguins_chinstrap],
    bins=[30, 35, 40, 45, 50, 55, 60],
    stacked=True,
)

plt.legend(["Adelie", "Gentoo", "Chinstrap"])


In [None]:
# Based on "Stove Ownership" from XKCD by Randall Munroe
# https://xkcd.com/418/

# with plt.xkcd():
fig = plt.figure()
ax = fig.add_axes((0.1, 0.2, 0.8, 0.7))
# ax.spines.right.set_color("none")
# ax.spines.top.set_color("none")
# ax.set_xticks([])
# ax.set_yticks([])
ax.set_ylim([-30, 10])

data = np.ones(100)
data[70:] -= np.arange(30)

ax.annotate(
    "THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED",
    xy=(70, 0),
    arrowprops=dict(arrowstyle="->"),
    xytext=(15, -10),
)

ax.plot(data)

ax.set_xlabel("time")
ax.set_ylabel("my overall health")
fig.text(0.5, 0.05, '"Stove Ownership" from xkcd by Randall Munroe', ha="center")


## [Seaborn](http://seaborn.pydata.org/index.html)

[![Seaborn Logo](images/seaborn.png)](http://seaborn.pydata.org/index.html)

Seaborn is a Python data visualization library **based on matplotlib**. It provides a high-level interface for drawing attractive and informative statistical graphics.

[Tutorial](http://seaborn.pydata.org/tutorial.html)

[Example](http://seaborn.pydata.org/examples/index.html)


In [None]:
import seaborn as sns

plt.figure(figsize=(12, 6))

sns.histplot(
    data=penguins_df,
    x="Culmen Length (mm)",
    binrange=(30, 60),
    binwidth=5,
    stat="percent",  # count, probability
    # cumulative=True,
    # kde=True,
    # hue="short_species",
    # multiple="dodge",
)


In [None]:
plt.figure(figsize=(12, 6))

sns.scatterplot(
    data=penguins_df,
    x="Culmen Length (mm)",
    y="Culmen Depth (mm)",
    # hue=penguins_df["short_species"],
    s=70,
)


In [None]:
plt.figure(figsize=(12, 6))

sns.histplot(
    penguins_df,
    x="Culmen Length (mm)",
    y="Culmen Depth (mm)",
    hue=penguins_df["short_species"],
)


In [None]:
sns.pairplot(
    penguins_df,
    hue="short_species",
    height=3,
    diag_kind="hist",
)


## [Plotly](https://plotly.com/python/)

[![Plotly Logo](images/plotly.png)](<[Ploty](https://plotly.com/python/)>)

Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
Plotly.py is free and open source and you can view the source, report issues or contribute on GitHub.

[Tutorial](https://plotly.com/python/getting-started/)

[Examples](https://plotly.com/python/#fundamentals)

[Cheatsheet](https://images.plot.ly/plotly-documentation/images/python_cheat_sheet.pdf)


In [None]:
import plotly.express as px

df = px.data.gapminder()

fig = px.scatter(
    df.query("year==2007"),
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_name="country",
    log_x=True,
    size_max=60,
)

fig.show()


In [None]:
fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    animation_frame="year",
    animation_group="country",
    size="pop",
    color="continent",
    hover_name="country",
    facet_col="continent",
    log_x=True,
    size_max=45,
    range_x=[100, 100000],
    range_y=[25, 90],
)
fig.show()


In [None]:
px.scatter_geo(
    df,
    locations="iso_alpha",
    color="continent",
    hover_name="country",
    size="pop",
    animation_frame="year",
    projection="natural earth",
)


## What visualization to use?

This is a handy reference about what visualization is better for each situation:

[Cheatsheet](https://s3.amazonaws.com/assets.datacamp.com/email/other/Data+Visualizations+-+DataCamp.pdf)


## [Dash](https://dash.plotly.com)

[![Dash Logo](images/dash.png)](https://dash.plotly.com)

Dash is the original low-code framework for rapidly building data apps in Python, R, Julia, and F# (experimental).

Written on top of Plotly.js and React.js, Dash is ideal for building and deploying data apps with customized user interfaces. It's particularly suited for anyone who works with data.

[Develop Data Visualization Interfaces in Python With Dash](https://realpython.com/python-dash/)
