## Lighthouse Labs
### W03D1 Data Visualization I
Instructor: Socorro Dominguez  
July 05, 2021

## Agenda
1. Why effective visualization?
*  Different types of visualization
*  Principles of effective visualization
*  Demo of popular libraries

## Why Data Viz?

- Humans have evolved with excellent vision
    - Size and proportion
    - Space and spatial relationships
    - Colour
    - Approximate quantity
- Humans have not evolved to read numbers or tables
    - In scientific publications and presentations, most people only look at figures


## Motivation: Ugly, bad, wrong

### What is wrong with these charts?

<img src='imgs/ugly-bad-wrong.png' width=700>

## Motivation: Ugly, bad, wrong

- **ugly**: A figure that has aesthetic problems but otherwise is clear and informative.
- **bad**: A figure that has problems related to perception; it may be unclear, confusing, overly complicated, or deceiving.
- **wrong**: A figure that has problems related to mathematics; it is objectively incorrect.

## It happens in real life too


### What's wrong with this picture?


<img src='imgs/wrong_chart.png' width=800>

## Types of visualizations

- **Visualizing x-y relationships**: scatterplot, bubble chart
- **Visualizing distributions**: histograms, density plots, boxplots
- **Categorical plots**: bar plots, boxplots
- **Visualizing geospatial data**: choropleth, cartogram

There are a lot of types!

## Relational Plots

- Relational plots visualize relationships between two numeric variables
- There are two types of relational plots in `seaborn`: scatter plots and line plots

![relational](imgs/seaborn_relational.png)

## Distribution Plots

- Distribution plots visualize how one or more variables are distributed
- There are many types of distribution plots in `seaborn` &mdash; a few examples are shown below

![distributions](imgs/seaborn_distributions.png)

## Categorical Plots

- Categorical plots visualize relationships between two variables where one of the variables (x- or y-axis) is categorical (divided into discrete groups)
- There are many types of categorical plots in `seaborn` &mdash; a few examples are shown below

![categorical](imgs/seaborn_categorical.png)

## Visualizing amounts

### Simple

<img src='imgs/amounts-1.png' width=500>

- Example: Population of countries
- For vertical bar chart (the first one on the left):
    - x-axis: country (represents each bar)
    - y-axis: population (represents size of bar)
    - color: can be used to highlight a particular country I'm talking about



## Visualizing amounts

### More complex

<img src='imgs/amounts_multi-1.png' width=500>

- Example: Sales by business unit by continent
- For vertical grouped bars (the first one on the left):
    - x-axis: Business unit by continent, grouped by continent
    - y-axis: Sales in USD
    - color: Each business unit gets a different color

## Multi-panel figures

<img src='imgs/BA-degrees-variable-y-lims-1.png' width=600>

* Why is this confusing?

## Multi-panel figures - the fix

<img src='imgs/BA-degrees-fixed-y-lims-1.png' width=600>

* Here, the axes are comparable

## Principles
1.  Principle of proportional ink
* Picking colours that have meaning
* Use encodings to your advantage
* Label clearly

Where do this principles come from?

[Calling BS THE ART OF SKEPTICISM IN A DATA-DRIVEN WORLD](https://www.penguinrandomhouse.com/books/563882/calling-bullshit-by-carl-t-bergstrom-and-jevin-d-west/)

[THE VISUAL DISPLAY OF QUANTITATIVE INFORMATION, Tufte](https://www.edwardtufte.com/tufte/books_vdqi)

## Principle 1: Principle of proportional ink

*"When a shaded region is used to represent a numerical value, the area of that shaded region should be directly proportional to the corresponding value."*

#### Example 1

<img src='imgs/top10books.jpg' width=600>


Source: [callingbullshit.org](https://callingbullshit.org/tools/tools_proportional_ink.html)

## Principle 1: Principle of proportional ink

#### Example 2

<img src='imgs/effective_tax_rate.png' width=600>


Source: [callingbullshit.org](https://callingbullshit.org/tools/tools_proportional_ink.html)

## Principle 2: Wise use of color that adds meaning

#### Example 1: Using color to represent discrete items or groups that don't have an intrinsic order

<img src='imgs/popgrowth-US-1.png' width=500>

## Principle 2: Wise use of color that adds meaning

#### Example 2: Using color to represent data values

<img src='imgs/map-Texas-income-1.png' width=500>

## Principle 2: Wise use of color that adds meaning

#### Example 3: Using color to represent data values - divergent scale

<img src='imgs/map-Texas-race-1.png' width=500>

## Principle 2: Wise use of color that adds meaning

#### Example 4: Using color as a tool to highlight

<img src='imgs/Aus-athletes-track-1.png' width=500>

## Principle 2: Wise use of color that adds meaning

#### Example 5: Bad use of color

Encoding too much or irrelevant information.

<img src='imgs/popgrowth-vs-popsize-colored-1.png' width=500>

## Principle 2: Wise use of color that adds meaning

#### Example 6: Bad use of color

Which pink?!

<img src='imgs/popgrowth-US-rainbow-1.png' width=500>

## Principle 3: Use encodings to your advantage
- Pick the right encoding for your purpose

<img src='imgs/encodings.png' width=600>

Source: [Designing Data Visualizations](https://www.oreilly.com/library/view/designing-data-visualizations/9781449314774/ch04.html)

## Principle 3: Use encodings to your advantage

#### Example 1: Redundant encoding

<img src='imgs/iris-redundant-encoding.png' width=500>

## Principle 4: Label clearly

#### Example 1a: Every figure needs a title
<br>
<br>
<img src='imgs/corruption-development-1.png' width=600>


## Principle 4: Label clearly

#### Example 1b: Every figure needs a title
<br>
<br>
<img src='imgs/corruption-development-infographic-1.png' width=600>


## Principle 4: Label clearly

#### Example 2a: Axes labels
<br>
<br>


<img src='imgs/tech-stocks-minimal-labeling-bad-1.png' width=500>


## Principle 4: Label clearly

#### Example 2b: Axes labels
<br>
<br>

<img src='imgs/tech-stocks-minimal-labeling-1.png' width=500>


## Principle 4: Label clearly

#### Example 3: Useful legends
<br>
<br>


<img src='imgs/blue-jays-scatter-bubbles2-1.png' width=500>


## Principle 5: Be mindful AND inclusive

#### Design for color-vision deficiency

By providing eight different colors, you guarantee to reach most public.

For additional resources, check out the following:
- `pandas`: [visualization tutorial](https://pandas.pydata.org/docs/getting_started/intro_tutorials/04_plotting.html#min-tut-04-plotting)
- `seaborn`: [quick intro tutorial](https://seaborn.pydata.org/introduction.html) and [detailed tutorials](https://seaborn.pydata.org/tutorial.html)
- `plotly`: [Plotly Express tutorial](https://plotly.com/python/plotly-express/)
- Source for this lecture: [Fundamentals of Data Visualization](https://serialmentor.com/dataviz/index.html)
- Tutorial for altair: [Altair](https://viz-learn.mds.ubc.ca/)

plotly graph objects 


plotly express