# A Very, Very Brief Introduction to Data Visualization

# About Me
<br>
<div style="font-size: larger;">
Chandrasekhar (Sekhar) Ramakrishnan<br>
<a href="https://twitter.com/ciyer">@ciyer</a><br>
<br>
<a href="https://datascience.ch">Swiss Data Science Center</a> and freelance data scientist; teach data viz at <a href="https://propulsion.academy">Propulsion Academy</a>

<a href="https://illposed.com"><img alt="illposed logo" src="images/illposed-logo.svg" height="75%"/></a>

# The Goal of Data Visualization

## The goal of visualizing data is to enable quantitative reasoning with your eyes

There are many valid answers to this question. One of them is the above. It is not the only answer, but it is the answer we will pursue here.

## One data set, two goals

<table>
    <tbody>
        <tr>
            <td><img width="505px" alt="GQ Waisteline Viz" src="https://i1.wp.com/flowingdata.com/wp-content/uploads/2010/09/waistline-measurement-chart-for-men.jpg?w=614&ssl=1"></td>
            <td><img width="550px" alt="FD Waisteline Viz" src="https://i2.wp.com/flowingdata.com/wp-content/uploads/2010/10/Pants-Size-Chart.png?w=550&ssl=1"></td>
        </tr>
    </tbody>
</table>

From FlowingData https://flowingdata.com/2010/09/30/advertised-vs-actual-waistline/

Consider these two visualizations of the same data set. They are both good, but they look very different because they persue different goals. 

The one on the right is a visualization of the kind we will be talking about: one designed to enable quantative reasoning. The one on the left has a different goal: its goal is to be entertaining. This is of course fine, and this visualization does a good job of realizing this goal, but it is not the kind of visualization we will be talking about.

# Table of Contents

* Tools for making visualizations
* Thinking about visualizations
* Selecting mappings
* Providing context

This introduction is made up of several sections. We will first survey some tools for making visualizations. Then we will develop some concepts and terminology to think about visualizations. The bulk of the material is in the final two sections which focus on choosing mappings from data to visuals and ways of providing context in data visualizations, respectively.

# References

<div style="display: flex; flex-direction: row;  justify-content: space-around">
    
<div>

## Edward Tufte

- [Visual Display of Quantitative Information](https://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142/)
- [Envisioning Information](https://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/)
- [Visual Explanations](https://www.amazon.com/Visual-Explanations-Quantities-Evidence-Narrative/dp/0961392126/)

</div>

<div>
    
## Online
- [Maneesh Agrawala’s Visualization Course](https://magrawala.github.io/cs448b-fa17/)
- [Jeffrey Heer’s Visualization Course](https://courses.cs.washington.edu/courses/cse442/17au/)
- [Jock Mackinlay’s Designing Great Visualizations](https://www.tableau.com/sites/default/files/media/designing-great-visualizations.pdf)

</div>



<!-- * Müller-Brockmann -->

These materials here borrow extensively from others.

# Tools for Visualization

There are many tools out there for making visualizations. You may already be very familiar with one. In this presentation, we will not go into any of them in detail, but you may be interested to know what is out there.

## Non-Programming

* Excel / Numbers / etc.
* [Tableau](https://www.tableau.com)
* [Spotfire](https://www.tibco.com/products/tibco-spotfire)
* [Power BI](https://powerbi.microsoft.com/en-us/)

Widely used spreedsheet software, like Excel, offer tools for making visualizations from tables of data. These have limited flexibility, but are very easy to use.

Tableau, Spotfire, and Power BI are all GUI-driven tools specifically for data visualization.

<div style="display: flex; flex-direction: row;  justify-content: space-around">

<div style="width: 300px">

## R
* ggplot

</div>

<div style="width: 430px">
    
## Python
* matplotlib
* seaborn
* bokeh
* altair
* HoloViews

</div>

<div style="width: 430px">

## JavaScript
* **[Vega](https://vega.github.io/vega/) / [Vega-Lite](https://vega.github.io/vega-lite/)**
* [D3](https://d3js.org)
* [C3](https://c3js.org)
* [plot.ly](https://plot.ly)

</div>

</div>

Programming languages also provide frameworks for data visualization, including, of course, widely used languages for data analysis like R and Python. The landscape in R is cleaner, in Python there has been much recent activity to develop modern, web-friendly alternatives to the classic matplotlib.

And JavaScript, though not typically an an environment for data analysis, is the primary programming language for the web and has powerful frameworks for visualizing data.

I am not going to say much about all of these in detail, but I wanted to highlight Vega and Vega-Lite, which are declarative specifications for describing data visualizations.

<div style="display: flex; flex-direction: row; flex-wrap: wrap; justify-content: space-evenly; width:800px">

<div style="flex: 1 1 auto;">

<h2>R</h2>

<ul>
    <li>ggplot</li>
</ul>

</div>

<div style="flex-grow: 1;">
    
<h2>Python</h2>

<ul>
  <li>matplotlib</li>
  <li>seaborn</li>
  <li>bokeh</li>
  <li>altair</li>
  <li>HoloViews</li>
</ul>
</div>

<div style="flex-grow: 1;">
<h2>JavaScript</h2>

<ul>
  <li>Vega/Vega-Lite</li>
  <li>d3</li>
  <li>C3</li>
  <li>plot.ly</li>
</ul>
</div>

</div>

# Thinking About Visualizations

Our goal of support quantitative reasoning visually has implications for how we build visualizations.

## The parts of a visualization

![visualization](images/context.png)

## Marks

![marks](images/marks.png)

The core of a visualization is made up of the marks that represent data.

## Context: axes, tick marks, title (legend)

![axes](images/axes.png)

## Context: model, predictions, labels

![context](images/context.png)

## Marks are defined by mapping


## Visualization maps from data to marks in an image

![mapping](images/mapping.png)

Visualizations are realized by mapping data variables to visual variables

Visual variables include position, shape, brightness, hue (color), transparancy. Doing this well requires being aware of what visual variables we have at our disposal and drawing upon knowledge from graphic design (colors, typorgaphy, layout), psychology/human factors, and statistics. It may be necessary to apply a transformation to the data along the way.

## Visual inferences should be valid data inferences

<img alt="jet-spread" width="400px" src="images/jet-spread.png" />

To make quantative reasoning possible through your eyes, we need to choose mappings that ensure that visual inferences are valid data inferences. Look at the above visualization. How do you exped the underlying data to be distributed? You probably see contours and bands of similar values. We will later look back at this and see if the data matches.

# Selecting Mappings

## Jacques Bertin, *Sémiologie Graphique* (1967)

<img alt="bertin mappings" src="http://3.bp.blogspot.com/-CChUqYR6DVc/T04XRZV1owI/AAAAAAAAACQ/3ftIrpYZj-g/s640/les_variables.jpg">

Image from http://pauline-blot.blogspot.ch/2012/02/jacques-bertin.html

## Jacques Bertin, *Sémiologie Graphique* (1967)

<div style="display: flex; flex-direction: row;  justify-content: space-around">

<div>
    <img width="300px" alt="bertin mappings" src="http://3.bp.blogspot.com/-CChUqYR6DVc/T04XRZV1owI/AAAAAAAAACQ/3ftIrpYZj-g/s640/les_variables.jpg" />
</div>

<div style="width: 20px">
    &nbsp;
</div>

<div style="width: 600px">
<table class="table table-sm" style="font-size: 18px">
    <thead>
        <tr>
            <th>Visual Variable</th>
            <th>Kind of Data</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Position</td>
            <td>Nominal, Ordinal, Interval, Ratio</td>
        </tr>
        <tr>
            <td>Size</td>
            <td>Nominal, Ordinal, Interval, Ratio</td>
        </tr>
        <tr>
            <td>Brightness</td>
            <td>Nominal, Ordinal, Interval, Ratio</td>
        </tr>
        <tr>
            <td>Texture</td>
            <td>Nominal, Ordinal</td>
        </tr>
        <tr>
            <td>Hue</td>
            <td>Nominal</td>
        </tr>
        <tr>
            <td>Shape</td>
            <td>Nominal</td>
        </tr>
    </tbody>
</table>
</div>
</div>

## Cleveland & McGill

## Mackinlay

# Designing Visualizations

* Choosing among mappings
* Tufte's rules
* The importance of context
* Layering information

# Hands-On

https://www.fueleconomy.gov/feg/download.shtml

Qs: 
- Are 1999 autos more or less efficient than 2008?
- Is one manufacturer more or less efficient than the avg?

Burtin (transforms)
