# Design of Visualizations

Visualizations are bad when they *hide*, *distract*, or *mislead*.

### Types of Data

Before building a visualization, it is important to know the type of data you are dealing with.

##### Qualitative vs Quantitative

* Qualitative/Categorical
    * Nominal
        * No inherent order to data
        * Examples: Gender, marital status, zip code, breakfast items, emotions, etc.
    * Ordinal
        * Intrinsic order or ranking but no well-defined magnitude of difference between each value
        * Examples: Letter grade, podium ranking, survey ratings, etc.
* Quantitative/Numeric
    * Interval
        * Numeric values where absolute differences between values are meaningful (i.e., can add and subtract)
    * Ratio
        * Numeric values where relative differences between values are meaningful (i.e., can multiply & divide)

Quantitative variables can also be divided into *discrete* and *continuous*.

##### Quantitative - Discrete vs Continuous

* Discrete
    * Quantitative variables can only take on a specific set of values at some maximum level of precision (e.g., integers)
    * Examples: Pages in a book, trees in a yard, dogs at a coffee shop
* Continuous
    * Quantitative variables can (hypothetically) take on values to any level of precision (e.g., real numbers)
    * Examples: Height, age, income

##### Likert Scale

Likert scale data may also be data you encounter. Technically, responses on these types of questions should be considered *ordinal*. However, it might not be the case that differences in consecutive levels are consistent in size. However, in order to simply analyses, Likert data is often treated as *interval* data.

<div align="center">
    <figure>
        <img src="./images/likertscale-1.png" width="400">
        <figcaption><em>This Likert scale, which happens to be graphical, has five points, allowing for neutrality (source: <a href="https://www.surveygizmo.com/survey-blog/likert-scale-what-is-it-how-to-analyze-it-and-when-to-use-it/">surveygizmo</em></a>)
        </figcaption>
    </figure>
</div>
<div align="center">
    <figure>
        <img src="./images/6-point-likert-scale-even-survey.png" width="400">
        <figcaption><em>This Likert scale has six points, not allowing for neutrality (source: <a href="https://www.fieldboom.com/blog/likert-scale/">fieldboom</em></a>)
        </figcaption>
    </figure>
<div align="center"></div>

### Display Elements

There are a number of ways to change the way in which data is displayed including where and how it is displayed with or on the
* x- and y-axis,
* size,
* shape,
* texture,
* angle, and
* length.

In general, humans are able to best understand data encoded with **positional changes** (differences in x- and y- position as we see with scatterplots) and **length changes** (differences in box heights as we see with bar charts and histograms).

Alternatively, humans *struggle* with understanding data encoded with **color hue changes** (as are unfortunately commonly used as an additional variable encoding in scatter plots - we'll study this in upcoming concepts) and **area** or **area changes** (as we see in pie charts, which often makes them not the best plot choice).

Color should only be used to draw attention to a key finding.

### Chart Junk

Charts should include the minimum number of elements such that it communicates your message effectively. Anything above this minimum is considered *[chart junk](https://en.wikipedia.org/wiki/Chartjunk)*.

Examples include:
* Heavy grid lines
* Unnecessary text
* Pictures surrounding the visual
* Shading or 3d components
* Ornamented chart axes

##### Data-Ink Ratio

The data-to-ink ratio was developed by Edward Tufte in 1983 and is defined as

$${\frac{\text{amount of ink used to describe the data}}{\text{amount of ink used to scribe everything else}}.}$$

In general, the higher the data-ink ratio the better.
