# Data Viz Intro

Welcome to our presentation! We will be learning about some theory and principles of data visualization that will enable you to better leverage visualization libraries available in Python and R

### Introductory Section

In [2]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRrJokBMohWj0oLmUkPM3SPZO4dKnNKko1nUzoXujuVdhDnJunHl2HPbq_EgAFeBQ/embed?start=false&loop=false&slide=id.p1" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

## Exploring Voyager and Raw Graphs
Now that we have experimented a bit with visualizations by hand, and critiqued a few, it's time to work with some programmatic visualization systems available. For this section please visit https://tinyurl.com/25h699h2 and download the csv. We will then take some time to explore 

* [Voyager](https://vega.github.io/voyager/)
* [Raw Graphs](https://www.rawgraphs.io/)


In [4]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRrJokBMohWj0oLmUkPM3SPZO4dKnNKko1nUzoXujuVdhDnJunHl2HPbq_EgAFeBQ/embed?start=false&loop=false&slide=id.g257a93d3aef_1_67" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

## Data Visualization in Python

Visualization tools like Voyager and Raw Graphs are great for "Thinking Visually" and prototyping. When visualizing data it often is important to work with a programming language because data often needs steps to be taken before visualizing that are easy to apply in a programmatic fashion. The visualization capabilities of a language are also much greater than an individual web tool can provide. Let's get started using python for plotting some of our data.

In this example, we will be plotting temperature data for the first day of summer in Tucson, Arizona.

We start by importing additional packages necessary for this visualization.

In [None]:
import pandas
import seaborn

print("Packages loaded!")

Next, we load data and look at the first few rows.

In [None]:
tucson = pandas.read_csv("data/tucson-summer.csv")
tucson.head()

Our first plot will be plotting the daily minimum temperature of the summer solstice over time. Using the seaborn method `regplot`, it will automatically draw the line from linear regression.

In [None]:
seaborn.regplot(x = "year", y = "tmin", data = tucson)

We should update our axis labels, using the `set` command. Note here we also assigned the plot to a variable (`tucson_plot`), so we could manipulate it.

In [None]:
tucson_plot = seaborn.regplot(x = "year", y = "tmin", data = tucson)
tucson_plot.set(ylabel = "Miniumum temperature (F)")

Next, we will modify our code to change what we plot on the y-axis. In this case we want to plot the maximum temperature (`tmax`) on the y-axis. Update the code below to change the values we are plotting on the y-axis.

In [None]:
tucson_plot = seaborn.regplot(x = "year", y = "tmin", data = tucson)
tucson_plot.set(ylabel = "Miniumum temperature (F)")

By default, it will add a linear regression line and confidence intervals. This may not be a linear relationship - try a polynomial relationship by adding `order = 2` to the `regplot` method (immediately following the data specification).

In [None]:
# Paste your code from above here, and update

To finish off this plot, we want to write the plot to a png file.

In [None]:
tucson_plot = # Copy and paste the plot code from the code block above

# Leave this line as is
tucson_plot.get_figure().savefig("output/tucson-plot.png")

After updating and running this last block of code, you can click the Jupyter logo in the top-left part of this notebook to see files that you can download (including the one we just saved). You can open a new tab with this view by right-clicking or control-clicking the icon.

## What How Why
### Important general principles in visualization

The following section of the presentation is based on slides from [Dr. Joshua Levine](https://jalevine.bitbucket.io/) (University of Arizona), and ideas from [Dr. Tamara Munzner](https://www.cs.ubc.ca/~tmm/) (University of British Columbia).

In [12]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSPDaygXMyuS_h0DLZb0-xfjkbTYVttXfb7u4SkbLAvpys3pEKMnothjKz6wMLpvw/embed?start=false&loop=false&delayms=3000&slide=id.g293948eab5e_0_0" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

### Practice Identifying Data Attribute types

In the cells below please load a dataframe of our arizona cities temperatures. Then form guesses about which columns are 

* Categorical
* Ordinal
* Quantitative

Identifying the attribute type is often the starting point for the visualization process.

In [14]:
import pandas as pd
csv_path = "data/arizona-heat.csv"
df = pd.read_csv(csv_path)
df

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Date,tmax,tmin,prcp,id,year,month,day,names,Lats,Lons
0,171,172,1895-06-21,104.0,58.0,0.00,USW00023160,1895,6,21,Tucson,32.1314,-112.0039
1,537,538,1896-06-21,102.0,75.0,0.00,USW00023160,1896,6,21,Tucson,32.1314,-112.0039
2,902,903,1897-06-21,105.0,73.0,0.00,USW00023160,1897,6,21,Tucson,32.1314,-112.0039
3,1267,1268,1898-06-21,94.0,75.0,0.03,USW00023160,1898,6,21,Tucson,32.1314,-112.0039
4,1632,1633,1899-06-21,101.0,71.0,0.00,USW00023160,1899,6,21,Tucson,32.1314,-112.0039
...,...,...,...,...,...,...,...,...,...,...,...,...,...
371,43270,43271,2017-06-21,92.0,54.0,0.00,USW00003103,2017,6,21,Flagstaff,35.1442,-87.5997
372,43635,43636,2018-06-21,91.0,45.0,0.00,USW00003103,2018,6,21,Flagstaff,35.1442,-87.5997
373,44000,44001,2019-06-21,73.0,44.1,0.00,USW00003103,2019,6,21,Flagstaff,35.1442,-87.5997
374,44366,44367,2020-06-21,84.9,48.9,0.00,USW00003103,2020,6,21,Flagstaff,35.1442,-87.5997


Write your guesses as comma separated strings in each list below. For example `"id"` is being identified as a categorical attribute type column. 

In [17]:
categorical_columns = ["id"]
ordinal_columns = []
quantitative_columns = []

print("categorical data")
print(df[categorical_columns])

print("ordinal data")
print(df[ordinal_columns])

print("quantitative data")
print(df[quantitative_columns])


categorical data
              id
0    USW00023160
1    USW00023160
2    USW00023160
3    USW00023160
4    USW00023160
..           ...
371  USW00003103
372  USW00003103
373  USW00003103
374  USW00003103
375  USW00003103

[376 rows x 1 columns]
ordinal data
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]

[376 rows x 0 columns]
quantitative data
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,

### Conceptual Models & Data Models

In [18]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSPDaygXMyuS_h0DLZb0-xfjkbTYVttXfb7u4SkbLAvpys3pEKMnothjKz6wMLpvw/embed?start=false&loop=false&delayms=3000&slide=id.g2941087898c_2_76" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

In the space below this exercise here fill out your guesses about the data model and the conceptual model in play for slide 23

* Data Model:
* Conceptual Model:
* Attribute Type:


## How: Visual Encoding

Here we will discuss in general the types of options available to pair your data and it's attribute types with visual primitives common to all plottling libraries.

In [19]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSPDaygXMyuS_h0DLZb0-xfjkbTYVttXfb7u4SkbLAvpys3pEKMnothjKz6wMLpvw/embed?start=false&loop=false&delayms=3000&slide=id.g294755a5a36_1_4" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

For each plot (1-4) on slide 29, list out what marks and channels you believe are in use

* 1
    * Marks
    * Channels
* 2
    * Marks
    * Channels
* 3
    * Marks
    * Channels
* 4
    * Marks
    * Channels
  

## Why: Task Asbtraction and Action Target Pairs

Now that we have a understanding of general categories of data, and how to connect them with visual primitives lets talk about how to frame the objectives that a visualization is meant to support. These objectives are often referred to as **"Tasks"** that a viewer is meant to accomplish via the visualization.

Dr. Tamara Munzner put forward an abstraction of Tasks, that they are made up of two parts: an Action and Target. Together these form a Pair, and giving these some thought is often helpful in organizing our thinking when developing a visualization. 

In [20]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSPDaygXMyuS_h0DLZb0-xfjkbTYVttXfb7u4SkbLAvpys3pEKMnothjKz6wMLpvw/embed?start=false&loop=false&delayms=3000&slide=id.g2941087898c_2_132" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

See if you can identify some of the tasks that this visualization above (slide 36) was made to accomplish. Write them out using the categories of Action Target pairs below. One example is provided below to get you started. Also if you're feeling constrained, just write the task normally, and then see if there are actions and targets that come to you after. 

* Task 1:
    * A viewer might be interested in identifying the high and low temperatures for a day in Tucson
    * Action: Explore
    * Target: Outliers

## Getting more hands on with the data 

At this point in the presentation we would like you to now go through the visualization design process using either of the csv's provided. 

Try to form a statement about both the **what** and the **why** (either one first).

Then either brainstorm with hand drawings, voyager or raw graphs. Once you have completed this attempt to break the visualization down into marks and channels that are at play.

## Final Section
### Additional considerations for visualization, and Community Events

In [23]:
%%html
<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSPDaygXMyuS_h0DLZb0-xfjkbTYVttXfb7u4SkbLAvpys3pEKMnothjKz6wMLpvw/embed?start=false&loop=false&delayms=3000&slide=id.p16" frameborder="0" width="100%" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
