# SI649-24 Fall -> Altair I
## Overview 
We're going to re-create some of the visualizations we did in Tableau but this time using Altair for the article: [“The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/). We'll be teaching you different pieces of Altair over the next few weeks so we'll focus on just a few visualizations this time:

1.   Replicate 1 visualizations in the original article  
2.   Implementing 2 new visualizations according to our specifications

**For this lab, we have done all of the necessary data transformation for you. You do not need to modify any DataFrame. You only need to write Altair code.**

### Lab Instructions (read the full version on the handout of the previous lab)

*   Save, rename, and submit the ipynb file (use your username in the name).
*   Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file. 
*   For each visualization, we will ask you to write down a "Grammar of Graphics" plan first (basically a description of what you'll code).
*   If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress. 
*   There are many bonus point opportunities in this lab. 

We encourage you to go through the Altair tutorials before next week:
- [UW Course](https://github.com/uwdata/visualization-curriculum)
- [Altair tutorial](https://github.com/altair-viz/altair-tutorial)

### Resources
- [Altair Documentation](https://altair-viz.github.io/index.html)
- [Markdown Cheatsheet](https://www.markdownguide.org/cheat-sheet/)
- [Pandas DataFrame Introduction](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)


In [None]:
# imports we will use
import altair as alt
import pandas as pd
from collections import defaultdict
alt.renderers.enable('html')#run this line if you are running jupyter notebook

In [None]:
# load data and perform basic data processing 
# get the CSV
datasetURL="https://raw.githubusercontent.com/dallascard/SI649_public/master/altair_hw1/movies_individual_task.csv" 
movieDF=pd.read_csv(datasetURL, encoding="latin-1")

# fix the result column, rename the values, and combine "dubious" with "ok" as "Passes Bechdel Test"
movieDF['test_result'] = movieDF['clean_test'].map({
    "ok":"Passes Bechdel Test",
    "men":'Women only talk about men',
    "notalk":"Women don't talk to each other",
    "nowomen":"Fewer than two women",
    "dubious":"Passes Bechdel Test"
})

# fix the location column to combine US and Canada
locationDict = defaultdict(lambda: 'International')
locationDict["United States"]="U.S. and Canada"
locationDict["Canada"]="U.S. and Canada"
movieDF["country_binary"]=movieDF["country"].map(locationDict)

##calculate ROI (Return on Investment) both domestic (US and Canada) and international
movieDF["roi_dom"]=movieDF["domgross_2013$"]/movieDF["budget_2013$"]
movieDF["int_only_gross"]=movieDF["intgross_2013$"]-movieDF["domgross_2013$"]
movieDF["roi_int"]=movieDF["int_only_gross"]/movieDF["budget_2013$"]

# drop the columns we won't need
movieDF=movieDF.drop(columns=["Unnamed: 0","test","budget","domgross","intgross","code","period code","decade code","director","imdb"])

# Make a copy of the data frame that excludes movies from before 1990
movieDF_since_1990=movieDF[movieDF.year>1989]

#take a look at the new dataset
movieDF_since_1990.sample(3)

## Part 1: Recreate this visualization 

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-orig.png?raw=True)





### Step 1: Write down your plan for each part of this chart:

For each chart, we are asking you to write a Grammar of Graphics plan for the chart. This involves writing down 1) the dataset you will use; 2) the type of mark you will use (e.g., bar, line, point, etc.), and 3) for each visual channel (e.g., position, color, etc.), the corresponding variable name (e.g., year, ROI, etc.) and data type (i.e., ordinal, nominal, or quantitative). Please use the following format:

*   Data Name: _dataset_
*   Mark type: _mark type_
*   Encoding Specification: 
    * _channel:variable:datatype_
    * _channel:variable:datatype_
    * ...
 
Hint: you should provide encoding specifications for both x and y, using the format channel:variable:datatype
For example, if we wanted to encode a nominal variable called "movietype" as the color, we would write:
- color : movietype : nominal



### *** Edit this cell to be your visualization plan (required) ***:

Left Chart:
*   Data Name: movieDF_since_1990
*   mark type: *TODO: write your answer here (e.g., point or area)*
*   Encoding Specification:  
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)

Right Chart:

* *TODO: fill in this part*

Compound Method (how to join these charts together?): *TODO: write your answer here*



### Step 2: Create your chart. 

Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits


### Visualization 1 Checkpoints


#### checkpoint 1: create the left chart as a basic bar chart (Domestic ROI by Bechdel test category)
 
*  Specify the correct mark 
*  Use the correct x and y encoding 
*  Plotting the right data (hint: make sure you examine the data frame and use the correct columns)


You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint1.png?raw=true&1) 



In [None]:
# TODO: Replicate checkpoint 1


#### checkpoint 2: sort the categories on the y-axis
 
* completed checkpoint1
* applied the correct sort order to the values on the y-axis (i.e., from top to bottom, the order of the bars is "Passes Bechdel Test", "Women only talk about men", "Women don't talk to each other", "Fewer than two women")

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint2.png?raw=true&1) 

*Hint*: [Sort](https://altair-viz.github.io/user_guide/generated/core/altair.Sort.html)


In [None]:
# TODO: Replicate checkpoint 2


#### checkpoint 3: Add a chart title, and remove axis labels and x-axis
 
* completed checkpoint2
* add a chart title
* remove the x and y-axis labels
* remove the x-axis tick marks

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint3.png?raw=True) 

*Hint*: [Axis](https://altair-viz.github.io/user_guide/generated/core/altair.Axis.html)


In [None]:
# TODO: Replicate checkpoint 3


#### checkpoint 4: Reshape the plot
 
* completed checkpoint 3
* Reshape the plot to have both width and height equal to 100

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint4.png?raw=True&1) 

Hint: set the width and height properties of the chart

In [None]:
# TODO: Replicate checkpoint 4


#### checkpoint 5: Add a text layer with the numbers for each bar
 
* completed checkpoint 4
* add the numbers for each bar with correct formatting (two decimal places)

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint5.png?raw=True&1) 

Hint 1: In Altair you can overlay two charts on top of each other using the "+" notation (e.g., `chart1 + chart 2`)

Hint 2: You can create a text layer that inherits everything from the base layer by using `base.mark_text().encode(text="...")`, where `base` is the name of the base bar chart, and the "..." is the data to show as text

Hint 3: Use the "dx" property of mark_text() to nudge the text left or right (see https://altair-viz.github.io/gallery/bar_chart_with_labels.html)

In [None]:
# TODO: Replicate checkpoint 5


#### checkpoint 6: remove the x-axis line and chart box, and increase the padding between bars
 
* completed checkpoint 5
* remove the x-axis line
* remove the box around the figure
* increase the padding between bars

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint6.png?raw=True) 

Hint 1: You can make use of configure_axis() and configure_view() of the overall view (base and text layer combined)
Hint 2: There are multiple ways to increase the spacing between bars

In [None]:
# TODO: Replicate checkpoint 6


#### checkpoint 7: create the right chart using International ROI with the same stylings as the left chart
 
* completed checkpoint 6
* create the right chart with the same stylings as the left chart
    * correct data
    * correct mark
    * correct encoding
    * apply correct sort order
    * no x-and y-axis labels
    * no x-axis
    * no box on chart
    * text lables with proper formatting and alignment
    * include the title for International ROI

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint7.png?raw=True) 


In [None]:
# TODO: Replicate checkpoint 7


#### checkpoint 8: remove y-axis labels and change color
 
* completed checkpoint 7
* remove y-axis labels
* set the bar color

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint8.png?raw=True) 



In [None]:
# TODO: Replicate checkpoint 8


#### checkpoint 9: combine the two charts together
 
* display both completed charts side by side

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-checkpoint9.png?raw=True) 

Hint: You will need to move your view and axis configurations to the overall combined chart!

In [None]:
# TODO: Replicate checkpoint 9


#### BONUS: add an overall title and add dollar symbols to text marks
 
* complete checkpoint 9
* add an overall title
* add dollar signs to text labels

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz1-bonus-checkpoint.png?raw=True) 



In [None]:
# BONUS: replicate the bonus checkpoint (optional)


## Visualization 2: Replicate this visualization

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz2-checkpoint3.png?raw=True) 

### *** Step 1: Write down your plan for the visualization (required) ***

*   Data Name: *movieDF*
*   mark type: *TODO: write your answer here*
*   Encoding Specification (1st chart):  
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)
*   Encoding Specification (2nd chart):  
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)
*   Encoding Specification (3rd chart):  
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)

Compound Method (how to join these charts together?): *TODO: write your answer here*

### Step 2: Create your chart. 
Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits


### Visualization 2 Checkpoints

#### checkpoint 1: line chart for average, median, and max of budget 
 
You will get full points if you 
*  Specify the correct mark 
*  Use the correct x and y encoding 
*  Plotting the right data 
*  Produce 3 line charts concatenated vertically


You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz2-checkpoint1.png?raw=True) 


In [None]:
#TODO: Replicate checkpoint 1


#### checkpoint 2: adjust width,  height and color 
Each chart should be 500x100, plotted with different colors
 
*  Complete checkpoint 1
*  Adjust chart width and height
*  Plot charts with different colors

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz2-checkpoint2.png?raw=True) 


In [None]:
#TODO: Replicate checkpoint 2


#### checkpoint 3: remove duplicated x-axis and adjust tick spacing

You will get full points if you 
*  Complete checkpoint 2
*  Remove duplicate x-axes from top and middle figures
*  Set the bottom x-axes to have ticks every 5 years

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz2-checkpoint3.png?raw=True) 


In [None]:
#TODO: Replicate checkpoint 3


## Visualization 3: Replicate this visualization


![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz3-checkpoint4.png?raw=True) 

### *** Step 1: Write down your plan for the visualization, for all four channels (required) ***

*   Data Name: *movieDF*
*   mark type: *TODO: write your answer here*
*   Encoding Specification:  
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)
    * *TODO: write your answer here* (channel : variable : datatype)






### Step 2: Create your chart. 
Please use the checkpoints below to work through the problem step-by-step. You can search for the keyword "TODO" to locate cells that need your edits

#### checkpoint 1: scatter plot of IMDB rating vs budget (in 2013 dollars)
 
You will get full points if you 
*  Specify the correct mark 
*  Plotting the right data 
*  Use the correct x and y encoding 

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz3-checkpoint1.png?raw=True) 


In [None]:
# TODO: replicate checkpoint 1


#### checkpoint 2: add the color and size channels
 
*  Complete checkpoint 1
*  Add the color channel
*  Add the size channel
*  Plot the right data

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz3-checkpoint2.png?raw=True&1) 


In [None]:
# TODO: replicate checkpoint 2


#### checkpoint 3: adjust the Legend for the size channel
 
*  Complete checkpoint 2
*  set the legend for the Size channel to explicitly include 0

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz3-checkpoint3.png?raw=True) 

Hint: You will need to adjust the legend using alt.Legend() within alt.Size()


In [None]:
# TODO: replicate checkpoint 3


#### checkpoint 4: adjust the Scale for the Size channel
 
*  Complete checkpoint 3
*  set the scale of the size channel to map the data onto the range 10 to 300 (this maps 0 in the data to a size of 10)

You chart will look like:

![](https://github.com/dallascard/SI649_public/blob/main/altair_hw1/viz3-checkpoint4.png?raw=True) 

Hint: You will need to adjust the scale of the size channel using alt.Scale()


In [None]:
# TODO: replicate checkpoint 4


*End of Lab*

To submit your assignment:

1. Please run all cells (Runtime > Run all), and make sure all the cells ran properly!!
2. Make sure you have named your .ipynb file with your uniqname: i.e., uniqname.ipynb
3. Upload your .ipynb file to Canvas. 
