# Altair Example 1 - Star Wars Movies Seen

This notebook demonstrates a method to create a Altair graphic using a heatmap underlying a simple set of data and is meant to create an alternative visualization that would compliment the source article.  The data used to create this visualization is a subset of the [FiveThirtyEight](https://fivethirtyeight.com) data used in the article [America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters)](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/) (Hickey, 2014).  The dataset used in this example is a subset of the original.  The original dataset can be found at [FiveThirtyEight Star Wars Survey](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey).


This notebook is an attempt to recreate the following image created by Nicholas Miller using Altair:

![nmill_ans_1.2](images/nmill_ans_1.2.png)


In [1]:
# The code in this cell was written and provided by the instruction team of 
# University of Michigan - School of Information - SIADS-522 - Information Visualization
# Taught by Professor Eytan Adar (2020)

import pandas as pd
import altair as alt
import numpy as np
import math

# enable correct rendering
alt.renderers.enable('default')

# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

sw = pd.read_csv('datasets/StarWars.csv', encoding='latin1')

#--------------------------------------------------------------------------------------------------

# Some format is needed for the survey dataframe, we provide the formatted dataset in a dataframe 
sw = sw.rename(columns={'Have you seen any of the 6 films in the Star Wars franchise?':'seen_any_movie',
                        'Do you consider yourself to be a fan of the Star Wars film franchise?': 'fan',
                        'Which of the following Star Wars films have you seen? Please select all that apply.' : 'seen_EI',
                        'Unnamed: 4' : 'seen_EII',
                        'Unnamed: 5' : 'seen_EIII',
                        'Unnamed: 6' : 'seen_EIV',
                        'Unnamed: 7' : 'seen_EV',
                        'Unnamed: 8' : 'seen_EVI',
                        'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'rank_EI',
                        'Unnamed: 10' : 'rank_EII',
                        'Unnamed: 11' : 'rank_EIII',
                        'Unnamed: 12' : 'rank_EIV',
                        'Unnamed: 13' : 'rank_EV',
                        'Unnamed: 14' : 'rank_EVI',
                        'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.' : 'Han Solo',
                        'Unnamed: 16' : 'Luke Skywalker',
                        'Unnamed: 17' : 'Princess Leia Organa',
                        'Unnamed: 18' : 'Anakin Skywalker',
                        'Unnamed: 19' : 'Obi Wan Kenobi',
                        'Unnamed: 20' : 'Emperor Palpatine',
                        'Unnamed: 21' : 'Darth Vader',
                        'Unnamed: 22' : 'Lando Calrissian',
                        'Unnamed: 23' : 'Boba Fett',
                        'Unnamed: 24' : 'C-3P0',
                        'Unnamed: 25' : 'R2 D2',
                        'Unnamed: 26' : 'Jar Jar Binks',
                        'Unnamed: 27' : 'Padme Amidala',
                        'Unnamed: 28' : 'Yoda',
                       })
sw = sw.drop([0])

#--------------------------------------------------------------------------------------------------

# Sample visualization

# We're going to fix the labels a bit so will create a mapping to the full names
episodes = ['EI', 'EII', 'EIII', 'EIV', 'EV', 'EVI']
names = {
    'EI' : 'The Phantom Meanance', 'EII' : 'Attack of the clones', 'EIII' : 'Revenge of the Sith', 
    'EIV': 'A New Hope', 'EV': 'The Empire Strikes Back', 'EVI' : 'The Return of the Jedi'
}

# we're also going to use this order to sort, so names_l will now have our sort order
names_l = [names[ep] for ep in episodes]

#--------------------------------------------------------------------------------------------------

# let's do some data pre-processing... sw (star wars) has everything

# We want to only use those people who have seen at least one movie, let's get the people, toss NAs
# and get the total count

# find people who have at least on of the columns (seen_*) not NaN
seen_at_least_one = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')
total = len(seen_at_least_one)

#--------------------------------------------------------------------------------------------------

# for each movie, we're going to calculate the percents and generate a new data frame
percs = []

# loop over each column and calculate the number of people who have seen the movie
# specifically, filter out the people who are *NaN* for a specific episode (e.g., ep_EII), count them
# and divide by the percent
for seen_ep in ['seen_' + ep for ep in episodes]:
    perc = len(seen_at_least_one[~ pd.isna(seen_at_least_one[seen_ep])]) / total
    percs.append(perc)
    
# at this point percs is holding our percentages

# now we're going use a trick to make tuples--pairing names with percents--using "zip" and then make a dataframe
tuples = list(zip([names[ep] for ep in episodes],percs))
seen_per_df = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])

The below visualizations are unique visualizations that do not exist in the article but do offer some contrast to the ways the data could have been presented or add more context to the article.  An alternative color scheme was used for aesthetics.

In [2]:
# This is code written by Nicholas Miller

base = alt.Chart(seen_per_df).encode(
    alt.Y('Name:O',
          sort=names_l)
)

heatmap = base.mark_rect().encode(
    color=alt.Color('Percentage:Q',
                    #scale=alt.Scale(scheme='redyellowgreen'),
                    scale=alt.Scale(scheme='lightgreyteal'),
                    legend=None
                   )
).properties(
    width=500,
    height=500
)

text = base.mark_text().encode(
    text=alt.Text('Percentage:Q',
                  format='.0%')
)

(heatmap + text).configure(
    background='#F0F0F0',
    padding=10                    # Add some padding around the edge
).properties(
    # add a title
    title={"text": "Which 'Star Wars' Movies Have You Seen?",
           "subtitle": "Of {} respondents who have seen any film".format(len(seen_at_least_one)),
           "fontSize":24,
           "subtitleFontSize":18,
           "anchor":"start",      # Make the text left justified
           "offset":15            # add some padding between title and below graph
          }
).configure_axis(
    ticks=False,                  # Remove the ticks
    labelFontSize=20,
    #labelFontStyle='bold',
    labelLimit=300,
    domain=False,                 # Remove the axis line,
    offset=-470,                  # Moves the bars to the right
    title=None,
    orient='right',
    translate=-25
).configure_text(
    dir='rtl',
    dx=190,
    dy=20,
    fontSize=50,
    fontStyle='bold',
    stroke='white',
    strokeOpacity=0.7
)

![Example 1](images/example1.png)