Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All). Make sure your notebook executed to the end.

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE." Please remember that homeworks are to be completed independently. You may not share code with others.

In [1]:
NAME = "Kevin Borah"

---

# Information Visualization I 
## School of Information, University of Michigan

## Week 3: 
- Perception / Cognition

## Assignment Overview
### This assignment's objectives include:

- Review, refect, and apply the concepts of the perception pipeline. Justify how different encodings impact the effectiveness of a visualization depending on the human perception process.

!["Drowing"](assets/preattentive_resized.png)

<p style="text-align: center;"> Preattentive Processing </p>

- Recreate visualizations and propose new and alternative visualizations using [Altair](https://altair-viz.github.io/) 

### The total score of this assignment will be 100 points consisting of:
- Case study reflection: America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters) (30 points)
- Altair programming exercise (70 points)

### Resources:
- Article by [FiveThirtyEight](https://fivethirtyeight.com) available  [online](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/) (Hickey, 2014)  
- Datasets from FiveThirtyEight, we have downloaded a subset of this data in the folder [./assets](assets)
    - The original dataset can be found at [FiveThirtyEight Star Wars Survey](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey)
    
    
### Important notes:
1) Grading for this assignment is entirely done by manual inspection. Focus on getting the visualization to look like our example. It doesn't need to be pixel perfect (e.g., you may not always know what our example is scaled by), but it should be pretty close. Hint: go back to lab in week 2 on altair for some styling help. A *lot* of the look and feel can be done in one line of code.

2) There will be a couple of places where the numbers you get when you select rows may be a little different than 538, but the percents should still work (e.g., 828 instead of 834). You'll see this in our examples. If you can somehow get the data to match exactly, that's great too.

3) When turning in your PDF, please use the File -> Print -> Save as PDF option ***from your browser***. Do ***not*** use the File->Download as->PDF option. Complete instructions for this are under Resources in the Coursera page for this class.

## Part 1. Perception and Cognition (30 points)
Read the article ["America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters),"](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/) and answer the following questions:


### 1.1 List the different data types in the following visualizations and their encodings (10 points)
Look at the following visualizations. For each, list the variable, their type, and the encoding used (e.g., Weight, quantitative, color, ...)

!["Drowing"](assets/how_rate_resized.png)
!["Drowing"](assets/have_seen_resized.png)

YOUR ANSWER HERE

    Vis1: Movie, Nominal, Position
          Third Category, Nominal, Position & Color
          Percent, Quantitative, Text & Length
          
    Vis2: Movie, Nominal, Position
          Percent, Quantitative, Text & Length

### 1.2 Propose an alternative encoding for the following visualization. Compare the visualizations based on perception. (10 points)
Either hand-draw or use an application to create a sketched solution. Upload an image and describe the differences between your solution and the FiveThirtyEight image in terms of perception (specifically for the task of comparing one movie to another).
!["Drowing"](assets/have_seen_resized.png)

YOUR ANSWER HERE

![answer1.2](assets/my_pic.jpg)

My image does a much worse job to help the viewer along in comparing one movie to another.  Any viewer would have a tough time comparing the area of six boxes to each other especially if the boxes are not ordered in any particular way (from largest to smallest).  In terms of perception, length is a much more effective way of comparing multiple things quickly without much thought.  538's chart is one that can be understood without much processing.  

### 1.3 Propose an alternative encoding for the following visualization. Compare the visualizations based on perception. (10 points)
Again, either-hand draw or use an application to create a sketched solution. Upload an image and describe the differences between your solution and the FiveThirtyEight image in terms of perception (specifically for the task of comparing one movie to another).
!["Drowing"](assets/how_rate_resized.png)

YOUR ANSWER HERE

![answer1.3](assets/my_pic2.jpg)

My alternative vis still requires the viewer to compare areas.  I think any advantage my alternative vis has in its interpretation as the percentages being part of a whole (for instance, the stacked bars that touch share borders all clearly add to 100) are rendered moot when you try and compare how movies were ranked.  I think even if I included text of the percentages where I have "Top", "Mid", and "Bott" the comparisons between movies is not as effective and that comes down to humans having a difficult time comparing areas.  If the task were to examine how only one movie did in the top, middle, and bottom third, I may prefer the stacked bar approach.  But to compare between movies, the more efficiently the viewer can process what is in front of them, the better.

## Part 2. Altair programming exercise (70 points)
We have provided you with some code and parts of the article [America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters)](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/). This article is based on the dataset:

1. [StarWars](data/StarWars.csv) Created by FiveThirtyEight based on a survey ran through SurveyMonkey Audience, surveying 1,186 respondents from June 3 to 6 2014. Available [online] (https://github.com/fivethirtyeight/data/tree/master/star-wars-survey)

To earn points for this assignment, you must:

- Recreate the visualizations in the article (replace the images in the article with a code cell that creates a visualization). We provide one example. Each visualization is worth 10 points (40 points/ 10 each x 4 total ).

    - _Partial credit can be granted for each visualization (up to 5 points) if you provide the grammar of graphics description of the visualization without a functional Altair implementation_


- Propose one alternative visualization for one of the article visualizations. Add a short paragraph describing why your visualization is more *effective* based on principles of perception/cognition. (15 points/ 10 points plot + 5 justification)


- Propose a new visualization to complement a part of the article. Add a short paragraph justifying your decisions in terms of Perception/Cognition processes.  (15 points/ 10 points plot + 5 justification)


In [2]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [3]:
# enable correct rendering
alt.renderers.enable('default')

RendererRegistry.enable('default')

In [4]:
# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

DataTransformerRegistry.enable('json')

In [5]:
sw = pd.read_csv('assets/StarWars.csv', encoding='latin1')

In [6]:
# Some format is needed for the survey dataframe, we provide the formatted dataset in a dataframe 
sw = sw.rename(columns={'Have you seen any of the 6 films in the Star Wars franchise?':'seen_any_movie',
                        'Do you consider yourself to be a fan of the Star Wars film franchise?': 'fan',
                        'Which of the following Star Wars films have you seen? Please select all that apply.' : 'seen_EI',
                        'Unnamed: 4' : 'seen_EII',
                        'Unnamed: 5' : 'seen_EIII',
                        'Unnamed: 6' : 'seen_EIV',
                        'Unnamed: 7' : 'seen_EV',
                        'Unnamed: 8' : 'seen_EVI',
                        'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'rank_EI',
                        'Unnamed: 10' : 'rank_EII',
                        'Unnamed: 11' : 'rank_EIII',
                        'Unnamed: 12' : 'rank_EIV',
                        'Unnamed: 13' : 'rank_EV',
                        'Unnamed: 14' : 'rank_EVI',
                        'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.' : 'Han Solo',
                        'Unnamed: 16' : 'Luke Skywalker',
                        'Unnamed: 17' : 'Princess Leia Organa',
                        'Unnamed: 18' : 'Anakin Skywalker',
                        'Unnamed: 19' : 'Obi Wan Kenobi',
                        'Unnamed: 20' : 'Emperor Palpatine',
                        'Unnamed: 21' : 'Darth Vader',
                        'Unnamed: 22' : 'Lando Calrissian',
                        'Unnamed: 23' : 'Boba Fett',
                        'Unnamed: 24' : 'C-3P0',
                        'Unnamed: 25' : 'R2 D2',
                        'Unnamed: 26' : 'Jar Jar Binks',
                        'Unnamed: 27' : 'Padme Amidala',
                        'Unnamed: 28' : 'Yoda',
                       })
sw = sw.drop([0])

In [7]:
# take a peak to look at the data
sw.sample(5)

Unnamed: 0,RespondentID,seen_any_movie,fan,seen_EI,seen_EII,seen_EIII,seen_EIV,seen_EV,seen_EVI,rank_EI,...,Yoda,Which character shot first?,Are you familiar with the Expanded Universe?,Do you consider yourself to be a fan of the Expanded Universe?æ,Do you consider yourself to be a fan of the Star Trek franchise?,Gender,Age,Household Income,Education,Location (Census Region)
742,3289821000.0,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,1,...,Very favorably,I don't understand this question,No,,Yes,Female,> 60,"$0 - $24,999",Some college or Associate degree,East North Central
876,3289490000.0,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,3,...,Very favorably,Greedo,No,,Yes,Female,45-60,"$25,000 - $49,999",Bachelor degree,West North Central
6,3292719000.0,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,1,...,Very favorably,Han,Yes,No,Yes,Male,18-29,"$25,000 - $49,999",Bachelor degree,Middle Atlantic
355,3290718000.0,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,4,...,Very favorably,Greedo,No,,Yes,Male,45-60,"$150,000+",Bachelor degree,East North Central
195,3291015000.0,Yes,Yes,Star Wars: Episode I The Phantom Menace,Star Wars: Episode II Attack of the Clones,Star Wars: Episode III Revenge of the Sith,Star Wars: Episode IV A New Hope,Star Wars: Episode V The Empire Strikes Back,Star Wars: Episode VI Return of the Jedi,6,...,Very favorably,I don't understand this question,No,,No,Female,> 60,"$100,000 - $149,999",Graduate degree,Mountain


# America’s Favorite ‘Star Wars’ Movies (And Least Favorite Characters)

_Original article available at [FiveThirtyEight](https://fivethirtyeight.com/features/americas-favorite-star-wars-movies-and-least-favorite-characters/)_

By [Walt Hickey](https://fivethirtyeight.com/contributors/walt-hickey/)

Filed under [Movies](https://fivethirtyeight.com/tag/movies/)

Get the data on [GitHub](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey)

This week, I caught a sneak peek [of the X-Wing fighter](http://www.wired.com/2014/07/star-wars-episode-vii-x-wing/) from the new “Star Wars” films in production. The forthcoming movies — and the middling response to the most recent trilogy — provide a perfect excuse to examine some questions I’ve long wanted answers to: How many people are “Star Wars” fans? Does the rest of America realize that “The Empire Strikes Back” is clearly the best of the bunch? Which characters are most well-liked and most hated? And who shot first, Han Solo or Greedo?

We ran a poll through [SurveyMonkey Audience](https://www.surveymonkey.com/mp/audience/), surveying 1,186 respondents from June 3 to 6 (the [data](https://github.com/fivethirtyeight/data/tree/master/star-wars-survey) is available [on GitHub](https://github.com/fivethirtyeight/data)). Seventy-nine percent of those respondents said they had watched at least one of the “Star Wars” films. This question, incidentally, had a substantial difference by gender: 85 percent of men have seen at least one “Star Wars” film compared to 72 percent of women. Of people who have seen a film, men were also more likely to consider themselves a fan of the franchise: 72 percent of men compared to 60 percent of women.

We then asked respondents which of the films they had seen. With 835 people responding, here’s the probability that someone has seen a given “Star Wars” film given that they have seen any Star Wars film:

!["Sol1"](assets/have_seen_resized.png)

In [8]:
# Sample visualization

# We're going to fix the labels a bit so will create a mapping to the full names
episodes = ['EI', 'EII', 'EIII', 'EIV', 'EV', 'EVI']
names = {
    'EI' : 'The Phantom Meanance', 'EII' : 'Attack of the Clones', 'EIII' : 'Revenge of the Sith', 
    'EIV': 'A New Hope', 'EV': 'The Empire Strikes Back', 'EVI' : 'The Return of the Jedi'
}

# we're also going to use this order to sort, so names_l will now have our sort order
names_l = [names[ep] for ep in episodes]

print("sort order: ",names_l)

sort order:  ['The Phantom Meanance', 'Attack of the Clones', 'Revenge of the Sith', 'A New Hope', 'The Empire Strikes Back', 'The Return of the Jedi']


In [9]:
# let's do some data pre-processing... sw (star wars) has everything

# We want to only use those people who have seen at least one movie, let's get the people, toss NAs
# and get the total count

# find people who have at least on of the columns (seen_*) not NaN
seen_at_least_one = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')
total = len(seen_at_least_one)

print("total who have seen at least one: ", total)

total who have seen at least one:  835


In [10]:
# for each movie, we're going to calculate the percents and generate a new data frame
percs = []

# loop over each column and calculate the number of people who have seen the movie
# specifically, filter out the people who are *NaN* for a specific episode (e.g., ep_EII), count them
# and divide by the percent
for seen_ep in ['seen_' + ep for ep in episodes]:
    perc = len(seen_at_least_one[~ pd.isna(seen_at_least_one[seen_ep])]) / total
    percs.append(perc)
    
# at this point percs is holding our percentages

# now we're going use a trick to make tuples--pairing names with percents--using "zip" and then make a dataframe
tuples = list(zip([names[ep] for ep in episodes],percs))
seen_per_df = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
seen_per_df

Unnamed: 0,Name,Percentage
0,The Phantom Meanance,0.805988
1,Attack of the Clones,0.683832
2,Revenge of the Sith,0.658683
3,A New Hope,0.726946
4,The Empire Strikes Back,0.907784
5,The Return of the Jedi,0.883832


In [11]:
# ok, time to make the chart... let's make a bar chart (use mark_bar)
bars = alt.Chart(seen_per_df).mark_bar(size=20).encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Percentage',
        axis=None),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=alt.Axis(tickCount=5, title=''),
         # we give the sorting order to avoid alphabetical order
         sort=names_l
    )
)

# at this point we don't really have a great plot (it's missing the annotations, titles, etc.)
bars


In [25]:
# we're going to overlay the text with the percentages, so let's make another visualization
# that's just text labels

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Percentage:Q',format='.0%')
)

# finally, we're going to combine the bars and the text and do some styling
alt.themes.enable('fivethirtyeight')

seen_movies = (text + bars).configure_mark(
    # we don't love the blue
    color='#6baed6'
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0


).properties(
    # add a title
    title="Which 'Star Wars' Movies Have you Seen?"
)

seen_movies = seen_movies.configure(
    background = "#EEEEEE"
)


seen_movies

# note that we are NOT formatting this in the Five Thirty Eight Style yet... we'll leave that to you to figure out

So we can see that “Star Wars: Episode V — The Empire Strikes Back” is the film seen by the most number of people, followed by “Star Wars: Episode VI — Return of the Jedi.” Appallingly, more people reported seeing “Star Wars: Episode I — The Phantom Menace” than the original “Star Wars” (renamed “Star Wars: Episode IV — A New Hope”).

So, which movie is the best? We asked the subset of 471 respondents who indicated they have seen every “Star Wars” film to rank them from best to worst. From that question, we calculated the share of respondents who rated each film as their favorite.

!["Sol1"](assets/best_movie_article_resized.png)


** Homework note: Click [here](assets/best_movie.png) to see a version of this plot generated in Altair.

### 2.1 What's the best 'Star Wars' movie? Recreate the above image using altair (10 POINTS)

In [26]:
# Recreate this image using Altair
# try to match the "538 style" as best you can (hint: look at the altair lab at the start of the semester)


##GET DATA IN ORDER

# find people who have seen all six movies (have any NaN)
seen_all_six = sw.dropna(subset=['seen_' + ep for ep in episodes],how='any')
total = len(seen_all_six)
seen_all_six.head()

# for each movie, we're going to calculate the percent of #1 votes
favorites = []

for rank_six in ['rank_' + ep for ep in episodes]:
    #favorite = seen_all_six[rank_six].value_counts()
    #favorite = favorite[1]
    
    favorite = seen_all_six[seen_all_six[rank_six]=="1"]
    favorites.append(len(favorite)/len(seen_all_six))
    
# now we're going use a trick to make tuples--pairing names with percents--using "zip" and then make a dataframe
tuples = list(zip([names[ep] for ep in episodes],favorites))
favorites_df = pd.DataFrame(tuples, columns = ['Name', 'Percentage_Fave'])
favorites_df    
  

##READY CHART   

fave_bars = alt.Chart(favorites_df).mark_bar(size=20).encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Percentage_Fave',
        axis=None),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=alt.Axis(tickCount=5, title='', labelColor = "black", labelFontSize = 15, grid = False),
         # we give the sorting order to avoid alphabetical order
         sort=names_l
    )
)
     

fave_text = fave_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Percentage_Fave:Q',format='.0%')
)

# finally, we're going to combine the bars and the text and do some styling
fave_movies = (fave_text + fave_bars).configure_mark(
    # we don't love the blue
    color='source'
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).configure_scale(
    # add some padding
    bandPaddingInner=0.2
).properties(
    # set the dimensions of the visualization
    width=500,
    height=180
).properties(
    # add a title
    title={
        "text": ["What's The Best 'Star Wars' Movie?"],
        "subtitle": ["Of 471 respondents who have seen all six films"],
        "subtitleFontSize": 18
    }
)

fave_movies = fave_movies.configure(
    background = "#f0f0f0"
).configure_view(
    strokeWidth=0
)


fave_movies




# YOUR CODE HERE
#raise NotImplementedError()

## Make sure to *style* your visualization to match the original the best you can

We can also drill down and find out, generally, how people rate the films. Overall, fans broke into two camps: those who preferred the original three movies and those who preferred the three prequels. People who said “The Empire Strikes Back” was their favorite were also likely to rate “A New Hope” and “Return of the Jedi” higher as well. Those who rated “The Phantom Menace” as the best film were more likely to rate prequels higher.

This chart shows how often each film was rated in the top third (best or second-best), the middle third (third or fourth) or the bottom third (second-worst or worst). It’s a more nuanced take on the series:

!["Sol1"](assets/how_rate_resized.png)

** Homework note: Click [here](assets/people_rate.png) to see a version of this plot generated in Altair.

### 2.2 How people rate the 'Star Wars' movie? Recreate the above image using altair (10 POINTS)

In [27]:
# Recreate this image using altair here (10 POINTS)

sw_rank = seen_all_six[["rank_EI", 'rank_EII', 'rank_EIII', 'rank_EIV', 'rank_EV', 'rank_EVI']]
sw_rank = sw_rank.apply(pd.to_numeric)

# sw_rank = sw_rank.rename(columns={
#     "rank_EI":names_l[0],
#     'rank_EII':names_l[1],
#     'rank_EIII':names_l[2],
#     'rank_EIV':names_l[3],
#     'rank_EV':names_l[4],
#     'rank_EVI':names_l[5],
# })

top_thirds=[]
mid_thirds=[]
bott_thirds=[]


for rank_in_thirds in ['rank_' + ep for ep in episodes]:
   
    top_third = sw_rank[sw_rank[rank_in_thirds]<3]
    top_thirds.append(len(top_third)/len(sw_rank))
    
    mid_third = sw_rank[sw_rank[rank_in_thirds].between(3,4,inclusive=True)]
    mid_thirds.append(len(mid_third)/len(sw_rank))
    
    bott_third = sw_rank[sw_rank[rank_in_thirds].between(5,6,inclusive=True)]
    bott_thirds.append(len(bott_third)/len(sw_rank))    

# now we're going use a trick to make tuples--pairing names with percents--using "zip" and then make a dataframe
tuples_thirds = list(zip([names[ep] for ep in episodes],top_thirds,mid_thirds,bott_thirds))
rank_thirds_df = pd.DataFrame(tuples_thirds, columns = ['Name', 'Top third', "Middle third","Bottom third"])


##READY CHART   

scale = 70
scaleh = 150

top_bars = alt.Chart(rank_thirds_df).mark_bar(size=20, color = "#4da74a").encode(
    x=alt.X(
        'Top third:Q',
        title = "Top third",
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=alt.Axis(tickCount=5, title='', labelColor = "black", labelFontSize = 15, grid = False),
         # we give the sorting order to avoid alphabetical order
         sort=names_l
    )
).properties(
    width = scale,
    height = scaleh
)

top_text = top_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Top third:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

top_bars = top_bars + top_text

mid_bars = alt.Chart(rank_thirds_df).mark_bar(size=20, color = "#008fd5").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Middle third:Q',
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=None,
         # we give the sorting order to avoid alphabetical order
         sort=names_l
    )
).properties(
    width = scale,
    height = scaleh
)

mid_text = mid_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Middle third:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

mid_bars = mid_bars + mid_text


bott_bars = alt.Chart(rank_thirds_df).mark_bar(size=20, color= "#ff2600").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Bottom third:Q',
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=None,
         # we give the sorting order to avoid alphabetical order
         sort=names_l
    )
).properties(
    width = scale,
    height = scaleh
)

bott_text = bott_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Bottom third:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

bott_bars = bott_bars + bott_text

ranked_thirds = (top_bars | mid_bars | bott_bars
).configure_scale(
    barBandPaddingInner = .2
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).properties(
    title={
        "text": ["How People Rate the 'Star Wars' Movies"],
        "subtitle":["How often each film was rated in the top, middle and bottom third",
                    "(by 471 respondents who have seen all six films)"],
        "subtitleFontSize":15
    }
)

ranked_thirds


# YOUR CODE HERE
#raise NotImplementedError()

# See critique below

Finally, we took a boilerplate format used by political favorability polls — “Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her” — and asked respondents to rate characters in the series.

!["Sol1"](assets/char_ranking_resized.png)


** Homework note. Here's an example solution generated in Altair:
!["Sol1"](assets/people_rate_s.png)


### 2.3 Star Wars' Characters Favorability Ratings. Recreate the above image using altair (10 POINTS)

In [28]:
# Recreate this image using altair here (10 POINTS)

sw_char = sw[['Luke Skywalker',
            'Han Solo', 
            'Princess Leia Organa',
            'Anakin Skywalker',
            'Obi Wan Kenobi',
            'Emperor Palpatine',
            'Darth Vader',
            'Lando Calrissian',
            'Boba Fett',
            'C-3P0',
            'R2 D2',
            'Jar Jar Binks',
            'Padme Amidala',
            'Yoda']]

sw_char = sw_char.dropna(how = "all")

#seen_all_six = sw.dropna(subset=['seen_' + ep for ep in episodes],how='any')

char_list = ['Luke Skywalker',
             'Han Solo',
            'Princess Leia Organa',
            'Anakin Skywalker',
            'Obi Wan Kenobi',
            'Emperor Palpatine',
            'Darth Vader',
            'Lando Calrissian',
            'Boba Fett',
            'C-3P0',
            'R2 D2',
            'Jar Jar Binks',
            'Padme Amidala',
            'Yoda']

favorable = []
neutral = []
unfavorable = []
unfamiliar = []


for x in char_list:
    favor = sw_char[sw_char[x]=="Very favorably"]
    somewhat_favor = sw_char[sw_char[x]=="Somewhat favorably"]
    percent_favor = len(favor+somewhat_favor)/sw_char.count()[x]
    favorable.append(percent_favor)
    
for x in char_list:
    feeling = sw_char[sw_char[x]=='Neither favorably nor unfavorably (neutral)']
    percent_feeling = len(feeling)/sw_char.count()[x]
    neutral.append(percent_feeling)
    
for x in char_list:
    feeling = sw_char[sw_char[x]=='Very unfavorably']
    sw_feeling = sw_char[sw_char[x]=='Somewhat unfavorably']
    percent_feeling = len(feeling+sw_feeling)/sw_char.count()[x]
    unfavorable.append(percent_feeling)
    
for x in char_list:
    feeling = sw_char[sw_char[x]=='Unfamiliar (N/A)']
    percent_feeling = len(feeling)/sw_char.count()[x]
    unfamiliar.append(percent_feeling)


tuples_chars = list(zip([chars for chars in char_list],favorable,neutral,unfavorable,unfamiliar))
feelings_df = pd.DataFrame(tuples_chars, columns = ['Name', 'Favorable', "Neutral","Unfavorable","Unfamiliar"])
feelings_df = feelings_df.sort_values(by="Favorable",ascending=False)
feelings_df

##READY CHART   

scale = 85
scaleh = 300

sort_char = feelings_df["Name"].tolist()

fave_bars = alt.Chart(feelings_df).mark_bar(size=20, color = "#4da74a").encode(
    x=alt.X(
        'Favorable:Q',
        title = "Favorable",
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        
        'Name:N',
         axis=alt.Axis(tickCount=5, title='', labelColor = "black", labelFontSize = 15, grid = False),
         sort = sort_char
    )
).properties(
    width = scale,
    height = scaleh
)

fave_text = fave_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Favorable:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

fave_bars = fave_bars + fave_text

neut_bars = alt.Chart(feelings_df).mark_bar(size=20, color = "#008fd5").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Neutral:Q',
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        'Name:N',
        axis=None,
        sort = sort_char
    )
).properties(
    width = scale,
    height = scaleh
)

neut_text = neut_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Neutral:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

neut_bars = neut_bars + neut_text


unfave_bars = alt.Chart(feelings_df).mark_bar(size=20, color= "#ff2600").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Unfavorable:Q',
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        'Name:N',
         axis=None,
        sort = sort_char
    )
).properties(
    width = scale,
    height = scaleh
)

unfave_text = unfave_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Unfavorable:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

unfave_bars = unfave_bars + unfave_text

unfam_bars = alt.Chart(feelings_df).mark_bar(size=20, color= "#999999").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Unfamiliar:Q',
        axis=alt.Axis(grid=False, titleFontSize = 15,orient="top",ticks=False,domain=False,labels = False)
    ),
    y=alt.Y(
        'Name:N',
         axis=None,
        sort = sort_char
    )
).properties(
    width = scale,
    height = scaleh
)

unfam_text = unfam_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3,  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Unfamiliar:Q',format='.0%')
).properties(
    width = scale,
    height = scaleh
)

unfam_bars = unfam_bars + unfam_text

fav_ratings = (fave_bars | neut_bars | unfave_bars | unfam_bars
).configure_scale(
    barBandPaddingInner = .2
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).properties(
    title={
        "text": ["'Star Wars' Characters Favorability Ratings"],
        "subtitle":["By 834 respondents"],
        "subtitleFontSize":15
    }
)


fav_ratings




# YOUR CODE HERE
#raise NotImplementedError()

# So close.... yet so far.

Throughout this entire assignment, I could tell I was writing too much code, but could not figure out how to condense everything that had to happen.  For this vis (and the one before it), I know I needed to (or at least could have) facet the charts, but I just couldn't figure this out. I also tried some variations of resolve_scale(), but could never make it work.

From a grammar of graphics perspective, the x-axis accross concatenated charts is not consistent and needed to be adjusted so that a 10%, for instance, looks the exact same in all four categories. 

You read that correctly. Jar Jar Binks has a lower favorability rating than the actual personification of evil in the galaxy.

And for those of you who want to know the impact that [historical revisionism](http://en.wikipedia.org/wiki/Han_shot_first) can have on a society:

!["Sol1"](assets/shot_first_article_resized.png)


** Homework note: Click [here](assets/shot_first.png) to see a version of this plot generated in Altair. You may find that you don't get 834 rows (as 538 did) but the percents should still work.

### 2.4 Who shot first? Recreate the above image using altair (10 POINTS)

In [29]:
# Recreate this image using altair here (10 POINTS)

shot_first = sw[["Which character shot first?","seen_any_movie"]]
shot_first = shot_first.dropna()


options_l = ["Han","Greedo","I don't understand this question"]

names_p = ["han_p","greedo_p","huh"]

percents = []

for name in options_l:
    percent = len(sw[sw["Which character shot first?"]==name])/len(shot_first)
    percents.append(percent)
    
percents  

zipem = list(zip([options for options in options_l],percents))
shot_first_df = pd.DataFrame(zipem, columns = ["Name","Percent"])

shot_first_df


##READY CHART   

shot_bars = alt.Chart(shot_first_df).mark_bar(size=20, color = "#008fd5").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Percent:Q',
        axis=None),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=alt.Axis(tickCount=0, title='', labelColor = "black", labelFontSize = 12, grid = False),
         sort = options_l
    )
)

shot_bars


shot_text = shot_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Percent:Q',format='.0%')
)

# finally, we're going to combine the bars and the text and do some styling
shot_comb = (shot_text + shot_bars).configure_mark(
    # we don't love the blue
    color='source'
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).configure_scale(
    # add some padding
    bandPaddingInner=0.2
).properties(
    # set the dimensions of the visualization
    width=500,
    height=100
).properties(
    # add a title
    title={
        "text": ["Who Shot First?"],
        "subtitle": ["According to 828 respondents"],
        "subtitleFontSize": 18
    }
)

shot_comb = shot_comb.configure(
    background = "#f0f0f0"
).configure_view(
    strokeWidth=0
)


shot_comb


In [30]:
print(total)

471


### 2.5.1 Make your own (15 points/ 10 points plot + 5 justification)

Propose and code an alternative visualization for one of the visualizations *already in the article*. Add a short paragraph describing why your visualization is more (or less) *effective* based on principles of perception/cognition. 

If you feel your visualization is worse, that's ok! Just tell us why.

In [31]:
# YOUR CODE HERE

shot_first["a"] = "a"
shot_first

sorted_l = ["I don't understand the question","Han","Greedo"]

shot_bars = alt.Chart(shot_first_df).mark_bar(size=20, color = "#008fd5").encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Percent:Q',
        axis=None),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Name:N',
         axis=alt.Axis(tickCount=0, title='', labelColor = "black", labelFontSize = 12, grid = False),
         sort = options_l
    )
)

shot_bars
shot_first_df["a"]="Who shot first?"
shot_first_trans = shot_first_df.T
shot_first_trans

han_bar = alt.Chart(shot_first_df).mark_bar(size=50).encode(
    # encode x as the percent, and hide the axis
    x=alt.X(
        'Percent:Q',
        axis = None
        ),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'a:N',
         axis=None,
         sort = options_l),
    color=alt.Color(
         "Name:N",
         legend = alt.Legend(title="Answers",labelFontSize = 11)
    )
)

text = han_bar.mark_text(dx=-75, dy=35,fontSize = 15).encode(
    x=alt.X(
        'Percent:Q',
        stack = "zero",
        axis = None
        ),
    y=alt.Y(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'a:N',
         axis=None,
         sort = options_l),
    color=alt.Color(
         "Name:N",
         legend = alt.Legend(title="Answers",labelFontSize = 11)),
    text=alt.Text('Percent:Q',
                  format='.0%',
                 )
)

han_bar = han_bar + text

han_bar = han_bar.configure_mark(
    color='source'
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).configure_scale(
    # add some padding
    bandPaddingInner=0.2
).properties(
    # set the dimensions of the visualization
    width=600,
    height=100
).properties(
    # add a title
    title={
        "text": ["Which Character Shot First?"],
        "subtitle": ["According to 828 respondents"],
        "subtitleFontSize": 18
    }
)


han_bar

# shot_bars_v = alt.Chart(shot_first_df).mark_bar(size=20, color = "#008fd5").encode(
#     # encode x as the percent, and hide the axis
#     x=alt.X(
#         # encode y using the name, use the movie name to label the axis, sort using the names_l
#         'Name:N',
#          axis=alt.Axis(tickCount=0, title='', labelColor = "black", labelFontSize = 12, grid = False),
#          sort = options_l
#         'Percent:Q',
#         axis=None),
#     y=alt.Y(


#raise NotImplementedError()

*Provide your justification here*

My visualization is an alternative to the vis in 2.4.  From a cosmetic standpoint, mine is not as visually appealing as 2.4; however, I do think the layout of the vis has some noted perception/cognition advantages.  While a viewer of both 2.4 and my vis will be able to quickly tell that "I don't understand" > "Han" > "Greedo", the scale of 2.4 is confusing (39% takes up almost the entire width of the chart, which we are used to being representative of 100%).  This confusion forces the viewer to read the text to the right of each bar mark, re-orient their expectations of the scale of the chart, then read again.  I'd like to think my vis does a better job at promoting parallel processing for the viewer--by making the width of the entire bar 100%, viewers can immediately understand that roughly a third of participants answered with each response.  I believe Gestalt Psychology also supports my vis--I think my vis is simpler and easier to interpret. 

### 2.5.2 Make your own (15 points/ 10 points plot + 5 justification)
Propose and code a *new visualization* to complement a part of the article. Add a short paragraph justifying your decisions in terms of Perception/Cognition processes.

If you feel your visualization is worse, that's ok! Just tell us why.

In [32]:
# YOUR CODE HERE


# seen_all_six = sw.dropna(subset=['seen_' + ep for ep in episodes],how='any')
# total = len(seen_all_six)
# seen_all_six.head()

sw_jjar = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')




#

sw_jjar[["seen1","seen2","seen3","seen4","seen5","seen6"]] = sw_jjar[["seen_EI","seen_EII","seen_EIII","seen_EIV","seen_EV","seen_EVI"]].notna().astype(int)

sw_jjar["ep_seen"] = sw_jjar["seen1"]+sw_jjar["seen2"]+sw_jjar["seen3"]+sw_jjar["seen4"]+sw_jjar["seen5"]+sw_jjar["seen6"]
sw_jjar

seen_count = [1,2,3,4,5,6]

favorability = []

for x in seen_count:
    temp_df = sw_jjar[sw_jjar["ep_seen"]==x]
    
    favor = temp_df[temp_df["Jar Jar Binks"]=="Very favorably"]
    somewhat_favor = temp_df[temp_df["Jar Jar Binks"]=="Somewhat favorably"]
    percent_favor = len(favor+somewhat_favor)/temp_df.count()["Jar Jar Binks"]
    favorability.append(percent_favor)
    
    
favorability

zipem = list(zip([numbs for numbs in seen_count],favorability))
jjar_likes = pd.DataFrame(zipem, columns = ["Total Episodes Seen","Percent",])
jjar_likes["Name"] = "Jar Jar Binks"
jjar_likes

jjar_chart = alt.Chart(jjar_likes).mark_bar(size=20, color = "#008fd5").encode(
    # encode x as the percent, and hide the axis
    y=alt.Y(
        'Total Episodes Seen:N',
        ),
    x=alt.X(
        # encode y using the name, use the movie name to label the axis, sort using the names_l
        'Percent:Q',
         axis=alt.Axis(tickCount=0, labelColor = "black", labelFontSize = 12, grid = False),
         sort = options_l
    )
)

shot_text = shot_bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    # we'll use the percentage as the text
    text=alt.Text('Percent:Q',format='.0%')
)

jjar_text = jjar_chart.mark_text(
    align="left",
    baseline="middle",
    dx=7
).encode(
    text=alt.Text("Percent:Q",format=".0%")
)

jjar_chart = jjar_chart + jjar_text

jjar_chart = jjar_chart.configure_mark(
    color='source'
).configure_view(
    # we don't want a stroke around the bars
    strokeWidth=0
).configure_scale(
    # add some padding
    bandPaddingInner=0.2
).properties(
    # set the dimensions of the visualization
    width=500,
    height=150
).properties(
    # add a title
    title={
        "text": ["Who Likes Jar Jar Binks Least?"],
        "subtitle": ["According to 834 respondents"],
        "subtitleFontSize": 18
    }
)

jjar_chart

#raise NotImplementedError()

*Provide your justification here*

I wanted to learn more about who thinks favorably of Jar Jar Binks the least, i.e. does the more episodes of Star Wars one watches influence their distaste for the character? From a perception/cognition standpoint, my vis keeps the theme of simple horizontal bar charts evident throughout the article--this consistency allows viewers to get comfortable interpreting only one style of chart (with small variations) while reading the article instead of having to stop at each one and figure out what is going on with it.   