## Data Visualization: Effectiveness, Expressiveness and an Alternate Ecnoding 

Visual 2:

!["Viz"](assets/have_seen_resized.png)

We'll try to focus on the achieving a similar or better level of effectveness and shall create a synthetic dataset for the purpose of generating the visual

In [3]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [8]:
sw = pd.read_csv('assets/StarWars.csv', encoding='latin1')

In [9]:
sw = sw.rename(columns={'Have you seen any of the 6 films in the Star Wars franchise?':'seen_any_movie',
                        'Do you consider yourself to be a fan of the Star Wars film franchise?': 'fan',
                        'Which of the following Star Wars films have you seen? Please select all that apply.' : 'seen_EI',
                        'Unnamed: 4' : 'seen_EII',
                        'Unnamed: 5' : 'seen_EIII',
                        'Unnamed: 6' : 'seen_EIV',
                        'Unnamed: 7' : 'seen_EV',
                        'Unnamed: 8' : 'seen_EVI',
                        'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'rank_EI',
                        'Unnamed: 10' : 'rank_EII',
                        'Unnamed: 11' : 'rank_EIII',
                        'Unnamed: 12' : 'rank_EIV',
                        'Unnamed: 13' : 'rank_EV',
                        'Unnamed: 14' : 'rank_EVI',
                        'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.' : 'Han Solo',
                        'Unnamed: 16' : 'Luke Skywalker',
                        'Unnamed: 17' : 'Princess Leia Organa',
                        'Unnamed: 18' : 'Anakin Skywalker',
                        'Unnamed: 19' : 'Obi Wan Kenobi',
                        'Unnamed: 20' : 'Emperor Palpatine',
                        'Unnamed: 21' : 'Darth Vader',
                        'Unnamed: 22' : 'Lando Calrissian',
                        'Unnamed: 23' : 'Boba Fett',
                        'Unnamed: 24' : 'C-3P0',
                        'Unnamed: 25' : 'R2 D2',
                        'Unnamed: 26' : 'Jar Jar Binks',
                        'Unnamed: 27' : 'Padme Amidala',
                        'Unnamed: 28' : 'Yoda',
                       })
sw = sw.drop([0])

In [10]:
episodes = ['EI', 'EII', 'EIII', 'EIV', 'EV', 'EVI']
names = {
    'EI' : 'The Phantom Meanance', 'EII' : 'Attack of the clones', 'EIII' : 'Revenge of the Sith', 
    'EIV': 'A New Hope', 'EV': 'The Empire Strikes Back', 'EVI' : 'The Return of the Jedi'
}

names_l = [names[ep] for ep in episodes]

print("sort order: ",names_l)

sort order:  ['The Phantom Meanance', 'Attack of the clones', 'Revenge of the Sith', 'A New Hope', 'The Empire Strikes Back', 'The Return of the Jedi']


In [11]:
seen_at_least_one = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')
total = len(seen_at_least_one)

In [17]:
percs = []

for seen_ep in ['seen_' + ep for ep in episodes]:
    perc = len(seen_at_least_one[~ pd.isna(seen_at_least_one[seen_ep])]) / total
    percs.append(perc)
    
tuples = list(zip([names[ep] for ep in episodes],percs))
seen_per_df = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
seen_per_df.Percentage = seen_per_df.Percentage.apply(lambda x:np.round(x,2))
seen_per_df

Unnamed: 0,Name,Percentage
0,The Phantom Meanance,0.81
1,Attack of the clones,0.68
2,Revenge of the Sith,0.66
3,A New Hope,0.73
4,The Empire Strikes Back,0.91
5,The Return of the Jedi,0.88


In [4]:
df = pd.DataFrame([['The Phantom Menace',.80],['Attack of the Clones',.68],['Revenge of the Sith',.66],['A New Hope',.73],['The Empire Strikes Back',.91],['Return of the Jedi',.88]],columns=['Movie Names','View perc'])
df

Unnamed: 0,Movie Names,View perc
0,The Phantom Menace,0.8
1,Attack of the Clones,0.68
2,Revenge of the Sith,0.66
3,A New Hope,0.73
4,The Empire Strikes Back,0.91
5,Return of the Jedi,0.88


In [19]:
a1=alt.Chart(seen_per_df).mark_bar(color='grey').encode(
    y=alt.Y('Name',sort='-x',stack='zero',axis=alt.Axis(labels=False,ticks=False),title=None),
    x=alt.X('Percentage',axis=alt.Axis(labels=False,ticks=False),title=None)
).transform_window(
    rank='rank(Percentage)',
    sort=[alt.SortField('Percentage', order='descending')]
).properties(height=250)

text = alt.Chart(seen_per_df).mark_text(dx=-15, dy=3, color='white',align='right',fontWeight='bold',fontSize=15).encode(
    y=alt.Y('Name', stack='zero',sort='-x'),
    x=alt.X('Percentage'),
    text=alt.Text('Name')
).transform_window(
    rank='rank(Percentage)',
    sort=[alt.SortField('Percentage', order='descending')]
)
text2 = alt.Chart(seen_per_df).mark_text(color='grey',fontWeight='bold', fontSize=16,
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    y=alt.Y('Name', stack='zero',sort='-x'),
    x=alt.X('Percentage'),
    text=alt.Text('Percentage:Q',
    format='1.0%')).transform_window(
    rank='rank(Percentage)',
    sort=[alt.SortField('Percentage', order='descending')]
)

a1+text+text2

![1.2](assets/1pt_2.png)

1. In the fivethirtyeight viz there is no ordering while in the alternative visual there is.
    - the benefit of ordering the bars is that they instantly show the hierarchy to the eye for values that might be closer to easily distinguish.
    - With the percentages present it does solve that in a way but, if we show numbers it might be better to order them for a quicker visual understanding.
2. The distance between the movie names and the percentages.
    - This is important when we want to compare movies with eachother as one has to go back and forth from the percentage to the movie name, store it in the mind, move down to the next movie and repeat.
    - I personally felt that having the percentages helped me evaluate one movie against another much faster.(eyes don't have to travel as much)

- The created viz would do better on preattentive process.
- Both maintain simplicity (Law of Prägnanz). In the created viz, having names inscribed in the bars makes comparison easier.
- The additional ordering from high to low makes understanding differences between close ranges quite straightforward as compared to the 538viz.
- The created viz also does not challenge any other visual limitations given the simplistic approach being adopted.