## Data Visualization: Effectiveness, Expressiveness and an Alternate Encoding 

Visual 3:

!["viz3"](assets/char_ranking_resized.png)

We'll try to get an alternate visual up and then talk about why this one does such a good job at converying the appropriate message.

Let's also take a look at what a recreation of this viz looks liek in altair

In [3]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [4]:
sw = pd.read_csv('assets/StarWars.csv', encoding='latin1')

In [5]:
sw = sw.rename(columns={'Have you seen any of the 6 films in the Star Wars franchise?':'seen_any_movie',
                        'Do you consider yourself to be a fan of the Star Wars film franchise?': 'fan',
                        'Which of the following Star Wars films have you seen? Please select all that apply.' : 'seen_EI',
                        'Unnamed: 4' : 'seen_EII',
                        'Unnamed: 5' : 'seen_EIII',
                        'Unnamed: 6' : 'seen_EIV',
                        'Unnamed: 7' : 'seen_EV',
                        'Unnamed: 8' : 'seen_EVI',
                        'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'rank_EI',
                        'Unnamed: 10' : 'rank_EII',
                        'Unnamed: 11' : 'rank_EIII',
                        'Unnamed: 12' : 'rank_EIV',
                        'Unnamed: 13' : 'rank_EV',
                        'Unnamed: 14' : 'rank_EVI',
                        'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.' : 'Han Solo',
                        'Unnamed: 16' : 'Luke Skywalker',
                        'Unnamed: 17' : 'Princess Leia Organa',
                        'Unnamed: 18' : 'Anakin Skywalker',
                        'Unnamed: 19' : 'Obi Wan Kenobi',
                        'Unnamed: 20' : 'Emperor Palpatine',
                        'Unnamed: 21' : 'Darth Vader',
                        'Unnamed: 22' : 'Lando Calrissian',
                        'Unnamed: 23' : 'Boba Fett',
                        'Unnamed: 24' : 'C-3P0',
                        'Unnamed: 25' : 'R2 D2',
                        'Unnamed: 26' : 'Jar Jar Binks',
                        'Unnamed: 27' : 'Padme Amidala',
                        'Unnamed: 28' : 'Yoda',
                       })
sw = sw.drop([0])

In [6]:
episodes = ['EI', 'EII', 'EIII', 'EIV', 'EV', 'EVI']
names = {
    'EI' : 'The Phantom Meanance', 'EII' : 'Attack of the clones', 'EIII' : 'Revenge of the Sith', 
    'EIV': 'A New Hope', 'EV': 'The Empire Strikes Back', 'EVI' : 'The Return of the Jedi'
}

names_l = [names[ep] for ep in episodes]

print("sort order: ",names_l)

sort order:  ['The Phantom Meanance', 'Attack of the clones', 'Revenge of the Sith', 'A New Hope', 'The Empire Strikes Back', 'The Return of the Jedi']


In [7]:
seen_at_least_one = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')
total = len(seen_at_least_one)

In [9]:
sw.columns
alt.themes.enable('fivethirtyeight')
actor_df=sw.ix[:,'Han Solo':'Yoda']
actor_df = actor_df.dropna(subset=['Han Solo',
       'Luke Skywalker', 'Princess Leia Organa', 'Anakin Skywalker',
       'Obi Wan Kenobi', 'Emperor Palpatine', 'Darth Vader',
       'Lando Calrissian', 'Boba Fett', 'C-3P0', 'R2 D2', 'Jar Jar Binks',
       'Padme Amidala', 'Yoda'],how='all')
actor = actor_df.melt()
types=['Very favorably', 'Somewhat favorably',
       'Neither favorably nor unfavorably (neutral)',
       'Somewhat unfavorably', 'nan', 'Unfamiliar (N/A)',
       'Very unfavorably']
fav = actor[(actor.value == 'Very favorably') | (actor.value =='Somewhat favorably')]
unfav = actor[(actor.value == types[3]) | (actor.value ==types[6])]
neutral = actor[(actor.value == types[2])]
unfam = actor[(actor.value == types[5])]

sorter=['Luke Skywalker','Han Solo','Princess Leia Organa','Obi Wan Kenobi','Yoda','R2 D2','C-3P0'
       ,'Anakin Skywalker','Darth Vader','Lando Calrissian','Padme Amidala','Boba Fett','Emperor Palpatine',
       'Jar Jar Binks']



fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav[fav.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
fav

unfav_perc=[]
for name in sorter:
    unfav_perc.append(round((len(unfav[unfav.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,unfav_perc))   
unfav = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])

neutral_perc=[]
for name in sorter:
    neutral_perc.append(round((len(neutral[neutral.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,neutral_perc))   
neutral = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])

unfam_perc=[]
for name in sorter:
    unfam_perc.append(round((len(unfam[unfam.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,unfam_perc))   
unfam = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])

########################################################################################################################
hi=alt.Chart(fav).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='green'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=True,gridOpacity=0),title=None,sort=names_l),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0, domain=False),title=None)
)
hello = alt.Chart(fav).mark_text(dx=3, dy=0, color='green',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter),
    x=alt.X('Percentage:Q'),
    text=alt.Text('Percentage:Q',format="1.0%")
)
f = (hi+hello).properties(height=260,width=80,
    # add a title
    title={"text":"Favorable",
           "fontSize":12,
           "anchor":"middle",
           "fontWeight":"normal"
           })
########################################################################################################################
hi1=alt.Chart(neutral).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='#008fd5'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=names_l),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0,domain=False),title=None)
)
hello1 = alt.Chart(neutral).mark_text(dx=3, dy=0, color='#008fd5',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=names_l,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
n = (hi1+hello1).properties(height=260,width=30,
    # add a title
    title={"text":"Neutral",
           "fontSize":12,
           "anchor":"middle",
           "fontWeight":"normal"
           })
########################################################################################################################
hi1=alt.Chart(unfav).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='red'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=names_l),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0),title=None)
)
hello1 = alt.Chart(unfav).mark_text(dx=3, dy=0, color='red',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=names_l,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
u = (hi1+hello1).properties(height=260,width=30,
    # add a title
    title={"text":"Unfavorable",
           "fontSize":12,
           "anchor":"middle",
           "fontWeight":"normal"
           })
########################################################################################################################
hi1=alt.Chart(unfam).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='grey'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=names_l),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0),title=None)
)
hello1 = alt.Chart(unfam).mark_text(dx=3, dy=0, color='grey',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=names_l,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
m = (hi1+hello1).properties(height=260,width=30,
    # add a title
    title={"text":"Unfamiliar",
           "fontSize":12,
           "anchor":"middle",
           "fontWeight":"normal"
           })
########################################################################################################################
(f|n|u|m).configure_axis(
    labelFontSize=11,
    titleFontSize=20,
    grid=False).configure_view(
    strokeWidth=0).properties(
    title={"text":"'Star Wars' Characters Favorability Rating",
           
           "subtitle":["By 834 respondents"],
           "subtitleColor": "black",
           "subtitleFontSize":18
          })

![2.3](assets/33.png)

The alternate visual:

In [10]:
sw.columns
actor_df=sw.ix[:,'Han Solo':'Yoda']
actor_df = actor_df.dropna(subset=['Han Solo',
       'Luke Skywalker', 'Princess Leia Organa', 'Anakin Skywalker',
       'Obi Wan Kenobi', 'Emperor Palpatine', 'Darth Vader',
       'Lando Calrissian', 'Boba Fett', 'C-3P0', 'R2 D2', 'Jar Jar Binks',
       'Padme Amidala', 'Yoda'],how='all')
actor = actor_df.melt()
types=['Very favorably', 'Somewhat favorably',
       'Neither favorably nor unfavorably (neutral)',
       'Somewhat unfavorably', 'nan', 'Unfamiliar (N/A)',
       'Very unfavorably']
fav = actor[(actor.value == 'Very favorably') | (actor.value =='Somewhat favorably')]
unfav = actor[(actor.value == types[3]) | (actor.value ==types[6])]
neutral = actor[(actor.value == types[2])]
unfam = actor[(actor.value == types[5])]

sorter=['Luke Skywalker','Han Solo','Princess Leia Organa','Obi Wan Kenobi','Yoda','R2 D2','C-3P0'
       ,'Anakin Skywalker','Darth Vader','Lando Calrissian','Padme Amidala','Boba Fett','Emperor Palpatine',
       'Jar Jar Binks']



fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav[fav.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
fav

unfav_perc=[]
for name in sorter:
    unfav_perc.append(round((len(unfav[unfav.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,unfav_perc))   
unfav = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
unfav = unfav.sort_values(by='Percentage',ascending=False)
unfavorder=list(unfav.Name)

neutral_perc=[]
for name in sorter:
    neutral_perc.append(round((len(neutral[neutral.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,neutral_perc))   
neutral = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
neutral = neutral.sort_values(by='Percentage',ascending=False)
neutralorder=list(neutral.Name)

unfam_perc=[]
for name in sorter:
    unfam_perc.append(round((len(unfam[unfam.variable==name]))/(len(actor[actor.variable==name])-actor[actor.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,unfam_perc))   
unfam = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
unfam = unfam.sort_values(by='Percentage',ascending=False)
unfamorder=list(unfam.Name)

########################################################################################################################
text = alt.Chart(fav).mark_text(dx=-6, dy=0, color='green',align='right',fontWeight='bold',fontSize=15).encode(
    y=alt.Y('Name', stack='zero',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None,sort=sorter),
    x=alt.X('Percentage',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None),
    text=alt.Text('Name')
)
hello = alt.Chart(fav).mark_text(dx=3, dy=0, color='green',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter),
    x=alt.X('Percentage:Q'),
    text=alt.Text('Percentage:Q',format="1.0%")
)
# raise NotImplementedError()
f = (hello+text).properties(height=260,width=80,
    title={"text":"Favorable",
           "fontSize":18,
           "anchor":"middle",
           "fontWeight":"bold",
           "color":'green'
           
           })
########################################################################################################################
text = alt.Chart(neutral).mark_text(dx=-6, dy=0, color='#008fd5',align='right',fontWeight='bold',fontSize=15).encode(
    y=alt.Y('Name', stack='zero',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None,sort=neutralorder),
    x=alt.X('Percentage',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None),
    text=alt.Text('Name')
)
hello1 = alt.Chart(neutral).mark_text(dx=3, dy=0, color='#008fd5',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=neutralorder,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
n = (hello1+text).properties(height=260,width=80,
    title={"text":"Neutral",
           "fontSize":18,
           "anchor":"middle",
           "fontWeight":"bold",
           "color":'#008fd5'
           })
########################################################################################################################
text = alt.Chart(unfav).mark_text(dx=-6, dy=0, color='red',align='right',fontWeight='bold',fontSize=15).encode(
    y=alt.Y('Name', stack='zero',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None,sort=unfavorder),
    x=alt.X('Percentage',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None),
    text=alt.Text('Name')
)
hello1 = alt.Chart(unfav).mark_text(dx=3, dy=0, color='red',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=unfavorder,axis=alt.Axis(ticks=False,gridOpacity=0,labels=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
u = (hello1+text).properties(height=260,width=80,
    title={"text":"Unfavorable",
           "fontSize":18,
           "anchor":"middle",
           "fontWeight":"bold",
           "color":'red'
           })
########################################################################################################################
text = alt.Chart(unfam).mark_text(dx=-6, dy=0, color='grey',align='right',fontWeight='bold',fontSize=15).encode(
    y=alt.Y('Name', stack='zero',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None,sort=unfamorder),
    x=alt.X('Percentage',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None),
    text=alt.Text('Name')
)
hello1 = alt.Chart(unfam).mark_text(dx=3, dy=0, color='grey',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=unfamorder,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
m = (hello1+text).properties(height=260,width=80,
    title={"text":"Unfamiliar",
           "fontSize":18,
           "anchor":"middle",
           "fontWeight":"bold",
           "color":'grey'
           })
########################################################################################################################
((f|u)&(n|m)).configure_axis(
    labelFontSize=11,
    titleFontSize=20,
    grid=False).configure_view(
    strokeWidth=0).properties(
    # add a title
    title={"text":"Star Wars Characters Favorability Rating",
           
           "subtitle":["By 834 respondents"],
           "subtitleColor": "black",
           "subtitleFontSize":18
          })

![2.51](assets/2pt51.png)

1. Let's first address the differences in the 2 visuals.
    - 538 viz:
        - Bars for encoding.
        - Ordering only in the first column (Favorable).
        - Ordering and placement of the different categories is in a single row.
    - Created viz:
        - Textual encoding in place of bars.
        - All columns are ordered from highest to lowest.
        - Ordering and placement of the different categories is placed across 2 rows.

2. Let's compare the effects.
    - The visual channel is limited and prefers simple presenations.
        - The 538 viz does not overload the visual channel and is quite simple and concise.
        - The created viz on the other hand has overwhelming amounts of text and looks overly elaborate.
    - The time to process.
        - The 538 viz lacks ordering for the all columns except the first. That can be understood as viewers would have a greater interest in favorability over other columns. But other columns would also benefit from such ordering as high favorability is exactly mapped to unfavorability given presence of other categories.
        - The created viz has a high to low ordering throughout and also moves unfavorability beside to favorability for quicker comparison.
    - Preattentive processing.
        - In both the visuals the colors chosen for categories relate to the mindsets we associate to these colors. (green - success/positive/favoribility; red - danger/risky/not right; blue - cool/calm/nutral; grey - misc/other)
    - Weber's law.
        - bars become a challenge to distiguish between percentages in close ranges. The text specification for the percentage covers for it. The case is similar for the created viz as the text position doesn't do a good job at conveying the percentages.
    - Gestalt Psychology.
        - Both the visuals use color to convey similarity and position to show proximity of entires in addition to labeling the category to indicate the same.
        
3. Summary
    - In this case the overwhelming amounts of text creates a feel of clutter.
    - The 538 viz on the other hand lacks ordering on other categories, but with an understandable reason for the same.
    - Considering everything mentioned above the 538 viz is a more effective viz for the purpose it is meant to serve. 