## Data Visualization: Effectiveness, Expressiveness and an Alternate Encoding 

Visual 4:

!["viz4"](assets/char_ranking_resized.png)

In this visualization, we'll try to use the same data to try and get an alternative insight. Then, we'll go on to compare it's effectiveness and expressiveness using the 538 viz as a reference.

In [1]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [2]:
sw = pd.read_csv('assets/StarWars.csv', encoding='latin1')

In [3]:
sw = sw.rename(columns={'Have you seen any of the 6 films in the Star Wars franchise?':'seen_any_movie',
                        'Do you consider yourself to be a fan of the Star Wars film franchise?': 'fan',
                        'Which of the following Star Wars films have you seen? Please select all that apply.' : 'seen_EI',
                        'Unnamed: 4' : 'seen_EII',
                        'Unnamed: 5' : 'seen_EIII',
                        'Unnamed: 6' : 'seen_EIV',
                        'Unnamed: 7' : 'seen_EV',
                        'Unnamed: 8' : 'seen_EVI',
                        'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.' : 'rank_EI',
                        'Unnamed: 10' : 'rank_EII',
                        'Unnamed: 11' : 'rank_EIII',
                        'Unnamed: 12' : 'rank_EIV',
                        'Unnamed: 13' : 'rank_EV',
                        'Unnamed: 14' : 'rank_EVI',
                        'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.' : 'Han Solo',
                        'Unnamed: 16' : 'Luke Skywalker',
                        'Unnamed: 17' : 'Princess Leia Organa',
                        'Unnamed: 18' : 'Anakin Skywalker',
                        'Unnamed: 19' : 'Obi Wan Kenobi',
                        'Unnamed: 20' : 'Emperor Palpatine',
                        'Unnamed: 21' : 'Darth Vader',
                        'Unnamed: 22' : 'Lando Calrissian',
                        'Unnamed: 23' : 'Boba Fett',
                        'Unnamed: 24' : 'C-3P0',
                        'Unnamed: 25' : 'R2 D2',
                        'Unnamed: 26' : 'Jar Jar Binks',
                        'Unnamed: 27' : 'Padme Amidala',
                        'Unnamed: 28' : 'Yoda',
                       })
sw = sw.drop([0])

In [4]:
episodes = ['EI', 'EII', 'EIII', 'EIV', 'EV', 'EVI']
names = {
    'EI' : 'The Phantom Meanance', 'EII' : 'Attack of the clones', 'EIII' : 'Revenge of the Sith', 
    'EIV': 'A New Hope', 'EV': 'The Empire Strikes Back', 'EVI' : 'The Return of the Jedi'
}

names_l = [names[ep] for ep in episodes]

print("sort order: ",names_l)

sort order:  ['The Phantom Meanance', 'Attack of the clones', 'Revenge of the Sith', 'A New Hope', 'The Empire Strikes Back', 'The Return of the Jedi']


In [5]:
seen_at_least_one = sw.dropna(subset=['seen_' + ep for ep in episodes],how='all')
total = len(seen_at_least_one)

In [6]:
actor_df=sw[['Han Solo',
       'Luke Skywalker', 'Princess Leia Organa', 'Anakin Skywalker',
       'Obi Wan Kenobi', 'Emperor Palpatine', 'Darth Vader',
       'Lando Calrissian', 'Boba Fett', 'C-3P0', 'R2 D2', 'Jar Jar Binks',
       'Padme Amidala', 'Yoda','Age']]
actor_df = actor_df.dropna(subset=['Han Solo',
       'Luke Skywalker', 'Princess Leia Organa', 'Anakin Skywalker',
       'Obi Wan Kenobi', 'Emperor Palpatine', 'Darth Vader',
       'Lando Calrissian', 'Boba Fett', 'C-3P0', 'R2 D2', 'Jar Jar Binks',
       'Padme Amidala', 'Yoda'],how='all')
actor = actor_df.melt(id_vars=['Age'])

young = actor[(actor.Age == '18-29')]
adult = actor[(actor.Age == '30-44')]
mid = actor[(actor.Age == '> 60')]
old = actor[(actor.Age == '45-60')]

sorter=['Luke Skywalker','Han Solo','Princess Leia Organa','Obi Wan Kenobi','Yoda','R2 D2','C-3P0'
       ,'Anakin Skywalker','Darth Vader','Lando Calrissian','Padme Amidala','Boba Fett','Emperor Palpatine',
       'Jar Jar Binks']

fav_young = young[(young.value == 'Very favorably') | (young.value =='Somewhat favorably')]
fav_adult = adult[(adult.value == 'Very favorably') | (adult.value =='Somewhat favorably')]
fav_mid = mid[(mid.value == 'Very favorably') | (mid.value =='Somewhat favorably')]
fav_old = old[(old.value == 'Very favorably') | (old.value =='Somewhat favorably')]

fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav_young[fav_young.variable==name]))/(len(young[young.variable==name])-young[young.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav_young = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])

fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav_adult[fav_adult.variable==name]))/(len(adult[adult.variable==name])-adult[adult.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav_adult = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
fav_adult

fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav_mid[fav_mid.variable==name]))/(len(mid[mid.variable==name])-mid[mid.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav_mid = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
fav_mid

fav_perc=[]
for name in sorter:
    fav_perc.append(round((len(fav_old[fav_old.variable==name]))/(len(old[old.variable==name])-old[old.variable==name].value.isna().sum()),2))
tuples= list(zip(sorter,fav_perc))   
fav_old = pd.DataFrame(tuples, columns = ['Name', 'Percentage'])
fav_old

alt.themes.enable('fivethirtyeight')

########################################################################################################################
hi=alt.Chart(fav_young).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='#9CBA7F'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0, domain=False),title=None,sort=sorter),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0,domain=False),title=None)
)
hello = alt.Chart(fav_young).mark_text(dx=3, dy=0, color='#9CBA7F',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter),
    x=alt.X('Percentage:Q'),
    text=alt.Text('Percentage:Q',format="1.0%")
)
f = (hi+hello).properties(height=260,width=20,
    title={"text":"AGE: 18-29",
           "fontSize":16,
           "anchor":"middle",
           "fontWeight":"bold"
           })
########################################################################################################################
hi1=alt.Chart(fav_adult).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='#66CD00'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=sorter),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0),title=None)
)
hello1 = alt.Chart(fav_adult).mark_text(dx=3, dy=0, color='#66CD00',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
n = (hi1+hello1).properties(height=260,width=20,
    title={"text":"30-44",
           "fontSize":16,
           "anchor":"middle",
           "fontWeight":"bold"
           })
########################################################################################################################
hi1=alt.Chart(fav_mid).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='#78AB46'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=sorter),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0),title=None)
)
hello1 = alt.Chart(fav_mid).mark_text(dx=3, dy=0, color='#78AB46',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
u = (hi1+hello1).properties(height=260,width=20,
    title={"text":"45-60",
           "fontSize":16,
           "anchor":"middle",
           "fontWeight":"bold"
           })
########################################################################################################################
hi1=alt.Chart(fav_old).mark_bar(
    opacity=0.8,
    stroke='black',
    strokeWidth=0,
    color='#636F57'
).encode(
    y = alt.Y('Name:N',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=sorter),
    x = alt.X('Percentage:Q',axis=alt.Axis(labels=False,ticks=False,gridOpacity=0),title=None)
)
hello1 = alt.Chart(fav_old).mark_text(dx=3, dy=0, color='#636F57',align='left',fontSize=13).encode(
    y=alt.Y('Name', stack='zero',sort=sorter,axis=alt.Axis(ticks=False,grid=False,domain=False)),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,grid=False,domain=False)),
    text=alt.Text('Percentage:Q',format="1.0%")
)
m = (hi1+hello1).properties(height=260,width=20,
    # add a title
    title={"text":"> 60",
           "fontSize":16,
           "anchor":"middle",
           "fontWeight":"bold"
           })
########################################################################################################################
z=alt.Chart(pd.concat([fav_young,fav_adult,fav_mid,fav_old])).mark_boxplot(color='#52a3a1').encode(
    y=alt.Y('Name:O',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False),title=None,sort=sorter),
    x=alt.X('Percentage:Q',axis=alt.Axis(ticks=False,gridOpacity=0,labels=False,domain=False),title=None)
).properties(height=260,width=400,
            title={"text":"Let's understand the spread",
           "fontSize":16,
           "anchor":"middle",
           "fontWeight":"bold"
           })

(f|n|u|m|z).configure_axis(
    labelFontSize=11,
    titleFontSize=20,
    grid=False).configure_view(
    strokeWidth=0).properties(
    title={"text":["Star Wars Characters Favorable Rating Among Different Age Groups"],
           
           "subtitle":["Variation in favorableness recorded from 834 respondents"],
           "subtitleColor": "black",
           "subtitleFontSize":18
          })


![2.52](assets/2pt52.png)
1. Let's first address the differences in the 2 visuals.
    - Created viz:
        - Bars for encoding.
        - Ordering only on Name ordering.
        - Ordering and placement of the different categories is in a single row.
        - Presence of boxplots to understand the variation within the favorability category better with an interactive component to it.
   

2. Let's compare the effects.
    - The visual channel is limited and prefers simple presenations.
        - The created viz does not overload the visual channel and is quite simple and concise.
    - The time to process.
        - This visual has to components a bare representation of numbers and another one to understand the spread.
        - The first part is quite straightforward given everything follows ordering on Names.
        - The second part (boxplots) further helps improve the perception on variance amongst these 4 categories within favorability.
    - Weber's law.
        - Bars become a challenge to distiguish between percentages in close ranges. The text specification for the percentage covers for it. The box plot also helps understand the magnitude of the differences.
    - Gestalt Psychology.
        - Both the visuals use color to convey similarity and position to show proximity of entires in addition to labeling the category to indicate the same.
        - It's easy to imply which boxplot corresponds to which Names.
        - There is also an interactive feature to the boxplots which would help non-stats viewers to understand the viz.       