## Data Visualisation: Expressiveness, Effectiveness and alternate representations of visuals

#### Through the use of this notebook, we'll try to get visuals, critique them, try to recreate, and/or code in alternate representations (for better or worse!)

#### - Visualisation:

!["article"](assets/article_2_resized.png)

#### Let's try to make an alternative visual with the same degree of expressiveness

In [3]:
import altair as alt
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

In [1]:
### we'll manually recreate the data for starters

new_df=[['Bad Moms',1,0,0,0,1,1,0,1,1,1,1,1,0],
        ['Hidden Figures',1,0,0,0,1,1,0,1,1,1,0,0,1],
        ['Independence Day: Resurgence',1,0,0,0,1,1,0,1,1,0,0,1,1],
        ['Finding Dory',1,0,1,0,0,0,0,1,1,0,1,0,1],
        ['Ghostbusters',1,0,0,0,0,1,0,1,1,1,0,0,1],
        ['Allegiant',1,0,0,0,0,1,0,1,1,0,0,0,1],
        ['Arrival',1,0,0,0,0,0,0,1,1,0,0,1,1],
        ['Ice Age: Collision Course',1,0,1,0,0,1,0,0,0,1,0,1,0]]

dff1=['Movies','The Bechdel Test','The Uphold Test','The Rees Davies test','The White Test',
        'The Waithe Test','The Ko Test','The Villalobos test',
        'The Peirce Test','The Villarreal Test','The Landau Test',
       'The Hagen Test','The Koeze-Dottle','The Feldman Score']

In [4]:
x1 = pd.DataFrame(new_df,columns=dff1)
x1

Unnamed: 0,Movies,The Bechdel Test,The Uphold Test,The Rees Davies test,The White Test,The Waithe Test,The Ko Test,The Villalobos test,The Peirce Test,The Villarreal Test,The Landau Test,The Hagen Test,The Koeze-Dottle,The Feldman Score
0,Bad Moms,1,0,0,0,1,1,0,1,1,1,1,1,0
1,Hidden Figures,1,0,0,0,1,1,0,1,1,1,0,0,1
2,Independence Day: Resurgence,1,0,0,0,1,1,0,1,1,0,0,1,1
3,Finding Dory,1,0,1,0,0,0,0,1,1,0,1,0,1
4,Ghostbusters,1,0,0,0,0,1,0,1,1,1,0,0,1
5,Allegiant,1,0,0,0,0,1,0,1,1,0,0,0,1
6,Arrival,1,0,0,0,0,0,0,1,1,0,0,1,1
7,Ice Age: Collision Course,1,0,1,0,0,1,0,0,0,1,0,1,0


In [5]:
x1 =x1.melt(id_vars=['Movies'])

In [6]:
def conditions(x1):
    if (x1['variable'] in ['The Uphold Test','The Rees Davies test','The White Test']):
        return 'Behind The Camera Test'
    elif (x1['variable'] in ['The Waithe Test','The Ko Test','The Villalobos test']):
        return 'Intersectional Test'
    elif (x1['variable'] in ['The Peirce Test','The Villarreal Test','The Landau Test']):
        return 'Protagonists Test'
    elif (x1['variable'] in ['The Hagen Test','The Koeze-Dottle','The Feldman Score']):
        return 'Supporting Cast Test'
    else:
        return 'Bechdel Test'

In [7]:
x1['type'] = x1.apply(conditions, axis=1)
x1.type.unique()

array(['Bechdel Test', 'Behind The Camera Test', 'Intersectional Test',
       'Protagonists Test', 'Supporting Cast Test'], dtype=object)

In [15]:
domains=['Bechdel Test', 'Behind The Camera Test', 'Intersectional Test',
       'Protagonists Test', 'Supporting Cast Test']
color_scale = alt.Scale(
    domain=domains,
    range=['rgb(64,160,152)', 'rgb(194,81,64)', 'rgb(172,102,96)', 'rgb(217,132,155)','rgb(10,120,150)']
)
p1 = alt.Chart(x1).mark_bar().encode(
    x=alt.X('Movies:O'),
    y=alt.Y('value',stack='zero',title='Tests Passed'),
    color=alt.Color('type',scale=color_scale),

    tooltip = [alt.Tooltip('variable',title='Test Name')]
    
).properties(
    width=800,
    height=480)

p2 = alt.Chart(x1).mark_tick(thickness=6,color='white').encode(
    x=alt.X('Movies:O'),
    y=alt.Y('value',stack='zero'),
    
    
).properties(
    width=800,
    height=150)

p1+p2

Expressiveness:

- In both cases the type of test can be determined by the color coding.
- On hovering on the blocks the name of the test is visible. Similar in both cases.
- The names of the movies are clearly listed in both cases.
- The same details that can be inferred from both the visuals except for the images of teh test creators.
- The difference we observe is that, in the self-created viz, the values are vertically stacked onto eachother with ticks to show separation between tests in a category(Behind camera, Intersectional, Protagonists, etc).

Since "Effectiveness" matters in context, let's consider one where the idea is to understand which movies passed most tests and how many of each category do they pass?
- Counting types of tests (Intersectional, Protagonists, etc) is fairly straightforward in both cases given their proximity to eachother and the color coding both the visuals stand even.
- But when we want to get an idea of the total number of tests a movie passes the self-created visual does a better job for 2 reasons: 
    1. Counting horizontally is tougher than counting vertically, especially with gaps in between. Due to the spaces we are forced to do a near-serial processing to count the total tests passed.  
    2. There is no scale to help in counting.
- The self-created viz makes it easy to count due to vertical stacking without gaps and also has the number of tests passed present as a value to directly read from given the presence of a scale(y-axis +1 for every test passed).