## Data Visualization: Effectiveness, Expressiveness and an Alternate Encoding 

Visual 9:

We'll use the same data as in the previous visuals and try and create a simple new visual

In [1]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [2]:
tweets = pd.read_csv('assets/tweets.csv')
tweets = pd.read_csv('assets/tweets.csv')

tweets['emojis'] = tweets['text'].str.findall(r'[^\w\s.,"@\'?/#!$%\^&\*;:{}=\-_`~()\U0001F1E6-\U0001F1FF]').str.len()

boxer_emojis = ['☘️','🇮🇪','🍀','💸','🤑','💰','💵','😴','😂','🤣','🥊','👊','👏','🇮🇪','💪','🔥','😭','💰']
for emoji in boxer_emojis:
    tweets[emoji] = tweets.text.str.count(emoji)
    
tweets['irish_pride'] = tweets['☘️'] + tweets['🇮🇪'] + tweets['🍀']
tweets['money_team'] = tweets['💸'] + tweets['🤑'] + tweets['💰'] +  tweets['💵']
tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')


teams = tweets.copy()
teams = teams.resample('1s').sum()
teams  = teams[(teams['💸']>0) | (teams['🤑']>0) | (teams['💰']>0) | (teams['💵']>0) | (teams['☘️']>0) | (teams['🍀']>0) | (teams['🇮🇪']>0) ]

mdf = teams['money_team'].rolling('4Min').mean().reset_index()
mdf['team'] = '💸🤑💰💵'
mdf = mdf.rename(columns={'money_team':'tweet_count'})

idf = teams['irish_pride'].rolling('4Min').mean().reset_index()
idf['team'] = '☘️🍀🇮🇪'
idf = idf.rename(columns={'irish_pride':'tweet_count'})

ndf = pd.concat([mdf,idf])

In [3]:
annotations = [['2017-08-27 00:15:00',4, 'Fight begins'],
               ['2017-08-27 00:22:00',5, 'McGregor does OK \nin the early rounds'],
               ['2017-08-27 00:53:00',4, 'Mayweather takes \nover and wins by \nTKO']]
a_df = pd.DataFrame(annotations, columns=['date','count','note'])
a_df['date'] = pd.to_datetime(a_df['date'])

In [6]:
tweets['money'] = tweets['text'].str.findall(r'[\U0001f911\U0001f4b0\U0001f4b8]*([^\#]+([Ff]+(loyd|LOYD)+)|([Mm]+(ayweather|AYTHER)+))*').str.len()
tweets['irish'] = tweets['text'].str.findall(r'[\U0001f1ee\U0001f340]*([^r]+([Cc]+onor+)|([Mm]+.(gregor|GREGOR)+))').str.len()

tweets.shape

use_irish = tweets[tweets['irish']>0]
use_money = tweets[tweets['money']>0]

use_irish.columns

use_money.columns

use_irish['😴'].sum()/len(use_irish),use_money['😴'].sum()/len(use_money)

use_irish['😂'].sum()/len(use_irish),use_money['😂'].sum()/len(use_money)

use_irish['🤣'].sum()/len(use_irish),use_money['🤣'].sum()/len(use_money)

use_irish['🥊'].sum()/len(use_irish),use_money['🥊'].sum()/len(use_money)

use_irish['👊'].sum()/len(use_irish),use_money['👊'].sum()/len(use_money)

use_irish['👏'].sum()/len(use_irish),use_money['👏'].sum()/len(use_money)

use_irish['💪'].sum()/len(use_irish),use_money['💪'].sum()/len(use_money)

use_irish['🔥'].sum()/len(use_irish),use_money['🔥'].sum()/len(use_money)

use_irish['😭'].sum()/len(use_irish),use_money['😭'].sum()/len(use_money)

f1 = pd.DataFrame([['😴',use_irish['😴'].sum()/len(use_irish),use_money['😴'].sum()/len(use_money)],['😂',use_irish['😂'].sum()/len(use_irish),use_money['😂'].sum()/len(use_money)],['🤣',use_irish['🤣'].sum()/len(use_irish),use_money['🤣'].sum()/len(use_money)],['🥊',use_irish['🥊'].sum()/len(use_irish),use_money['🥊'].sum()/len(use_money)],['👊',use_irish['👊'].sum()/len(use_irish),use_money['👊'].sum()/len(use_money)],['👏',use_irish['👏'].sum()/len(use_irish),use_money['💪'].sum()/len(use_money)],['🔥',use_irish['🔥'].sum()/len(use_irish),use_money['🔥'].sum()/len(use_money)],['😭',use_irish['😭'].sum()/len(use_irish),use_money['😭'].sum()/len(use_money)]],columns=['emoji','McGregor','Mayweather'])

x=f1.melt(id_vars=['emoji'])
x['value']=x.value*100

inter = alt.Chart(x).mark_bar(strokeWidth=0).encode(
    x=alt.X('variable',axis=alt.Axis(ticks=False,labels=False,gridOpacity=0,domain=False),title=None),
    y=alt.Y('value',axis=alt.Axis(ticks=True,gridOpacity=0, domain=False),title=None),
    color=alt.Color('variable:N',scale=alt.Scale(range=[ 'green','#ff9900']),title=None),
    column=alt.Column('emoji:N',header=alt.Header(labelFontSize=20,title=''))
)


(inter).configure_mark(
    color='#008fd5'
).configure_view(
    strokeWidth=0,
    strokeOpacity=0
).properties(
    title={"text":"Irish Pride VS The Money Team: Emoji War",
           
           "subtitle":["Which team dominated the emoji battle?"],
           "subtitleColor": "black",
           "subtitleFontSize":18
          }).configure_scale(
    bandPaddingInner=0.2
).configure_axis(
    labelFontSize=11,
    titleFontSize=20,    
).configure_legend(
                   
                    labelFontSize=20
                   
                  )

![2.6](assets/2pt6.png)

Perception and Cognition:
- The visual is simple and concise. (large bars would violate the data-ink ratio but they are more easier to read than their stripped down version. It's okay to violate the principle in this case.)
- The visual processing time is quick.
- The encoding in unabiguious, there isn't any presence of double encoding for the 2 categories being compared.
- Considering Weber's law and the perception of difference, that is a minor challenge in this case. There is no way for the viewer to know the exact percentage associated to each bar. Even when there is a real difference, the stats corresponding to the clap emoji are indistinguishable. This drastically increases the time to perceive, and then to map it appropriately. In this case, the viewer can't perceive differences throughout the graph even if she/he wanted to.
- Bars do a good job at maintaining the integrity of the information being conveyed without it being altered.
- Adding percentages to the bars would help.
- Finally, to address effectiveness, if the viewer wanted to know for exactly how many emojis a particular team was ahead on, due to all shortcomings in perception and interpretation, this visual would be low on effectiveness too.