## Data Visualization: Effectiveness, Expressiveness and an Alternate Encoding 

Visual 8:

!["viz8"](assets/chart_4.png) 

In this viz, along with the recreation, we'll also try to create an alternate look for the same viz and then compare them with eachother

In [1]:
import pandas as pd
import altair as alt
import numpy as np
import math

In [2]:
tweets = pd.read_csv('assets/tweets.csv')
tweets = pd.read_csv('assets/tweets.csv')

tweets['emojis'] = tweets['text'].str.findall(r'[^\w\s.,"@\'?/#!$%\^&\*;:{}=\-_`~()\U0001F1E6-\U0001F1FF]').str.len()

boxer_emojis = ['☘️','🇮🇪','🍀','💸','🤑','💰','💵','😴','😂','🤣','🥊','👊','👏','🇮🇪','💪','🔥','😭','💰']
for emoji in boxer_emojis:
    tweets[emoji] = tweets.text.str.count(emoji)
    
tweets['irish_pride'] = tweets['☘️'] + tweets['🇮🇪'] + tweets['🍀']
tweets['money_team'] = tweets['💸'] + tweets['🤑'] + tweets['💰'] +  tweets['💵']
tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')


teams = tweets.copy()
teams = teams.resample('1s').sum()
teams  = teams[(teams['💸']>0) | (teams['🤑']>0) | (teams['💰']>0) | (teams['💵']>0) | (teams['☘️']>0) | (teams['🍀']>0) | (teams['🇮🇪']>0) ]

mdf = teams['money_team'].rolling('4Min').mean().reset_index()
mdf['team'] = '💸🤑💰💵'
mdf = mdf.rename(columns={'money_team':'tweet_count'})

idf = teams['irish_pride'].rolling('4Min').mean().reset_index()
idf['team'] = '☘️🍀🇮🇪'
idf = idf.rename(columns={'irish_pride':'tweet_count'})

ndf = pd.concat([mdf,idf])

In [3]:
annotations = [['2017-08-27 00:15:00',4, 'Fight begins'],
               ['2017-08-27 00:22:00',5, 'McGregor does OK \nin the early rounds'],
               ['2017-08-27 00:53:00',4, 'Mayweather takes \nover and wins by \nTKO']]
a_df = pd.DataFrame(annotations, columns=['date','count','note'])
a_df['date'] = pd.to_datetime(a_df['date'])

In [4]:
alt.themes.enable('fivethirtyeight')

tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')


teams = tweets.copy()
teams['irish_pride']
teams = teams.resample('1s').sum()
teams=teams[(teams['😭']>0)|(teams['🤣']>0)]

t1=teams[['😭']]
t1['tweet_count']=t1
t1=t1.rolling('4Min').mean().reset_index()
t1['sym']= '😭' 

t2=teams[['🤣']]
t2['tweet_count']=t2
t2=t2.rolling('4Min').mean().reset_index()
t2['sym']='🤣'

ndf  = pd.concat([t1,t2])
teams.columns

bars=alt.Chart(ndf).mark_line(
    opacity=1,
    strokeWidth=1.4,
    fontSize=70,
    fontWeight='bold',
    size=2.0
).encode(
    y = alt.Y('tweet_count',axis=alt.Axis(tickCount=4,domain=True),title="Four minute rolling average"),
    x = alt.X('datetime',axis=alt.Axis(tickCount=4,domain=True,format = ("%I:%M")),title=None),
    color=alt.Color('sym',
                    scale=alt.Scale(
            range=[ '#00cccc','#ff8c1a']),title=''
                   ),
).properties(width=500, height=300)



df=pd.DataFrame([['2017-08-27 00:24:00',1.65],['2017-08-27 00:29:00',1.80]],columns=['a','b'])
df.a=pd.to_datetime(df.a)
dfr=alt.Chart(df).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    fontSize=70
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

df1=pd.DataFrame([['2017-08-27 00:15:00',1.4],['2017-08-27 00:15:00',2.1]],columns=['a','b'])
df1.a=pd.to_datetime(df1.a)
dfr1=alt.Chart(df1).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    width=4.6,
    fontWeight='bold'
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

df2=pd.DataFrame([['2017-08-27 00:55:00',.30],['2017-08-27 00:55:00',.85]],columns=['a','b'])
df2.a=pd.to_datetime(df2.a)
dfr2=alt.Chart(df2).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    fontWeight='bold'
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

ad_df=pd.DataFrame([['2017-08-27 00:10:00',2.2,'Fight begins'],['2017-08-27 00:30:00',2.1,'McGregor \nimpresses \nearly'],['2017-08-27 00:50:00',.1,'Fight ends']],columns=['date','count','note'])
ad_df.date=pd.to_datetime(ad_df.date)
ad = alt.Chart(ad_df).mark_text(
    opacity=0.9,
    strokeWidth=1.2,
    lineBreak='\n',
    size=14,
    align='left'
).encode(
    y=alt.Y('count', stack='zero',axis=alt.Axis(tickCount=5,)),
    x=alt.X('date',axis=alt.Axis(tickCount=5),title=None),
    text=alt.Text('note')
).properties(width=500, height=300)

(bars+dfr1+dfr+dfr2+ad).configure_axis(
    labelFontSize=11,
    titleFontSize=16,
    ).configure_view(
    strokeWidth=0).properties(
    title={"text":"Tears were shed - of joy and sorrow",
           
           "subtitle":["Four minute rolling average of the number of uses of selected emoji in","sampled tweets during the Mayweather-McGregor fight"],
           "subtitleColor": "black",
           "subtitleFontSize":16,
           "fontSize":26,
           "align":'left'
          }).configure_scale(
    bandPaddingInner=0.2
).configure_legend(orient='top',
                   symbolType='stroke',
                    labelFontSize=25,
                   symbolSize=165
                   
                  ).configure_title(anchor='start').configure_axisLeft(titleFontSize=14)

![2.4](assets/2pt4_.png)

Let's try to increase the ink in viz and see how that affects our perception.

In [5]:
alt.themes.enable('fivethirtyeight')
tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')


teams = tweets.copy()
teams['irish_pride']
teams = teams.resample('1s').sum()
teams=teams[(teams['😭']>0)|(teams['🤣']>0)]

t1=teams[['😭']]
t1['tweet_count']=t1
t1=t1.rolling('4Min').mean().reset_index()
t1['sym']= '😭' 

tx=teams[['😭']]
tx['tweet_count']=tx
tx=tx.rolling('4Min').mean().reset_index()
tx['sym']= '😭'

t2=teams[['🤣']]
t2['tweet_count']=t2
t2=t2.rolling('4Min').mean().reset_index()
t2['sym']='🤣'
ty=teams[['🤣']]
ty['tweet_count']=ty
ty=ty.rolling('4Min').mean().reset_index()
ty['sym']= '🤣'

ndf  = pd.concat([t1,t2])
xr=pd.concat([tx,ty])
teams.columns

bars=alt.Chart(xr).mark_circle(
    opacity=1,
    strokeWidth=1.4,
    fontSize=70,
    fontWeight='bold',
    size=50.0
).encode(
    y = alt.Y('tweet_count',axis=alt.Axis(tickCount=4,domain=True),title="Four minute rolling average"),
    x = alt.X('datetime',axis=alt.Axis(tickCount=4,domain=True,format = ("%I:%M")),title=None),
    color=alt.Color('sym',
                    scale=alt.Scale(
            range=[ '#00cccc','#ff8c1a']),title=''
                   ),
).properties(width=600, height=300)

barsn=alt.Chart(ndf).transform_filter(
    alt.datum.sym == '🤣'
).mark_area(
    opacity=0.3,
    strokeWidth=1.4,
    fontSize=70,
    fontWeight='bold',
    size=2.0
).encode(
    y = alt.Y('tweet_count',axis=alt.Axis(tickCount=4,domain=True),title="Four minute rolling average"),
    x = alt.X('datetime',axis=alt.Axis(tickCount=4,domain=True,format = ("%I:%M")),title=None),
    color=alt.Color('sym',
                    scale=alt.Scale(
            range=[ '#00cccd','#ff8c1b']),title=''
                   ),
).properties(width=600, height=300)

barsn2=alt.Chart(ndf).transform_filter(
    alt.datum.sym == '😭'
).mark_area(
    opacity=0.3,
    strokeWidth=1.4,
    fontSize=70,
    fontWeight='bold',
    size=2.0
).encode(
    y = alt.Y('tweet_count',axis=alt.Axis(tickCount=4,domain=True),title="Four minute rolling average"),
    x = alt.X('datetime',axis=alt.Axis(tickCount=4,domain=True,format = ("%I:%M")),title=None),
    color=alt.Color('sym',
                    scale=alt.Scale(
            range=[ '#00cccd','#ff8c1b']),title=''
                   ),
).properties(width=600, height=300)



df=pd.DataFrame([['2017-08-27 00:24:00',1.65],['2017-08-27 00:29:00',1.80]],columns=['a','b'])
df.a=pd.to_datetime(df.a)
dfr=alt.Chart(df).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    fontSize=70
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

df1=pd.DataFrame([['2017-08-27 00:15:00',1.4],['2017-08-27 00:15:00',2.1]],columns=['a','b'])
df1.a=pd.to_datetime(df1.a)
dfr1=alt.Chart(df1).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    width=4.6,
    fontWeight='bold'
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

df2=pd.DataFrame([['2017-08-27 00:55:00',.30],['2017-08-27 00:55:00',.85]],columns=['a','b'])
df2.a=pd.to_datetime(df2.a)
dfr2=alt.Chart(df2).mark_line(
    opacity=1,
    stroke='black',
    strokeWidth=1.6,
    fontWeight='bold'
).encode(
    y = alt.Y('b',axis=alt.Axis(tickCount=8,domain=True)),
    x = alt.X('a',axis=alt.Axis(tickCount=8,domain=True))
)

ad_df=pd.DataFrame([['2017-08-27 00:10:00',2.2,'Fight begins'],['2017-08-27 00:30:00',2.1,'McGregor \nimpresses \nearly'],['2017-08-27 00:50:00',.1,'Fight ends']],columns=['date','count','note'])
ad_df.date=pd.to_datetime(ad_df.date)
ad = alt.Chart(ad_df).mark_text(
    opacity=0.9,
    strokeWidth=1.2,
    lineBreak='\n',
    size=14,
    align='left'
).encode(
    y=alt.Y('count', stack='zero',axis=alt.Axis(tickCount=5,),title=None),
    x=alt.X('date',axis=alt.Axis(tickCount=5),title=None),
    text=alt.Text('note')
).properties(width=600, height=300)

(barsn2+barsn+bars+dfr1+dfr+dfr2+ad).configure_axis(
    labelFontSize=11,
    titleFontSize=16,
    ).configure_view(
    strokeWidth=0).properties(
    title={"text":"Tears were shed - of joy and sorrow",
           
           "subtitle":["Four minute rolling average of the number of uses of selected emoji in","sampled tweets during the Mayweather-McGregor fight"],
           "subtitleColor": "black",
           "subtitleFontSize":16
          }).configure_scale(
    bandPaddingInner=0.2
).configure_legend(orient='top',
                    labelFontSize=25,
                   symbolSize=15
                   
                  )

![2.5](assets/2pt5_.png)

Perception/cognition
- original vs alternative
    - the original viz is simple and concise while the alternative viz has a lot of overlap.
    - the overlap doesn't add any value in this case and would also violate the data-ink principle.
    - the processing time is higher for the alternative viz as the dots and the area tend to be cause clutter and unwmeaningful overlap respectively. When the circles overlap it create a major issue the exact point becomes difficult to estimate for the viewer, reducing the cognitive ability of the viewer. 
    - both visuals use appropriate color encoding. the created viz creates a meaningless view point (the mixing of the color area, a mix of opacity). 
    - the original viz does a very good job by using lines and not area. They visual convey exactly what is intended to the viewer.
    - In many ways the original viz is in accordance with Gestalt Psychology.(e.g. simple and concise, color, line and their continuity)

The original viz is more effective at addressing the reactions (sad/happy) over time than the alternative viz as it is quicker to process, doesn't have unnecssary redundancy, and is simple and concise.