# Altair Example 3 - Mayweather-McGregor Fight - Much hype, some boredom

This notebook demonstrates a method to create a Altair graphic closely resembling the theme from an article.  It is meant to create an alternative visualization that would compliment the source article.  The data used to create this visualization is [FiveThirtyEight](https://fivethirtyeight.com)'s data used in the article [The Mayweather-McGregor Fight, As Told Through Emojis](https://fivethirtyeight.com/features/the-mayweather-mcgregor-fight-as-told-through-emojis/) (Mehta et al., 2017).  The original dataset can be found at [FiveThirtyEight's Github: Mayweather vs McGregor](https://github.com/fivethirtyeight/data/tree/master/mayweather-mcgregor).

This notebook is an attempt recreate a visualization from the article showing more expressive data.

In [1]:
# The code in this cell was written and provided by the instruction team of 
# University of Michigan - School of Information - SIADS-522 - Information Visualization
# Taught by Professor Eytan Adar (2020)

# start with the setup
import pandas as pd
import altair as alt
import numpy as np

# enable correct rendering
alt.renderers.enable('default')

# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

#--------------------------------------------------------------------------------------------------

# we're going to do some setup here in anticipation of needing the data in 
# a specific format. We moved it all up here so everything is in one place.

# load the tweets
tweets = pd.read_csv('datasets/tweets.csv')

# we're going to process the data in a couple of ways
# first, we want to know how many emojis are in each tweet so we'll create a new column
# that counts them
tweets['emojis'] = tweets['text'].str.findall(r'[^\w\s.,"@\'?/#!$%\^&\*;:{}=\-_`~()\U0001F1E6-\U0001F1FF]').str.len()

# next, there are a few specific emojis that we care about, we're going to create
# a column for each one and indicate how many times it showed up in the tweet
boxer_emojis = ['☘️','🇮🇪','🍀','💸','🤑','💰','💵','😴','😂','🤣','🥊','👊','👏','🇮🇪','💪','🔥','😭','💰']
for emoji in boxer_emojis:
    # here's a different way to get the counts
    tweets[emoji] = tweets.text.str.count(emoji)

#--------------------------------------------------------------------------------------------------

tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')

The below visualizations are unique visualizations that do not exist in the article but do offer some contrast to the ways the data could have been presented or add more context to the article.

In [2]:
# This is code written by Nicholas Miller

teams2 = tweets.copy()
teams2['much_hype'] = tweets['🔥']
teams2['some_boredom'] = tweets['😴']
teams2 = teams2.resample('1s').sum()
teams2 = teams2[(teams2['🔥']>0) | (teams2['😴']>0) ]

hype_df = teams2['much_hype'].rolling('4Min').mean().reset_index()
hype_df['team'] = '🔥'
hype_df = hype_df.rename(columns={'much_hype':'tweet_count'})

bore_df = teams2['some_boredom'].rolling('4Min').mean().reset_index()
bore_df['team'] = '😴'
bore_df = bore_df.rename(columns={'some_boredom':'tweet_count'})

hbdf = pd.concat([hype_df,bore_df])

x_grid= ['2017-08-27 00:05:00',
         '2017-08-27 00:15:00',
         '2017-08-27 00:25:00',
         '2017-08-27 00:35:00',
         '2017-08-27 00:45:00',
         '2017-08-27 00:55:00',
         '2017-08-27 01:05:00']

line_df = pd.DataFrame({
    'x': ['2017-08-27 00:15:00', '2017-08-27 00:15:00',
          '2017-08-27 00:30:00', '2017-08-27 00:30:00',
          '2017-08-27 00:30:00', '2017-08-27 00:32:00',
          '2017-08-27 00:45:00', '2017-08-27 00:45:00',
          '2017-08-27 00:29:30', '2017-08-27 00:30:00',
          '2017-08-27 00:30:00', '2017-08-27 00:30:30',
          '2017-08-27 00:44:30', '2017-08-27 00:45:00',
          '2017-08-27 00:45:00', '2017-08-27 00:45:30'],
    'y': [0.8, 0.25, 0.35, 0.25, 0.25, 0.2, 0.4, 0.25, 0.32, 0.35, 0.35, 0.32, 0.37, 0.4, 0.4, 0.37],
    'class': ['A', 'A', 'B', 'B', 'C', 'C', 'D', 'D', 'B1', 'B1', 'B2', 'B2', 'D1', 'D1', 'D2', 'D2']
})

ant_df = pd.DataFrame({
    'x': ['2017-08-27 00:07:00',
          '2017-08-27 00:33:00', '2017-08-27 00:33:00', '2017-08-27 00:33:00'],
    'y': [0.2, 0.2, 0.125, 0.05],
    'note': ['Fight begins', 'Mayweather', 'takes control in', 'middle rounds']
})
#--------------------------------------------------------------
base = alt.Chart(hbdf).transform_joinaggregate(
    TotalTweets='sum(tweet_count)',
    groupby=['datetime']
).transform_calculate(
    PercentOfTotal="datum.tweet_count / datum.TotalTweets"
).mark_area(
    line={'color': '#8b2019'},
).encode(
    x=alt.X('datetime:T',
            axis=alt.Axis(grid=True,
                          gridColor='#d6d6d6',
                          values=x_grid,
                          tickSize=10,
                          tickColor='#d6d6d6',
                          labelColor='#b9b9b9',
                          title=None,
                         )
           ),
    y=alt.Y('PercentOfTotal:Q',
            axis=alt.Axis(grid=True,
                          domainColor='#d6d6d6',
                          gridColor='#d6d6d6',
                          values=[0, 0.25, 0.5, 0.75, 1],
                          tickSize=20,
                          tickColor='#d6d6d6',
                          labelColor='#b9b9b9',
                          title='Four-minute rolling average',
                          titlePadding=15,
                          titleFontSize=15,
                          format='.0%',
                         )
           ),
    color=alt.Color('team',
                    scale=alt.Scale(domain=['🔥', '😴'],  # This sets the colors of each line
                                    range=['#f6392b', '#4cbfc4']
                                   ),
                    legend=alt.Legend(title=None,
                                      orient='top',  # This puts the legend on top
                                      labelFontSize=24,
                                      symbolSize=500,
                                     )
                   ),
    order=alt.Order('team', sort='ascending')
)

lines = alt.Chart(line_df).mark_line(color='black', size=1).encode(
    x='x:T',
    y='y:Q',
    detail='class'
)

text = alt.Chart(ant_df).mark_text(
    align='left',
    fontSize=18,
).encode(
    x='x:T',
    y='y:Q',
    text=alt.Text('note:N')
)


(base + lines + text).configure(
    background='#F0F0F0',
    padding=15                    # Add some padding around the edge
).properties(
    # add a title
    title={"text": "Much hype, some boredom",
           "subtitle": ["Four-minute rolling average of the number of uses of selected emoji in",
                        "sampled tweets during the Mayweather-McGregor fight"],
           "fontSize":22,
           "subtitleFontSize":17,
           "anchor":"start",      # Make the text left justified
           "offset":35            # add some padding between title and below graph
          },
    width=450
).configure_view(
    strokeOpacity=0               # Remove the boundary box
)

**Justification**

This is a remake of the "Much hype, some boredom" graphic where I use a percentage of 🔥 emoji versus 😴 emoji to measure hype vs boredom.  The graphic done in this way provides some unique insights not easily observed in the original.  For starters, this version is more effective because instead of 2 lines competing with one another, we have only one line which depicts this comparison in a more concise and clear manner.  The single line is easier to follow and understand the comparison between these emojis.  The key areas called out with the arrows and text in the original are still evident and are actually more clear for when Mayweather takes control.  Some information that is loss is the pure volumn of tweets.  While that does carry some useful information, it isn't very useful when the focus is comparing these two emoji's usage over time.  Seeing this graphic in this way, it's also easy to see that 🔥 was much more used for the majority of the match compared to 😴 suggesting perhaps that the audience in general found the match exciting.  The deep depressions pointed out by the arrows also could suggest that the McGregor fans outnumbered the Mayweather fans as the article suggested.  While the original visual is more expressive because it carries more data, it is less effective at comparing the two emojis.