# Altair Example 4 - Mayweather-McGregor Fight - Positive Fan Support

This notebook demonstrates a method to create a Altair graphic closely resembling the theme from an article.  It is meant to create an alternative visualization that would compliment the source article.  The data used to create this visualization is [FiveThirtyEight](https://fivethirtyeight.com)'s data used in the article [The Mayweather-McGregor Fight, As Told Through Emojis](https://fivethirtyeight.com/features/the-mayweather-mcgregor-fight-as-told-through-emojis/) (Mehta et al., 2017).  The original dataset can be found at FiveThirtyEight's [Github: Mayweather vs McGregor](https://github.com/fivethirtyeight/data/tree/master/mayweather-mcgregor).

This notebook is an attempt create a new visualization showing some new information missing in the article.

In [1]:
# The code in this cell was written and provided by the instruction team of 
# University of Michigan - School of Information - SIADS-522 - Information Visualization
# Taught by Professor Eytan Adar (2020)

# start with the setup
import pandas as pd
import altair as alt
import numpy as np

# enable correct rendering
alt.renderers.enable('default')

# uses intermediate json files to speed things up
alt.data_transformers.enable('json')

#--------------------------------------------------------------------------------------------------

# we're going to do some setup here in anticipation of needing the data in 
# a specific format. We moved it all up here so everything is in one place.

# load the tweets
tweets = pd.read_csv('datasets/tweets.csv')

# we're going to process the data in a couple of ways
# first, we want to know how many emojis are in each tweet so we'll create a new column
# that counts them
tweets['emojis'] = tweets['text'].str.findall(r'[^\w\s.,"@\'?/#!$%\^&\*;:{}=\-_`~()\U0001F1E6-\U0001F1FF]').str.len()

# next, there are a few specific emojis that we care about, we're going to create
# a column for each one and indicate how many times it showed up in the tweet
boxer_emojis = ['☘️','🇮🇪','🍀','💸','🤑','💰','💵','😴','😂','🤣','🥊','👊','👏','🇮🇪','💪','🔥','😭','💰']
for emoji in boxer_emojis:
    # here's a different way to get the counts
    tweets[emoji] = tweets.text.str.count(emoji)

#--------------------------------------------------------------------------------------------------

tweets['datetime'] = pd.to_datetime(tweets['created_at'])
tweets = tweets.set_index('datetime')

The below visualizations are unique visualizations that do not exist in the article but do offer some contrast to the ways the data could have been presented or add more context to the article.

In [2]:
# This is code written by Nicholas Miller

# Get all hashtags and their counts
# This was used build the mayteam_hash and mcgteam_hash lists manually
from collections import defaultdict

hashtag_list = defaultdict(int)
all_hashtags = tweets['text'].str.findall(r'#.*?(?=\s|$|#)').to_list()
for sub in all_hashtags:
    for x in sub:
        hashtag_list[x] += 1

hash_df = pd.DataFrame.from_dict(hashtag_list, orient='index')
hash_df.columns=['count']
hash_df.index.rename('hashtag', inplace=True)
hash_df.sort_values(by='count', ascending=False, inplace=True)

In [3]:
# This is code written by Nicholas Miller

# Overview: Preparing our data
#  Step 1 - Add columns that represent if each tweet is an expression of joy or sadness
#  Step 2 - Determine fans based on based on emoji
#  Step 3 - Determine fans based on hashtags
#  Step 4 - Add columns that represent if each tweet is a fan of one player or the other
#  Step 5 - Build the dataframes

teams4 = tweets.copy()

# Step 1 - Add Joy & Sadness Columns
joy_list = ['🤤','🤣','🤠','🤗','🙃','🙂','😺','😸','😹','😝','😜','😛','🤪',
            '😚','😙','😘','😗','😏','😎','😍','😌','😋','😊','😉','😇','😆',
            '😅','😄','😃','😂','😁','😀','👍','🌝','🌚','👌', '🤘']
sad_list = ['🤧','🙁','😿','😰','😯','😭','😬','😫','😩','😧','😦','😥','😡',
            '😠','😟','😞','😖','😕','😔','😓','😒','👎','🤢','🤕','🤒','😷',
            '🙀','😳','😲','😱','😮','😨', '😴', '🤦', '😬', '😤']
teams4['joy'] = teams4['text'].str.findall(f"({'|'.join(joy_list)})").str.len()
teams4['sad'] = teams4['text'].str.findall(f"({'|'.join(sad_list)})").str.len()

# Step 2 - Get fans by emoji
mcgregor_fans = teams4[teams4['☘️'] + teams4['🇮🇪'] + teams4['🍀'] > 0 ]['screen_name'].to_list()
mayweather_fans = teams4[teams4['💸'] + teams4['🤑'] + teams4['💰'] + teams4['💵'] > 0 ]['screen_name'].to_list()
mcgfan_list = list(set(mcgregor_fans))
mayfan_list = list(set(mayweather_fans))  # this is quick way to remove duplicates in the list

# Step 3 - Get fans by hashtag
mayteam_hash = ['#Mayweather', '#mayweather', '#FloydMayweather', '#TMT', '#MAYWEATHER',
                '#TeamMayweather', '#MoneyTeam', '#TBE', '#MayWeather', '#TheMoneyTeam',
                '#floydmayweather', '#50-0']
mcgteam_hash = ['#McGregor', '#mcgregor', '#Mcgregor', '#ConorMcGregor', '#TeamMcGregor',
                '#MCGREGOR', '#MacGregor', '#McGregor!']
teams4['hashtags'] = teams4['text'].str.findall(r'#.*?(?=\s|$|#)')
# Update the fan lists
for index,row in teams4.iterrows():
    for h in row['hashtags']:
        if h in mayteam_hash and row['screen_name'] not in mcgfan_list:
            mayfan_list.append(row['screen_name'])
        elif h in mcgteam_hash and row['screen_name'] not in mayfan_list:
            mcgfan_list.append(row['screen_name'])

# Step 4 - Assign fan allegiance
teams4['mcgregor_fan'] = teams4['screen_name'].apply(lambda x: True if x in mcgfan_list and x not in mayfan_list else False)
teams4['mayweather_fan'] = teams4['screen_name'].apply(lambda x: True if x in mayfan_list and x not in mcgfan_list else False)

# Step 5 - Build our dataframes to be used in the graph
mcfan_pre_df = teams4[teams4['mcgregor_fan'] == True]
mcfan_pre_df = mcfan_pre_df.resample('1s').sum()

mafan_pre_df = teams4[teams4['mayweather_fan'] == True]
mafan_pre_df = mafan_pre_df.resample('1s').sum()

mcfan_joy_df = mcfan_pre_df['joy'].rolling('4Min').mean().reset_index()
mcfan_joy_df['team'] = 'McGregor Fans'
mcfan_joy_df = mcfan_joy_df.rename(columns={'joy':'tweet_count'})

mafan_joy_df = mafan_pre_df['joy'].rolling('4Min').mean().reset_index()
mafan_joy_df['team'] = 'Mayweather Fans'
mafan_joy_df = mafan_joy_df.rename(columns={'joy':'tweet_count'})

joy_df = pd.concat([mcfan_joy_df, mafan_joy_df])

In [4]:
# This is code written by Nicholas Miller

x_grid= ['2017-08-27 00:05:00',
         '2017-08-27 00:15:00',
         '2017-08-27 00:25:00',
         '2017-08-27 00:35:00',
         '2017-08-27 00:45:00',
         '2017-08-27 00:55:00',
         '2017-08-27 01:05:00']

line_df = pd.DataFrame({
    'x': ['2017-08-27 00:15:00', '2017-08-27 00:15:00',
          '2017-08-27 00:30:00', '2017-08-27 00:30:00',
          '2017-08-27 00:55:00', '2017-08-27 00:55:00'],
    'y': [0.5, 0.15, 0.7, 0.4, 0.3, 0.15],
    'class': ['A', 'A', 'B', 'B', 'C', 'C']
})

ant_df = pd.DataFrame({
    'x': ['2017-08-27 00:07:00',
          '2017-08-27 00:20:00', '2017-08-27 00:20:00',
          '2017-08-27 00:34:30', '2017-08-27 00:34:30', '2017-08-27 00:34:30',
          '2017-08-27 00:49:00'],
    #'y': [0.1, 0.2, 0.125, 0.05],
    'y': [0.1, 0.35, 0.275, 0.95, 0.875, 0.8, 0.1],
    'note': ['Fight begins', 'McGregor does OK', 'in the early rounds', 
             'Mayweather', 'takes control in', 'middle rounds', 'Fight ends']
})
#--------------------------------------------------------------
base = alt.Chart(joy_df).transform_joinaggregate(
    TotalTweets='sum(tweet_count)',
    groupby=['datetime']
).transform_calculate(
    PercentOfTotal="datum.tweet_count / datum.TotalTweets"
).mark_area(
    line={'color': '#8b2019', 'size': 3},
#     color=alt.Gradient(
#         gradient='linear',
#         stops=[alt.GradientStop(color='white', offset=0),
#                alt.GradientStop(color='#f6392b', offset=1)],
#         x1=1,
#         x2=1,
#         y1=1,
#         y2=0
#     )
).encode(
    x=alt.X('datetime:T',
            axis=alt.Axis(grid=True,
                          gridColor='#d6d6d6',
                          values=x_grid,
                          tickSize=10,
                          tickColor='#d6d6d6',
                          labelColor='#b9b9b9',
                          title=None,
                         )
           ),
    y=alt.Y('PercentOfTotal:Q',
            axis=alt.Axis(grid=True,
                          domainColor='#d6d6d6',
                          gridColor='#d6d6d6',
                          #tickMinStep=2,     # This makes the y-axis grid go up by 2's
                          values=[0, 0.25, 0.5, 0.75, 1],
                          tickSize=20,
                          tickColor='#d6d6d6',
                          labelColor='#b9b9b9',
                          title='Four-minute rolling average',
                          titlePadding=15,
                          titleFontSize=15,
                          format='.0%',
                         )
           ),
    color=alt.Color('team',
                    scale=alt.Scale(domain=['McGregor Fans', 'Mayweather Fans'],  # This sets the colors of each line
                                    range=['#4aa84a', '#fdcc28']
                                   ),
                    legend=alt.Legend(title=None,
                                      orient='top',  # This puts the legend on top
                                      labelFontSize=18,
                                      symbolSize=500,
                                     )
                   ),
    order=alt.Order('team', sort='descending')
)

lines = alt.Chart(line_df).mark_line(color='black', size=1).encode(
    x='x:T',
    y='y:Q',
    detail='class'
)

text = alt.Chart(ant_df).mark_text(
    align='left',
    fontSize=14,
    #fontStyle='bold',
    #stroke='#F0F0F0',
    #strokeWidth=1,
    #strokeCap='round',
).encode(
    x='x:T',
    y='y:Q',
    text=alt.Text('note:N')
)

(base + lines + text).configure(
    background='#F0F0F0',
    padding=15                    # Add some padding around the edge
).properties(
    # add a title
    title={"text": ["Irish pride vs. The Money Team","Which team was more 😀😂👍🤪 positive?"],
           "subtitle": ["Four-minute rolling average of the number of uses of positive emoji in",
                        "sampled tweets during the Mayweather-McGregor fight"],
           "fontSize":22,
           "subtitleFontSize":17,
           "anchor":"start",      # Make the text left justified
           "offset":35            # add some padding between title and below graph
          },
    #width=450
).configure_view(
    strokeOpacity=0               # Remove the boundary box
)

![Example 4](images/example4.png)

**Justification**

In this graph I wanted to see if I could reveal the joy each team of supporters was expressing throughout the fight by examining which team individuals supported and comparing the usage of positive emoji.  I ran into a problem only using emoji to identify team support.  It turns out, the amount of positive emoji in the dataset was much lower than I expected resulting in a visual with many plateaus and deadspaces (the number of tweets assigned to a team was was very small).  To address this, I incorporated hashtags since many supporters of a boxer did not use those emoji.  Understanding this is an article on emoji and not hashtags, team loyality is not the main focus of this graph (positive emoji are) and I felt it was an acceptable fix to my problem.  The end result was 19% of the tweets were assigned as McGregor fans and 18% Mayweather fans (see calculation in last cell).  I then compared usage of positive emoji across the two teams of supporters.  I considered the following as "positive":

'🤤','🤣','🤠','🤗','🙃','🙂','😺','😸','😹','😝','😜','😛','🤪','😚','😙','😘','😗','😏','😎',
'😍','😌','😋','😊','😉','😇','😆','😅','😄','😃','😂','😁','😀','👍','🌝','🌚','👌', '🤘'

The end result is a insightful visual that reveal some interesting dynamics that played out during the time period.  The fight started with about 50/50 positive emoji from both teams.  McGregor supporters maintain a higher ratio early on where it starts to wane in the middle rounds when Mayweather takes control.  There's a late burst from McGregor fans before the steep decline towards the end of the match where Mayweather fans show the first noticable advantage in postivie ratio.  After the fight, McGregor fans maintain positive but noticably less than before.

The number of tweets is not visible so it's probably not as expressive as graphs in the article but the content is very effecive in comparing positive emotion through emoji between the two teams.  This would compliment the article and give the users another angle to visualize the data.

In [5]:
# This is code written by Nicholas Miller
print('McGregor Fan Tweets: {:.2%}'.format(len(teams4[teams4['mcgregor_fan'] == True]) / len(teams4)))
print('Mayweather Fan Tweets: {:.2%}'.format(len(teams4[teams4['mayweather_fan'] == True]) / len(teams4)))

McGregor Fan Tweets: 19.44%
Mayweather Fan Tweets: 17.82%
