# Irish Media Twitter Activity

In [1]:
import json
import glob
import pandas as pd
import altair as alt
import pprint

The data was to comprise of tweets taken from five twitter accounts for the five biggest media organisations in Ireland:
* RTÉ News
* ~~The Irish Times~~
* The Irish Independent
* The Irish Examiner
* The Journal (Dot IE)


### Missing Irish Times Data

This work was done in the last week in May, 2022. At the start of the week, there was no problem accessing the Irish Times tweets. At the end of the week, when testing had finished and I looked at a bigger tweet selection, all I got back from the Irish Times was `[{"meta": {"result_count": 0}}]`. I have no idea why, and had no option but to proceed without them.

## The Individual Accounts

In [2]:
files = glob.glob('tweets/newspapers/newspaper_account_details/*.json')
media_details = []
for f in files:
    with open(f) as g:
        temp = json.load(g)
        media_details.append(temp)

A look at how the information is organised.

In [3]:
pprint.pprint(media_details[0]['data'])

{'created_at': '2010-05-31T13:08:52.000Z',
 'description': 'Providing open access to valuable journalism | Support our '
                'work so we can keep questioning, debunking, explaining and '
                'informing https://t.co/yrQeirmt7y',
 'id': '150246405',
 'location': 'Ireland',
 'name': 'TheJournal.ie',
 'public_metrics': {'followers_count': 706618,
                    'following_count': 746,
                    'listed_count': 2439,
                    'tweet_count': 314417},
 'url': 'http://t.co/nM175qvbzR',
 'username': 'thejournal_ie',
 'verified': True}


In [4]:
holder = []
for detail in media_details:
    temp = {"created_at": detail['data']['created_at'],
            "username": detail['data']['username'],
            "name": detail['data']['name'],
            "id": detail['data']['id'],
            "followers": detail['data']['public_metrics']['followers_count'],
            "following": detail['data']['public_metrics']['following_count'],
            "listed": detail['data']['public_metrics']['listed_count'],
            "tweets": detail['data']['public_metrics']['tweet_count'],
            "location": detail['data']['location'],
            "verified": detail['data']['verified']}
    holder.append(temp)

In [5]:
media_df = pd.DataFrame(holder)
media_df.index = media_df['id']
del(media_df['id'])
media_df['created_at'] = pd.to_datetime(media_df['created_at'])
media_df

Unnamed: 0_level_0,created_at,username,name,followers,following,listed,tweets,location,verified
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
150246405,2010-05-31 13:08:52+00:00,thejournal_ie,TheJournal.ie,706618,746,2439,314417,Ireland,True
15084853,2008-06-11 13:54:36+00:00,IrishTimes,The Irish Times,670120,156,4014,680596,Ireland,True
8973062,2007-09-19 12:52:21+00:00,rtenews,RTÉ News,1103818,308,4041,206940,Ireland,True
19903360,2009-02-02 12:01:29+00:00,irishexaminer,Irish Examiner,241613,602,1750,443308,Ireland,True
91334232,2009-11-20 12:55:30+00:00,Independent_ie,Independent.ie,712402,134,2650,457011,Dublin,True


Add a followers per tweet metric, and sort by followers descending.

In [6]:
media_df['followers_per_tweet'] = media_df['followers'] / media_df['tweets']
media_df.sort_values('followers', ascending=False)

Unnamed: 0_level_0,created_at,username,name,followers,following,listed,tweets,location,verified,followers_per_tweet
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
8973062,2007-09-19 12:52:21+00:00,rtenews,RTÉ News,1103818,308,4041,206940,Ireland,True,5.334
91334232,2009-11-20 12:55:30+00:00,Independent_ie,Independent.ie,712402,134,2650,457011,Dublin,True,1.558829
150246405,2010-05-31 13:08:52+00:00,thejournal_ie,TheJournal.ie,706618,746,2439,314417,Ireland,True,2.247391
15084853,2008-06-11 13:54:36+00:00,IrishTimes,The Irish Times,670120,156,4014,680596,Ireland,True,0.984608
19903360,2009-02-02 12:01:29+00:00,irishexaminer,Irish Examiner,241613,602,1750,443308,Ireland,True,0.545023


## Charts for Followers per Account, Followers per Tweet per Account

In [7]:
bar = alt.Chart(media_df).mark_bar().encode(x='name',
                                            y='followers',
                                           tooltip=['name', 'followers']).properties(title='Followers per Acccount',
                                                                      width=300)
bar2 = alt.Chart(media_df).mark_bar(color='crimson').encode(x='name',
                                                            y='followers_per_tweet',
                                           tooltip=['name', 'followers_per_tweet']).properties(title='Followers per Tweet per Acccount',
                                                                      width=300)
bar | bar2

Examine correlation, if any, in the numeric data.

In [8]:
media_df.corr()

Unnamed: 0,followers,following,listed,tweets,verified,followers_per_tweet
followers,1.0,-0.371797,0.769819,-0.482069,,0.882506
following,-0.371797,1.0,-0.629878,-0.440747,,-0.068733
listed,0.769819,-0.629878,1.0,0.082764,,0.557065
tweets,-0.482069,-0.440747,0.082764,1.0,,-0.778783
verified,,,,,,
followers_per_tweet,0.882506,-0.068733,0.557065,-0.778783,,1.0


## Observations

* RTÉ is the big beast of Irish media twitter, just as it’s the big beast of Irish media in general. It has the the most followers and has the longest-established account.
* There’s quite a close correlation between follower count and number of tweets - a Pearson co-efficient of 0.88. However, this is more than likely a co-incidence. Closer examination of the growth of the accounts may be more illustrative. It is not reasonable to suggest that a relationship exists between tweets and followers.
* The Journal follows the most accounts, and the Independent the least.
* The Independent lists its location as Dublin, while the other four list theirs as Ireland.

# The Tweets

In [9]:
with open('tweets/newspapers/recent_tweets/independent_ie.json') as f:
    indo = json.load(f)
type(indo)

list

In [10]:
pprint.pprint(indo[0]['data'][0])

{'author_id': '91334232',
 'context_annotations': [{'domain': {'description': 'Entity Service related '
                                                    'Events domain',
                                     'id': '29',
                                     'name': 'Events [Entity Service]'},
                          'entity': {'id': '1511289118348640256',
                                     'name': 'Eurovision 2022'}},
                         {'domain': {'description': 'Award shows, like the '
                                                    'Oscars, Grammys, or VMAs',
                                     'id': '118',
                                     'name': 'Award Show'},
                          'entity': {'id': '1511289118348640256',
                                     'name': 'Eurovision 2022'}},
                         {'domain': {'description': 'Entity Service related '
                                                    'Events domain',
                           

## Assembling the Tweet Data

In [11]:
files = glob.glob("tweets/newspapers/recent_tweets/*.json")
holder = []
for f in files:
    with open(f) as g:
        tweets = json.load(g)
        for item in tweets:
            for t in item['data']:
                temp = {"id": t["id"],
                    "account": f.split("/")[-1][:-5],
                        "created_at": t["created_at"],
                           "likes": t['public_metrics']["like_count"],
                            "quotes": t['public_metrics']["quote_count"],
                            "replies": t['public_metrics']["reply_count"],
                            "retweets": t['public_metrics']["retweet_count"],
                            "reply_settings": t['reply_settings'],
                            "source": t['source'],
                            "text": t['text']}
                holder.append(temp)

In [12]:
tweets = pd.DataFrame(holder)
tweets['created_at'] = pd.to_datetime(tweets['created_at'])
tweets.index = tweets['id']
del(tweets['id'])
print(tweets.info())

<class 'pandas.core.frame.DataFrame'>
Index: 1995 entries, 1523076460780556288 to 1520623310442143744
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype              
---  ------          --------------  -----              
 0   account         1995 non-null   object             
 1   created_at      1995 non-null   datetime64[ns, UTC]
 2   likes           1995 non-null   int64              
 3   quotes          1995 non-null   int64              
 4   replies         1995 non-null   int64              
 5   retweets        1995 non-null   int64              
 6   reply_settings  1995 non-null   object             
 7   source          1995 non-null   object             
 8   text            1995 non-null   object             
dtypes: datetime64[ns, UTC](1), int64(4), object(4)
memory usage: 155.9+ KB
None


In [13]:
tweets.head()

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1523076460780556288,thejournal_ie,2022-05-07 23:05:00+00:00,13,0,8,6,everyone,TweetDeck,"Drugs, prostitution and extortion: The Longfor..."
1523022857596403712,thejournal_ie,2022-05-07 19:32:00+00:00,5,0,1,0,everyone,TweetDeck,What are the issues surrounding the National M...
1523014808265797632,thejournal_ie,2022-05-07 19:00:01+00:00,2,1,2,0,everyone,TweetDeck,Quiz: How well do you know these Star Wars cha...
1522993282191478784,thejournal_ie,2022-05-07 17:34:28+00:00,3,1,0,0,everyone,Twitter Web App,SDLP deputy leader Nichola Mallon has lost her...
1522991507963203584,thejournal_ie,2022-05-07 17:27:25+00:00,0,0,0,2,everyone,TweetDeck,RT @rugby_ie: It was a cruel ending for Munste...


## Tweets per Account

In [14]:
tpa = pd.DataFrame(tweets.account.value_counts())
tpa.rename(columns={"account":"tweets"}, inplace=True)
tpa

Unnamed: 0,tweets
independent_ie,623
irishexaminer,622
rtenews,475
thejournal_ie,275


## Sources

In [15]:
pd.crosstab(tweets.source, tweets.account)

account,independent_ie,irishexaminer,rtenews,thejournal_ie
source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Buffer,593,0,0,0
Hootsuite Inc.,24,0,0,0
Sendible,0,18,0,0
TweetDeck,0,79,0,261
Twitter Media Studio,0,0,43,0
Twitter Media Studio - LiveCut,0,0,46,0
Twitter Web App,6,70,386,13
Twitter for iPhone,0,3,0,1
dlvr.it,0,452,0,0


In [16]:
def tweet_finder(data, key, value, account):
    sieve = data[data[key]==value]
    holder = []
    with open(f'tweets/newspapers/recent_tweets/{account}.json') as f:
        temp = json.load(f)
    for item in temp:
        for t in item['data']:
            if t['id'] in sieve.index:
                holder.append(t)
    print(f'The result set has {len(holder)} tweets.')
    return holder

In [17]:
results = tweet_finder(tweets, 'source', 'Twitter for iPhone', 'irishexaminer')

The result set has 3 tweets.


In [18]:
for r in results:
    print(r['created_at'])
    print(r['text'])
    print("-"*80)

2022-05-07T11:01:29.000Z
RT @ExaminerSport: Larry Ryan: Kammy bantered responsibly and he made us smile https://t.co/rjPsnSaoRK
--------------------------------------------------------------------------------
2022-05-07T11:01:22.000Z
RT @ExaminerSport: What really goes on in a huddle of hurling selectors? https://t.co/Subr7ozzqS
--------------------------------------------------------------------------------
2022-05-07T11:01:08.000Z
RT @ExaminerSport: Patrick Kelly: A positive performance is a pre-requisite from Cork https://t.co/uvKT0ZHYxv
--------------------------------------------------------------------------------


## Reply Settings

In [19]:
pd.crosstab(tweets.reply_settings, tweets.account)

account,independent_ie,irishexaminer,rtenews,thejournal_ie
reply_settings,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
everyone,623,622,475,271
mentionedUsers,0,0,0,4


### Replies Limited to Mentioned Users

`thejournal_ie` limits the reply settings for four of its tweets to "mentionedUsers". Why would this be? to discover, we create  a dataframe, `mu`, by slicing `tweets`.

In [20]:
results = tweet_finder(tweets, 'reply_settings', 'mentionedUsers', 'thejournal_ie')

The result set has 4 tweets.


In [21]:
for r in results:
    print(r['created_at'])
    print(r['text'])
    print("-"*80)

2022-05-05T16:42:23.000Z
An 84-year-old man who admits sexually assaulting his young granddaughter has been jailed for 21 months

https://t.co/FzTGvn3L7f
--------------------------------------------------------------------------------
2022-05-05T14:01:18.000Z
A store assistant was sexually harassed when a supermarket manager exposed himself to her in the workplace and sent her ‘dirty pictures’, the Workplace Relations Commission has determined https://t.co/7fmueRZRIZ
--------------------------------------------------------------------------------
2022-05-04T16:13:30.000Z
The UK has included an Irish journalist in its latest round of financial sanctions against Russia https://t.co/jUHaLePH5o
--------------------------------------------------------------------------------
2022-05-03T14:17:04.000Z
A witness in the trial of a woman accused of murdering a two-year-old girl has said in evidence that she heard the defendant say “I am telling, I am telling” on the morning the toddler was found

These are sensitive stories, but they are not flagged as such in `entities`. It may be an in-the-moment authorial decision - all four tweets are from the `Twitter Web App` source, which suggests they are not automated.

## Retweets, Quote Tweets, Likes and Replies

In [22]:
grouper = tweets.groupby('account')
holder = []
for a, b in grouper:
    temp = b[['likes', 'quotes', 'replies', 'retweets']].copy()
    temp['account'] = a
    holder.append(temp)

In [23]:
metrics = pd.concat(holder)
metrics.describe()

Unnamed: 0,likes,quotes,replies,retweets
count,1995.0,1995.0,1995.0,1995.0
mean,15.224561,1.32381,6.062155,3.549875
std,114.336811,11.69087,36.323855,20.348146
min,0.0,0.0,0.0,0.0
25%,1.0,0.0,0.0,0.0
50%,3.0,0.0,1.0,1.0
75%,9.0,1.0,4.0,3.0
max,3825.0,403.0,1214.0,786.0


In [24]:
grouper = tweets.groupby('account')
holder = []
for a, b in grouper:
    temp = pd.DataFrame(b[['likes', 'quotes', 'replies', 'retweets']].unstack())
    temp.reset_index(inplace=True)
    temp['account'] = a
    holder.append(temp)

In [25]:
chart_metrics = pd.concat(holder)

chart_metrics.rename(columns={0:"value", "level_0":"metric"}, inplace=True)
del(chart_metrics['id'])
chart_metrics.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7980 entries, 0 to 1099
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   metric   7980 non-null   object
 1   value    7980 non-null   int64 
 2   account  7980 non-null   object
dtypes: int64(1), object(2)
memory usage: 249.4+ KB


In [26]:
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

In [27]:
metrics_boxplot = alt.Chart(chart_metrics).mark_boxplot().encode(x='account:N',
                                                                 y='value:Q',
                                                                 column='metric:N',
                                                                tooltip=["account",
                                                                         "metric",
                                                                         "value"]).properties(width=200,
                                                                                             title="Public Metrics")
metrics_boxplot

In [28]:
tweets[tweets.likes==tweets.likes.max()]

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1522223815044050944,rtenews,2022-05-05 14:36:53+00:00,3825,403,824,786,everyone,Twitter Media Studio,President Michael D Higgins has described the ...


In [29]:
print(tweets.loc["1522223815044050944"]['text'])

President Michael D Higgins has described the purchase of Twitter by tech billionaire Elon Musk as a 'manifestation of an incredible and dangerous narcissism' | Read more: https://t.co/UcW55R2V7G https://t.co/5b9CZCjL3E


In [30]:
tweets[(tweets.account=='independent_ie')&(tweets.likes==2603)]

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1521055649748160512,independent_ie,2022-05-02 09:15:01+00:00,2603,83,341,307,everyone,Buffer,A Catholic couple say they will go to jail rat...


In [31]:
print(tweets.loc["1521055649748160512"]['text'])

A Catholic couple say they will go to jail rather than pay a €300 fine for travelling 70km to attend Mass during lockdown https://t.co/Msgt6bUEUL


### Most Replies

In [32]:
tweets[(tweets.account=='independent_ie')&(tweets.replies==1214)]

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1522224352930013185,independent_ie,2022-05-05 14:39:01+00:00,1513,297,1214,31,everyone,Buffer,‘We are pregnant!’- Brian Dowling and partner ...


In [33]:
print(tweets.loc["1522224352930013185"]['text'])

‘We are pregnant!’- Brian Dowling and partner Arthur Gourounlian announce they’re expecting their first child. https://t.co/RV6q7AOEw6


In [34]:
tweets[(tweets.account=='rtenews')&(tweets.replies==824)]

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1522223815044050944,rtenews,2022-05-05 14:36:53+00:00,3825,403,824,786,everyone,Twitter Media Studio,President Michael D Higgins has described the ...


In [35]:
print(tweets.loc["1522223815044050944"]['text'])

President Michael D Higgins has described the purchase of Twitter by tech billionaire Elon Musk as a 'manifestation of an incredible and dangerous narcissism' | Read more: https://t.co/UcW55R2V7G https://t.co/5b9CZCjL3E


## Tweet Dates and Times

### Starts and Finishes

### Hourly

In [36]:
date_range = pd.date_range(start="2022-05-01 00:00:00+00:00", end=tweets.created_at.max(), freq='H')
len(date_range)

168

In [37]:
holder = []
grouper = tweets.groupby('account')
for a, b in grouper:
    temp = b[['created_at', 'likes', 'quotes', 'replies', 'retweets']].copy()
    temp.index=temp['created_at']
    del(temp['created_at'])
    temp = temp.resample('H').sum()
    temp = temp.reindex(date_range)
    temp.fillna(0, inplace=True)
    temp['account'] = a
    temp[['likes', 'quotes', 'replies', 'retweets']] = temp[['likes', 'quotes', 'replies', 'retweets']].cumsum() 
    holder.append(temp)

In [38]:
time_df = pd.concat(holder)
time_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 672 entries, 2022-05-01 00:00:00+00:00 to 2022-05-07 23:00:00+00:00
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   likes     672 non-null    float64
 1   quotes    672 non-null    float64
 2   replies   672 non-null    float64
 3   retweets  672 non-null    float64
 4   account   672 non-null    object 
dtypes: float64(4), object(1)
memory usage: 31.5+ KB


In [39]:
grouper = time_df.groupby('account')
holder = []
for a, b in grouper:
    temp = pd.DataFrame(b.unstack())
    temp.reset_index(inplace=True)
    temp.columns = ["metric", "created_at", "value"]
    temp = temp[temp['metric'] != "account"]
    temp['account'] = a
    holder.append(temp)
    
chart_df = pd.concat(holder)
chart_df['value'] = chart_df['value'].astype(int)
chart_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2688 entries, 0 to 671
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype              
---  ------      --------------  -----              
 0   metric      2688 non-null   object             
 1   created_at  2688 non-null   datetime64[ns, UTC]
 2   value       2688 non-null   int64              
 3   account     2688 non-null   object             
dtypes: datetime64[ns, UTC](1), int64(1), object(2)
memory usage: 105.0+ KB


In [40]:
alt.Chart(chart_df).mark_line().encode(x='created_at:T',
                                       y='value:Q',
                                       color='account',
                                       column='metric',
                                      tooltip=["created_at",
                                               "value",
                                               "account"]).interactive()

## Tweet Times

In [41]:
tweets.head()

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1523076460780556288,thejournal_ie,2022-05-07 23:05:00+00:00,13,0,8,6,everyone,TweetDeck,"Drugs, prostitution and extortion: The Longfor..."
1523022857596403712,thejournal_ie,2022-05-07 19:32:00+00:00,5,0,1,0,everyone,TweetDeck,What are the issues surrounding the National M...
1523014808265797632,thejournal_ie,2022-05-07 19:00:01+00:00,2,1,2,0,everyone,TweetDeck,Quiz: How well do you know these Star Wars cha...
1522993282191478784,thejournal_ie,2022-05-07 17:34:28+00:00,3,1,0,0,everyone,Twitter Web App,SDLP deputy leader Nichola Mallon has lost her...
1522991507963203584,thejournal_ie,2022-05-07 17:27:25+00:00,0,0,0,2,everyone,TweetDeck,RT @rugby_ie: It was a cruel ending for Munste...


In [42]:
temp = tweets.copy()
temp['hour'] = temp.created_at.apply(lambda x: x.strftime('%H:00'))
holder = []
grouper = temp.groupby(['account', 'hour'])
for a, b in grouper:
    holder.append([a[0], a[1], b.shape[0]])

In [43]:
tweet_times = pd.DataFrame(holder, columns=["account", "time", "tweets"])
tweet_times.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   account  85 non-null     object
 1   time     85 non-null     object
 2   tweets   85 non-null     int64 
dtypes: int64(1), object(2)
memory usage: 2.1+ KB


In [44]:
len(tweet_times.time.unique())

24

In [45]:
heatmap = alt.Chart(tweet_times).mark_rect().encode(x='time:O',
                                                    y='account:N',
                                                    color='tweets:Q',
                                                   tooltip=['time',
                                                            'account',
                                                            'tweets']).properties(width=600,
                                                                                  height=200,
                                                                                  title='Tweets per Hour')

heatmap

## Tweets by Account by Source

In [46]:
temp = tweets.copy()
temp['hour'] = temp.created_at.apply(lambda x: x.strftime('%H:00'))
holder = []
grouper = temp.groupby(['account','source', 'hour'])
for a, b in grouper:
    holder.append([a[0], a[1], a[2], b.shape[0]])

In [47]:
tweet_times = pd.DataFrame(holder, columns=["account", "source", "time", "tweets"])
tweet_times.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 173 entries, 0 to 172
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   account  173 non-null    object
 1   source   173 non-null    object
 2   time     173 non-null    object
 3   tweets   173 non-null    int64 
dtypes: int64(1), object(3)
memory usage: 5.5+ KB


In [48]:
source_df = pd.pivot_table(tweet_times, index=['account', 'source'], columns=['time'], values=['tweets'])
source_df.reset_index(inplace=True)
source_df.head()

Unnamed: 0_level_0,account,source,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets,tweets
time,Unnamed: 1_level_1,Unnamed: 2_level_1,00:00,01:00,02:00,03:00,04:00,05:00,06:00,07:00,...,14:00,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00
0,independent_ie,Buffer,14.0,14.0,13.0,13.0,10.0,20.0,42.0,27.0,...,29.0,28.0,29.0,30.0,33.0,27.0,28.0,31.0,31.0,14.0
1,independent_ie,Hootsuite Inc.,,,,,,,1.0,,...,5.0,2.0,1.0,,,,,2.0,1.0,
2,independent_ie,Twitter Web App,,,,,,,,1.0,...,,,1.0,,,,,,,
3,irishexaminer,Sendible,,,,,,,,,...,,1.0,1.0,1.0,,,8.0,,,
4,irishexaminer,TweetDeck,,,,,,2.0,,12.0,...,27.0,3.0,,,,8.0,3.0,1.0,,


In [49]:
grouper = tweet_times.groupby('account')
holder = []
for a, b in grouper:
    heatmap = alt.Chart(b).mark_rect().encode(x='time:O',
                                                    y='source:N',
                                                    color='tweets:Q',
                                                   tooltip=['time',
                                                            'source',
                                                            'tweets']).properties(title=a)
    holder.append(heatmap)    

In [50]:
b.head()

Unnamed: 0,account,source,time,tweets
146,thejournal_ie,TweetDeck,05:00,3
147,thejournal_ie,TweetDeck,06:00,18
148,thejournal_ie,TweetDeck,07:00,15
149,thejournal_ie,TweetDeck,08:00,17
150,thejournal_ie,TweetDeck,09:00,16


In [51]:
base = alt.Chart(tweet_times).mark_rect().encode(
    x=alt.X('time:O', axis=alt.Axis(title="")),
    y=alt.Y('source:O', axis=alt.Axis(title="")),
    color='tweets:Q',
    tooltip=["time", "source", "account", "tweets"]
).properties(
    width=350,
    height=200,
    title='zebadee'
).facet('account:N', columns=2)

base