# The Tweets

In [1]:
import json
import glob
import pandas as pd
import altair as alt
import pprint

In [2]:
with open('tweets/newspapers/recent_tweets/independent_ie.json') as f:
    indo = json.load(f)
type(indo)

list

In [3]:
pprint.pprint(indo[0]['data'][0])

{'author_id': '91334232',
 'context_annotations': [{'domain': {'description': 'Entity Service related '
                                                    'Events domain',
                                     'id': '29',
                                     'name': 'Events [Entity Service]'},
                          'entity': {'id': '1511289118348640256',
                                     'name': 'Eurovision 2022'}},
                         {'domain': {'description': 'Award shows, like the '
                                                    'Oscars, Grammys, or VMAs',
                                     'id': '118',
                                     'name': 'Award Show'},
                          'entity': {'id': '1511289118348640256',
                                     'name': 'Eurovision 2022'}},
                         {'domain': {'description': 'Entity Service related '
                                                    'Events domain',
                           

## Assembling the Tweet Data

In [4]:
files = glob.glob("tweets/newspapers/recent_tweets/*.json")
holder = []
for f in files:
    with open(f) as g:
        tweets = json.load(g)
        for item in tweets:
            for t in item['data']:
                temp = {"id": t["id"],
                    "account": f.split("/")[-1][:-5],
                        "created_at": t["created_at"],
                           "likes": t['public_metrics']["like_count"],
                            "quotes": t['public_metrics']["quote_count"],
                            "replies": t['public_metrics']["reply_count"],
                            "retweets": t['public_metrics']["retweet_count"],
                            "reply_settings": t['reply_settings'],
                            "source": t['source'],
                            "text": t['text']}
                holder.append(temp)

In [5]:
tweets = pd.DataFrame(holder)
tweets['created_at'] = pd.to_datetime(tweets['created_at'])
tweets.index = tweets['id']
del(tweets['id'])
print(tweets.info())

<class 'pandas.core.frame.DataFrame'>
Index: 1995 entries, 1523076460780556288 to 1520623310442143744
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype              
---  ------          --------------  -----              
 0   account         1995 non-null   object             
 1   created_at      1995 non-null   datetime64[ns, UTC]
 2   likes           1995 non-null   int64              
 3   quotes          1995 non-null   int64              
 4   replies         1995 non-null   int64              
 5   retweets        1995 non-null   int64              
 6   reply_settings  1995 non-null   object             
 7   source          1995 non-null   object             
 8   text            1995 non-null   object             
dtypes: datetime64[ns, UTC](1), int64(4), object(4)
memory usage: 155.9+ KB
None


These are the metrics of the first five tweets of 1,995.

In [6]:
tweets.head()

Unnamed: 0_level_0,account,created_at,likes,quotes,replies,retweets,reply_settings,source,text
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1523076460780556288,thejournal_ie,2022-05-07 23:05:00+00:00,13,0,8,6,everyone,TweetDeck,"Drugs, prostitution and extortion: The Longfor..."
1523022857596403712,thejournal_ie,2022-05-07 19:32:00+00:00,5,0,1,0,everyone,TweetDeck,What are the issues surrounding the National M...
1523014808265797632,thejournal_ie,2022-05-07 19:00:01+00:00,2,1,2,0,everyone,TweetDeck,Quiz: How well do you know these Star Wars cha...
1522993282191478784,thejournal_ie,2022-05-07 17:34:28+00:00,3,1,0,0,everyone,Twitter Web App,SDLP deputy leader Nichola Mallon has lost her...
1522991507963203584,thejournal_ie,2022-05-07 17:27:25+00:00,0,0,0,2,everyone,TweetDeck,RT @rugby_ie: It was a cruel ending for Munste...


## Tweets per Account

These are the total tweet counts per account.

In [7]:
tpa = pd.DataFrame(tweets.account.value_counts())
tpa.rename(columns={"account":"tweets"}, inplace=True)
tpa

Unnamed: 0,tweets
independent_ie,623
irishexaminer,622
rtenews,475
thejournal_ie,275


## Tweets by Day By Account

And these are the tweets by account by day.

In [8]:
dates = pd.date_range(start='2022-05-01', end='2022-05-07', freq='D')
date_order = []
[date_order.append(x.strftime('%a, %b %d')) for x in dates];

In [9]:
tbdba = tweets.copy()
tbdba['date'] = tbdba.created_at.apply(lambda x: x.strftime('%a, %b %d'))
temp = pd.crosstab(tbdba.date, tbdba.account)
temp.index = pd.Categorical(temp.index, date_order, ordered=True)
temp.sort_index(inplace=True)
temp

account,independent_ie,irishexaminer,rtenews,thejournal_ie
"Sun, May 01",85,60,39,26
"Mon, May 02",73,69,31,39
"Tue, May 03",89,110,83,51
"Wed, May 04",99,112,75,47
"Thu, May 05",109,90,81,50
"Fri, May 06",89,112,117,40
"Sat, May 07",79,69,49,22


## Sources

The majority of tweets from these news organisations are automated. A number of different platforms are used.

[Buffer](https://developer.twitter.com/en/community/toolbox/buffer), a social media marketing software used for building brands on social media, is used only by the Irish Independent. [dlvr.it](https://dlvrit.com/) does the same thing, and is used only by the Examiner. [Tweetdeck](https://tweetdeck.twitter.com/) is the favourite of the The Journal, while RTÉ use the Twitter Web App, which is Tweetdeck by another name.

In [10]:
pd.crosstab(tweets.source, tweets.account)

account,independent_ie,irishexaminer,rtenews,thejournal_ie
source,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Buffer,593,0,0,0
Hootsuite Inc.,24,0,0,0
Sendible,0,18,0,0
TweetDeck,0,79,0,261
Twitter Media Studio,0,0,43,0
Twitter Media Studio - LiveCut,0,0,46,0
Twitter Web App,6,70,386,13
Twitter for iPhone,0,3,0,1
dlvr.it,0,452,0,0


In [11]:
def tweet_finder(data, key, value, account):
    sieve = data[data[key]==value]
    holder = []
    with open(f'tweets/newspapers/recent_tweets/{account}.json') as f:
        temp = json.load(f)
    for item in temp:
        for t in item['data']:
            if t['id'] in sieve.index:
                holder.append(t)
    print(f'The result set has {len(holder)} tweets.')
    return holder

### Tweets from Twitter for iPhone

The Examiner over-rode the automation three times during the week to send tweets from an iPhone. Let's look at the tweets.

In [12]:
results = tweet_finder(tweets, 'source', 'Twitter for iPhone', 'irishexaminer')

The result set has 3 tweets.


In [13]:
for r in results:
    print(r['created_at'])
    print(r['text'])
    print("-"*80)

2022-05-07T11:01:29.000Z
RT @ExaminerSport: Larry Ryan: Kammy bantered responsibly and he made us smile https://t.co/rjPsnSaoRK
--------------------------------------------------------------------------------
2022-05-07T11:01:22.000Z
RT @ExaminerSport: What really goes on in a huddle of hurling selectors? https://t.co/Subr7ozzqS
--------------------------------------------------------------------------------
2022-05-07T11:01:08.000Z
RT @ExaminerSport: Patrick Kelly: A positive performance is a pre-requisite from Cork https://t.co/uvKT0ZHYxv
--------------------------------------------------------------------------------


All three are sports columns. It may be that the iPhone tweeter wrote one, if not all, of those columns. It's a reasonable hypothesis.

## Reply Settings

Because bullying is prevalent on Twitter users can set their accounts to either accept responses from everyone, which is the default, or limit it to only those users who are mentioned in the tweet itself. These are the settings for the tweets in the data set.

In [14]:
pd.crosstab(tweets.reply_settings, tweets.account)

account,independent_ie,irishexaminer,rtenews,thejournal_ie
reply_settings,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
everyone,623,622,475,271
mentionedUsers,0,0,0,4


### Replies Limited to Mentioned Users

`thejournal_ie` limits the reply settings for four of its tweets to "mentionedUsers". Why would this be? Again, we can look at the tweets.

In [15]:
results = tweet_finder(tweets, 'reply_settings', 'mentionedUsers', 'thejournal_ie')

The result set has 4 tweets.


In [16]:
for r in results:
    print(r['created_at'])
    print(r['text'])
    print("-"*80)

2022-05-05T16:42:23.000Z
An 84-year-old man who admits sexually assaulting his young granddaughter has been jailed for 21 months

https://t.co/FzTGvn3L7f
--------------------------------------------------------------------------------
2022-05-05T14:01:18.000Z
A store assistant was sexually harassed when a supermarket manager exposed himself to her in the workplace and sent her ‘dirty pictures’, the Workplace Relations Commission has determined https://t.co/7fmueRZRIZ
--------------------------------------------------------------------------------
2022-05-04T16:13:30.000Z
The UK has included an Irish journalist in its latest round of financial sanctions against Russia https://t.co/jUHaLePH5o
--------------------------------------------------------------------------------
2022-05-03T14:17:04.000Z
A witness in the trial of a woman accused of murdering a two-year-old girl has said in evidence that she heard the defendant say “I am telling, I am telling” on the morning the toddler was found