# Bots in Science 🧪

In this notebook, the impact of bots on tweets is calculated at the paper level, differentiating in:
+ overall 🌍
+ by ESI field 📚
+ by Web of Science category 📖

## Libraries

In [8]:
import pandas as pd
import numpy as np
from functions import bot_mentions, gini
import scipy.stats

## 1. Overall

In this case, by focusing attention on the tweets and wanting to analyze the impact, the tweets are selected. After that, bot tweets are calculated at the paper level, and metrics are calculated to shed light on the impact of bots on this altmetric.

In [2]:
データ_tw_men = pd.read_csv('data/final_mentions_full_bots.tsv', sep='\t', encoding='UTF-8',
                         dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men = データ_tw_men[データ_tw_men.Original==1].copy()
データ_tw_men.shape

(17511547, 6)

In [3]:
データ_tw_men_paper = bot_mentions(データ_tw_men, agg_by='DOI')

### 1.1. Impact metrics

In [4]:
データ_tw_men_paper['bot_tweets_p'].mean()

20.254773443890613

In [9]:
gini(np.array(データ_tw_men_paper['bot_tweets_p']))

0.7706832026443643

In [10]:
100*データ_tw_men_paper[データ_tw_men_paper['bot_tweets_p']==100].shape[0]/データ_tw_men_paper.shape[0]

12.687799811811258

In [11]:
100*データ_tw_men_paper[データ_tw_men_paper['bot_tweets_p']>=50].shape[0]/データ_tw_men_paper.shape[0]

21.335657260448283

In [12]:
100*データ_tw_men_paper[データ_tw_men_paper['bot_tweets_p']>0].shape[0]/データ_tw_men_paper.shape[0]

33.9914038300808

In [13]:
データ_tw_men_paper['tweets_no_bot'] = データ_tw_men_paper['tweets']-データ_tw_men_paper['tweets_bot']
データ_tw_men_paper['tweets_per'] = データ_tw_men_paper['tweets'].rank(pct=True)
データ_tw_men_paper['tweets_no_bot_per'] = データ_tw_men_paper['tweets_no_bot'].rank(pct=True)
データ_tw_men_paper['per_diff'] = abs(データ_tw_men_paper['tweets_per']-データ_tw_men_paper['tweets_no_bot_per'])

In [14]:
100*データ_tw_men_paper['per_diff'].mean()

10.545640726942056

In [15]:
scipy.stats.spearmanr(データ_tw_men_paper['tweets'], データ_tw_men_paper['tweets_no_bot'])[0]

0.8675767468675512

## 2. ESI field

The same process is replicated but at the ESI field level.

In [16]:
データ_tw_men_esi = pd.read_csv('data/final_mentions_full_bots_esi.tsv', sep='\t', encoding='UTF-8',
                             dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men_esi = データ_tw_men_esi.loc[データ_tw_men_esi.Original==1, ['ESI', 'DOI']].drop_duplicates().copy()
データ_tw_men_esi.shape

(5212927, 2)

In [17]:
データ_tw_men_esi = データ_tw_men_esi.merge(データ_tw_men_paper, how='inner', on='DOI')

### 2.1. Impact metrics

In [18]:
データ_tw_men_esi['tweets_per'] = None
データ_tw_men_esi['tweets_no_bot_per'] = None
データ_tw_men_esi['per_diff'] = None

In [19]:
for esi in list(set(データ_tw_men_esi['ESI'])):
    データ_tw_men_esi.loc[データ_tw_men_esi.ESI==esi, 'tweets_per'] = 100*データ_tw_men_esi.loc[データ_tw_men_esi.ESI==esi, 'tweets'].rank(pct=True)
    データ_tw_men_esi.loc[データ_tw_men_esi.ESI==esi,'tweets_no_bot_per'] = 100*データ_tw_men_esi.loc[データ_tw_men_esi.ESI==esi, 'tweets_no_bot'].rank(pct=True)
データ_tw_men_esi['per_diff'] = abs(データ_tw_men_esi['tweets_per']-データ_tw_men_esi['tweets_no_bot_per'])

In [20]:
データ_tw_men_esi_paper = データ_tw_men_esi.groupby('ESI').agg({'bot_tweets_p':'mean', 'per_diff':'mean'}).reset_index()

In [21]:
データ_tw_men_esi_paper_aux = データ_tw_men_esi.groupby('ESI').agg({
    'bot_tweets_p': [
        lambda x: 100*(x > 0).sum()/len(x),
        lambda x: 100*(x >= 50).sum()/len(x),
        lambda x: 100*(x == 100).sum()/len(x)
    ]
}).reset_index()
データ_tw_men_esi_paper_aux.columns = ['ESI', 'any', 'half', 'all']

In [22]:
データ_tw_men_esi_paper = データ_tw_men_esi_paper.merge(データ_tw_men_esi_paper_aux, how='inner', on='ESI')

In [23]:
データ_tw_men_esi_gini = データ_tw_men_esi.groupby('ESI').apply(
    lambda x: gini(np.array(x['bot_tweets_p']))
).reset_index().rename({0:'gini'}, axis=1)

In [24]:
データ_tw_men_esi_paper = データ_tw_men_esi_paper.merge(データ_tw_men_esi_gini, how='inner', on='ESI')

In [25]:
データ_tw_men_esi_corr = データ_tw_men_esi.groupby('ESI').apply(
    lambda x: scipy.stats.spearmanr(x['tweets'], x['tweets_no_bot'])[0]
).reset_index().rename({0:'corr'}, axis=1)

In [26]:
データ_tw_men_esi_paper = データ_tw_men_esi_paper.merge(データ_tw_men_esi_corr, how='inner', on='ESI')

In [27]:
データ_tw_men_esi_paper

Unnamed: 0,ESI,bot_tweets_p,per_diff,any,half,all,gini,corr
0,Agricultural Sciences,20.345658,11.914281,29.736095,21.636132,14.517683,0.781953,0.824404
1,Arts & Humanities,4.243537,2.797029,8.017828,4.192296,2.621624,0.950239,0.970478
2,Biology & Biochemistry,25.958696,11.092892,45.811471,26.836957,15.542072,0.698965,0.866728
3,Chemistry,17.449191,11.773101,25.205331,19.295065,12.017849,0.809873,0.819194
4,Clinical Medicine,17.857444,8.763823,31.292803,18.535176,11.086549,0.79572,0.903549
5,Computer Science,19.91206,13.004691,31.466988,21.397204,11.589382,0.774748,0.779765
6,Economics & Business,6.649696,4.074,12.325064,6.664181,4.133214,0.922921,0.958192
7,Engineering,20.499761,13.512228,26.529433,22.168287,15.997536,0.786891,0.759326
8,Environment/Ecology,13.160968,7.63918,24.436036,13.715583,7.328715,0.842851,0.916404
9,Geosciences,17.211262,10.946679,28.635705,18.773111,9.939035,0.801064,0.846024


## 3. Categories

In [29]:
データ_tw_men_cat = pd.read_csv('data/final_mentions_full_bots_category.tsv', sep='\t', encoding='UTF-8',
                             dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men_cat = データ_tw_men_cat[['subject_category', 'DOI']].drop_duplicates().copy()
データ_tw_men_cat.shape

(6267964, 2)

In [30]:
データ_tw_men_cat = データ_tw_men_cat.merge(データ_tw_men_paper, how='inner', on='DOI')

### 3.1. Impact metrics

In [31]:
データ_tw_men_cat['tweets_per'] = None
データ_tw_men_cat['tweets_no_bot_per'] = None
データ_tw_men_cat['per_diff'] = None

In [32]:
for category in list(set(データ_tw_men_cat['subject_category'])):
    データ_tw_men_cat.loc[データ_tw_men_cat.subject_category==category, 'tweets_per'] = 100*データ_tw_men_cat.loc[データ_tw_men_cat.subject_category==category, 'tweets'].rank(pct=True)
    データ_tw_men_cat.loc[データ_tw_men_cat.subject_category==category,'tweets_no_bot_per'] = 100*データ_tw_men_cat.loc[データ_tw_men_cat.subject_category==category, 'tweets_no_bot'].rank(pct=True)
データ_tw_men_cat['per_diff'] = abs(データ_tw_men_cat['tweets_per']-データ_tw_men_cat['tweets_no_bot_per'])

In [33]:
データ_tw_men_cat_paper = データ_tw_men_cat.groupby('subject_category').agg({'bot_tweets_p':'mean', 'per_diff':'mean'}).reset_index()

In [34]:
データ_tw_men_cat_paper_aux = データ_tw_men_cat.groupby('subject_category').agg({
    'bot_tweets_p': [
        lambda x: 100*(x > 0).sum()/len(x),
        lambda x: 100*(x >= 50).sum()/len(x),
        lambda x: 100*(x == 100).sum()/len(x)
    ]
}).reset_index()
データ_tw_men_cat_paper_aux.columns = ['subject_category', 'any', 'half', 'all']

In [35]:
データ_tw_men_cat_paper = データ_tw_men_cat_paper.merge(データ_tw_men_cat_paper_aux, how='inner', on='subject_category')

In [36]:
データ_tw_men_cat_gini = データ_tw_men_cat.groupby('subject_category').apply(
    lambda x: gini(np.array(x['bot_tweets_p']))
).reset_index().rename({0:'gini'}, axis=1)

In [37]:
データ_tw_men_cat_paper = データ_tw_men_cat_paper.merge(データ_tw_men_cat_gini, how='inner', on='subject_category')

In [38]:
データ_tw_men_cat_corr = データ_tw_men_cat.groupby('subject_category').apply(
    lambda x: scipy.stats.spearmanr(x['tweets'], x['tweets_no_bot'])[0]
).reset_index().rename({0:'corr'}, axis=1)

In [39]:
データ_tw_men_cat_paper = データ_tw_men_cat_paper.merge(データ_tw_men_cat_corr, how='inner', on='subject_category')

In [41]:
データ_tw_men_cat_paper.sort_values(0).head(25)

Unnamed: 0,subject_category,bot_tweets_p,per_diff,any,half,all,gini,0
148,"Mathematics, Applied",58.91808,30.274198,64.2048,61.3504,52.7168,0.407596,0.022913
147,Mathematics,70.908004,29.502504,75.429482,73.476032,65.343585,0.288677,0.04057
150,Mechanics,41.941993,23.076289,48.797954,44.947022,35.652174,0.574131,0.4117
193,"Physics, Mathematical",63.810002,20.711085,80.165887,70.684473,46.9505,0.335258,0.482352
239,Telecommunications,18.404655,16.544852,21.337766,19.391747,15.382359,0.813263,0.544313
18,Automation & Control Systems,18.573556,16.254131,21.24175,19.92178,15.472989,0.811639,0.559103
192,"Physics, Fluids & Plasmas",45.227702,19.970798,57.466063,49.032444,34.884952,0.530071,0.569387
72,"Engineering, Electrical & Electronic",20.111171,16.761985,24.841597,21.887257,14.937645,0.792126,0.581343
2,Agricultural Engineering,29.370928,18.577972,33.542714,31.281407,25.53392,0.702456,0.594718
138,"Materials Science, Biomaterials",39.833317,18.439857,48.070993,42.79061,32.727044,0.592283,0.625495


### 3.2. Categories by main areas

In [46]:
データ_esi = pd.read_csv('data/mapping.csv', sep=';')
データ_esi.rename({'WC':'subject_category'}, axis=1, inplace=True)
データ_esi

Unnamed: 0,subject_category,SC,ESI,Category
0,Agricultural Economics & Policy,Agriculture,Agricultural Sciences,Life Sciences & Biomedicine
1,Agricultural Engineering,Agriculture,Agricultural Sciences,Life Sciences & Biomedicine
2,"Agriculture, Dairy & Animal Science",Agriculture,Agricultural Sciences,Life Sciences & Biomedicine
3,"Agriculture, Multidisciplinary",Agriculture,Agricultural Sciences,Life Sciences & Biomedicine
4,Agronomy,Agriculture,Agricultural Sciences,Life Sciences & Biomedicine
...,...,...,...,...
249,Urban Studies,Urban Studies,"Social Sciences, General",Social Sciences
250,Women's Studies,Women's Studies,"Social Sciences, General",Social Sciences
251,Astronomy & Astrophysics,Astronomy & Astrophysics,Space Sciences,Physical Sciences
252,Tropical Medicine,Tropical Medicine,Clinical Medicine,Life Sciences & Biomedicine


In [49]:
データ_tw_men_cat_paper_area = データ_tw_men_cat_paper.merge(データ_esi[['subject_category', 'Category']], on='subject_category')

In [51]:
データ_tw_men_cat_paper_area.Category.value_counts()

Life Sciences & Biomedicine    95
Technology                     51
Social Sciences                46
Physical Sciences              34
Arts & Humanities              27
Multidisciplinary               1
Name: Category, dtype: int64

In [60]:
データ_tw_men_cat_paper_area[データ_tw_men_cat_paper_area.Category=='Technology'].sort_values(0).head(5)

Unnamed: 0,subject_category,bot_tweets_p,per_diff,any,half,all,gini,0,Category
150,Mechanics,41.941993,23.076289,48.797954,44.947022,35.652174,0.574131,0.4117,Technology
239,Telecommunications,18.404655,16.544852,21.337766,19.391747,15.382359,0.813263,0.544313,Technology
18,Automation & Control Systems,18.573556,16.254131,21.24175,19.92178,15.472989,0.811639,0.559103,Technology
72,"Engineering, Electrical & Electronic",20.111171,16.761985,24.841597,21.887257,14.937645,0.792126,0.581343,Technology
138,"Materials Science, Biomaterials",39.833317,18.439857,48.070993,42.79061,32.727044,0.592283,0.625495,Technology
