# Bots in Science 🧪

In this notebook, mentions made by bots are identified:
+ (1) globally 🌍
+ (2) by Web of Science category 📖
+ (3) by ESI field 📚

## Libraries

In [1]:
import pandas as pd

## 1. Overall bots

### 1.1. Preprocessing

Mentions made by tweeters for whom a botscore could be estimated are filtered (98.5%).

In [2]:
データ_tw_men = pd.read_csv('data/final_mentions_full.tsv', sep='\t',
                         dtype={'Outlet or Author':str, 'External Mention ID':str},
                         encoding='UTF-8')
データ_tw_men.shape

(51999245, 5)

In [3]:
データ_botscore = pd.read_csv('data/full_botometer_results.tsv', sep='\t', dtype={'user_id':str})
データ_botscore.shape

(4872369, 2)

In [5]:
データ_tw_men = データ_tw_men[データ_tw_men['Outlet or Author'].isin(データ_botscore.user_id.tolist())].copy()
データ_tw_men.shape

(51230936, 5)

### 1.2. Bots

The mentions are labeled according to whether the tweeter making them is a bot or not.

In [4]:
データ_bots = pd.read_csv('results/bots_list.tsv', sep='\t', dtype={'Outlet or Author':str})
データ_bots.shape

(11073, 1)

In [7]:
データ_tw_men['bot'] = 0
データ_tw_men.loc[データ_tw_men['Outlet or Author'].isin(データ_bots['Outlet or Author'].tolist()), 'bot'] = 1

In [8]:
データ_tw_men[['bot']].value_counts()/データ_tw_men.shape[0]

bot
0      0.95275
1      0.04725
dtype: float64

<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [14]:
#データ_tw_men.to_csv('data/final_mentions_full_bots.tsv', index=False, sep='\t', encoding='UTF-8')

## 2. Bots by category

As above but classifying mentions by Web of Science category.

In [11]:
データ_alt_sub = pd.read_csv('data/altmetric_subjects_cat_esi.tsv', sep='\t', encoding='UTF-8')
データ_alt_sub

Unnamed: 0,DOI,subject_category,ESI
0,10.1001/jama.2016.19627,"Medicine, General & Internal",Clinical Medicine
1,10.1001/jama.2016.19720,"Medicine, General & Internal",Clinical Medicine
2,10.1001/jama.2016.19976,"Medicine, General & Internal",Clinical Medicine
3,10.1001/jama.2017.10569,"Medicine, General & Internal",Clinical Medicine
4,10.1001/jama.2017.1363,"Medicine, General & Internal",Clinical Medicine
...,...,...,...
6763065,10.1177/0021989417726107,"Literature, African, Australian, Canadian",Arts & Humanities
6763066,10.1177/0021989419854507,"Literature, African, Australian, Canadian",Arts & Humanities
6763067,10.1177/0021989420962785,"Literature, African, Australian, Canadian",Arts & Humanities
6763068,10.1353/cal.2017.0135,"Literature, African, Australian, Canadian",Arts & Humanities


In [12]:
データ_tw_men.DOI = データ_tw_men.DOI.str.lower()
データ_alt_sub.DOI = データ_alt_sub.DOI.str.lower()

In [13]:
データ_alt_sub_cat = データ_alt_sub[['DOI', 'subject_category']].drop_duplicates()
データ_alt_sub_cat.shape

(6763070, 2)

Due to the lack of categories in 64 publications, 524 mentions are lost.

In [14]:
データ_tw_men[データ_tw_men.DOI.isin(データ_alt_sub.DOI.tolist())].shape

(51230412, 6)

In [15]:
データ_tw_men_cat = データ_tw_men.merge(データ_alt_sub_cat, how='inner', on='DOI')
データ_tw_men_cat.shape

(77622030, 7)

<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [24]:
#データ_tw_men_cat.to_csv('data/final_mentions_full_bots_category.tsv', index=False, sep='\t', encoding='UTF-8')

## 3. Bots by ESI field

In [16]:
データ_alt_sub_esi = データ_alt_sub[['DOI', 'ESI']].drop_duplicates()
データ_alt_sub_esi.shape

(5639267, 2)

In [17]:
データ_tw_men_esi = データ_tw_men.merge(データ_alt_sub_esi, how='inner', on='DOI')
データ_tw_men_esi.shape

(66269090, 7)

<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [28]:
#データ_tw_men_esi.to_csv('data/final_mentions_full_bots_esi.tsv', index=False, sep='\t', encoding='UTF-8')