# Bots in Science 🧪

In this notebook, the presence statistics of bots are calculated in:
+ (1) general 🌍
+ (2) by Web of Science category (Mathematics is individually explored) 📖
+ (3) by ESI field 📚

## Libraries

In [2]:
import pandas as pd
from functions import bot_mentions

## 1. Overall

In [2]:
データ_tw_men = pd.read_csv('data/final_mentions_full_bots.tsv', sep='\t', encoding='UTF-8',
                         dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men.shape

(51230936, 6)

In [3]:
データ_tw_men_freq = bot_mentions(データ_tw_men)
データ_tw_men_freq

Unnamed: 0,papers_mentioned,tweeters,bots,mentions,mentions_bot,tweets,tweets_bot,retweet,retweet_bot,bots_p,bot_mentions_p,bot_tweets_p,bot_retweet_p
True,3744002,4872369,11073,51230936,2420650,17511547,2018607,33719389,402043,0.227261,4.724977,11.527291,1.19232


In [4]:
#データ_tw_men_freq.to_csv('results/agg_bot_mentions.tsv', sep='\t', index=False)

## 2. Web of Science categories

In [3]:
データ_tw_men_cat = pd.read_csv('data/final_mentions_full_bots_category.tsv', sep='\t', encoding='UTF-8',
                             dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men_cat.shape

(77622030, 7)

In [3]:
データ_tw_men_cat_freq = bot_mentions(データ_tw_men_cat, agg_by='subject_category')
データ_tw_men_cat_freq

Unnamed: 0_level_0,papers_mentioned,tweeters,bots,mentions,mentions_bot,tweets,tweets_bot,retweet,retweet_bot,bots_p,bot_mentions_p,bot_tweets_p,bot_retweet_p
subject_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Acoustics,6329,13025,389,41353,2197,15296,1840,26057,357,2.986564,5.312795,12.029289,1.370073
Agricultural Economics & Policy,2144,9306,95,21686,401,7543,277,14143,124,1.020847,1.849119,3.672279,0.876759
Agricultural Engineering,3189,3480,162,7882,1397,4883,1343,2999,54,4.655172,17.723928,27.503584,1.800600
"Agriculture, Dairy & Animal Science",16174,24236,525,80709,10470,38475,10060,42234,410,2.166199,12.972531,26.146849,0.970782
"Agriculture, Multidisciplinary",8875,23105,435,49197,4180,21643,3894,27554,286,1.882709,8.496453,17.991960,1.037962
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Veterinary Sciences,41019,46498,810,188334,26650,105384,25543,82950,1107,1.742010,14.150392,24.238025,1.334539
Virology,21470,147607,1354,375512,20392,134715,18083,240797,2309,0.917301,5.430452,13.423153,0.958899
Water Resources,31153,48305,692,156027,6567,68250,5723,87777,844,1.432564,4.208887,8.385348,0.961528
Women's Studies,6459,48401,356,101528,1819,33822,1143,67706,676,0.735522,1.791624,3.379457,0.998434


<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [7]:
#データ_tw_men_cat_freq.to_csv('results/agg_bot_mentions_cat.tsv', sep='\t', index=True)

### 2.1. Mathematics

Due to the high number of bot mentions and their concentration in a few accounts, Mathematics has been explored.

In [4]:
データ_tw_men_cat_freq.loc['Mathematics']

papers_mentioned    14463.000000
tweeters             6874.000000
bots                  275.000000
mentions            41141.000000
mentions_bot        23605.000000
tweets              29332.000000
tweets_bot          21027.000000
retweet             11809.000000
retweet_bot          2578.000000
bots_p                  4.000582
bot_mentions_p         57.375854
bot_tweets_p           71.686213
bot_retweet_p          21.830807
Name: Mathematics, dtype: float64

In [24]:
データ_tweeters = pd.read_csv('data/tweeters_metadata.tsv', sep='\t', encoding='UTF-8', dtype={'id_str':str})
データ_tweeters = データ_tweeters.groupby('id_str').first().reset_index()
データ_tweeters.shape

(4875054, 20)

In [25]:
データ_mat = データ_tw_men_cat[データ_tw_men_cat['subject_category']=='Mathematics']
データ_mat.shape

(41141, 7)

In [34]:
データ_mat_bot = データ_mat[データ_mat.bot==1].copy()
データ_mat_bot = bot_mentions(データ_mat_bot, agg_by='Outlet or Author')

In [37]:
sum(データ_mat_bot.mentions)

23605

In [27]:
データ_mat_bot = データ_mat[データ_mat.bot==1].copy()
データ_mat_bot = bot_mentions(データ_mat_bot, agg_by='Outlet or Author')
データ_mat_bot = データ_mat_bot.reset_index()
データ_mat_bot.shape

(275, 14)

In [29]:
データ_mat_bot = データ_mat_bot.merge(データ_tweeters, how='inner', left_on='Outlet or Author', right_on='id_str')
データ_mat_bot.shape

(275, 34)

<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [31]:
#データ_mat_bot.to_csv('results/mathematics.tsv', sep='\t', index=False, encoding='UTF-8')

## 3. ESI field

In [38]:
データ_tw_men_esi = pd.read_csv('data/final_mentions_full_bots_esi.tsv', sep='\t', encoding='UTF-8',
                             dtype={'Outlet or Author':str, 'External Mention ID':str})
データ_tw_men_esi.shape

(66269090, 7)

In [39]:
データ_tw_men_esi_freq = bot_mentions(データ_tw_men_esi, agg_by='ESI')
データ_tw_men_esi_freq

Unnamed: 0_level_0,papers_mentioned,tweeters,bots,mentions,mentions_bot,tweets,tweets_bot,retweet,retweet_bot,bots_p,bot_mentions_p,bot_tweets_p,bot_retweet_p
ESI,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Agricultural Sciences,121913,264669,2318,976556,59230,390331,52965,586225,6265,0.875811,6.065192,13.569253,1.068702
Arts & Humanities,61725,205128,1281,636914,11709,221536,6689,415378,5020,0.624488,1.838396,3.019374,1.208538
Biology & Biochemistry,278533,779387,5578,4178047,236912,1422288,207488,2755759,29424,0.715691,5.670401,14.588325,1.067728
Chemistry,401034,258478,2943,1771965,144980,859634,129507,912331,15473,1.138588,8.181877,15.065365,1.695985
Clinical Medicine,1051725,2472525,8012,18374693,687015,6017715,536665,12356978,150350,0.324041,3.73892,8.918086,1.216721
Computer Science,92399,191562,2309,676381,69246,294407,60606,381974,8640,1.205354,10.237721,20.585788,2.261934
Economics & Business,86408,326141,1841,1081441,27539,374871,17771,706570,9768,0.56448,2.54651,4.740564,1.382453
Engineering,316740,487948,4009,1708945,138947,746333,119545,962612,19402,0.821604,8.130572,16.017649,2.015558
Environment/Ecology,258850,620641,3951,3376896,117919,1088946,92694,2287950,25225,0.6366,3.491935,8.512268,1.102515
Geosciences,161684,348299,2240,1606015,82966,557406,71858,1048609,11108,0.643126,5.165954,12.891501,1.059308


<div class="alert-warning">
    <strong>Warning:</strong> This line of code is commented to avoid generating new versions of the file when reviewing the code.
</div>

In [40]:
#データ_tw_men_esi_freq.to_csv('results/agg_bot_mentions_esi.tsv', sep='\t', index=True)

In [7]:
データ_tw_men_cat[(データ_tw_men_cat['subject_category']=='Literature, Slavic') & (データ_tw_men_cat.bot==1)].sort_values('Outlet or Author')

Unnamed: 0,Outlet or Author,External Mention ID,DOI,Details Page URL,Original,bot,subject_category
60123026,1298687874112671751,1440677040785530892,10.1016/j.ruslit.2021.07.005,https://www.altmetric.com/details/113867134,0,1,"Literature, Slavic"
60123027,1298687874112671751,1451090725546905604,10.1016/j.ruslit.2021.07.005,https://www.altmetric.com/details/113867134,0,1,"Literature, Slavic"
61559196,1298687874112671751,1445025693478293511,10.1016/j.ruslit.2021.07.010,https://www.altmetric.com/details/114474786,0,1,"Literature, Slavic"
63115716,1298687874112671751,1441008975646953473,10.1016/j.ruslit.2021.07.006,https://www.altmetric.com/details/113916638,0,1,"Literature, Slavic"
70687568,207581304,1425761644609773572,10.31860/0131-6095-2020-2-183-200,https://www.altmetric.com/details/111697951,1,1,"Literature, Slavic"
69643483,2924860811,1002604803199991809,10.1016/j.ruslit.2018.05.013,https://www.altmetric.com/details/43136895,0,1,"Literature, Slavic"
60123032,967503330066010112,1440681442795667459,10.1016/j.ruslit.2021.07.005,https://www.altmetric.com/details/113867134,0,1,"Literature, Slavic"
61559200,967503330066010112,1445030082930692097,10.1016/j.ruslit.2021.07.010,https://www.altmetric.com/details/114474786,0,1,"Literature, Slavic"
63115719,967503330066010112,1441013598122283012,10.1016/j.ruslit.2021.07.006,https://www.altmetric.com/details/113916638,0,1,"Literature, Slavic"
60123033,987712090071814144,1440679519229136916,10.1016/j.ruslit.2021.07.005,https://www.altmetric.com/details/113867134,0,1,"Literature, Slavic"
