Para que este notebook funcione, se debe descargar el dataset desde [este link](https://www.kaggle.com/datasets/prathamsharma123/farmers-protest-tweets-dataset-raw-json) y ubicarlo en la carpeta `dataset` sin descomprimirlo.

A continuación, se importará el dataset a un dataframe de pandas y se limpiarán los datos. El código fue obtenido de https://www.kaggle.com/code/prathamsharma123/clean-raw-json-tweets-data

In [1]:
import pandas as pd
from pandas.io.json import json_normalize
import warnings
warnings.filterwarnings("ignore")

El siguiente proceso puede tomar varios minutos.

In [2]:
raw_tweets = pd.read_json(r'dataset/archive.zip', lines=True)
raw_tweets = raw_tweets[raw_tweets['lang']=='en']
print("Shape: ", raw_tweets.shape)
raw_tweets.head(5)

Shape:  (417511, 21)


Unnamed: 0,url,date,content,renderedContent,id,user,outlinks,tcooutlinks,replyCount,retweetCount,...,quoteCount,conversationId,lang,source,sourceUrl,sourceLabel,media,retweetedTweet,quotedTweet,mentionedUsers
0,https://twitter.com/ShashiRajbhar6/status/1376...,2021-03-30 03:33:46+00:00,Support 👇\n\n#FarmersProtest,Support 👇\n\n#FarmersProtest,1376739399593910273,"{'username': 'ShashiRajbhar6', 'displayname': ...",[],[],0,0,...,0,1376739399593910273,en,"<a href=""http://twitter.com/download/android"" ...",http://twitter.com/download/android,Twitter for Android,,,,
1,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:33:23+00:00,Supporting farmers means supporting our countr...,Supporting farmers means supporting our countr...,1376739306287427584,"{'username': 'kaursuk06272818', 'displayname':...",[],[],0,0,...,0,1376739306287427584,en,"<a href=""http://twitter.com/download/android"" ...",http://twitter.com/download/android,Twitter for Android,[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
2,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:31:00+00:00,Support farmers if you are related to food #St...,Support farmers if you are related to food #St...,1376738704128020488,"{'username': 'kaursuk06272818', 'displayname':...",[],[],0,0,...,0,1376738704128020488,en,"<a href=""http://twitter.com/download/android"" ...",http://twitter.com/download/android,Twitter for Android,[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
3,https://twitter.com/SukhdevSingh_/status/13767...,2021-03-30 03:30:45+00:00,#StopHateAgainstFarmers support #FarmersProtes...,#StopHateAgainstFarmers support #FarmersProtes...,1376738640542400518,"{'username': 'SukhdevSingh_', 'displayname': '...",[],[],0,1,...,0,1376738640542400518,en,"<a href=""http://twitter.com/download/android"" ...",http://twitter.com/download/android,Twitter for Android,,,,
4,https://twitter.com/Davidmu66668113/status/137...,2021-03-30 03:30:30+00:00,"You hate farmers I hate you, \nif you love the...","You hate farmers I hate you, \nif you love the...",1376738579171344386,"{'username': 'Davidmu66668113', 'displayname':...",[],[],0,0,...,0,1376738579171344386,en,"<a href=""http://twitter.com/download/android"" ...",http://twitter.com/download/android,Twitter for Android,,,,


In [3]:
user_id = []
for user in raw_tweets['user']:
    uid = user['id']
    user_id.append(uid)
raw_tweets['userId'] = user_id

# Remove less important columns
cols = ['url', 'date', 'renderedContent', 'id', 'userId', 'replyCount', 'retweetCount', 'likeCount', 'quoteCount', 'source', 'media', 'retweetedTweet', 'quotedTweet', 'mentionedUsers']
tweets = raw_tweets[cols]
tweets.rename(columns={'id':'tweetId', 'url':'tweetUrl'}, inplace=True)
tweets.head(5)

Unnamed: 0,tweetUrl,date,renderedContent,tweetId,userId,replyCount,retweetCount,likeCount,quoteCount,source,media,retweetedTweet,quotedTweet,mentionedUsers
0,https://twitter.com/ShashiRajbhar6/status/1376...,2021-03-30 03:33:46+00:00,Support 👇\n\n#FarmersProtest,1376739399593910273,1015969769760096256,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",,,,
1,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:33:23+00:00,Supporting farmers means supporting our countr...,1376739306287427584,1332937272581263362,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
2,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:31:00+00:00,Support farmers if you are related to food #St...,1376738704128020488,1332937272581263362,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
3,https://twitter.com/SukhdevSingh_/status/13767...,2021-03-30 03:30:45+00:00,#StopHateAgainstFarmers support #FarmersProtes...,1376738640542400518,1308356658582618112,0,1,3,0,"<a href=""http://twitter.com/download/android"" ...",,,,
4,https://twitter.com/Davidmu66668113/status/137...,2021-03-30 03:30:30+00:00,"You hate farmers I hate you, \nif you love the...",1376738579171344386,1357311756532649985,0,0,1,0,"<a href=""http://twitter.com/download/android"" ...",,,,


In [4]:
tweets = pd.DataFrame(tweets)
tweets.drop_duplicates(subset=['tweetId'], inplace=True)
print("Shape: ", tweets.shape)
tweets.head(5)

Shape:  (417511, 14)


Unnamed: 0,tweetUrl,date,renderedContent,tweetId,userId,replyCount,retweetCount,likeCount,quoteCount,source,media,retweetedTweet,quotedTweet,mentionedUsers
0,https://twitter.com/ShashiRajbhar6/status/1376...,2021-03-30 03:33:46+00:00,Support 👇\n\n#FarmersProtest,1376739399593910273,1015969769760096256,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",,,,
1,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:33:23+00:00,Supporting farmers means supporting our countr...,1376739306287427584,1332937272581263362,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
2,https://twitter.com/kaursuk06272818/status/137...,2021-03-30 03:31:00+00:00,Support farmers if you are related to food #St...,1376738704128020488,1332937272581263362,0,0,0,0,"<a href=""http://twitter.com/download/android"" ...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
3,https://twitter.com/SukhdevSingh_/status/13767...,2021-03-30 03:30:45+00:00,#StopHateAgainstFarmers support #FarmersProtes...,1376738640542400518,1308356658582618112,0,1,3,0,"<a href=""http://twitter.com/download/android"" ...",,,,
4,https://twitter.com/Davidmu66668113/status/137...,2021-03-30 03:30:30+00:00,"You hate farmers I hate you, \nif you love the...",1376738579171344386,1357311756532649985,0,0,1,0,"<a href=""http://twitter.com/download/android"" ...",,,,


In [7]:
def tten_tweets():
  return tweets.nlargest(10, "retweetCount", keep="first")

In [8]:
tten_tweets()

Unnamed: 0,tweetUrl,date,renderedContent,tweetId,userId,replyCount,retweetCount,likeCount,quoteCount,source,media,retweetedTweet,quotedTweet,mentionedUsers
408128,https://twitter.com/rihanna/status/13566258896...,2021-02-02 15:29:51+00:00,why aren’t we talking about this?! #FarmersPro...,1356625889602199552,79293791,163065,315547,944307,45832,"<a href=""http://twitter.com/download/iphone"" r...",,,,
395142,https://twitter.com/GretaThunberg/status/13566...,2021-02-02 20:04:01+00:00,We stand in solidarity with the #FarmersProtes...,1356694884615340037,1006419421244678144,49793,103957,319363,13815,"<a href=""http://twitter.com/download/iphone"" r...",,,,
266196,https://twitter.com/GretaThunberg/status/13572...,2021-02-04 10:59:01+00:00,I still #StandWithFarmers and support their pe...,1357282507616645122,1006419421244678144,39596,67694,234676,10587,"<a href=""http://twitter.com/download/iphone"" r...",,,,
366579,https://twitter.com/miakhalifa/status/13568483...,2021-02-03 06:14:01+00:00,"“Paid actors,” huh? Quite the casting director...",1356848397899112448,2835653131,15569,35921,139959,5681,"<a href=""http://twitter.com/download/iphone"" r...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
372793,https://twitter.com/miakhalifa/status/13568277...,2021-02-03 04:51:48+00:00,What in the human rights violations is going o...,1356827705161879553,2835653131,9082,26972,99227,4606,"<a href=""http://twitter.com/download/iphone"" r...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
314192,https://twitter.com/TeamJuJu/status/1357048037...,2021-02-03 19:27:19+00:00,"Happy to share that I’ve donated $10,000 to pr...",1357048037302960129,733170759829327874,7683,23251,59248,4082,"<a href=""http://twitter.com/download/iphone"" r...",,,,
215034,https://twitter.com/BobBlackman/status/1357755...,2021-02-05 18:19:19+00:00,There has been much social media coverage arou...,1357755699162398720,805185025,1845,20132,42779,1592,"<a href=""https://mobile.twitter.com"" rel=""nofo...",[{'previewUrl': 'https://pbs.twimg.com/media/E...,,,
398011,https://twitter.com/vanessa_vash/status/135668...,2021-02-02 19:09:23+00:00,Farmers feed the world. Fight for them. Protec...,1356681136655769605,1134059457191776257,1301,18744,67986,820,"<a href=""http://twitter.com/download/android"" ...",,,,
325261,https://twitter.com/kylekuzma/status/135700972...,2021-02-03 16:55:04+00:00,Should be talking about this! #FarmersProtest\...,1357009721090138112,272616327,4167,17368,39653,2505,"<a href=""http://twitter.com/download/iphone"" r...",,,,
163689,https://twitter.com/AmandaCerny/status/1359013...,2021-02-09 05:36:49+00:00,To all of my influencer/celeb friends- read up...,1359013362881994752,104856942,2028,15677,81375,813,"<a href=""http://twitter.com/download/iphone"" r...",,,,
