## Preliminar steps

In [327]:
import pandas as pd
import numpy as np

import os
import re

import datetime
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

pd.set_option('max_colwidth', 280) # For printing the full text of the twee

In [328]:
# List of the file anmes to load (the ones that finish in s10.csv) from the data folder
files = [f for f in os.listdir('data/') if re.match(r'.*\_s10.csv', f)]
files

['tweets_2110_s10.csv',
 'tweets_2109_s10.csv',
 'tweets_2108_s10.csv',
 'tweets_2107_s10.csv']

In [329]:
data = [pd.read_csv('data/' + df) for df in files] # load the dataframes in a list
data = pd.concat(data, axis = 0) # concatenates the files to get and unique file
print(data.shape)

(650833, 20)


## Exploring some probable filters to apply on the dataset

### Searching for spam tweets

Twitter is widely used for bots and other types of users to use spam


Given that nowadays creating a new project of cryptocurrencies has became easy, many new projects are created every day and twitter is used as a way to spread information about the new project. Spammers try to use popular hashtags or cashtags such as Bitcoin to reach some public interested in investing in new projects that is promoted as the next bitcoin, for its capacity to increase its price in the future.

Common words on this regard are: amazing/excellent/amazing project. Here some examples of tweets wi

In [330]:
# Some examples of spam tweets
data['text_l'] = data['text'].str.lower()
data[data['text_l'].str.match(r'^(?=.*\b(?:best|strong|awesome|amazing|great|big|excellent|good|nice)\b).*project')].sample(5)

Unnamed: 0.1,Unnamed: 0,id,conversation_id,author_id,created_at,text,lang,source,public_metrics.like_count,public_metrics.quote_count,...,public_metrics.retweet_count,entities.annotations,entities.cashtags,author.username,author.public_metrics.followers_count,author.public_metrics.following_count,author.public_metrics.listed_count,geo.country,geo.country_code,text_l
75201,582767,1428301324362428418,1428288993184174081,1412799954058244102,2021-08-19T10:22:26.000Z,@AirdropStario good project \n@kmmanik33 \n@HaridashPaul \n@KSolymon \n\n#cryptocurrency #Crypto #Bitcoin #Ethereum #ETH #Airdrop #bounty #BSC #BNB #ALPHADOGE,en,Twitter for Android,0,0,...,0,,,KingSoly9,143,3380,15,,,@airdropstario good project \n@kmmanik33 \n@haridashpaul \n@ksolymon \n\n#cryptocurrency #crypto #bitcoin #ethereum #eth #airdrop #bounty #bsc #bnb #alphadoge
99566,372733,1418833057407062016,1418831910189486081,1413861853189394435,2021-07-24T07:18:56.000Z,Great Project. Keep it up team.\n\n@joyBasa27074877\n\n@Amit32116\n\n@sweetsh93416220\n\n#KITE #Airdrop #Binance #Bitcoin #BakerySwap #BNB #PanCakeSwap #cryptocurrency #NFT #DeFi #BTC #BinanceSmartChain #BSC https://t.co/ye9giIe9Za…\nQuote Tweet,en,Twitter Web App,0,0,...,0,,,Mamun90508981,16,446,1,,,great project. keep it up team.\n\n@joybasa27074877\n\n@amit32116\n\n@sweetsh93416220\n\n#kite #airdrop #binance #bitcoin #bakeryswap #bnb #pancakeswap #cryptocurrency #nft #defi #btc #binancesmartchain #bsc https://t.co/ye9giie9za…\nquote tweet
24158,637545,1439050594526711809,1438836691503132673,772629764011995138,2021-09-18T02:16:12.000Z,@keplerswap @airdropinspect Good project and solid team❤️❤️❤️ i think in the near future i will see an unprecedentet growth of this project 🚀🚀🚀\n\n@anambrother \n@kusumayanto1 \n@MemedHariadi \n\n#BTC #ETH #KeplerSwap #SDS #Airdrops #DeFi,en,Twitter for Android,0,0,...,0,,,tonisunduk_tony,92,1741,5,,,@keplerswap @airdropinspect good project and solid team❤️❤️❤️ i think in the near future i will see an unprecedentet growth of this project 🚀🚀🚀\n\n@anambrother \n@kusumayanto1 \n@memedhariadi \n\n#btc #eth #keplerswap #sds #airdrops #defi
50998,180081,1442236528168562693,1441734385397428228,1394471630378651649,2021-09-26T21:15:58.000Z,"@OfficialThemis with the opportunity to take part in this airdrop, I am very enthusiastic because this is an excellent project, let's take it to the moon.\n\n@rrahiqo \n@najwazakiyyah20 \n@iqbalwarwerwor \n\n#Airdrops #ThemisProtocol #Bitcoin #AirdropDet #crypto #Blockchain",en,Twitter for Android,0,0,...,0,,,PurnaasihEri,170,2106,9,,,"@officialthemis with the opportunity to take part in this airdrop, i am very enthusiastic because this is an excellent project, let's take it to the moon.\n\n@rrahiqo \n@najwazakiyyah20 \n@iqbalwarwerwor \n\n#airdrops #themisprotocol #bitcoin #airdropdet #crypto #blockchain"
89295,65498,1431984901008629762,1431974512006873089,1410811107011223555,2021-08-29T14:19:40.000Z,@CoinRizen Excellent and great projects...\nI have participated in following the guidelines and rules of this airdrop.I am sure that it will develop and be successful in the future.\n\n@Rajvai22\n@ray84_x\n@mo\n\n#Airdrops #Bitcoin #crypto #Blockchain #RizenCoin,en,Twitter for Android,0,0,...,0,,,NewRX8,137,1041,4,,,@coinrizen excellent and great projects...\ni have participated in following the guidelines and rules of this airdrop.i am sure that it will develop and be successful in the future.\n\n@rajvai22\n@ray84_x\n@mo\n\n#airdrops #bitcoin #crypto #blockchain #rizencoin


Similarly Airdrop is another token that is linked with the development of a new crypto project

Given that usually tweets associated with the promotion of a new project (airdrop) are highly possitive, this can bias our measure of sentiments over Bitcoin. For this reason we delete the tweet that contain a positive adjective before the work project. The same way we delete tweets with

In [331]:
# adjective project
data['delete_project'] = np.where(data['text_l'].str.contains(r'^(?=.*\b(?:best|strong|awesome|amazing|great|big|excellent|good|nice)\b).*project'), 1, 0)

# Airdrop word
data['delete_airdrop'] = np.where(data['text_l'].str.contains(r'airdrop'), 1, 0)

print('Proportion of positive project tweets: ', data['delete_project'].mean())
print('Proportion of tweets with the word aidrop:  ',data['delete_airdrop'].mean())

Proportion of positive project tweets:  0.14631710438776152
Proportion of tweets with the word aidrop:   0.1963237881299811


Another easy way to identify tweets is by searchin with the word referral since those tweets are also about some people trying  capture other to be part of a investment and potencial scam. See the follow examples.

In [332]:
data[data['text'].str.contains('referral')].sample(5)

Unnamed: 0.1,Unnamed: 0,id,conversation_id,author_id,created_at,text,lang,source,public_metrics.like_count,public_metrics.quote_count,...,entities.cashtags,author.username,author.public_metrics.followers_count,author.public_metrics.following_count,author.public_metrics.listed_count,geo.country,geo.country_code,text_l,delete_project,delete_airdrop
124960,1319789,1411748935249563656,1411748935249563656,617853906,2021-07-04T18:09:09.000Z,#freelance #freelancer #businessgrowth #grow #build #affiliatemarketing Check my website for #affiliate #referral #cashback #crypto #shopping #travel #business #dogecoin #bitcoin https://t.co/vBHLD11zwQ,en,ContentStudio.io,1,0,...,,bmurphypointman,78399,18419,546,,,#freelance #freelancer #businessgrowth #grow #build #affiliatemarketing check my website for #affiliate #referral #cashback #crypto #shopping #travel #business #dogecoin #bitcoin https://t.co/vbhld11zwq,0,0
181834,1445584,1446557616113586176,1446557616113586176,347865202,2021-10-08T19:26:26.000Z,@airdropinspect \n\n@airdropinspect\n#Bitcoin\nChannel: https://t.co/LYIc063wan\n\nBuild bots with the referral system\nPromote Airdrop/Bounty/Giveaway/Referral Link\n\nContact: https://t.co/a49FMbzsFa,en,Twitter Web App,0,0,...,,ErnestWeudji,21,426,1,,,@airdropinspect \n\n@airdropinspect\n#bitcoin\nchannel: https://t.co/lyic063wan\n\nbuild bots with the referral system\npromote airdrop/bounty/giveaway/referral link\n\ncontact: https://t.co/a49fmbzsfa,0,1
121151,423129,1429567107629473792,1429567107629473792,617853906,2021-08-22T22:12:13.000Z,#Onlinemarketing #NetworkingPays #Online #Marketing #Networking #incomeforlife #payout #affiliatemarketing Check my #website #makemoney #earnmoney #affiliate #referral #cashback #crypto #shopping #travel #business #bitcoin #dogecoin https://t.co/vBHLD11zwQ,en,ContentStudio.io,0,0,...,,bmurphypointman,78402,18419,546,,,#onlinemarketing #networkingpays #online #marketing #networking #incomeforlife #payout #affiliatemarketing check my #website #makemoney #earnmoney #affiliate #referral #cashback #crypto #shopping #travel #business #bitcoin #dogecoin https://t.co/vbhld11zwq,0,0
90140,1269509,1412155864350347272,1412155864350347272,617853906,2021-07-05T21:06:09.000Z,"#Smart #seamless #freelancer #tools to #build, #manage, and #grow your #business #affiliatemarketing Check my #website #makemoney #earnmoney #affiliate #referral #cashback #crypto #shopping #travel #business #bitcoin #dogecoin https://t.co/vBHLD11zwQ",en,ContentStudio.io,1,0,...,,bmurphypointman,78400,18419,546,,,"#smart #seamless #freelancer #tools to #build, #manage, and #grow your #business #affiliatemarketing check my #website #makemoney #earnmoney #affiliate #referral #cashback #crypto #shopping #travel #business #bitcoin #dogecoin https://t.co/vbhld11zwq",0,0
27451,358577,1430048931996598274,1430048931996598274,990831118072078341,2021-08-24T06:06:49.000Z,"#POND / #USDT acheived all take profit targets.\n\n#Congratulations.\n\nToday Accuracy So Far=100.0%, Qty =6 Signals\n\nYesterday Accuracy=78.95%, Qty =19 Signals\n\nMake money via our #trading and #referral plans (link in profile).\n\n#cryptocurrencies #Bitcoin #Binance #Alt...",en,cryptoscalper,0,0,...,,scalperSignals,1535,54,23,,,"#pond / #usdt acheived all take profit targets.\n\n#congratulations.\n\ntoday accuracy so far=100.0%, qty =6 signals\n\nyesterday accuracy=78.95%, qty =19 signals\n\nmake money via our #trading and #referral plans (link in profile).\n\n#cryptocurrencies #bitcoin #binance #alt...",0,0


Usually tweets that contain the word refferal are also spam, so lets include this as a term to delete tweets

In [333]:
data['delete_referral'] = np.where(data['text_l'].str.contains(r'referral'), 1, 0)
print('Proportion of tweets with the word referral:  ',data['delete_referral'].mean())

Proportion of tweets with the word referral:   0.0429910591503504


Now lets see what are some examples of tweets to delete and to keep

Another common word is marketing. So lets do the same process as above.

In [334]:
data['delete_marketing'] = np.where(data['text_l'].str.contains(r'marketing'), 1, 0)
print('Proportion of tweets with the word marketing:  ',data['delete_marketing'].mean())

Proportion of tweets with the word marketing:   0.06402256800131524


In [335]:
# Example of Tweets to remove
data[(data['delete_airdrop'] == 1) | (data['delete_referral'] == 1) | (data['delete_project'] == 1) | (data['delete_marketing'] == 1)].sample(5)

Unnamed: 0.1,Unnamed: 0,id,conversation_id,author_id,created_at,text,lang,source,public_metrics.like_count,public_metrics.quote_count,...,author.public_metrics.followers_count,author.public_metrics.following_count,author.public_metrics.listed_count,geo.country,geo.country_code,text_l,delete_project,delete_airdrop,delete_referral,delete_marketing
49380,7419,1421210904750919684,1421210904750919684,1389184389406470150,2021-07-30T20:47:39.000Z,Airdrop #telegram #btc @BabyDogeArt #bsc #follow\n👇👇👇\n\nhttps://t.co/03W5O1UlF7,en,Twitter for Android,0,0,...,14,220,1,,,airdrop #telegram #btc @babydogeart #bsc #follow\n👇👇👇\n\nhttps://t.co/03w5o1ulf7,0,1,0,0
118704,670606,1416753888569401345,1416709761777618946,1297815908052738048,2021-07-18T13:37:03.000Z,@airdropinspect Thanks for giving us such a great opportunity. I am supporting it. always success for the development team to the moon Nice project.\n@alif6244 \n@vkatikk\n@WhiteAp79686089\n#Airdrop #Airdrops #Airdropinspector #AntEx #ANT #VNDT #Crypto #Bitcoin,en,Twitter for Android,0,0,...,45,1657,5,,,@airdropinspect thanks for giving us such a great opportunity. i am supporting it. always success for the development team to the moon nice project.\n@alif6244 \n@vkatikk\n@whiteap79686089\n#airdrop #airdrops #airdropinspector #antex #ant #vndt #crypto #bitcoin,1,1,0,0
69461,513532,1417889043786833921,1417889043786833921,1386566605983739904,2021-07-21T16:47:45.000Z,Awesome project\n\n@QuantumLeapers @PhongLee86 @RiskAc13 \n\n#Airdrop #Airdrops #Airdropinspector #Larix #LARIX #Solana #SPL #Sollet #Crypto #Bitcoin https://t.co/08z1HXKEXj,en,Twitter Web App,0,0,...,132,910,1,,,awesome project\n\n@quantumleapers @phonglee86 @riskac13 \n\n#airdrop #airdrops #airdropinspector #larix #larix #solana #spl #sollet #crypto #bitcoin https://t.co/08z1hxkexj,1,1,0,0
67122,1182553,1424344228402970625,1424344228402970625,1293420361762009088,2021-08-08T12:18:21.000Z,#makemoneyonline #affiliatemarketing #networkmarketing #digitalmarketing #makemoney #socialmedia #affiliatemarketing #gift #shopping #gifts #onlineshopping #dogecoin #bitcoin free shipping on purchases over $25 at #Amazon https://t.co/E0egX01ArP,en,ContentStudio.io,0,0,...,475,333,7,,,#makemoneyonline #affiliatemarketing #networkmarketing #digitalmarketing #makemoney #socialmedia #affiliatemarketing #gift #shopping #gifts #onlineshopping #dogecoin #bitcoin free shipping on purchases over $25 at #amazon https://t.co/e0egx01arp,0,0,0,1
81556,672180,1438811977099341830,1438440496028868608,1417132958570356736,2021-09-17T10:28:01.000Z,"@AirdropStario I really like this airdrop, it's like a good project and in the future it will be a big project and fly high to the moon, congratulations \n@SatifFahreza \n@iv_secret \n@Muldans98 \n\n#cryptocurrency #Airdrop #BSC #Bitcoin #ETH #Amplify #AMPT #Airdrop...",en,Twitter for Android,0,0,...,83,1260,1,,,"@airdropstario i really like this airdrop, it's like a good project and in the future it will be a big project and fly high to the moon, congratulations \n@satiffahreza \n@iv_secret \n@muldans98 \n\n#cryptocurrency #airdrop #bsc #bitcoin #eth #amplify #ampt #airdrop...",1,1,0,0


In [336]:
# Examples of tweets after removing
data[(data['delete_airdrop'] == 0) & (data['delete_referral'] == 0) & (data['delete_project'] == 0) | (data['delete_marketing'] == 0)].sample(5)

Unnamed: 0.1,Unnamed: 0,id,conversation_id,author_id,created_at,text,lang,source,public_metrics.like_count,public_metrics.quote_count,...,author.public_metrics.followers_count,author.public_metrics.following_count,author.public_metrics.listed_count,geo.country,geo.country_code,text_l,delete_project,delete_airdrop,delete_referral,delete_marketing
83577,1160072,1424529982974578692,1424529982974578692,261960635,2021-08-09T00:36:29.000Z,Bitcoin can't be viewed as an untraceable 'crime coin' anymore #Bitcoin via https://t.co/0NnSfBI7JQ https://t.co/3Is8FmnhuW,en,TwinyBots,0,0,...,575,0,8,,,bitcoin can't be viewed as an untraceable 'crime coin' anymore #bitcoin via https://t.co/0nnsfbi7jq https://t.co/3is8fmnhuw,0,0,0,0
143306,86789,1442885284174123008,1442885284174123008,1357886427770535938,2021-09-28T16:13:53.000Z,"It is a really LEGIT project. This is an excellent iniative with super technical team. guys let's go support this project,next big project in crypto World🔥🔥\n\n@Poll07647395\n@Eman27791242\n@Bhupindrasahu5\n\n#Airdrop #SINGIN #Bitcoin #AirdropDet #NFT #BSC #Crypto #Blockchain...",en,Twitter for Android,0,0,...,14,260,9,,,"it is a really legit project. this is an excellent iniative with super technical team. guys let's go support this project,next big project in crypto world🔥🔥\n\n@poll07647395\n@eman27791242\n@bhupindrasahu5\n\n#airdrop #singin #bitcoin #airdropdet #nft #bsc #crypto #blockchain...",1,1,0,0
77918,921073,1437048689411903496,1435817095988609024,1428326174556577800,2021-09-12T13:41:21.000Z,"@frxresearch @DerivedFinance is a decentralised multi chain assets protocols that features farming #staking/minting, fees and governance.\n#derived aims to propel the usage of synthetic assets in #polkadot.\ncheck out @DerivedFinance \n#bitcoin #cryptocurrency #DEX $DVD http...",en,Twitter for Android,0,0,...,84,187,7,,,"@frxresearch @derivedfinance is a decentralised multi chain assets protocols that features farming #staking/minting, fees and governance.\n#derived aims to propel the usage of synthetic assets in #polkadot.\ncheck out @derivedfinance \n#bitcoin #cryptocurrency #dex $dvd http...",0,0,0,0
36382,181212,1431067980754927619,1431067980754927619,1200994323141054466,2021-08-27T01:36:09.000Z,POCO is a good project &amp; have a strong team. I think in the near future I will see an unprecedented growth of this project. Let's build a good and strong community.\n\n@fiqrinurhafizh\n@gepsif\n@rExile_0\n \n#cryptocurrency #Bitcoin #Airdrop #nftgames #PocoLand https://...,en,Twitter for Android,0,0,...,5,266,0,,,poco is a good project &amp; have a strong team. i think in the near future i will see an unprecedented growth of this project. let's build a good and strong community.\n\n@fiqrinurhafizh\n@gepsif\n@rexile_0\n \n#cryptocurrency #bitcoin #airdrop #nftgames #pocoland https://...,1,1,0,0
127910,220784,1419875233884819456,1419875233884819456,1419201618357100544,2021-07-27T04:20:10.000Z,"All coin follow #BTC everyday was dump pump dump pump, all people can predicted when buying or selling. If any take HODL, long time to be rich. #xrp #dogecoin #ETHEREUM #TFUEL",en,Twitter for Android,0,0,...,14,125,3,,,"all coin follow #btc everyday was dump pump dump pump, all people can predicted when buying or selling. if any take hodl, long time to be rich. #xrp #dogecoin #ethereum #tfuel",0,0,0,0


### Searching for spam accounts

Another common way to filter tweets is by taking into account the user variables, ofter new accounts are created to being spam for this reason 

In [337]:
data[data['author.public_metrics.followers_count'] < 50].sample(5)

Unnamed: 0.1,Unnamed: 0,id,conversation_id,author_id,created_at,text,lang,source,public_metrics.like_count,public_metrics.quote_count,...,author.public_metrics.followers_count,author.public_metrics.following_count,author.public_metrics.listed_count,geo.country,geo.country_code,text_l,delete_project,delete_airdrop,delete_referral,delete_marketing
54686,742665,1438291282221092865,1438291282221092865,1403583480827498497,2021-09-15T23:58:58.000Z,Good projects and i Will be supported on this projects\n\n@nraykov66\n \n@coin_running\n \n@rasita\n\n#Airdrop #Airdrops #Airdropinspector #BinanceSmartChain #BSC #HOS #FastHorse #Crypto #Bitcoin https://t.co/DnIxRiB3iT,en,Twitter Web App,0,0,...,12,303,0,,,good projects and i will be supported on this projects\n\n@nraykov66\n \n@coin_running\n \n@rasita\n\n#airdrop #airdrops #airdropinspector #binancesmartchain #bsc #hos #fasthorse #crypto #bitcoin https://t.co/dnixrib3it,1,1,0,0
148755,1425276,1446738430717685763,1446738430717685763,1303152744807583744,2021-10-09T07:24:55.000Z,@Lonesai6 @Nuopie1 @ChueCrypto great project #BTC #Solana #Bsc #ETH #Polygon #Polkadot #HecoChain #Launchpad #investing #DeFi https://t.co/lVHbYyb3mB,en,Twitter Web App,0,0,...,23,1155,4,,,@lonesai6 @nuopie1 @chuecrypto great project #btc #solana #bsc #eth #polygon #polkadot #hecochain #launchpad #investing #defi https://t.co/lvhbyyb3mb,1,0,0,0
97459,1481501,1422211150691799044,1421926394238439425,1414603696999043082,2021-08-02T15:02:16.000Z,@project_euro Potential project and I know this Project will success\nI believe this is a faithful project.\n\n@AtiKHaS45145424\n@Joysaha07526272\n@Aiden5_31\n\n#beuropro #bitcoin #btc #cryptocurrency #cryptonews \n#finance #forex #invest #investing #investment \n#investor #t...,en,Twitter for Android,0,0,...,20,741,2,,,@project_euro potential project and i know this project will success\ni believe this is a faithful project.\n\n@atikhas45145424\n@joysaha07526272\n@aiden5_31\n\n#beuropro #bitcoin #btc #cryptocurrency #cryptonews \n#finance #forex #invest #investing #investment \n#investor #t...,0,0,0,0
59141,159287,1420318395573014534,1420318395573014534,1372834025715343360,2021-07-28T09:41:08.000Z,I believe this is a faithful project.The projector has a lot of attractions so hopefully the project will be better in the future and will be the best.\n@asepyusuf\n@Agunghasanu271 \n@CRDTpay \n\n#bsc #BTC #Airdrop https://t.co/b3BEadyBhM,en,Twitter for Android,0,0,...,49,2459,8,,,i believe this is a faithful project.the projector has a lot of attractions so hopefully the project will be better in the future and will be the best.\n@asepyusuf\n@agunghasanu271 \n@crdtpay \n\n#bsc #btc #airdrop https://t.co/b3beadybhm,1,1,0,0
99348,1382359,1447061289717620739,1447061233723711488,1446670186677997575,2021-10-10T04:47:51.000Z,"@Co2Heal https://t.co/o92SE3U2J3\n\nI'm farming my bitcoin on this app, feels good to invest in #shibainu with this application, do like me i do not regret since the first day.\n\n#shib #NIO #shibainu #MOASS #GME #AMCSTRONG #AMCSqueeze #AMCtothemoon #Bitcoin #etherium https:...",en,Twitter Web App,0,0,...,0,5,0,,,"@co2heal https://t.co/o92se3u2j3\n\ni'm farming my bitcoin on this app, feels good to invest in #shibainu with this application, do like me i do not regret since the first day.\n\n#shib #nio #shibainu #moass #gme #amcstrong #amcsqueeze #amctothemoon #bitcoin #etherium https:...",0,0,0,0


In [338]:
data['delete_followers'] = np.where(data['author.public_metrics.followers_count'] < 30, 1, 0)
print(data['delete_followers'].mean())

0.17034477354405816


In [339]:
### Count and delete the spam tweets
print('Project:', data['delete_project'].sum(), data['delete_project'].mean())
print('Airdrop:', data['delete_airdrop'].sum(), data['delete_airdrop'].mean())
print('Referal:', data['delete_referral'].sum(), data['delete_referral'].mean())
print('Followers:', data['delete_followers'].sum(), data['delete_followers'].mean())


ini_number = data.shape[0]
print('Initial number of tweets:', ini_number)

data = data[(data['delete_airdrop'] == 0) & (data['delete_referral'] == 0) & (data['delete_project'] == 0) & (data['delete_marketing'] == 0) & (data['delete_followers'] == 0)].copy()
fin_number = data.shape[0]

print('Final number of tweets:', fin_number)
print('Total tweets removed:', ini_number - fin_number,  fin_number/ini_number)

Project: 95228 0.14631710438776152
Airdrop: 127774 0.1963237881299811
Referal: 27980 0.0429910591503504
Followers: 110866 0.17034477354405816
Initial number of tweets: 650833
Final number of tweets: 384647
Total tweets removed: 266186 0.5910072169051047


In [341]:
# Summary of the code to delete tweets
data['text_l'] = data['text'].str.lower()

data['delete_project'] = np.where(data['text_l'].str.contains(r'^(?=.*\b(?:best|strong|awesome|amazing|great|big|excellent|good|nice)\b).*project'), 1, 0)
data['delete_airdrop'] = np.where(data['text_l'].str.contains(r'airdrop'), 1, 0)
data['delete_referral'] = np.where(data['text_l'].str.contains(r'referral'), 1, 0)
data['delete_marketing'] = np.where(data['text_l'].str.contains(r'marketing'), 1, 0)
data['delete_followers'] = np.where(data['author.public_metrics.followers_count'] < 30, 1, 0)

data = data[(data['delete_airdrop'] == 0) & (data['delete_referral'] == 0) & (data['delete_project'] == 0) & (data['delete_marketing'] == 0) & (data['delete_followers'] == 0)].copy()
print(data.shape)

(384647, 26)
