## Scrape Depressive tweets from Twitter for 2020

I would like to gather data from twitter based on depressive hashtags, such as #depressed, #anxiety, #depression, #suicide, #mentalhealth, #loneliness, #hopelessness, #itsokaynottobeokay. Then apply various techniques to remove non-depressive messages The result of this script will provide a dataset that contains a filtered collection of tweets that are potentially depressive. The script also removes all hashtags from the tweets, so that the machine learning model cannot cheat by just looking for depressive hashtags. The final dataset will be manually reviewed and labelled, so that both the depressive and non-depressive messages within it will be correctly marked.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install -U git+https://github.com/cyxv/twint.git@master

Collecting git+https://github.com/cyxv/twint.git@master
  Cloning https://github.com/cyxv/twint.git (to revision master) to /tmp/pip-req-build-zy5_r29h
  Running command git clone -q https://github.com/cyxv/twint.git /tmp/pip-req-build-zy5_r29h
Collecting dataclasses
  Downloading dataclasses-0.6-py3-none-any.whl (14 kB)
Building wheels for collected packages: twint
  Building wheel for twint (setup.py) ... [?25l[?25hdone
  Created wheel for twint: filename=twint-2.1.21-py3-none-any.whl size=39036 sha256=94625f9586fbf7baf895b49099eab69d1be50b194410e335e23dc1a61cbd4c48
  Stored in directory: /tmp/pip-ephem-wheel-cache-ghrb08va/wheels/01/5d/f2/931521a600472aa963d669aa165dcafc68c3e208ee8c4b3f7c
Successfully built twint
Installing collected packages: dataclasses, twint
  Attempting uninstall: twint
    Found existing installation: twint 2.1.20
    Uninstalling twint-2.1.20:
      Successfully uninstalled twint-2.1.20
Successfully installed dataclasses-0.6 twint-2.1.21


In [2]:
!pip3 install -qq twint
!pip install -qq whatthelang

[K     |████████████████████████████████| 1.3 MB 34.6 MB/s 
[K     |████████████████████████████████| 263 kB 64.7 MB/s 
[K     |████████████████████████████████| 378 kB 62.6 MB/s 
[K     |████████████████████████████████| 291 kB 66.5 MB/s 
[K     |████████████████████████████████| 160 kB 59.3 MB/s 
[K     |████████████████████████████████| 271 kB 41.5 MB/s 
[?25h  Building wheel for twint (setup.py) ... [?25l[?25hdone
  Building wheel for fake-useragent (setup.py) ... [?25l[?25hdone
  Building wheel for googletransx (setup.py) ... [?25l[?25hdone
[K     |████████████████████████████████| 786 kB 33.8 MB/s 
[K     |████████████████████████████████| 455 kB 41.6 MB/s 
[K     |████████████████████████████████| 244 kB 48.9 MB/s 
[?25h  Building wheel for whatthelang (setup.py) ... [?25l[?25hdone
  Building wheel for cysignals (setup.py) ... [?25l[?25hdone
  Building wheel for pyfasttext (setup.py) ... [?25l[?25hdone


In [3]:
# Import Library
import twint
import nest_asyncio
nest_asyncio.apply()
import pandas as pd
import re

In [23]:
# Instantiate and configure the twint-object
c = twint.Config()
c.Store_object = True
c.Pandas =True
c.Search = "#depressed"
c.Limit = 100000

In [24]:
# Run search
twint.run.Search(c)

1448808022814433294 2021-10-15 00:28:44 +0000 <wannamakeyousob> #crying #depressed #needhelp #adele #iloveyoumydearihopeidkeepyousafebutiloveyou  https://t.co/5tbvBBhtwO
1448804627097407489 2021-10-15 00:15:15 +0000 <lighthousegh0st> When the depression's at its worst, I lose my sense of taste (and other fun things), and I'm wondering if anyone seeing this has experienced that *and* the loss of taste due to Covid, and noted a difference between the two? #depression  #depressed
1448793239847202825 2021-10-14 23:30:00 +0000 <KennethMGrimes1> Think to yourself in the past week how many times you may have been anxious about something and leery about it.   https://t.co/zLDxk5Yu7h #depression #depressed #deep #stress #stressed #stressmanagement #stressrelief #SuicidePrevention #SuicideAwareness #help #women #men #teens  https://t.co/St65mOoOGO
1448792278844203011 2021-10-14 23:26:11 +0000 <QUEEN_JADASHAY> Why do I feel like I’m normal when I’m high? It’s like a temporarily I’m an actual huma

In [8]:
# Quick check
twint.storage.panda.Tweets_df.head()

Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,user_id,user_id_str,username,name,day,hour,link,urls,photos,video,thumbnail,retweet,nlikes,nreplies,nretweets,quote_url,search,near,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest
0,1448793239847202825,1448793239847202825,1634254000000.0,2021-10-14 23:30:00,0,,Think to yourself in the past week how many ti...,en,"[depression, depressed, deep, stress, stressed...",[],1306668993403662344,1306668993403662344,KennethMGrimes1,Kenneth M Grimes,4,23,https://twitter.com/KennethMGrimes1/status/144...,[https://superiordivisionbrandz.blogspot.com/2...,[https://pbs.twimg.com/tweet_video_thumb/FBsOv...,1,https://pbs.twimg.com/tweet_video_thumb/FBsOv2...,False,0,0,0,,#depressed,,,,,,,[],,,,
1,1448792278844203011,1448792278844203011,1634254000000.0,2021-10-14 23:26:11,0,,Why do I feel like I’m normal when I’m high? I...,en,"[depressed, anxiety, depression, mentalhealthm...",[],1348210112847085570,1348210112847085570,QUEEN_JADASHAY,JADASHAY 🎙,4,23,https://twitter.com/QUEEN_JADASHAY/status/1448...,[],[],0,,False,0,0,0,,#depressed,,,,,,,[],,,,
2,1448791989546364929,1448791989546364929,1634254000000.0,2021-10-14 23:25:02,0,,I would have to sell most of it to cover gas f...,en,"[shiboshis, depressed]",[],1387603268239560708,1387603268239560708,Ezekielf11,Ezekiel f.,4,23,https://twitter.com/Ezekielf11/status/14487919...,[],[],0,,False,0,0,0,https://twitter.com/ShibaInuHodler/status/1447...,#depressed,,,,,,,[],,,,
3,1448783648937873413,1448783648937873413,1634252000000.0,2021-10-14 22:51:53,0,,Currently sitting in the dark drinking Baja Bl...,en,[depressed],[],1107142973446184960,1107142973446184960,gremln_princess,gremlin princess94💞,4,22,https://twitter.com/gremln_princess/status/144...,[],[],0,,False,1,0,0,,#depressed,,,,,,,[],,,,
4,1448781523339993089,1448781523339993089,1634251000000.0,2021-10-14 22:43:26,0,,I legit thought about buying chucks today…. Fu...,en,[depressed],[],2255941682,2255941682,jjdsolano,serial is my calling♡,4,22,https://twitter.com/jjdsolano/status/144878152...,[],[],0,,False,0,0,0,,#depressed,,,,,,,[],,,,


In [9]:
# Cleanup
tweets_depressed = twint.storage.panda.Tweets_df.drop_duplicates(subset=['id'])

In [11]:
# Reindex
tweets_depressed.index = range(len(tweets_depressed))

In [12]:
# Remove non-english
from whatthelang import WhatTheLang
wtl = WhatTheLang()

In [13]:
# This function makes easy to handle exceptions (e.g. no text where text should be)
# not really needed but can be useful 

def detect_lang(text):
    try: 
        return wtl.predict_lang(text)
    except Exception:
        return 'exp'

In [14]:
# Added performance measure here...you can leave teh %%time line out

%%time

tweets_depressed['lang'] = tweets_depressed['tweet'].map(lambda t: detect_lang(t))

CPU times: user 51.2 ms, sys: 706 µs, total: 51.9 ms
Wall time: 55 ms


In [15]:
# keep only english

tweets_depressed = tweets_depressed[tweets_depressed.lang == 'en']

In [16]:
tweets_depressed

Unnamed: 0,id,conversation_id,created_at,date,timezone,place,tweet,language,hashtags,cashtags,user_id,user_id_str,username,name,day,hour,link,urls,photos,video,thumbnail,retweet,nlikes,nreplies,nretweets,quote_url,search,near,geo,source,user_rt_id,user_rt,retweet_id,reply_to,retweet_date,translate,trans_src,trans_dest,lang
0,1448793239847202825,1448793239847202825,1.634254e+12,2021-10-14 23:30:00,+0000,,Think to yourself in the past week how many ti...,en,"[depression, depressed, deep, stress, stressed...",[],1306668993403662344,1306668993403662344,KennethMGrimes1,Kenneth M Grimes,4,23,https://twitter.com/KennethMGrimes1/status/144...,[https://superiordivisionbrandz.blogspot.com/2...,[https://pbs.twimg.com/tweet_video_thumb/FBsOv...,1,https://pbs.twimg.com/tweet_video_thumb/FBsOv2...,False,0,0,0,,#depressed,,,,,,,[],,,,,en
1,1448792278844203011,1448792278844203011,1.634254e+12,2021-10-14 23:26:11,+0000,,Why do I feel like I’m normal when I’m high? I...,en,"[depressed, anxiety, depression, mentalhealthm...",[],1348210112847085570,1348210112847085570,QUEEN_JADASHAY,JADASHAY 🎙,4,23,https://twitter.com/QUEEN_JADASHAY/status/1448...,[],[],0,,False,0,0,0,,#depressed,,,,,,,[],,,,,en
2,1448791989546364929,1448791989546364929,1.634254e+12,2021-10-14 23:25:02,+0000,,I would have to sell most of it to cover gas f...,en,"[shiboshis, depressed]",[],1387603268239560708,1387603268239560708,Ezekielf11,Ezekiel f.,4,23,https://twitter.com/Ezekielf11/status/14487919...,[],[],0,,False,0,0,0,https://twitter.com/ShibaInuHodler/status/1447...,#depressed,,,,,,,[],,,,,en
3,1448783648937873413,1448783648937873413,1.634252e+12,2021-10-14 22:51:53,+0000,,Currently sitting in the dark drinking Baja Bl...,en,[depressed],[],1107142973446184960,1107142973446184960,gremln_princess,gremlin princess94💞,4,22,https://twitter.com/gremln_princess/status/144...,[],[],0,,False,1,0,0,,#depressed,,,,,,,[],,,,,en
5,1448780189341822977,1448748319338082310,1.634251e+12,2021-10-14 22:38:08,+0000,,@Drewzz_ #depressed #😔🔫,und,[depressed],[],348738481,348738481,NelsonWest_,Nielzon,4,22,https://twitter.com/NelsonWest_/status/1448780...,[],[],0,,False,0,1,0,,#depressed,,,,,,,"[{'screen_name': 'Drewzz_', 'name': '👤', 'id':...",,,,,en
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
700,1445580348956442633,1445580348956442633,1.633488e+12,2021-10-06 02:43:07,+0000,,Realized that Mikey can't sleep now becoz his ...,en,"[depressed, tokyorevengers, anitwt]",[],1400020843116171264,1400020843116171264,Mar_randomsimp,Mar,3,02,https://twitter.com/Mar_randomsimp/status/1445...,[],[],0,,False,0,0,0,,#depressed,,,,,,,[],,,,,en
701,1445576589706551300,1445576589706551300,1.633487e+12,2021-10-06 02:28:11,+0000,,i can tell my bf is #depressed but is trying t...,en,[depressed],[],1349579296973725698,1349579296973725698,egrimmitt12,evie,3,02,https://twitter.com/egrimmitt12/status/1445576...,[],[],0,,False,1,0,0,,#depressed,,,,,,,[],,,,,en
702,1445570662731370503,1445570662731370503,1.633486e+12,2021-10-06 02:04:38,+0000,,Morgan wallets having a tour and I didn’t get ...,en,[depressed],[],3142773487,3142773487,alicialguezabal,𝐀𝐋𝐈𝐂𝐈𝐀,3,02,https://twitter.com/alicialguezabal/status/144...,[],[],0,,False,1,1,0,,#depressed,,,,,,,[],,,,,en
703,1445569628021813251,1445569628021813251,1.633486e+12,2021-10-06 02:00:31,+0000,,Gracious. These “new” artist are awful. Everyb...,en,"[depressed, hiphopawards]",[],129340952,129340952,MoVega_,Mo Vega,3,02,https://twitter.com/MoVega_/status/14455696280...,[],[],0,,False,2,0,1,,#depressed,,,,,,,[],,,,,en


In [19]:
tweets_depressed_new = tweets_depressed[['id','tweet']]

In [20]:
tweets_depressed_new

Unnamed: 0,id,tweet
0,1448793239847202825,Think to yourself in the past week how many ti...
1,1448792278844203011,Why do I feel like I’m normal when I’m high? I...
2,1448791989546364929,I would have to sell most of it to cover gas f...
3,1448783648937873413,Currently sitting in the dark drinking Baja Bl...
5,1448780189341822977,@Drewzz_ #depressed #😔🔫
...,...,...
700,1445580348956442633,Realized that Mikey can't sleep now becoz his ...
701,1445576589706551300,i can tell my bf is #depressed but is trying t...
702,1445570662731370503,Morgan wallets having a tour and I didn’t get ...
703,1445569628021813251,Gracious. These “new” artist are awful. Everyb...


In [22]:
# Done

tweets_depressed_new.to_csv('/content/drive/MyDrive/NLP/Depression_Detection/tweets_depressed_new.csv')