#### The COVID-19 pandemic has changed our lifestyles, habits, and daily routine. Although some of impact of COVID-19 have been widely reported already, many effects of this pandemic are still to be discovered. 

#### Back pain is one of the most common chronic disorders, which not only can reduce work productivity and negatively affect the quality of life, but can also increase the economic and societal burden.

#### Due to the lockdown and the necessity to work from home, concerns and complaints of back pain have dramatically emerged.

## Goal: Assess the changes in the frequency of reported physical back pain complaints reported during the COVID-19 pandemic. 

#### Specially, I am going to investigate differences in the number of back pain complaints between the pre-pandemic and during the pandemic. 

#### To do this, I am going to test the hypothesis: There is a statistically significant difference between the number of complaints regarding back pain during and before the COVID-19 pandemic.



#### This study investigates changes in the number of actual back pain complaints reported on Twitter over time. For this purpose, the following research steps have been followed: 
1. Conducting exploratory Twitter data analysis regarding back pain.
2. Downloading relevant Twitter data.
3. Defining and training an intelligent data filter based on tools from the machine learning (ML), deep learning (DL), and natural language processing (NLP) domain.
4. Applying the trained filter to all data instances.
5. Creating visualizations of filtered data.
6. Testing the research hypotheses.


# Data Acquisition: 

### - Only English language tweets were considered. I followed Twitter Developer Policy and used official Twitter API to download the necessary data.
### - I considered Twitter data for the years 2019 and 2020. The year 2019 served as a baseline before pandemics, and data gathered in 2020 was labeled as COVID-19 related. 
### - Search term ‘back pain’ with localization set to the USA over two period: from 1 March 2019 to 1 December 2019, and from 1 March 2020 to 1 December 2020. A total of 15663 and 14634 USA localized tweets were collected for the selected period in years 2019 and 2020, respectively. After dropping tweets with duplicate texts, the final tweet numbers for this research are 26635 and 24497 for 2019 and 2020, respectively.


# Preprocessing of Tweets:

The downloaded Twitter data contained many posts unrelated to back pain or not necessarily expressing complaints regarding the presence of back pain. Therefore, in order to filter out the unwanted tweets and assess the true number of back pain complaints, I need to classify each tweet either as ‘complaining on back pain’ or ‘other’.
Since manual labeling of the total number of tweets is not an option, I want to develop an automatic filtering method using ML, DL, and NLP. I am going to apply BERT pretraining approach model and machine learning classifier such as gradient boosting (XGBoost). 

### Also, tweet texts were preprocessed according to the following procedure:
1. The links to images were replaced with the “_IMAGE” token.

2. Redundant/repeating characters were removed (for example a ten times repeated ‘a’ was converted to ‘aa’).

3. Textual elements representing retweets were converted to “_RETWEET” token.

4. Other textual elements beginning with “http” or “https” or “youtu.be” were converted to “_URL” tokens.

5. Language of tweets was assessed with use of langdetect module and all non-English tweets were removed.

6. All emotions were converted to textual representations with use of emoji module.

### In order to prepare the data for supervised training of our automatic filtering method, I decided to draw a random sample of 5000 tweets from the whole data corpus and labeled all selected tweets manually.


### The preprocessing code is written in Python3 using google colab. The link is: 

https://colab.research.google.com/drive/1brqBum8DDSZbeFqqKo1YfnFEM3CaKFB5

In [None]:
# Some modules need to be imported in advance to create connection between Colab and Drive
!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [None]:
# Autheticate e-mail ID
#!pip install -U tensorflow-gpu==2.0.0 grpcio
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


import os
os.chdir('/content')



file_variable_name = drive.CreateFile({'id':'1jrtWTv7EdTAowlCB6eUz_TSV0SFU28-A'})

file_variable_name.GetContentFile('csv_date_renderedContent.zip')
!unzip csv_date_renderedContent.zip


Archive:  csv_date_renderedContent.zip
   creating: csv_date_renderedContent/
  inflating: csv_date_renderedContent/200901-200930_dc.csv  
  inflating: csv_date_renderedContent/200801-200831_dc.csv  
  inflating: csv_date_renderedContent/200701-200731_dc.csv  
  inflating: csv_date_renderedContent/200601-200630_dc.csv  
  inflating: csv_date_renderedContent/200501-200531_dc.csv  
  inflating: csv_date_renderedContent/200401-200430_dc.csv  
  inflating: csv_date_renderedContent/200301-200331_dc.csv  
  inflating: csv_date_renderedContent/190901-190930_dc.csv  
  inflating: csv_date_renderedContent/190801-190831_dc.csv  
  inflating: csv_date_renderedContent/190701-190731_dc.csv  
  inflating: csv_date_renderedContent/190601-190630_dc.csv  
  inflating: csv_date_renderedContent/190501-190531_dc.csv  
  inflating: csv_date_renderedContent/190401-190430_dc.csv  
  inflating: csv_date_renderedContent/190301-190331_dc.csv  


In [None]:
import pandas as pd

!pip3 install datatable
import datatable as dt
import os

Collecting datatable
[?25l  Downloading https://files.pythonhosted.org/packages/80/07/7ca237758866497cbe076e31920a0320e28808f54fa75a5e2b0348d7aa8a/datatable-0.11.0-cp36-cp36m-manylinux2010_x86_64.whl (83.7MB)
[K     |████████████████████████████████| 83.7MB 73kB/s 
[?25hInstalling collected packages: datatable
Successfully installed datatable-0.11.0


In [None]:
file_list = ['200901-200930',
             '200801-200831',
             '200701-200731',
             '200601-200630',
             '200501-200531',
             '200401-200430',
             '200301-200331',

             '190901-190930',
             '190801-190831',
             '190701-190731',
             '190601-190630',
             '190501-190531',
             '190401-190430',
             '190301-190331',
             ]

In [None]:
flist = [f'{x}_dc.csv' for x in file_list]
flist

['200901-200930_dc.csv',
 '200801-200831_dc.csv',
 '200701-200731_dc.csv',
 '200601-200630_dc.csv',
 '200501-200531_dc.csv',
 '200401-200430_dc.csv',
 '200301-200331_dc.csv',
 '190901-190930_dc.csv',
 '190801-190831_dc.csv',
 '190701-190731_dc.csv',
 '190601-190630_dc.csv',
 '190501-190531_dc.csv',
 '190401-190430_dc.csv',
 '190301-190331_dc.csv']

In [None]:
!pwd

/content


In [None]:
df = pd.DataFrame()
for ff in flist:
  df_local = dt.fread(os.path.join('/content/csv_date_renderedContent',ff)).to_pandas()
  df = pd.concat([df,df_local])

df.columns = ['dummy','date','text','url']
df.drop(['dummy','url'],axis=1, inplace=True)
df.drop(0, axis=0, inplace=True)
df.reset_index(drop=True, inplace=True)
df['tweet_id'] = df.index
df = df[['tweet_id','date','text']]

In [None]:
df

Unnamed: 0,tweet_id,date,text
0,0,2020-09-29 23:59:17+00:00,My lower back pain came out of no where tonigh...
1,1,2020-09-29 23:58:59+00:00,@FruitKace For me it is my hips. Lower back pa...
2,2,2020-09-29 23:58:54+00:00,"""We don't need bodyguard.\nWe don't need coock..."
3,3,2020-09-29 23:58:34+00:00,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29 23:57:11+00:00,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21
...,...,...,...
530305,530305,2019-03-01 00:10:02+00:00,💵🎧Magno Garcia x General Back Pain “Ballin On ...
530306,530306,2019-03-01 00:08:36+00:00,@COMPLEXcsgo You still have back pain?
530307,530307,2019-03-01 00:05:01+00:00,This is everything I ever wanted and more. My ...
530308,530308,2019-03-01 00:01:56+00:00,A short video to kick off this month's campaig...


In [None]:
len(df)

530310

In [None]:
df['date'] = df['date'].apply(lambda a: pd.to_datetime(a).date())
df.head()

Unnamed: 0,tweet_id,date,text
0,0,2020-09-29,My lower back pain came out of no where tonigh...
1,1,2020-09-29,@FruitKace For me it is my hips. Lower back pa...
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coock..."
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21


In [None]:
import datetime
len(df[df['date'] <= datetime.date(2019, 12, 1)])/len(df)

0.5484716486583319

In [None]:
len(df[df['date'] >= datetime.date(2019, 12, 1)])/len(df)

0.4515283513416681

In [None]:
df['month'] = pd.to_datetime(df['date']).dt.to_period('M')

# Get the start and end months
months = df['month'].sort_values()
start_month = months.iloc[0]
end_month = months.iloc[-1]

#index = pd.PeriodIndex(start=start_month, end=end_month)

#df.groupby('month')['text'].count().reindex(index).plot.bar()

In [None]:
pd.set_option('display.max_colwidth', -1)
df[df['date']>= datetime.date(2019, 12, 1)]

  """Entry point for launching an IPython kernel.


Unnamed: 0,tweet_id,date,text,month
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09
...,...,...,...,...
290855,290855,2020-03-01,#ArtificialIntelligence Can Scan Doctors’ Notes to Distinguish Between Types of Back Pain | @newswise ow.ly/swHW30qlEZt,2020-03
290856,290856,2020-03-01,"Educating Chronic Back Pain Patients About Their Pain Increases Recovery \n\nThe Patient Learns About What Movements He May Do, Gradually Increasing The Activity \n\nChronic back pain patients are usually ... - bit.ly/2SLzXQ3",2020-03
290857,290857,2020-03-01,@atinykoo SO TELL ME AND SAVE ME THE BACK PAIN 😇😇😇 \n\npls 🥺 https://t.co/JpezxYcDUe,2020-03
290858,290858,2020-03-01,Standing For Excessive Time At Work Can Be Even Worse Than Excess Sitting Down \n\nThe Blood Pools In The Legs And The Heart Has To Fight Against Gravity \n\nSitting for too much time has been named “the n ... - justnobackpain.com/news/standing-… https://t.co/FOlh9Buaqz,2020-03


In [None]:
# replacing image links to a _IMAGE token

df["tidy_text"] = df["text"].str.replace('https://t.co/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))\w+', ' _IMAGE ', regex=True)
df

Unnamed: 0,tweet_id,date,text,month,tidy_text
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09,My lower back pain came out of no where tonight jeeeeeeeeez _IMAGE
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch."
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09,Back pain and insomnia 🥴 _IMAGE
...,...,...,...,...,...
530305,530305,2019-03-01,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…,2019-03,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…
530306,530306,2019-03-01,@COMPLEXcsgo You still have back pain?,2019-03,@COMPLEXcsgo You still have back pain?
530307,530307,2019-03-01,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…,2019-03,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…
530308,530308,2019-03-01,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…,2019-03,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…


In [None]:
!pip install demoji

import demoji
demoji.download_codes()

Downloading emoji data ...
... OK (Got response in 0.38 seconds)
Writing emoji data to /root/.demoji/codes.json ...
... OK


In [None]:
## Check if there is an emoji contained in a tweet and extract them:

def extract_emojis(text):

  return [emoji for emoji in demoji._EMOJI_PAT.findall(text)]



def contains_emoji(text):
  """return True if text contains an emoji, False otherwise."""

  return bool(demoji.findall(text))


# Example:
sample = ' my ✨ back pain ✨ : 📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈'

extract_emojis(sample)
#contains_emoji(sample)

['✨',
 '✨',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈',
 '📈']

In [None]:
!pip install emoji
import emoji



In [None]:
def interpret_emoji(text):
  string = ""
  d = dict()
  for i in range(len(text)):
    if text[i] in emoji.UNICODE_EMOJI:
      if text[i] in d.keys():
        if d[text[i]] < 2:
          string = string + text[i]
          d[text[i]] += 1
      else:
        string = string + text[i]
        d[text[i]] = 1
    else:
      string = string + text[i]
    

  return demoji.replace_with_desc(string)


In [None]:
sample = ' my ✨ back pain ✨ : 📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈📈'

interpret_emoji(sample)

' my :sparkles: back pain :sparkles: : :chart increasing::chart increasing:'

In [None]:
df['tidy_text'] = df['tidy_text'].map(lambda x: interpret_emoji(x))
df

Unnamed: 0,tweet_id,date,text,month,tidy_text
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09,My lower back pain came out of no where tonight jeeeeeeeeez _IMAGE
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch."
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09,Back pain and insomnia :woozy face: _IMAGE
...,...,...,...,...,...
530305,530305,2019-03-01,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…,2019-03,:dollar banknote::headphone:Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ :headphone::dollar banknote: soundcloud.com/magnogarcia/ma…
530306,530306,2019-03-01,@COMPLEXcsgo You still have back pain?,2019-03,@COMPLEXcsgo You still have back pain?
530307,530307,2019-03-01,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…,2019-03,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…
530308,530308,2019-03-01,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…,2019-03,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…


In [None]:
#Reduce the number of redundant characters to 2

import re

def remove_redundant_char(text):

  text_new = ""
  words = text.split()

  for word in words:
    text_new += re.sub(r'([a-z])\1\w+(\1{2,})', r'\2', word) + ' '
  return text_new.strip()


# Example 
tweet = 'My lower back pain came out of no where tonight jeeeeeeeeez'

remove_redundant_char(tweet)

'My lower back pain came out of no where tonight jeez'

In [None]:
df['tidy_text'] = df['tidy_text'].map(lambda x: remove_redundant_char(x))

In [None]:
df

Unnamed: 0,tweet_id,date,text,month,tidy_text
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09,My lower back pain came out of no where tonight jeez _IMAGE
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch."
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09,"""We don't need bodyguard. We don't need coocker. We don't need house cleaner. We need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09,Back pain and insomnia :woozy face: _IMAGE
...,...,...,...,...,...
530305,530305,2019-03-01,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…,2019-03,:dollar banknote::headphone:Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ :headphone::dollar banknote: soundcloud.com/magnogarcia/ma…
530306,530306,2019-03-01,@COMPLEXcsgo You still have back pain?,2019-03,@COMPLEXcsgo You still have back pain?
530307,530307,2019-03-01,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…,2019-03,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…
530308,530308,2019-03-01,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…,2019-03,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…


In [None]:
# This function does the same thing as remove_redundant_char function. But some words in the tweets work well with this function
# and some others work well with remove_redundant_char function. (It's in the process to combine these two function and have a perfect one)

def repoo(x):
    repeat_regexp = re.compile(r'(\S+)(\1{2,})')
    repl = r'\2'
    return repeat_regexp.sub(repl=r'\2', string=x)


In [None]:
for i in range(10):
  df['tidy_text'] = df['tidy_text'].map(lambda x: repoo(str(x)))

In [None]:
# Get rid of some other image link and replace them to a _IMAGE token

df['tidy_text'] = df['tidy_text'].str.replace('https://pbs.twimg.com/media/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))w+', ' _IMAGE', regex=True)

In [None]:
# Replace instances of twitter.com/"Username" to _RETWEET

df['tidy_text'] = df['tidy_text'].str.replace('twitter.com(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))+|twitter.com(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F])|\u00E2\u20AC)+\¦', ' _RETWEET', regex=True)

In [None]:
# Replace all URLS to "_URL" token

df['tidy_text'] = df['tidy_text'].str.replace('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))w+', ' _URL', regex=True)
df['tidy_text'] = df['tidy_text'].str.replace('URL#(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\), ]|(?:%[0-9a-fA-F][0-9a-fA-F]))w+', ' _URL', regex=True)

# All other URLS contain ".com" 
df['tidy_text'] = df['tidy_text'].str.replace('[a-zA-Z]\w+\.com\/[a-zA-Z0-9]\w+([\-\.\/]\w+)+', ' _URL', regex=True)

# URLS like bit.ly/3hQ1ee9
df['tidy_text'] = df['tidy_text'].str.replace('bit.ly(\/\d+\w+)+', ' _URL', regex=True)

# facebook.com & youtu.be
df['tidy_text'] = df['tidy_text'].str.replace('youtu.be(\/[A-Z]\w+)+|\bfacebook.com\b', ' _URL', regex=True)

In [None]:
# Replace instagram.com to _URL (sample: instagram.com/p/B_Uvw2vl_Lh/)

df['tidy_text'] = df['tidy_text'].str.replace('instagram.com\/[a-zA-Z0-9]([\-\.\/\_]\w+)+', ' _URL', regex=True)

In [None]:
df

Unnamed: 0,tweet_id,date,text,month,tidy_text
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09,My lower back pain came out of no where tonight jeez _IMAGE
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch."
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09,"""We don't need bodyguard. We don't need coocker. We don't need house cleaner. We need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09,@AlexMLeo Rolfing saved me from back pain.
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09,Back pain and insomnia :woozy face: _IMAGE
...,...,...,...,...,...
530305,530305,2019-03-01,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…,2019-03,:dollar banknote::headphone:Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ :headphone::dollar banknote: _URL…
530306,530306,2019-03-01,@COMPLEXcsgo You still have back pain?,2019-03,@COMPLEXcsgo You still have back pain?
530307,530307,2019-03-01,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…,2019-03,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. _RETWEET…
530308,530308,2019-03-01,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…,2019-03,A short video to kick off this month's campaign on back pain _URL co- _URL…


In [None]:
!pip install langdetect
from langdetect import detect

def kfdetect(x):
  try:
    lang = detect(x)
  except:
    lang = 'NA'
  return lang




In [None]:
df['language'] = df['tidy_text'].map(lambda x:kfdetect(x))
df

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
0,0,2020-09-29,My lower back pain came out of no where tonight jeeeeeeeeez https://t.co/OHpzYJjY7p,2020-09,My lower back pain came out of no where tonight jeez _IMAGE,en
1,1,2020-09-29,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",2020-09,"@FruitKace For me it is my hips. Lower back pain = stretch, stretch, stretch.",en
2,2,2020-09-29,"""We don't need bodyguard.\nWe don't need coocker.\nWe don't need house cleaner.\n\nWe need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,2020-09,"""We don't need bodyguard. We don't need coocker. We don't need house cleaner. We need a fucking massager. Like, I'd literally do anything for some, people have no idea how painful it is to lay down and feel that back pain when all you want to do is to sleep. """,en
3,3,2020-09-29,@AlexMLeo Rolfing saved me from back pain.,2020-09,@AlexMLeo Rolfing saved me from back pain.,en
4,4,2020-09-29,Back pain and insomnia 🥴 https://t.co/3n0xEIkT21,2020-09,Back pain and insomnia :woozy face: _IMAGE,en
...,...,...,...,...,...,...
530305,530305,2019-03-01,💵🎧Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ 🎧💵 soundcloud.com/magnogarcia/ma…,2019-03,:dollar banknote::headphone:Magno Garcia x General Back Pain “Ballin On A Budget” Prod by ⁦@charliesdizz⁩ :headphone::dollar banknote: _URL…,en
530306,530306,2019-03-01,@COMPLEXcsgo You still have back pain?,2019-03,@COMPLEXcsgo You still have back pain?,en
530307,530307,2019-03-01,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. twitter.com/midnigtartist/…,2019-03,This is everything I ever wanted and more. My heart is healed. My acne is gone. My back pain has faded. I am at peace. _RETWEET…,en
530308,530308,2019-03-01,A short video to kick off this month's campaign on back pain bit.ly/2SyjZoI co-kinetic.com/twittervideo/2…,2019-03,A short video to kick off this month's campaign on back pain _URL co- _URL…,en


In [None]:
normal_tweet = df[df['language'] != 'NA']
len(normal_tweet)

530291

In [None]:
abnormal_tweet = df[df['language'] == 'NA']
abnormal_tweet

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
70168,70168,2020-08-13,https://tfclarkfitnessmagazine.jcom/the-top-3-exercises-for-lower-back-pain/,2020-08,https://tfclarkfitnessmagazine.jcom/the-top-3-exercises-for-lower-back-pain/,
75940,75940,2020-08-10,𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹\n𝗡𝗮𝗺𝗲 𝗼𝗳 𝗣𝗮𝘁𝗶𝗲𝗻𝘁: 𝗠𝗿𝘀 𝗦𝘂𝗻𝗶𝘁𝗮 𝗝𝗼𝘀𝗵𝗶\n𝗔𝗴𝗲: 𝟲𝟭 𝘆𝗲𝗮𝗿𝘀\n𝗗𝗶𝘀𝗼𝗿𝗱𝗲𝗿 𝗖𝘂𝗿𝗲𝗱: 𝗦𝘁𝗼𝗺𝗮𝗰𝗵 𝗜𝗻𝗳𝗲𝗰𝘁𝗶𝗼𝗻\n𝗟𝗶𝗻𝗸 𝗳𝗼𝗿 𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗩𝗶𝗱𝗲𝗼:youtu.be/HbjoY2nqMnM\n#backpain #kneepain #healthy https://t.co/zuixjJcNc4,2020-08,𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗡𝗮𝗺𝗲 𝗼𝗳 𝗣𝗮𝘁𝗶𝗲𝗻𝘁: 𝗠𝗿𝘀 𝗦𝘂𝗻𝗶𝘁𝗮 𝗝𝗼𝘀𝗵𝗶 𝗔𝗴𝗲: 𝟲𝟭 𝘆𝗲𝗮𝗿𝘀 𝗗𝗶𝘀𝗼𝗿𝗱𝗲𝗿 𝗖𝘂𝗿𝗲𝗱: 𝗦𝘁𝗼𝗺𝗮𝗰𝗵 𝗜𝗻𝗳𝗲𝗰𝘁𝗶𝗼𝗻 𝗟𝗶𝗻𝗸 𝗳𝗼𝗿 𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗩𝗶𝗱𝗲𝗼: _URL #backpain #kneepain #healthy _IMAGE,
91494,91494,2020-07-30,"@dushnivi @daughterislife සාමාන්‍යයෙන් ස්ථූල අයට, වැඩි වේලාවක් වාඩි වෙලා (පරිගණක ආශ්‍රිතව වගේ) වැඩ කරන අයට, ව්‍යායාම් අඩු අයට, mechanical low back pain එකක් ඇති වීමේ සම්භාවිතාවය අඩුයි.",2020-07,"@dushnivi @daughterislife සාමාන්‍යයෙන් ස්ථූල අයට, වැඩි වේලාවක් වාඩි වෙලා (පරිගණක ආශ්‍රිතව වගේ) වැඩ කරන අයට, ව්‍යායාම් අඩු අයට, mechanical low back pain එකක් ඇති වීමේ සම්භාවිතාවය අඩුයි.",
108846,108846,2020-07-20,#𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹\n𝗡𝗮𝗺𝗲 𝗼𝗳 𝗣𝗮𝘁𝗶𝗲𝗻𝘁: 𝗥𝗮𝗷𝘃𝗲𝗲𝗿 𝗚𝗮𝗱𝗲𝗸𝗮𝗿\n𝗔𝗴𝗲: 𝟮.𝟱 𝘆𝗲𝗮𝗿𝘀\n𝗗𝗶𝘀𝗼𝗿𝗱𝗲𝗿 𝗖𝘂𝗿𝗲𝗱: 𝗟𝗼𝘀𝘀 𝗼𝗳 𝗛𝗲𝗮𝗿𝗶𝗻𝗴\n𝗟𝗶𝗻𝗸 𝗳𝗼𝗿 𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗩𝗶𝗱𝗲𝗼 -youtu.be/NDMmTr6zhVw\n#Backpain #JointPain #ayurvedic https://t.co/yaUqCV8KtU,2020-07,#𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗡𝗮𝗺𝗲 𝗼𝗳 𝗣𝗮𝘁𝗶𝗲𝗻𝘁: 𝗥𝗮𝗷𝘃𝗲𝗲𝗿 𝗚𝗮𝗱𝗲𝗸𝗮𝗿 𝗔𝗴𝗲: 𝟮.𝟱 𝘆𝗲𝗮𝗿𝘀 𝗗𝗶𝘀𝗼𝗿𝗱𝗲𝗿 𝗖𝘂𝗿𝗲𝗱: 𝗟𝗼𝘀𝘀 𝗼𝗳 𝗛𝗲𝗮𝗿𝗶𝗻𝗴 𝗟𝗶𝗻𝗸 𝗳𝗼𝗿 𝗧𝗲𝘀𝘁𝗶𝗺𝗼𝗻𝗶𝗮𝗹 𝗩𝗶𝗱𝗲𝗼 - _URL #Backpain #JointPain #ayurvedic _IMAGE,
123564,123564,2020-07-11,Ｂａｃｋｐａｉｎ,2020-07,Ｂａｃｋｐａｉｎ,
128726,128726,2020-07-08,ʰᵉˡˡᵒ ᵇᵃᶜᵏ ᵖᵃⁱⁿ,2020-07,ʰᵉˡˡᵒ ᵇᵃᶜᵏ ᵖᵃⁱⁿ,
135206,135206,2020-07-04,ＢＡＣＫＰＡＩＮ,2020-07,ＢＡＣＫＰＡＩＮ,
273877,273877,2020-03-15,"Back pain? I don’t know her. ⁽ʰᵃʰᵃ ʰᵃʰᵃ ⁿᵒᵗ ʳᵉᵃˡˡʸ ᴵ ᵃᵐ ʷᵉˡˡ ᵃᵠᵘᵃᵗᶦⁿᵗᵉᵈ ᵃⁿᵈ ᵃᵐ ˢᵁᶠᶠᴱᴿᴵᴺᴳ,,, ᵖˡᵉᵃˢᵉ ᶠᵒʳ ᵗʰᵉ ˡᵒᵛᵉ ᵒᶠ ˢᵒᵘⁿᵈ ˢˡᵉᵉᵖ ᵃⁿᵈ ʸᵒᵘʳ ᴸᵁᴺᴳˢ ᵖʳᵃᶜᵗᶦᶜᵉ ˢᵃᶠᵉ ᵇᶦⁿᵈᶦⁿᵍ⁾ https://t.co/sSpxZjpR7Z",2020-03,"Back pain? I don’t know her. ⁽ʰᵃʰᵃ ʰᵃʰᵃ ⁿᵒᵗ ʳᵉᵃˡˡʸ ᴵ ᵃᵐ ʷᵉˡˡ ᵃᵠᵘᵃᵗᶦⁿᵗᵉᵈ ᵃⁿᵈ ᵃᵐ ˢᵁᶠᶠᴱᴿᴵᴺᴳ,, ᵖˡᵉᵃˢᵉ ᶠᵒʳ ᵗʰᵉ ˡᵒᵛᵉ ᵒᶠ ˢᵒᵘⁿᵈ ˢˡᵉᵉᵖ ᵃⁿᵈ ʸᵒᵘʳ ᴸᵁᴺᴳˢ ᵖʳᵃᶜᵗᶦᶜᵉ ˢᵃᶠᵉ ᵇᶦⁿᵈᶦⁿᵍ⁾ _IMAGE",
331091,331091,2019-08-25,【Ｂａｃｋ　Ｐａｉｎ】,2019-08,【Ｂａｃｋ Ｐａｉｎ】,
358836,358836,2019-07-30,▬▬▬.◙.▬▬▬ \n═▂▄▄▓▄▄▂ \n◢◤ █▀▀████▄▄▄▄◢◤ \n█▄ █ █▄ ███▀▀▀▀▀▀▀╬ \n◥█████◤ \n══╩══╩══\n╬═╬\n╬═╬\n╬═╬\n╬═╬ Dropping in to\n╬═╬ take away your \n╬═╬ back pain\n╬═╬ \n╬═╬☻/ \n╬═╬/▌ \n╬═╬/,2019-07,▬▬.◙.▬▬ ═▂▄▄▓▄▄▂ ◢◤ █▀▀██▄▄◢◤ █▄ █ █▄ ██▀▀╬ ◥██◤ ══╩══╩══ ╬═╬ ╬═╬ ╬═╬ ╬═╬ Dropping in to ╬═╬ take away your ╬═╬ back pain ╬═╬ ╬═╬☻/ ╬═╬/▌ ╬═╬/,


In [None]:
df[df['language'] == 'NA'].count()

tweet_id     19
date         19
text         19
month        19
tidy_text    19
language     19
dtype: int64

In [None]:
english_tweet = df[df['language'] == 'en']
len(english_tweet)

499868

In [None]:
non_english_tweet = df[df['language'] != 'en']
len(non_english_tweet)

30442

In [None]:
non_english_tweet

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
10,10,2020-09-29,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,2020-09,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,id
49,49,2020-09-29,Badtrip gyud kay ni akong maduf*kin back pain bay,2020-09,Badtrip gyud kay ni akong maduf*kin back pain bay,tl
59,59,2020-09-29,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,2020-09,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,ja
83,83,2020-09-29,maika's back = pain twitter.com/eli_miranduh/s…,2020-09,maika's back = pain _RETWEET…,id
87,87,2020-09-29,Pain. \n\nBack pain.,2020-09,Pain. Back pain.,id
...,...,...,...,...,...,...
530220,530220,2019-03-01,Back pain + Migraine 😭 1 linggo ng laging ganto ah.,2019-03,Back pain + Migraine :loudly crying face: 1 linggo ng laging ganto ah.,tl
530231,530231,2019-03-01,Di makabangon sa sobrang sakit. 😭 #Backpain,2019-03,Di makabangon sa sobrang sakit. :loudly crying face: #Backpain,tl
530235,530235,2019-03-01,Back pain!!!,2019-03,Back pain!!,id
530245,530245,2019-03-01,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。\n運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！\n\nhumantouchjapan.com/product/perfec…,2019-03,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。 運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！ _URL…,ja


In [None]:
# save non_english_tweets as a excel file

non_english_tweet.to_excel('non_english_tweets.xlsx', index=False)

In [None]:
back_pain_random_sample = english_tweet.sample(n=5000, random_state=1)
back_pain_random_sample.to_excel('back_pain_tweets.xlsx', index=False)

In [None]:
# Some more efforts on non_english tweets

non_english_tweet

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
10,10,2020-09-29,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,2020-09,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,id
49,49,2020-09-29,Badtrip gyud kay ni akong maduf*kin back pain bay,2020-09,Badtrip gyud kay ni akong maduf*kin back pain bay,tl
59,59,2020-09-29,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,2020-09,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,ja
83,83,2020-09-29,maika's back = pain twitter.com/eli_miranduh/s…,2020-09,maika's back = pain _RETWEET…,id
87,87,2020-09-29,Pain. \n\nBack pain.,2020-09,Pain. Back pain.,id
...,...,...,...,...,...,...
530220,530220,2019-03-01,Back pain + Migraine 😭 1 linggo ng laging ganto ah.,2019-03,Back pain + Migraine :loudly crying face: 1 linggo ng laging ganto ah.,tl
530231,530231,2019-03-01,Di makabangon sa sobrang sakit. 😭 #Backpain,2019-03,Di makabangon sa sobrang sakit. :loudly crying face: #Backpain,tl
530235,530235,2019-03-01,Back pain!!!,2019-03,Back pain!!,id
530245,530245,2019-03-01,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。\n運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！\n\nhumantouchjapan.com/product/perfec…,2019-03,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。 運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！ _URL…,ja


In [None]:
non_english_tweet = non_english_tweet.dropna(axis=0)

In [None]:
def CleanText(text):
    text = str(text)
    text = text.strip('\n')   #removing from the start and end
    text = text.replace('\n','') #in the middle of the string
    
    return text.lower()

In [None]:
non_english_tweet['tidy_text'] = non_english_tweet['tidy_text'].apply(lambda x: CleanText(x))

In [None]:
#non_english_tweet['tidy_text'] = non_english_tweet['tidy_text'].str.replace('back pain(?:|[$%!;+])', 'backpain...', regex=True)

In [None]:
non_english_tweet['tidy_text'] = non_english_tweet['tidy_text'].str.replace('@[^\s]+', '', regex=True)
non_english_tweet

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
10,10,2020-09-29,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,2020-09,hai nder. kamu ada low back pain kayanya ya? coba baca2 tentang itu. trus coba diperhatiin kamu ada tanda2 skolio ga.,id
49,49,2020-09-29,Badtrip gyud kay ni akong maduf*kin back pain bay,2020-09,badtrip gyud kay ni akong maduf*kin back pain bay,tl
59,59,2020-09-29,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,2020-09,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらrt,ja
83,83,2020-09-29,maika's back = pain twitter.com/eli_miranduh/s…,2020-09,maika's back = pain _retweet…,id
87,87,2020-09-29,Pain. \n\nBack pain.,2020-09,pain. back pain.,id
...,...,...,...,...,...,...
530220,530220,2019-03-01,Back pain + Migraine 😭 1 linggo ng laging ganto ah.,2019-03,back pain + migraine :loudly crying face: 1 linggo ng laging ganto ah.,tl
530231,530231,2019-03-01,Di makabangon sa sobrang sakit. 😭 #Backpain,2019-03,di makabangon sa sobrang sakit. :loudly crying face: #backpain,tl
530235,530235,2019-03-01,Back pain!!!,2019-03,back pain!!,id
530245,530245,2019-03-01,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。\n運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！\n\nhumantouchjapan.com/product/perfec…,2019-03,腰痛や背中、肩や首、back painにはパーフェクトチェア。 運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！ _url…,ja


In [None]:
# List of tweets abbreviations 

contractions = {
"afaik": "as far as i know","fr": "friend", "b/c": "because","bfn": "bye for now", "yo": "years old", "y/o": "years old", "iz": "is", "izz":"is", "br": "best regards", "btw": "by the way", 
"dm": "direct message", "em": "email", "fb": "facebook", "ftf": "face to face", "ftl": "for the loss",
"ftw": "for the win", "fwiw": "for what it is worth", "hth": "hope that helps", "imho": "in my humble opinion",
"imo": "in my opinion", "irl": "in real life", "jv": "joint venture", "j/k": "just kidding", "li": "linkedin",
"lmao": "laughing my ass off", "lmk": "let me know", "lol": "laughing out loud", "mt": "modified tweet",
"nsfw": "not safe for work", "oh": "overheard", "omfg": "oh my fuking god", "omg": "oh my god", "prt": "partial retweet",
"rthx": "thanks for the retweet", "sob": "son of a bitch", "tmb": "tweet me back", "tmi": "too much information",
"wth": "what the hell", "ymmv": "your mileage may vary", "yw": "you are welcome", "til": "today i learned",
"cx": "correction", "rtq": "read the question", "mf": "mother fucker", "im": "i am", "youre": "you are",
"can't": "cannot", "cant": "cannot", "could've": "could have", "couldn't": "could not", "couldnt": "could not",
"didn't": "did not", "doesn't": "does not", "don't": "do not", "didnt": "did not", "doesnt": "does not",
"dont": "do not", "hadn't": "had not", "hadnt": "had not", "hasn't": "has not", "haven't": "have not",
"hasnt": "has not", "havent": "have not", "he'll": "he will", "hell" : "he will", "how'd": "how did", 
"how'd'y": "how do you", "how'll": "how will", "how's": "how has / how is", "i'd": "I had / I would",
"i'd've": "I would have", "i'll": "I shall / I will", "i'm": "I am", "i've": "I have", "isn't": "is not", "isnt": "is not"
}

In [None]:
def ReplaceWord(text):

    for word in text.split():
        if word in contractions:
            text = text.replace(word, contractions[word.lower()])
            
    return text

In [None]:
non_english_tweet['tidy_text'] = non_english_tweet['tidy_text'].apply(lambda x: ReplaceWord(x))

In [None]:
non_english_tweet['language'] = non_english_tweet['tidy_text'].apply(lambda x:kfdetect(x))
non_english_tweet

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
10,10,2020-09-29,@QnA18MENFESS Hai nder. Kamu ada low back pain kayanya ya? Coba baca2 tentang itu. Trus coba diperhatiin kamu ada tanda2 skolio ga.,2020-09,hai nder. kamu ada low back pain kayanya ya? coba baca2 tentang itu. trus coba diperhatiin kamu ada tanda2 skolio ga.,id
49,49,2020-09-29,Badtrip gyud kay ni akong maduf*kin back pain bay,2020-09,badtrip gyud kay ni akong maduf*kin back pain bay,tl
59,59,2020-09-29,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらRT,2020-09,本当に結果だけが重要です。 いくら頑張っても受からなかったらうんこなんです。 #共感したらrt,ja
83,83,2020-09-29,maika's back = pain twitter.com/eli_miranduh/s…,2020-09,maika's back = pain _retweet…,en
87,87,2020-09-29,Pain. \n\nBack pain.,2020-09,pain. back pain.,id
...,...,...,...,...,...,...
530220,530220,2019-03-01,Back pain + Migraine 😭 1 linggo ng laging ganto ah.,2019-03,back pain + migraine :loudly crying face: 1 linggo ng laging ganto ah.,tl
530231,530231,2019-03-01,Di makabangon sa sobrang sakit. 😭 #Backpain,2019-03,di makabangon sa sobrang sakit. :loudly crying face: #backpain,tl
530235,530235,2019-03-01,Back pain!!!,2019-03,back pain!!,en
530245,530245,2019-03-01,腰痛や背中、肩や首、Back Painにはパーフェクトチェア。\n運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！\n\nhumantouchjapan.com/product/perfec…,2019-03,腰痛や背中、肩や首、back painにはパーフェクトチェア。 運動後のリカバリーにも最適、アメリカではオリンピックメダリストやトップアスリートが使用している高級リクライニングチェア！ _url…,ja


In [None]:
non_english = non_english_tweet[non_english_tweet['language'] != 'en']
len(non_english)

25538

In [None]:
english = non_english_tweet[non_english_tweet['language'] == 'en']
english

Unnamed: 0,tweet_id,date,text,month,tidy_text,language
83,83,2020-09-29,maika's back = pain twitter.com/eli_miranduh/s…,2020-09,maika's back = pain _retweet…,en
284,284,2020-09-29,@eishakhaliq FR HELP THE BACK PAIN TOO,2020-09,friend help the back pain too,en
402,402,2020-09-29,@proudlynalayak i just don't want back pain,2020-09,i just do not want back pain,en
570,570,2020-09-29,okay bACK PAIN IN YOUR AREA IT IS,2020-09,okay back pain in your area it is,en
576,576,2020-09-29,Backpain izz real🤧,2020-09,backpain is real:sneezing face:,en
...,...,...,...,...,...,...
529826,529826,2019-03-01,@OSCARanking A back pain.,2019-03,a back pain.,en
529878,529878,2019-03-01,@shyamMSDian07 @msdhoni 2007 nundi MS ki back pain start ayyndhi ala ani skip cheyakunda max matches aadadu.. On field lo unappudu okkasari ayna stretch chesthune untadu.. Ee injury lite repu aadathadu💪,2019-03,2007 nundi ms ki back pain start ayyndhi ala ani skip cheyakunda max matches aadadu.. on field lo unappudu okkasari ayna stretch chesthune untadu.. ee injury lite repu aadathadu:flexed biceps:,en
529963,529963,2019-03-01,Backpain https://t.co/FFjIdXQX4m,2019-03,backpain _image,en
530110,530110,2019-03-01,@salonpas No more back pain!!,2019-03,no more back pain!!,en


In [None]:
non_english.to_excel('non_english_tweets.xlsx', index=False)

In [None]:
english.to_excel('english_tweets.xlsx', index=False)