# Text Processing: Covid19 Positive
### Author: Ehsan Gharib-Nezhad


<!-- Let's review some of the pre-processing steps for text data:

- Remove special characters
- Tokenizing
- Lemmatizing/Stemming
- Stop word removal

`CountVectorizer` actually can do a lot of this for us! It is important to keep these steps in mind in case you want to change the default methods used for each of these. -->

In [1]:
# Load Libraries
from myfunctions import *
from bs4 import BeautifulSoup #Function for removing html
from nltk.stem.porter import PorterStemmer
from nltk.tokenize import sent_tokenize, word_tokenize, RegexpTokenizer


In [2]:
# Load datasets
df = pd.read_csv('../datasets/preprocessed_covid19positive_reddit_LAST.csv',index_col=0)

In [3]:
df[df[]].head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49


### Data shape

In [4]:
df.shape

(30815, 9)

In [5]:
df['selftext'].str.len().sum()

23208602

### Drop rows with selftext equal '[removed]'

In [6]:
# percentage of rows with "[removed]" word
print(f"percentage of rows with '[removed]' word: \
      {np.round(len(df[df['selftext']=='[removed]'])*100/len(df),2)}%")

percentage of rows with '[removed]' word:       0.0%


In [7]:
# remove all rows with selftext = "[removed]"
df.drop(index=df[df['selftext']=='[removed]'].index, inplace=True)

In [8]:
df.reset_index(drop=True, inplace=True)

In [9]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49


### Drop rows with nan in the selftext

In [10]:
# null percentage
df.isnull().sum()*100/len(df)

title           0.0
selftext        0.0
subreddit       0.0
created_utc     0.0
author          0.0
num_comments    0.0
score           0.0
is_self         0.0
timestamp       0.0
dtype: float64

In [11]:
#drop all rows with nulls
df.dropna(inplace=True)

In [12]:
# resetting the index
df.reset_index(inplace=True, drop = True)

In [13]:
# check for any remained nulls ?!
df.isna().sum()

title           0
selftext        0
subreddit       0
created_utc     0
author          0
num_comments    0
score           0
is_self         0
timestamp       0
dtype: int64

In [14]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49


### Lower Casing

In [15]:
df['post']  = df['selftext'].str.lower()

In [16]:
df['post']

0                                                                                                                                                                                                                                                                                                                                  for those that have tested positive i hope every single one of you feels better soon!
1                                                                                                                                                               i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to go out in case i do have it and i give it to other people. are your symptoms debilitating?
2                       i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to 

### Remove URL's / Website address

In [17]:
# Function for url's
def remove_urls(text):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    return url_pattern.sub(r'', text)

In [18]:
df['post'] = df['post'].map( remove_urls )

### Removing special characters

In [19]:
df['post'] = df['post']\
                        .replace('http\S+', '', regex=True)\
                        .replace('www\S+', '', regex=True)\
                        .replace('\n\n\S+', '', regex=True)\
                        .replace('\n', '', regex=True)\
                        .replace('\*', '', regex=True)

In [20]:
df['post']

0                                                                                                                                                                                                                                                                                                                                  for those that have tested positive i hope every single one of you feels better soon!
1                                                                                                                                                               i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to go out in case i do have it and i give it to other people. are your symptoms debilitating?
2                       i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to 

In [33]:
df['post'].str.len().sum()

22779514

### Find/Count emoji

In [22]:
import demoji

In [23]:
def find_emoji(dataframe, 
               print_option = False):
    if print_option == True:
        print ( dataframe[dataframe.map(demoji.findall) != {}])
    return (dataframe.map(demoji.findall) != {}).sum()

In [24]:
find_emoji(df['post'])

847

### Remove emoji

In [25]:
def remove_emoji(dataframe):
    return dataframe.map(demoji.replace)

In [26]:
df['post'] = remove_emoji(df['post'])

### Convert emoji to text
All emojis are removed fot the first part of the project which is distingushing two sub-redits. 
However, emojis are converted to text for sentiment analysis.

In [27]:
import emoji
def convert_emoji_to_text(text):
    return emoji.demojize(text)

In [28]:
# df['selftext'].iloc[0:10].map(convert_emoji_to_text)

### Removal of HTML tags

In [29]:
from bs4 import BeautifulSoup #Function for removing html

In [30]:
def remove_html(text):
    return BeautifulSoup(text, "lxml").text

In [31]:
df['post'] = df['post'].map(remove_html)

In [32]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp,post
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12,for those that have tested positive i hope every single one of you feels better soon!
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28,"i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to go out in case i do have it and i give it to other people. are your symptoms debilitating?"
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17,"i have no idea if i've got the coronavirus, or it's just a cold etc. i've got a runny nose and coughing quite a lot, and a bit of a headache. i don't want to go out in case i do have it and i give it to other people. my symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so i wasn't able to sleep. are your symptoms debilitating?"
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01,"i live in a canadian province with only 3 presumed cases, all related to travel. i haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. however, 11 days ago i was interacting with someone who travelled from the us and had been travelling in international airports. 21f and feel like i’m experiencing shortness of breath, but unsure because ..."
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49,"yesterday i woke up and noticed i had a shortness of breath a few hours after waking up. at work i went home early because i felt weak/tired. morning, i don’t feel as weak anymore, but i still have the shortness of breath and slight chest pain. could this be allergies? did any of you have these symptoms? called my doctors and they will not test me unless i have a cough with a fever and shortn..."


### Replace all non-letters with space

In [34]:
def replace_all_non_letters_with_space(text):
    return re.sub("[^a-zA-Z]",  # Search for all non-letters
                          " ",          # Replace all non-letters with spaces
                          str(text))

In [35]:
df['post'] = df['post'].map(replace_all_non_letters_with_space)

In [36]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp,post
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12,for those that have tested positive i hope every single one of you feels better soon
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28,i have no idea if i ve got the coronavirus or it s just a cold etc i ve got a runny nose and coughing quite a lot and a bit of a headache i don t want to go out in case i do have it and i give it to other people are your symptoms debilitating
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17,i have no idea if i ve got the coronavirus or it s just a cold etc i ve got a runny nose and coughing quite a lot and a bit of a headache i don t want to go out in case i do have it and i give it to other people my symptoms feel practically the same as a normal cold flu other than last night where my eyes hurt a bit so i wasn t able to sleep are your symptoms debilitating
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01,i live in a canadian province with only presumed cases all related to travel i haven t travelled outside my city and haven t been in contact with anyone who has a known case of covid however days ago i was interacting with someone who travelled from the us and had been travelling in international airports f and feel like i m experiencing shortness of breath but unsure because ...
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49,yesterday i woke up and noticed i had a shortness of breath a few hours after waking up at work i went home early because i felt weak tired morning i don t feel as weak anymore but i still have the shortness of breath and slight chest pain could this be allergies did any of you have these symptoms called my doctors and they will not test me unless i have a cough with a fever and shortn...


### Remove Stop Words

In [37]:
def remove_stop_words(dataFrame):
    return [token for token in dataFrame if token not in stopwords.words('english')]

In [38]:
#Importing stopwords from nltk library
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))# Function to remove the stopwords
def stopwords(text):
    return " ".join([word for word in str(text).split() if word not in STOPWORDS])# Applying the stopwords to 'text_punct' and store into 'text_stop'


df["post"] = df["post"].apply(stopwords)

In [39]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp,post
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12,tested positive hope every single one feels better soon
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms debilitating
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms feel practically normal cold flu last night eyes hurt bit able sleep symptoms debilitating
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01,live canadian province presumed cases related travel travelled outside city contact anyone known case covid however days ago interacting someone travelled us travelling international airports f feel like experiencing shortness breath unsure never felt way felt slight pressure tightness right side chest feel like get much air lungs usual pains sharp feelings yet fever cough headache symptom rea...
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49,yesterday woke noticed shortness breath hours waking work went home early felt weak tired morning feel weak anymore still shortness breath slight chest pain could allergies symptoms called doctors test unless cough fever shortness breath cough fever


### Spelling Correction

In [40]:
def compare(corrected_text, original_text):  
    
    l1 = list(corrected_text)
    l2 = list(original_text)
#     print(l1)
    l1_ = [line.split(' ') for line in l1][0]
    l2_ = [line.split(' ')for line in l2][0]
#     print(l1)
    good = 0
    bad = 0
    for i in range(0, len(l1)):
        if l1_[i] != l2_[i]:
            bad += 1
            print(l1_[i] , l2_[i])
        else:
            good += 1
    print(f'Number of accurate words are= {good},\
          \nNumber of corrected words= {bad},\
          \nCorrection Percentage={np.round(bad*100/(len(l1)), 1)}%')


In [41]:
def correct_spell(original_text_df):
    
    return original_text_df.apply(lambda x: str(TextBlob(x).correct()))   # Correcting the text
    

In [42]:
# df['post'] = correct_spell(original_text_df=df['post'])

In [43]:
df.head()

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp,post
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12,tested positive hope every single one feels better soon
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms debilitating
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms feel practically normal cold flu last night eyes hurt bit able sleep symptoms debilitating
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01,live canadian province presumed cases related travel travelled outside city contact anyone known case covid however days ago interacting someone travelled us travelling international airports f feel like experiencing shortness breath unsure never felt way felt slight pressure tightness right side chest feel like get much air lungs usual pains sharp feelings yet fever cough headache symptom rea...
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49,yesterday woke noticed shortness breath hours waking work went home early felt weak tired morning feel weak anymore still shortness breath slight chest pain could allergies symptoms called doctors test unless cough fever shortness breath cough fever


# Stemmizing
When we "stem" data, we take words and attempt to return a base form of the word. It tends to be cruder than using lemmatization.

In [44]:
Pstemmizer = PorterStemmer()

In [45]:
def make_token(post):
    tokenizer = RegexpTokenizer(r'\w+') # remove the punctuation 
    post_tokens = tokenizer.tokenize(post)
    stem_spam = [Pstemmizer.stem(token) for token in post_tokens]
    return (' '.join(stem_spam))
    

In [46]:
df['token'] = list(map(make_token,df['post']))

In [47]:
df

Unnamed: 0,title,selftext,subreddit,created_utc,author,num_comments,score,is_self,timestamp,post,token
0,I am constantly seeing people asking about people’s experiences so I figured now we can go to one place and stop spamming the AMA and ask Reddit’s subs.,For those that have tested positive I hope every single one of you feels better soon!,COVID19positive,1584148032,the1andonlyjoja,191,1,True,2020-03-13 18:07:12,tested positive hope every single one feels better soon,test posit hope everi singl one feel better soon
1,How severe are your guys' symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. Are your symptoms debilitating?",COVID19positive,1584358828,RocketFrasier,0,1,True,2020-03-16 04:40:28,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms debilitating,idea got coronaviru cold etc got runni nose cough quit lot bit headach want go case give peopl symptom debilit
2,How severe are your symptoms?,"I have no idea if I've got the coronavirus, or it's just a cold etc. I've got a runny nose and coughing quite a lot, and a bit of a headache. I don't want to go out in case I do have it and I give it to other people. My symptoms feel practically the same as a normal cold/flu, other than last night where my eyes hurt a bit, so I wasn't able to sleep. Are your symptoms debilitating?",COVID19positive,1584358937,RocketFrasier,42,1,True,2020-03-16 04:42:17,idea got coronavirus cold etc got runny nose coughing quite lot bit headache want go case give people symptoms feel practically normal cold flu last night eyes hurt bit able sleep symptoms debilitating,idea got coronaviru cold etc got runni nose cough quit lot bit headach want go case give peopl symptom feel practic normal cold flu last night eye hurt bit abl sleep symptom debilit
3,Shortness of breath as first symptom?,"I live in a Canadian province with only 3 presumed cases, all related to travel. I haven’t travelled outside my city, and haven’t been in contact with anyone who has a known case of covid19. However, 11 days ago I was interacting with someone who travelled from the US and had been travelling in international airports. \n\nI’m 21F and feel like I’m experiencing shortness of breath, but unsure b...",COVID19positive,1584375181,_haligirl98_,77,1,True,2020-03-16 09:13:01,live canadian province presumed cases related travel travelled outside city contact anyone known case covid however days ago interacting someone travelled us travelling international airports f feel like experiencing shortness breath unsure never felt way felt slight pressure tightness right side chest feel like get much air lungs usual pains sharp feelings yet fever cough headache symptom rea...,live canadian provinc presum case relat travel travel outsid citi contact anyon known case covid howev day ago interact someon travel us travel intern airport f feel like experienc short breath unsur never felt way felt slight pressur tight right side chest feel like get much air lung usual pain sharp feel yet fever cough headach symptom read stori peopl symptom seem start least fever cough sh...
4,"I need advice, please!","Yesterday I woke up and noticed I had a shortness of breath a few hours after waking up. At work I went home early because I felt weak/tired. \n\nThis morning, I don’t feel as weak anymore, but I still have the shortness of breath and slight chest pain. Could this be allergies? Did any of you have these symptoms?\n\nI called my doctors and they will not test me unless I have a cough with a fev...",COVID19positive,1584382909,monkcell,11,2,True,2020-03-16 11:21:49,yesterday woke noticed shortness breath hours waking work went home early felt weak tired morning feel weak anymore still shortness breath slight chest pain could allergies symptoms called doctors test unless cough fever shortness breath cough fever,yesterday woke notic short breath hour wake work went home earli felt weak tire morn feel weak anymor still short breath slight chest pain could allergi symptom call doctor test unless cough fever short breath cough fever
...,...,...,...,...,...,...,...,...,...,...,...
30810,"Just tested positive for COVID fully vaccinated, when my isolation is over am I able to see my boyfriend who tested positive a few days later than me and will still be in isolation?",I am not sure how this all works. I tested positive Sunday for COVID despite being fully vaccinated. My boyfriend started showing symptoms later and tested positive Tuesday. Am I able to see him when my isolation is over and he is still in isolation or should I wait until he is out?,COVID19positive,1631227446,bunnygirl1716,7,1,True,2021-09-09 15:44:06,sure works tested positive sunday covid despite fully vaccinated boyfriend started showing symptoms later tested positive tuesday able see isolation still isolation wait,sure work test posit sunday covid despit fulli vaccin boyfriend start show symptom later test posit tuesday abl see isol still isol wait
30811,"Has anyone tried taking cough syrup/Buckleys with Covid, and their body just throwing it back up?","Back in 2019, December 27 to be exact, I was the sickest I had ever been in my entire life. My chest hurt, it was hard to breathe, I coughed so much I popped blood vessels in my eyes, and I had no sense of smell. Completely bed ridden for 2 days, needing help to walk to the bathroom, sleeping 20h of the day.\n\nI was 25.\n\nThere's a part of me that thinks it was Covid, but seeing how it was i...",COVID19positive,1631227752,PsydemonCat,5,1,True,2021-09-09 15:49:12,back december exact sickest ever entire life chest hurt hard breathe coughed much popped blood vessels eyes sense smell completely bed ridden days needing help walk bathroom sleeping h day part thinks covid seeing december world tells thing also lived heavy tourism city especially china typical tourists end day wondering whenever tried taking buckleys kind cough syrup body would puke minute la...,back decemb exact sickest ever entir life chest hurt hard breath cough much pop blood vessel eye sens smell complet bed ridden day need help walk bathroom sleep h day part think covid see decemb world tell thing also live heavi tourism citi especi china typic tourist end day wonder whenev tri take buckley kind cough syrup bodi would puke minut later anyon els answer
30812,Alcohol after covid,"Just tried having a drink for the first time after recovering from covid about a month ago. Had food with the alcohol, but it seems to be hitting harder than normal. Curious what others have experienced.",COVID19positive,1631229654,waster02,12,1,True,2021-09-09 16:20:54,tried drink first time recovering covid month ago food alcohol seems hitting harder normal curious others experienced,tri drink first time recov covid month ago food alcohol seem hit harder normal curiou other experienc
30813,Covid,Can you get reinfected again with covid after recently recovering from covid?My sister tested negative for covid like in the first or second week of August after dealing with it and her friend was tested positive today. She’s been spending time with her at the gym and at work since they both work together. My mom is still recovering from pneumonia but we were all tested negative a couple of we...,COVID19positive,1631230273,Huge_Commercial_9976,7,1,True,2021-09-09 16:31:13,get reinfected covid recently recovering covid sister tested negative covid like first second week august dealing friend tested positive today spending time gym work since work together mom still recovering pneumonia tested negative couple weeks ago,get reinfect covid recent recov covid sister test neg covid like first second week august deal friend test posit today spend time gym work sinc work togeth mom still recov pneumonia test neg coupl week ago


In [49]:
df['post'].str.len().sum(),df['token'].str.len().sum()

(13885968, 12191584)

In [97]:
## save text processed token as cvs

In [98]:
# save processed data to be used for distingushing
df.to_csv('../datasets/text_processed_covid19positive.csv')

# reduce the time from March 2020 to March 2021

In [99]:
df[ df['timestamp'] < '2021-03-09 16:20:54' ]['timestamp']

0        2020-03-13 18:07:12
1        2020-03-16 04:40:28
2        2020-03-16 04:42:17
3        2020-03-16 09:13:01
4        2020-03-16 11:21:49
                ...         
20613    2021-03-09 15:14:00
20614    2021-03-09 15:43:36
20615    2021-03-09 16:00:08
20616    2021-03-09 16:08:21
20617    2021-03-09 16:10:00
Name: timestamp, Length: 20618, dtype: object

## Save text-processed doc

check nulls for the last time

In [100]:
df.isnull().sum()

title           0
selftext        0
subreddit       0
created_utc     0
author          0
num_comments    0
score           0
is_self         0
timestamp       0
post            0
token           0
dtype: int64

In [101]:
df_limitedTime = df[ df['timestamp'] < '2021-03-09 16:20:54' ]

In [102]:
df.to_csv('../datasets/text_processed_covid19positive_Mar2020_Mar2021.csv')