# In the Time of COVID: A text and sentiment analysis of individual stories in the time of the SARS-CoV-2 pandemic
#### by Diedre Brown, Pratt Institute, Spring 2021

Inspired by the Date Paintings of Japanese conceptual artist, On Kawara, and the (Süd-) Koreanischer Kalender / (South) Korean Calendar (1991) created by German conceptual artist, Hanne Darboven, this project used an online survey to collect pertinent dates and stories of thirteen (13) individuals during the COVID-19 pandemic (1 January 2020). The data was cleaned and analyzed via text and sentiment anlysis. As part of a prototype to visualize both quantitative and qualitative data, this analysis will contribute to an ongoing "data display" project that will allow users to not only trace the reported COVID cases and death tolls by date, but some personal sentiments and/or reflections of people as well.

### Import Libraries

In [1]:
# standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import os
import re

In [2]:
%matplotlib inline
#makes inline plots to have better quality
%config InlineBackend.figure_format = 'png'
#set the default style to be colorblind friendly
plt.style.use("seaborn-colorblind")

In [3]:
pd.set_option('mode.chained_assignment', None)

In [4]:
# Scikit Learn Libraries
import sklearn
sklearn.__version__>="0.20"

True

In [5]:
from sklearn import linear_model
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.linear_model import LinearRegression,ElasticNetCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB

In [6]:
# Natural Language Toolkit Library - NLTK
import nltk
nltk.download("words")
nltk.download("stopwords")

[nltk_data] Error loading words: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1123)>
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1123)>


False

In [7]:
pip install spacy

Defaulting to user installation because normal site-packages is not writeable
Collecting spacy
  Using cached spacy-3.0.5-cp39-cp39-macosx_10_9_x86_64.whl (12.2 MB)
Collecting thinc<8.1.0,>=8.0.2
  Using cached thinc-8.0.2-cp39-cp39-macosx_10_9_x86_64.whl (1.1 MB)
Collecting preshed<3.1.0,>=3.0.2
  Using cached preshed-3.0.5-cp39-cp39-macosx_10_9_x86_64.whl (106 kB)
Collecting cymem<2.1.0,>=2.0.2
  Using cached cymem-2.0.5-cp39-cp39-macosx_10_9_x86_64.whl (32 kB)
Collecting blis<0.8.0,>=0.4.0
  Using cached blis-0.7.4-cp39-cp39-macosx_10_9_x86_64.whl (5.8 MB)
Collecting wasabi<1.1.0,>=0.8.1
  Using cached wasabi-0.8.2-py3-none-any.whl (23 kB)
Collecting pydantic<1.8.0,>=1.7.1
  Using cached pydantic-1.7.3-cp39-cp39-macosx_10_9_x86_64.whl (2.4 MB)
Collecting typer<0.4.0,>=0.3.0
  Using cached typer-0.3.2-py3-none-any.whl (21 kB)
Collecting spacy-legacy<3.1.0,>=3.0.0
  Using cached spacy_legacy-3.0.2-py2.py3-none-any.whl (7.8 kB)
Collecting catalogue<2.1.0,>=2.0.1
  Using cached catalogu

In [8]:
# spaCy another Natural Language Processing library built for Python/Cython
# spaCy has the fastest syntactic parser, which helps increase it's accuracy over nltk
import string
import spacy


In [9]:
# import the logging library to expose the interface that the application code directly uses.
import logging
# logging levels
# CRITICAL (50) - A serious error, indicating that the program itself may be unable to continue running.
# ERROR (40) - Due to a more serious problem, the software has not been able to perform some function.
# WARNING (30) - An indication that something unexpected happened, or indicative of some problem in the near future (e.g. 'disk space low'). The software is still working as expected.
# INFO (20) - Confirmation that things are working as expected.
# DEBUG (10) - Detailed information, typically of interest only when diagnosing problems.
# NOTSET (0) -

In [10]:
logFormatter = '%(asctime)s - %(levelname)s - %(message)s' # logging formatted as time, level name, and message
logging.basicConfig(format=logFormatter, level=logging.INFO) # sets the default logging level, and the log formatting
logger = logging.getLogger(__name__) # 
# run first log
logger.info("initial log")



2021-04-18 11:30:48,137 - INFO - initial log


### Load Data

In [12]:
df_survey_raw = pd.read_csv("/Users/diedrebrown/Desktop/GitHubRepos/inthetimeofcovid/data/ITCResponses-clean.csv")

In [13]:
df_survey_raw.head()

Unnamed: 0,T,P,Consent,Age,Gender,CDate,CLocation,Mem,3Wrds,ColorB,ColorHex,ColorRes,Comment
0,3/12/21 16:56,A,Yes,25-40,Female,3/11/20,"Surrey, British Columbia, Canada",I recall having a team meeting at work with lo...,"Fun, overwhelming, tiring",I have no color deficiency.,NONE,NONE,
1,3/13/21 16:45,B,Yes,25-40,Female,1/24/20,"Brooklyn, NY, USA",A lot of things came to my mind. It was hard t...,"Chinese, friendly, funny",I have no color deficiency.,CC0000,Red is the official color of Chinese New Year:),🥸 good luck with the study
2,3/14/21 11:55,C,Yes,41-55,Female,3/13/20,"Brooklyn, NY, USA",It was the day that I learned that public scho...,"disbelief, confusion, concern",I have no color deficiency.,660000,"Red reminds me of school, and concern",Curious about color being associated with feel...
3,3/14/21 12:44,D,Yes,25-40,Female,3/30/20,"Radford, VA, USA",My Grandmother passed away and we couldn’t gri...,"Angry, Sunken, bitter",I have no color deficiency.,330066,It feels like a somber color,
4,3/17/21 03:33,E,Yes,25-40,Male,3/8/20,"Newark, NJ, USA",I hope this doesn’t feel like too ‘silly’ of a...,"Exhilarating, overindulgent, inspiring",I have no color deficiency.,CC0000,"The concert design largely used this color, so...","Again, this feels like such a silly answer giv..."


In [15]:
df_survey_raw.shape

(14, 13)