Multi-modal-Emotion-Analysis-on-COVID-19

An Emotion Care Model using Multimodal Textual Analysis on COVID-19

A novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

Demonstration of Project:

Link for Research Paper: https://www.sciencedirect.com/science/article/abs/pii/S0960077921000618

Contributed in other researches: https://scholar.google.com/citations?user=73b_WZcAAAAJ&hl=en

Link to the EmotionofIndia.com : http://emotionofindia.herokuapp.com/#

Link to Download Complete Dataset: https://github.com/Piyush2912/Twitter_dataset

How to use?

Download the required files into a directory if your choice.
Open required codes in Jupyter Notebook.
Install the dependencies as mentioned in code.
Execute the code block by block.
Get visualized results.
Done

1. Abstract

At the dawn of the year 2020, the world was hit by a significant pandemic COVID-19, that traumatized the entire planet.
The infectious spread grew in leaps and bounds and forced the policymakers and governments to move towards lockdown.
The lockdown further compelled people to stay under house arrest, which further resulted in an outbreak of emotions on social media platforms.
Perceiving people's emotional state during these times becomes critically and strategically important for the government and the policymakers.
In this regard, a novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

2. Motivation

In the rapid developing world, with increasing in technology so is increasing diseases such as Covid-19.
To prevent the spread of disease, it is absolutely necessary to analyze and act correctly.
There is a need for perceiving people's emotional state during these times becomes critically and strategically important for the government and the policymakers.

3. Problem Statement

The goal is to identify the emotions of people living in different states of India during the phases of lockdown.
There is availability of huge amount of data in Twitter which needs to be analyzed.
There is an absolute need for classification into categories in order to analyze efficiently.
In order to categories, there is a need to develop an effective algorithm for investigation of data.

4. Introduction

This project studies 8-scale emotions (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) over multiple categories such as nature, lockdown, health, education, market, and politics.
This is the first of its kind linguistic analysis on multiple modes pertaining to the pandemic to the best of our understanding.
An interactive internet application has also been developed for the same.

Figure 1: Web Application

5. Requirements

Jupyter Notebook version 6.1 or above
Python version 3.8 or above
NRC Emotion Lexicons - http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.html
Python Libraries Used:
- numpy https://numpy.org/doc/
- pandas https://pandas.pydata.org/docs/
- string https://docs.python.org/3/library/string.html
- matplotlib https://matplotlib.org/stable/users/index.html
- re https://docs.python.org/3/library/re.html
- nltk https://www.nltk.org/api/nltk.html
- wordcloud https://pypi.org/project/wordcloud/
- stopwords https://pypi.org/project/stop-words/
- tweepy https://docs.tweepy.org/en/stable/index.html
- seaborn https://seaborn.pydata.org/tutorial.html

6. Dataset Creation

Retrieving data from Twitter using Twitter API called Tweepy library available in Python.

Dataset : https://github.com/Piyush2912/Twitter_dataset

Dataset Description

Figure 2: Dataset First 5 rows

Figure 3: Dataset Last 5 rows

Figure 4: Word Cloud of tweeted words

The data set consist of 8,14,887 total tweets from all around India.
The following figure 2 shows the first 5 rows of the dataset.
The following figure 3 shows the last 5 rows of the dataset.
The figure 4 represents most used words in tweets from complete dataset.
From the following figures the dataset description is described as:
- "S.No." depicting the number of tweet in incresing order
- "Tweet Posted Time (UTC)" depicting the time of posting of that particular tweet.
- "tweets" depicting textual content of that tweet.
- "Tweet Location" depicting the location from which the tweet has been posted.

7. Generic Methodology

Figure 5: Data Pipeline

The following figure 5 represents sequential steps performed in order to reach to end goal.
Firstly, the twitter data is scraped from Twitter using Twitter streaming API.
Secondly, the dataset is curated and made in a csv format for each state.
Thirdly, the tweets from the data is cleaned using basic nlp operations like:
- converting text into lowercase
- removing user mentions
- removing re-tweets
- removing special characters except [a-zA-Z]
- removing hyperlink starts with https
- removing punctuations
- removing stopwords
After that, POS(Part-of-speech) tagging, also called grammatical tagging the process of marking up a word in a text (corpus) is applied on the data.
After that, Multimodal vectors are created using tuples of aspect category which stores aspect terms.
After that, An algorithm is developed using NRC emotion lexicon and data from each states to create a Multimodal emotion scoring system using word count frequency.
Lastly, the results of India state-wise emotion plots are analyzed using different visualization techniques such as bar plots, doughnut plots and bubble charts.

8. Developed Algorithm:

This algorithm is used for calculating the score of each multimodal category.
In this algorithm, ‘T’ is the list of ‘t’ tweets, ‘Ct’ is a list obtained after cleaning of t tweets, ‘w’ is a set of tokenized words from ‘t’ tweets, ‘At’ is a set of multimodal terms and ‘csv’ is the two dimensional data frame which after processing forms different category wise tweets.
The clean function is used for cleaning of tweets.
The category wise function is used for separating tweets on the basis of multimodal terms.
The POS tagging function is used for each category to determine adjectives and adverbs in ‘t’ tweets.
The lexicon function is used for generating a list of common words from POS tagged words and lexicon words.
The count frequency function is used for generating a dictionary where key is word and value is frequency of word in ‘tw’ list of tweets.
The scoring system function is used for calculating the emotion score for each multimodal category

9. Results

I. Analysis from bar plots:

The emotion of population from different states of India is analyzed, and Maharashtra can be clearly seen at the top of every emotion score followed by NCT of Delhi and Karnataka.
These states were worst hit by the pandemic which led to a sudden and massive rise in corona virus cases.
These could be the reason for higher fear, anger ,disgust and sadness score in these states.
While at the same time they have also seen immense growth in the number of recovered patients that rises the level of trust and joy among the people.
This same trend is followed in other states like Tamil Nadu, West Bengal, Andhra Pradesh too.
The Indian State-wise 8 Emotion (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) bar plots are as follows:

II. Analysis from doughnut plots:

The people have mostly been very patient during the lockdown period and have contributed their bit to stop the spread chain of COVID-19.
They have shown trust in the governmental policies and decisions as well as on their fellow citizens.
Even after all this, due to the rapid spread of the virus, people have been quite fearful about catching infection.
The rising death toll can be another cause of distress and sadness.
Most people have been fearful and unsure about the outcomes of lockdown.
Unemployment, working from home, falling economy and losing close ones has taken a toll on people’s emotional health.
Despite all the shortcomings, people have been trying to look at the bright side of things.
They have widely trusted the capabilities of teachers and schools to cope up with the changing education system.
Reduction in pollution levels has been another cause of contentment amongst people.
There has been a lot of anticipation regarding the final outcomes of lockdown and its impact on the economy.
People have been looking forward to the governmental policies towards the unlocking procedure.
The 6 Multimodal Emotion doughbut plots are as follows:

III. Analysis from Bubble charts:

Figure 7 depicts mode-wise emotion distribution during the lockdown period.
COVID-19 has clearly, not just taken away jobs and the economy but has also affected people’s mental health adversely.
Living away from family trapped in foreign lands, losing loved ones, and not attending their funerals, wearing masks every time you step out is not easy to adapt to.
The new normal is definitely not normal for people.
The Emotion intensity bubble plots pertaining to 6 modes are as follows:

10. Summary and Conclusion

In this project, we present an emotion care scheme web-based platform to recognize the emotional state of Indian citizens throughout the ongoing COVID-19 crisis.
Taking India as a case study, we inferred from this textual analysis that ‘joy’ has been lesser towards everything (~9-15%) but nature (~17%) due to the apparent fact of lessened pollution.
The education system entailed more trust (~29%) due to teachers' fraternity's consistent efforts.
The health sector witnessed sadness (~16%) and fear (~18%) as the dominant emotions among the masses as human lives were at stake.
With the help of this research, health organizations and higher authorities will be able to have a better insight into the emotional health of people and will also be able to interpret the way people react to various day-to-day decisions.
Additionally, the state-wise and emotion-wise depiction is also provided.
An interactive internet application has also been developed for the same.

11. Limitations/ Challenges faced during the project

The collection of the labeled dataset was a problem because of the unavailability of the properly labeled dataset.
The proposed scheme currently works only on Twitter data.
The model is fully functional but it cannot be used for other languages.
The number of emotions considered in this project are 8-scale emotions (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) only.

12. Future Scope

The proposed scheme is scalable if data from different social media platforms is incorporated.
The model is fully functional but its horizon can be widened by including different languages.
The number of emotions considered can be increased to have an even more fine-grained analysis.
Usage of deep learning approaches might also fine-tune the current scheme.
To investigate new features to improve the existing model.

13. Credits:

Thanking my project mentors and teammates for caring and supporting me wholeheartedly. The role you played in my life is invaluable. I’m grateful for all of your help and continued support.

Mentor : Dr. Vedika Gupta : https://www.linkedin.com/in/drvedikagupta/

Team Mate : Adarsh Kumar : https://www.linkedin.com/in/adarsh-kumar-5b1a1719b/

Team Mate : Rohan Arora : https://www.linkedin.com/in/rohanarora18/

Team Mate : Shreya Dhingra : https://www.linkedin.com/in/shreya-dhingra-927b19190

14. License:

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Code		Code
Dataset		Dataset
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation