Skip to content

A novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

License

Notifications You must be signed in to change notification settings

Piyush2912/Multi-modal-Emotion-Analysis-on-COVID-19

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multi-modal-Emotion-Analysis-on-COVID-19

An Emotion Care Model using Multimodal Textual Analysis on COVID-19

A novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

Demonstration of Project:

Link to the EmotionofIndia.com : http://emotionofindia.herokuapp.com/#

Link to Download Complete Dataset: https://github.com/Piyush2912/Twitter_dataset

How to use?

  1. Download the required files into a directory if your choice.
  2. Open required codes in Jupyter Notebook.
  3. Install the dependencies as mentioned in code.
  4. Execute the code block by block.
  5. Get visualized results.
  6. Done

Table of Contents:

  1. Abstract
  2. Motivation
  3. Problem Statement
  4. Introduction
  5. Requirements
  6. Dataset Creation
  7. Generic Methodology
  8. Developed Algorithm
  9. Results
  10. Summary and Conclusion
  11. Limitations
  12. Future Scope
  13. Credits
  14. License

1. Abstract

  • At the dawn of the year 2020, the world was hit by a significant pandemic COVID-19, that traumatized the entire planet.
  • The infectious spread grew in leaps and bounds and forced the policymakers and governments to move towards lockdown.
  • The lockdown further compelled people to stay under house arrest, which further resulted in an outbreak of emotions on social media platforms.
  • Perceiving people's emotional state during these times becomes critically and strategically important for the government and the policymakers.
  • In this regard, a novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

2. Motivation

  • In the rapid developing world, with increasing in technology so is increasing diseases such as Covid-19.
  • To prevent the spread of disease, it is absolutely necessary to analyze and act correctly.
  • There is a need for perceiving people's emotional state during these times becomes critically and strategically important for the government and the policymakers.

3. Problem Statement

  • The goal is to identify the emotions of people living in different states of India during the phases of lockdown.
  • There is availability of huge amount of data in Twitter which needs to be analyzed.
  • There is an absolute need for classification into categories in order to analyze efficiently.
  • In order to categories, there is a need to develop an effective algorithm for investigation of data.

4. Introduction

  • This project studies 8-scale emotions (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) over multiple categories such as nature, lockdown, health, education, market, and politics.
  • This is the first of its kind linguistic analysis on multiple modes pertaining to the pandemic to the best of our understanding.
  • An interactive internet application has also been developed for the same.

Figure 1: Web Application

5. Requirements

6. Dataset Creation

Retrieving data from Twitter using Twitter API called Tweepy library available in Python.

Dataset Description

Figure 2: Dataset First 5 rows

Figure 3: Dataset Last 5 rows

Figure 4: Word Cloud of tweeted words

  • The data set consist of 8,14,887 total tweets from all around India.
  • The following figure 2 shows the first 5 rows of the dataset.
  • The following figure 3 shows the last 5 rows of the dataset.
  • The figure 4 represents most used words in tweets from complete dataset.
  • From the following figures the dataset description is described as:
    • "S.No." depicting the number of tweet in incresing order
    • "Tweet Posted Time (UTC)" depicting the time of posting of that particular tweet.
    • "tweets" depicting textual content of that tweet.
    • "Tweet Location" depicting the location from which the tweet has been posted.

7. Generic Methodology

Figure 5: Data Pipeline

  • The following figure 5 represents sequential steps performed in order to reach to end goal.
  • Firstly, the twitter data is scraped from Twitter using Twitter streaming API.
  • Secondly, the dataset is curated and made in a csv format for each state.
  • Thirdly, the tweets from the data is cleaned using basic nlp operations like:
    • converting text into lowercase
    • removing user mentions
    • removing re-tweets
    • removing special characters except [a-zA-Z]
    • removing hyperlink starts with https
    • removing punctuations
    • removing stopwords
  • After that, POS(Part-of-speech) tagging, also called grammatical tagging the process of marking up a word in a text (corpus) is applied on the data.
  • After that, Multimodal vectors are created using tuples of aspect category which stores aspect terms.
  • After that, An algorithm is developed using NRC emotion lexicon and data from each states to create a Multimodal emotion scoring system using word count frequency.
  • Lastly, the results of India state-wise emotion plots are analyzed using different visualization techniques such as bar plots, doughnut plots and bubble charts.

8. Developed Algorithm:

image image

  • This algorithm is used for calculating the score of each multimodal category.
  • In this algorithm, ‘T’ is the list of ‘t’ tweets, ‘Ct’ is a list obtained after cleaning of t tweets, ‘w’ is a set of tokenized words from ‘t’ tweets, ‘At’ is a set of multimodal terms and ‘csv’ is the two dimensional data frame which after processing forms different category wise tweets.
  • The clean function is used for cleaning of tweets.
  • The category wise function is used for separating tweets on the basis of multimodal terms.
  • The POS tagging function is used for each category to determine adjectives and adverbs in ‘t’ tweets.
  • The lexicon function is used for generating a list of common words from POS tagged words and lexicon words.
  • The count frequency function is used for generating a dictionary where key is word and value is frequency of word in ‘tw’ list of tweets.
  • The scoring system function is used for calculating the emotion score for each multimodal category

9. Results

I. Analysis from bar plots:

  • The emotion of population from different states of India is analyzed, and Maharashtra can be clearly seen at the top of every emotion score followed by NCT of Delhi and Karnataka.
  • These states were worst hit by the pandemic which led to a sudden and massive rise in corona virus cases.
  • These could be the reason for higher fear, anger ,disgust and sadness score in these states.
  • While at the same time they have also seen immense growth in the number of recovered patients that rises the level of trust and joy among the people.
  • This same trend is followed in other states like Tamil Nadu, West Bengal, Andhra Pradesh too.
  • The Indian State-wise 8 Emotion (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) bar plots are as follows:

India state-wise anger score India state-wise anticipation score India state-wise disgust score India state-wise fear score

India state-wise joy score India state-wise sadness score India state-wise surprise score India state-wise trust score

II. Analysis from doughnut plots:

  • The people have mostly been very patient during the lockdown period and have contributed their bit to stop the spread chain of COVID-19.
  • They have shown trust in the governmental policies and decisions as well as on their fellow citizens.
  • Even after all this, due to the rapid spread of the virus, people have been quite fearful about catching infection.
  • The rising death toll can be another cause of distress and sadness.
  • Most people have been fearful and unsure about the outcomes of lockdown.
  • Unemployment, working from home, falling economy and losing close ones has taken a toll on people’s emotional health.
  • Despite all the shortcomings, people have been trying to look at the bright side of things.
  • They have widely trusted the capabilities of teachers and schools to cope up with the changing education system.
  • Reduction in pollution levels has been another cause of contentment amongst people.
  • There has been a lot of anticipation regarding the final outcomes of lockdown and its impact on the economy.
  • People have been looking forward to the governmental policies towards the unlocking procedure.
  • The 6 Multimodal Emotion doughbut plots are as follows:

Education Health Lockdown

Market Nature Politics

III. Analysis from Bubble charts:

  • Figure 7 depicts mode-wise emotion distribution during the lockdown period.
  • COVID-19 has clearly, not just taken away jobs and the economy but has also affected people’s mental health adversely.
  • Living away from family trapped in foreign lands, losing loved ones, and not attending their funerals, wearing masks every time you step out is not easy to adapt to.
  • The new normal is definitely not normal for people.
  • The Emotion intensity bubble plots pertaining to 6 modes are as follows:

ANGER ANTICIPATION DISGUST FEAR

JOY SADNESS SURPRISE TRUST

10. Summary and Conclusion

  • In this project, we present an emotion care scheme web-based platform to recognize the emotional state of Indian citizens throughout the ongoing COVID-19 crisis.
  • Taking India as a case study, we inferred from this textual analysis that ‘joy’ has been lesser towards everything (~9-15%) but nature (~17%) due to the apparent fact of lessened pollution.
  • The education system entailed more trust (~29%) due to teachers' fraternity's consistent efforts.
  • The health sector witnessed sadness (~16%) and fear (~18%) as the dominant emotions among the masses as human lives were at stake.
  • With the help of this research, health organizations and higher authorities will be able to have a better insight into the emotional health of people and will also be able to interpret the way people react to various day-to-day decisions.
  • Additionally, the state-wise and emotion-wise depiction is also provided.
  • An interactive internet application has also been developed for the same.

11. Limitations/ Challenges faced during the project

  • The collection of the labeled dataset was a problem because of the unavailability of the properly labeled dataset.
  • The proposed scheme currently works only on Twitter data.
  • The model is fully functional but it cannot be used for other languages.
  • The number of emotions considered in this project are 8-scale emotions (Anger, Anticipation, Disgust, Fear, Joy, Sadness, Surprise, and Trust) only.

12. Future Scope

  • The proposed scheme is scalable if data from different social media platforms is incorporated.
  • The model is fully functional but its horizon can be widened by including different languages.
  • The number of emotions considered can be increased to have an even more fine-grained analysis.
  • Usage of deep learning approaches might also fine-tune the current scheme.
  • To investigate new features to improve the existing model.

13. Credits:

Thanking my project mentors and teammates for caring and supporting me wholeheartedly. The role you played in my life is invaluable. I’m grateful for all of your help and continued support.

14. License:

  • Apache License 2.0

About

A novel emotion care scheme has been proposed in this project to analyze multimodal textual data contained in real-time tweets related to COVID-19.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published