Sentiment Analysis of COVID-19 Tweets

Hello there! The proposed approach is divided into four phases: 1) pre-processing, 2) keyword trend analysis, 3) word embeddings for feature extraction, and 4) classification methods. The CovidSenti dataset is divided into two chunks, training and testing. We take care of the various factors of the dataset, such as over-fitting, noisy or small and large datasets. The main objective of this study is to evaluate the classification performance of state-of-the-art classifiers on the COVIDSenti dataset and then attempt to improve performance by extracting key features of tweets. The proposed technique classifies the CovidSenti dataset with higher accuracy and competently for the COVIDSenti dataset containing COVID-19 associated Twitter posts

About Us

Development Phase

For the Model Training and Validation the fastai approach was used along with Keras(Tensorflow 2.0) and Pytorch. The development phase of the project is divided in 5 phases:

Data Collection and Cleaning.
Exploratory Data Analysis and Preprocessing.
Model Training and Sentiment Extractor
Create a Web File

Website Development

The Development of the website is divided into 4 phases:

Public Sentiment Analysis
Real Time Sentiment Analysis
Twitter Live Feed Analysis
Live Case count

Notebooks

Data Collection and Data Cleaning

Collecting data for training the ML model is the basic step in the machine learning pipeline. The predictions made by ML systems can only be as good as the data on which they have been trained. Following are some of the problems that can arise in data collection:

Inaccurate data. The collected data could be unrelated to the problem statement.
Missing data. Sub-data could be missing. That could take the form of empty values in columns or missing images for some class of prediction.

Data cleaning is one of the important parts of machine learning. It plays a significant part in building a model. It surely isn’t the fanciest part of machine learning and at the same time, there aren’t any hidden tricks or secrets to uncover. However, the success or failure of a project relies on proper data cleaning.

Exploratory Data Analysis

EDA is the approach for analyzing the dataset to summarise its main features. The dataset summaries can be of 2 types,

Numerical Summary: Numerical summaries are summaries in terms of Numbers. Ex: Mean( Average), Median, etc…It can be either a) Univariate – Measure relies only on one variable or b) Bivariate – measure relies on two variables.
Graphical Summary: Graphical summaries will be in the form of graphs. Ex: Histogram, Box-plot, etc…

The data set thus obtained after cleaning was then subjected to Exploratory Data Analysis (EDA) by plotting various types of graphs based on the sentiments and sentiment triggers, to gain valuable insights from the data. The frequency distribution graphs gives us a good perspective of the dataset and also gives us an insight into predicting the model's generalization capability. By plotting the graphs on the basis of sentiments and the sentiment triggers, it was clear that there was not much of a difference in the sentiment trend in tweets prevalent in India when compared to the rest of the world.

Model Training & Sentiment Extractor

A training model is a dataset that is used to train an ML algorithm. It consists of the sample output data and the corresponding sets of input data that have an influence on the output. The training model is used to run the input data through the algorithm to correlate the processed output against the sample output. The result from this correlation is used to modify the model. This iterative process is called “model fitting”. The accuracy of the training dataset or the validation dataset is critical for the precision of the model.

Model training in machine language is the process of feeding an ML algorithm with data to help identify and learn good values for all attributes involved. There are several types of machine learning models, of which the most common ones are supervised and unsupervised learning. Supervised learning is possible when the training data contains both the input and output values. Each set of data that has the inputs and the expected output is called a supervisory signal. The training is done based on the deviation of the processed result from the documented result when the inputs are fed into the model.

WebsiteWe

We create the website for:

Public Sentiment Analysis
Real Time Sentiment Analysis
Twitter Live Feed Analysis
Live Case count

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.idea		.idea
Python		Python
TwitterMonitoringJavaScript-master		TwitterMonitoringJavaScript-master
assets		assets
ppt		ppt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.idea

.idea

Python

Python

TwitterMonitoringJavaScript-master

TwitterMonitoringJavaScript-master

assets

assets

ppt

ppt

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

index.html

index.html

requirments.txt

requirments.txt

Repository files navigation

Sentiment Analysis of COVID-19 Tweets

About Us

Development Phase

Website Development

Notebooks

Data Collection and Data Cleaning

Exploratory Data Analysis

Model Training & Sentiment Extractor

WebsiteWe

Thanks for checking out the repo!

About

Releases

Packages

Languages

License

ThisIs-Developer/Sentiment-Analysis-of-Covid-19-Tweets

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of COVID-19 Tweets

About Us

Development Phase

Website Development

Notebooks

Data Collection and Data Cleaning

Exploratory Data Analysis

Model Training & Sentiment Extractor

WebsiteWe

Thanks for checking out the repo!

About

Topics

Resources

License

Stars

Watchers

Forks

Languages