## Business Understanding
Work has already begun towards developing a COVID-19 vaccine. From measles to the common flu, vaccines have lowered the risk of illness and death, and have saved countless lives around the world. Unfortunately in some countries, the 'anti-vaxxer' movement has led to lower rates of vaccination and new outbreaks of old diseases.

Although it may be many months before we see COVID-19 vaccines available on a global scale, it is important to monitor public sentiment towards vaccinations now and especially in the future when COVID-19 vaccines are offered to the public. The anti-vaccination sentiment could pose a serious threat to the global efforts to get COVID-19 under control in the long term.

The objective of this challenge is to develop a machine learning model to assess if a Twitter post related to vaccinations is positive, neutral, or negative. This solution could help governments and other public health actors monitor public sentiment towards COVID-19 vaccinations and help improve public health policy, vaccine communication strategies, and vaccination programs across the world.

### Project Objective.
This is a Zindi Challenge and the main objective of this project is to develop a Recurrent Neural Network to predict the Sentiment of a tweet.

### Data Source
The source of this data is from the Zindi website.

### Evaluation Metric
The evaluation metric for the project is the Root Mean Squared Error
At the end of the project, the RMSE value < 0.5

#### Import Necessary Libraries

In [1]:
# Dataframe
import pandas as pd

# Matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

# Numpy
import numpy as np

#Scikit-learn
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, classification_report,accuracy_score
from sklearn.metrics import root_mean_squared_error as rmse
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer
from sklearn.decomposition import LatentDirichletAllocation, NMF

# Tensorfow
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras import utils


# Keras layers
from tensorflow.keras.layers import Activation, Dense, Embedding,Dropout, Flatten, Conv1D, MaxPooling3D, LSTM
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# nltK

import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('averaged_perceptron_targer')
nltk.download('omw-1.4')
from nltk.corpus import stopwords, wordnet
from nltk.stem import SnowballStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

#Word2vec
import gensim
from gensim.test.utils import common_texts
from gensim.models import Word2Vec

# Utility
import string
import re
import os
import logging
import pickle
import datetime
import itertools
import random
from collections import Counter, defaultdict

# Wordcloud
from PIL import Image
from wordcloud import WordCloud, ImageColorGenerator, STOPWORDS

# Warnings
import warnings
warnings.filterwarnings('ignore')

#Set Logs
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Alphagoal\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Alphagoal\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Alphagoal\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Error loading averaged_perceptron_targer: Package
[nltk_data]     'averaged_perceptron_targer' not found in index
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\Alphagoal\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


#### Data Understanding

In [4]:
train_df = pd.read_csv("../data/Train.csv")
train_df.head()

Unnamed: 0,tweet_id,safe_text,label,agreement
0,CL1KWCMY,Me &amp; The Big Homie meanboy3000 #MEANBOY #M...,0.0,1.0
1,E3303EME,I'm 100% thinking of devoting my career to pro...,1.0,1.0
2,M4IVFSMS,"#whatcausesautism VACCINES, DO NOT VACCINATE Y...",-1.0,1.0
3,1DR6ROZ4,I mean if they immunize my kid with something ...,-1.0,1.0
4,J77ENIIE,Thanks to <user> Catch me performing at La Nui...,0.0,1.0


In [5]:
# Remove the neutral text
train_df = train_df[train_df['label'] != 0.0]

train_df.head()

Unnamed: 0,tweet_id,safe_text,label,agreement
1,E3303EME,I'm 100% thinking of devoting my career to pro...,1.0,1.0
2,M4IVFSMS,"#whatcausesautism VACCINES, DO NOT VACCINATE Y...",-1.0,1.0
3,1DR6ROZ4,I mean if they immunize my kid with something ...,-1.0,1.0
5,OVNPOAUX,<user> a nearly 67 year old study when mental ...,1.0,0.666667
6,JDA2QDV5,"Study of more than 95,000 kids finds no link b...",1.0,0.666667
