# Natural Language Processing Project - Part 2

Installing vaderSentiment library - [VADER](https://github.com/cjhutto/vaderSentiment) (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. 

In [None]:
!pip install vaderSentiment



In [None]:
# import dependencies
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import pandas as pd
from bs4 import BeautifulSoup

In [None]:
# import dataset
imdb = pd.read_csv('/content/drive/My Drive/NLP/IMDB Dataset.csv')

# visualizing dataset
imdb.head()

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive


In [None]:
def pre_processing(text: str) -> str:
  """Function that remove the html tags from string
  Args:
    text (str): text with or without html tags
  Returns:
    str: returns the text without html tags 
  """
  # create soup object
  soup = BeautifulSoup(text)

  # remove html tags
  text_without_tags = soup.get_text()

  return text_without_tags

In [None]:
imdb['review_without_tags'] = imdb['review'].apply(pre_processing)

In [None]:
imdb.head()

Unnamed: 0,review,sentiment,review_without_tags
0,One of the other reviewers has mentioned that ...,positive,One of the other reviewers has mentioned that ...
1,A wonderful little production. <br /><br />The...,positive,A wonderful little production. The filming tec...
2,I thought this was a wonderful way to spend ti...,positive,I thought this was a wonderful way to spend ti...
3,Basically there's a family where a little boy ...,negative,Basically there's a family where a little boy ...
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive,"Petter Mattei's ""Love in the Time of Money"" is..."


## Vader sentiment analysis

In [None]:
analyser = SentimentIntensityAnalyzer()

# declaring lambda fuction that calculate score
# from vaderSentiment
sentimentAnalysis = lambda text: analyser.polarity_scores(text)['compound']

In [None]:
imdb['vaderScore'] = imdb['review_without_tags'].apply(sentimentAnalysis)

In [None]:
imdb['vaderScoreText'] = imdb['vaderScore'].apply(lambda score: "positive" if score > 0 else "negative")

In [None]:
imdb.head()

Unnamed: 0,review,sentiment,review_without_tags,vaderScore,vaderScoreText
0,One of the other reviewers has mentioned that ...,positive,One of the other reviewers has mentioned that ...,-0.9916,negative
1,A wonderful little production. <br /><br />The...,positive,A wonderful little production. The filming tec...,0.967,positive
2,I thought this was a wonderful way to spend ti...,positive,I thought this was a wonderful way to spend ti...,0.9519,positive
3,Basically there's a family where a little boy ...,negative,Basically there's a family where a little boy ...,-0.9213,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive,"Petter Mattei's ""Love in the Time of Money"" is...",0.9744,positive


In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score

In [None]:
def validation(y_test, y_hat):
  
  acc = accuracy_score(y_test, y_hat)
  tn, fp, fn, tp = confusion_matrix(y_test, y_hat).ravel()

  precision = tp / (tp + fp)
  recall = tp / (tp + fn)
  fpr = fp / (fp + tn)

  print("Acuracia: ", acc)
  print("Precision: ", precision)
  print("Recall: ", recall)

In [None]:
validation(imdb['sentiment'], imdb['vaderScoreText'])

Acuracia:  0.69528
Precision:  0.6470039144835893
Recall:  0.85948
