## What is NLP (Natural Language Processing)?
<img src='../Images/Others/nlp.jpg' width='320' height='200' style="float: left; margin:5px 22px 3px 1px">

Natural language processing (NLP) refers to the branch of computer science—and more specifically, the branch of artificial intelligence or AI—concerned with giving computers the ability to understand text and spoken words in much the same way human beings can.  

**Spam detection:** you may not think of spam detection as an NLP solution, but the best spam detection technologies use NLP's text classification capabilities to scan emails for language that often indicates spam or phishing.  

**Machine translation:** Google Translate is an example of widely available NLP technology at work.  

**Virtual agents and chatbots:** Virtual agents such as Apple's Siri and AmaneutralSumon's Alexa use speech recognition to recognineutralSume patterns in voice commands and natural language generation to respond with appropriate action or helpful comments.  

**Social media sentiment analysis:** NLP has become an essential business tool for uncovering hidden data insights from social media channels. Sentiment analysis can analyneutralSume language used in social media posts, responses, reviews, and more to extract attitudes and emotions in response to products, promotions, and events–information companies can use in product designs, advertising campaigns, and more.

In this notebook we will see how to do sentiment analysis.

In [8]:
import pandas as pd 
import numpy as np 

path = '../Datasets/Reviews.csv'
df = pd.read_csv(path, index_col=[0])

df.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,31-Jul-18,Charcoal Fabric,Love my Echo!,1
1,5,31-Jul-18,Charcoal Fabric,Loved it!,1
2,4,31-Jul-18,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,31-Jul-18,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,31-Jul-18,Charcoal Fabric,Music,1


In [9]:
df['date'] = pd.to_datetime(df['date'])
df.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback
0,5,2018-07-31,Charcoal Fabric,Love my Echo!,1
1,5,2018-07-31,Charcoal Fabric,Loved it!,1
2,4,2018-07-31,Walnut Finish,"Sometimes while playing a game, you can answer...",1
3,5,2018-07-31,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1
4,5,2018-07-31,Charcoal Fabric,Music,1


In [10]:
from nltk.corpus import stopwords
import nltk 
import re
import string

nltk.download('stopwords')

stemmer = nltk.SnowballStemmer('english')
stopword = set(stopwords.words('english'))

def clean_text(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text = " ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text = " ".join(text)
    return text

df["clean_reviews"] = df["verified_reviews"].apply(clean_text)
print('Cleaning is completed!')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ramaz\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Cleaning is completed!


In [11]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

df['Positive'] = [analyzer.polarity_scores(i)['pos'] for i in df['clean_reviews']]
df['Negative'] = [analyzer.polarity_scores(i)['neg'] for i in df['clean_reviews']]
df['Neutral'] = [analyzer.polarity_scores(i)['neu'] for i in df['clean_reviews']]

df.head()

Unnamed: 0,rating,date,variation,verified_reviews,feedback,clean_reviews,Positive,Negative,Neutral
0,5,2018-07-31,Charcoal Fabric,Love my Echo!,1,love echo,0.808,0.0,0.192
1,5,2018-07-31,Charcoal Fabric,Loved it!,1,love,1.0,0.0,0.0
2,4,2018-07-31,Walnut Finish,"Sometimes while playing a game, you can answer...",1,sometim play game answer question correct alex...,0.223,0.141,0.636
3,5,2018-07-31,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1,lot fun thing yr old learn dinosaur control l...,0.564,0.0,0.436
4,5,2018-07-31,Charcoal Fabric,Music,1,music,0.0,0.0,1.0


In [12]:
x = df['Positive'].sum()
y = df['Negative'].sum()
z = df['Neutral'].sum()

def sentiment(x,y,z):
    if x>y and x>z:
        print("Positive :)")
    elif y>x and y>z:
        print("Negative >_<")
    else:
        print("Neutral :|")

sentiment(x,y,z)

Neutral :|


In [13]:
comment = 'I love programming'

print("Let's calculate the sentiment of your comment:", comment)
print('_'*50)

def sentiment_scores(sentence):
    sid_obj = SentimentIntensityAnalyzer()
    sentiment_dict = sid_obj.polarity_scores(sentence)

    print("{0:.2f}% Positive".format(sentiment_dict['pos']*100))
    print("{0:.2f}% Neutral".format(sentiment_dict['neu']*100))
    print("{0:.2f}% Negative".format(sentiment_dict['neg']*100))
    print("Sentence Overall Rated As", end = " ")
 
    if sentiment_dict['compound'] >= 0.05 :
        print("Positive")
 
    elif sentiment_dict['compound'] <= - 0.05 :
        print("Negative")

    else :
        print("Neutral")

sentiment_scores(comment)

Let's calculate the sentiment of your comment: I love programming
__________________________________________________
67.70% Positive
32.30% Neutral
0.00% Negative
Sentence Overall Rated As Positive


In [15]:
df['reviews_length'] = df['clean_reviews'].apply(len)

Unnamed: 0,rating,date,variation,verified_reviews,feedback,clean_reviews,Positive,Negative,Neutral,reviews_length
0,5,2018-07-31,Charcoal Fabric,Love my Echo!,1,love echo,0.808,0.0,0.192,9
1,5,2018-07-31,Charcoal Fabric,Loved it!,1,love,1.0,0.0,0.0,4
2,4,2018-07-31,Walnut Finish,"Sometimes while playing a game, you can answer...",1,sometim play game answer question correct alex...,0.223,0.141,0.636,99
3,5,2018-07-31,Charcoal Fabric,I have had a lot of fun with this thing. My 4 ...,1,lot fun thing yr old learn dinosaur control l...,0.564,0.0,0.436,101
4,5,2018-07-31,Charcoal Fabric,Music,1,music,0.0,0.0,1.0,5


In [17]:
df = df[['rating', 'feedback', 'clean_reviews', 'Positive', 'Negative', 'Neutral', 'reviews_length']]
df.to_csv('../Datasets/cleaned_data.csv', index=False)