# Instagram review sentiment analysis
___

> ### If you are using jupyter notebook install below package by running cell

In [None]:
# Run this cell and wait until it installed successfully ! 
!pip install vaderSentiment

> ### If you are using python in cmd (command prompt) install below package in terminal using

<br>

```pip install vanderSentiment```

<br>
                                 
                                 

<p>

# 1. Single review prediction :
___

In [2]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    return score


sentiment_analyzer_scores('PUBLICATION: A few reasons why the Pacific Islands are at high risk of debt distress: üìàModest long-term economic growth prospects üåÄHigh vulnerability to natural disasters üöß High costs for public services/infrastructure. ‚óæÔ∏è‚óæÔ∏è‚óæÔ∏è‚óæÔ∏è‚óæÔ∏èRead more by copying this link or type "East Asia & Pacific Economic Update 2019" on your search browser. #EAPUpdate')


{'neg': 0.218, 'neu': 0.663, 'pos': 0.118, 'compound': -0.7351}

# 2. Multi review prediction at once :
___

In [3]:
# Importing necessary libraries
import numpy as np
import pandas as pd

from collections import defaultdict
import re

In [4]:
#loading dataset
df = pd.read_excel('insta-data-clean.xlsx')


# Generating empty dataset having following columns beacuse we have to values in it according to our sentiment scores
df_final = pd.DataFrame(columns=['Caption post','Negative','Neutral','Possitive','Overall Sentiment'])

df.drop(index=df[df["Post content"].isna()].index,axis=0,inplace=True)


collect = defaultdict(list)
check_record_for_none = index = df[df["Post content"].isna()]

if check_record_for_none.index > 0:
  print('NONE value present at location {}'.format(check_index.index))
  df.drop(index=df[df["Post content"].isna()].index,axis=0,inplace=True)


for record in df.iterrows():
  record = record[1]["Post content"]
  collect["Caption post"].append(df[df['Post content'] == record]["Post content"].get_values()[0])
  review = sentiment_analyzer_scores(record)
  #print(type(review))
  #print(sub.findall(review))
# print(review)
  collect["Negative"].append(review["neg"])
  collect["Neutral"].append(review["neu"])
  collect["Possitive"].append(review["pos"])
  collect["Overall Sentiment"].append(review["compound"])
  #break
new_col = pd.DataFrame(collect)
new_col = pd.DataFrame(new_col)

df[df['Post content'] == record]["Post content"].get_values()[0]
df["Post content"]
check_index = index=df[df["Post content"].isna()]

  


In [5]:
# First 5 records
new_col.head(5)

Unnamed: 0,Caption post,Negative,Neutral,Possitive,Overall Sentiment
0,PUBLICATION: A few reasons why the Pacific Isl...,0.218,0.663,0.118,-0.7351
1,"With 6 million #solar panels, a new solar plan...",0.027,0.911,0.062,0.4404
2,"Millions of people in #Mozambique, #Malawi and...",0.127,0.823,0.05,-0.8519
3,What kind of #Data interests you? Tell us in t...,0.091,0.891,0.017,-0.8119
4,CALLING ALL ARTISTS! This one's for you! üé®üéµüéûÔ∏èüñº...,0.042,0.817,0.142,0.9182


In [6]:
# last 5 records
new_col.tail(5)

Unnamed: 0,Caption post,Negative,Neutral,Possitive,Overall Sentiment
889,We took a look at the US economy. Our findings...,0.0,1.0,0.0,0.0
890,#tbt British economist John Maynard Keynes and...,0.058,0.813,0.129,0.743
891,Global economy on firm footing; growth project...,0.0,0.867,0.133,0.3818
892,Indonesia‚Äôs Minister of Finance Sri Mulyani an...,0.0,1.0,0.0,0.0
893,Just completed Germany‚Äôs annual economic revie...,0.0,1.0,0.0,0.0


### a) Saving result in Csv File 
___

In [7]:
new_col.to_csv('Saving_all_prediction_in_CSV.csv')

### b) Saving result in Excel File
___

In [None]:
new_col.to_csv('Saving_all_prediction_in_EXCEL.csv')

# SO HOW IT WORKS ?
___

<img src="https://www.lexalytics.com/images/extra/sentiment1.png" />

<p style="font-family:poppins">
    <b>VADER (Valence Aware Dictionary and sEntiment Reasoner)</b> is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. 

</p>

<p style="font-family:poppins">
examples of typical use cases for sentiment analysis, including proper handling of sentences with:

> typical negations (e.g., "not good")

> use of contractions as negations (e.g., "wasn't very good")

> conventional use of punctuation to signal increased sentiment intensity (e.g., "Good!!!")

> conventional use of word-shape to signal emphasis (e.g., using ALL CAPS for words/phrases)

> using degree modifiers to alter sentiment intensity (e.g., intensity boosters such as "very" and 

> intensity dampeners such as "kind of")

> understanding many sentiment-laden slang words (e.g., 'sux')

> understanding many sentiment-laden slang words as modifiers such as 'uber' or 'friggin' or 'kinda'

> understanding many sentiment-laden emoticons such as :) and :D

> translating utf-8 encoded emojis such as cupid and kiss and grin

> understanding sentiment-laden initialisms and acronyms (for example: 'lol')

> more examples of tricky sentences that confuse other sentiment analysis tools

> example for how VADER can work in conjunction with NLTK to do sentiment analysis on longer texts...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses

>  of a concept for assessing the sentiment of images, video, or other tagged multimedia content

> if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of texts in other languages (non-English text sentences).

</p>
</p>

<img src="https://ai2-s2-public.s3.amazonaws.com/figures/2017-08-08/bb5704679d1aaafd4fabfbe8b34930a98d40714f/2-Figure1-1.png">

<img src="https://www.mathworks.com/products/text-analytics/_jcr_content/mainParsys/band_copy_688706585__253862225/mainParsys/columns/2/image_copy_copy.adapt.full.high.gif/1559064584467.gif" />

# subgroup: face-positive

1F600                                      ; fully-qualified     # üòÄ grinning face

1F601                                      ; fully-qualified     # üòÅ beaming face with smiling eyes

1F602                                      ; fully-qualified     # üòÇ face with tears of joy

1F923                                      ; fully-qualified     # ü§£ rolling on the floor laughing

1F603                                      ; fully-qualified     # üòÉ grinning face with big eyes

.

.

.

(more emoji code)

# Their is meaning for all expression in numerical format

( '}{' )	1.6	0.66332	[1, 2, 2, 1, 1, 2, 2, 1, 3, 1]

(%	-0.9	0.9434	[0, 0, 1, -1, -1, -1, -2, -2, -1, -2]
 
('-:	2.2	1.16619	[4, 1, 4, 3, 1, 2, 3, 1, 2, 1]

(':	2.3	0.9	[1, 3, 3, 2, 2, 4, 2, 3, 1, 2]

((-:	2.1	0.53852	[2, 2, 2, 1, 2, 3, 2, 2, 3, 2]

(*	1.1	1.13578	[2, 1, 1, -1, 1, 2, 2, -1, 2, 2]

.

.

.

(more emoji code)



<img src ='http://www.joshuakim.io/wp-content/uploads/2017/12/filtering2.jpg'/>

<img src="https://www.researchgate.net/profile/Suthendran_Kannan/publication/325896826/figure/fig5/AS:639911610818561@1529578225737/Sentiment-Analysis-vs-VADER-Sentiment-Analysis.png"/>

# About Scoring 
____

The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). 

This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

> ### positive sentiment: compound score >= 0.05

> ### neutral sentiment: compound score > -0.05 and compound score < 0.05

> ### negative sentiment: compound score <= -0.05

<hr>

The ```pos, neu, and neg``` scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). 

These are the most useful metrics if you want multidimensional measures of sentiment for a given sentence.