In [1]:
import pandas as pd
import os
import json

def score_text(text):
    from nltk.sentiment.vader import SentimentIntensityAnalyzer
    sid = SentimentIntensityAnalyzer()
    score = sid.polarity_scores(text)
    return score["compound"]



## 1. Vocabulary-based sentiment scoring will be scored for the query "Amazon Company"
## Positive and negative paragraphs are defined

In [2]:
query = "Amazon Company"

negative_paragraph = """{0} is very bad. And author doesn't provide any justification. People don't like {0}. 
Some even hate {0}, because {0} is evil. Some groups believe {0} is their main enemy.
""".format(query)

positive_paragraph = """{0} is very good. And author doesn't provide any justification. People like {0}. 
Some even love {0}, because {0} is honest. Some groups believe {0} is their best friend.""".format(query)

negative_paragraph

"Amazon Company is very bad. And author doesn't provide any justification. People don't like Amazon Company. \nSome even hate Amazon Company, because Amazon Company is evil. Some groups believe Amazon Company is their main enemy.\n"

## 2. For the sake of simplicity top 5 relevant articles were retrieved.
## These articles are read.

In [3]:
folder = "./amazon_company/"
files = os.listdir(folder)

In [4]:
df = pd.DataFrame(index=list(range(0,len(files))), columns=['title', 'link', 'text'])

In [5]:
for i, filename in enumerate(files):
    with open(folder + filename, 'r') as f:
        data = json.load(f)
        df.loc[i] = data['title'], data['link'], data['text']

## 3. Then these articles are scored and ranking is built

In [6]:
ranking = df.copy()
ranking['score'] = df['text'].apply(score_text)
ranking = ranking.sort_values(by='score', ascending=False)
ranking

Unnamed: 0,title,link,text,score
0,Amazon (company),https://en.wikipedia.org/wiki/Amazon_(company),"Amazon (company)\nAmazon.com, Inc.[6] (/ˈæməzɒ...",0.9998
3,Prime Video,https://en.wikipedia.org/wiki/Prime_Video,"Prime Video, also marketed as Amazon Prime Vid...",0.9996
1,History of Amazon,https://en.wikipedia.org/wiki/History_of_Amazon,Founding\nThe company was founded as a result ...,0.9993
4,Amazon Web Services,https://en.wikipedia.org/wiki/Amazon_Web_Services,Amazon Web Services (AWS) is a subsidiary of A...,0.9993
2,Amazon Prime,https://en.wikipedia.org/wiki/Amazon_Prime,Amazon Prime is a paid subscription service of...,0.9982


## 4. Article with median score is choosen for reference

In [7]:
reference = ranking.iloc[len(files)//2]
reference

title                                    History of Amazon
link       https://en.wikipedia.org/wiki/History_of_Amazon
text     Founding\nThe company was founded as a result ...
score                                               0.9993
Name: 1, dtype: object

## 5. Two copies are added with positive/negative editions in article text

In [8]:
df_extended = df.copy()
df_extended

Unnamed: 0,title,link,text
0,Amazon (company),https://en.wikipedia.org/wiki/Amazon_(company),"Amazon (company)\nAmazon.com, Inc.[6] (/ˈæməzɒ..."
1,History of Amazon,https://en.wikipedia.org/wiki/History_of_Amazon,Founding\nThe company was founded as a result ...
2,Amazon Prime,https://en.wikipedia.org/wiki/Amazon_Prime,Amazon Prime is a paid subscription service of...
3,Prime Video,https://en.wikipedia.org/wiki/Prime_Video,"Prime Video, also marketed as Amazon Prime Vid..."
4,Amazon Web Services,https://en.wikipedia.org/wiki/Amazon_Web_Services,Amazon Web Services (AWS) is a subsidiary of A...


In [9]:
df_extended = df_extended.append(pd.DataFrame(data={'title':['neg_edit', 'pos_edit'],
                                     'link':['',''], 
                                     'text':[reference['text'] + negative_paragraph,
                                            reference['text'] + positive_paragraph]}))
df_extended

Unnamed: 0,title,link,text
0,Amazon (company),https://en.wikipedia.org/wiki/Amazon_(company),"Amazon (company)\nAmazon.com, Inc.[6] (/ˈæməzɒ..."
1,History of Amazon,https://en.wikipedia.org/wiki/History_of_Amazon,Founding\nThe company was founded as a result ...
2,Amazon Prime,https://en.wikipedia.org/wiki/Amazon_Prime,Amazon Prime is a paid subscription service of...
3,Prime Video,https://en.wikipedia.org/wiki/Prime_Video,"Prime Video, also marketed as Amazon Prime Vid..."
4,Amazon Web Services,https://en.wikipedia.org/wiki/Amazon_Web_Services,Amazon Web Services (AWS) is a subsidiary of A...
0,neg_edit,,Founding\nThe company was founded as a result ...
1,pos_edit,,Founding\nThe company was founded as a result ...


## 6. New ranking is built

In [10]:
ranking = df_extended.copy()
ranking['score'] = df_extended['text'].apply(score_text)
ranking = ranking.sort_values(by='score', ascending=False)
ranking

Unnamed: 0,title,link,text,score
0,Amazon (company),https://en.wikipedia.org/wiki/Amazon_(company),"Amazon (company)\nAmazon.com, Inc.[6] (/ˈæməzɒ...",0.9998
3,Prime Video,https://en.wikipedia.org/wiki/Prime_Video,"Prime Video, also marketed as Amazon Prime Vid...",0.9996
1,pos_edit,,Founding\nThe company was founded as a result ...,0.9996
1,History of Amazon,https://en.wikipedia.org/wiki/History_of_Amazon,Founding\nThe company was founded as a result ...,0.9993
4,Amazon Web Services,https://en.wikipedia.org/wiki/Amazon_Web_Services,Amazon Web Services (AWS) is a subsidiary of A...,0.9993
0,neg_edit,,Founding\nThe company was founded as a result ...,0.9991
2,Amazon Prime,https://en.wikipedia.org/wiki/Amazon_Prime,Amazon Prime is a paid subscription service of...,0.9982


## 7. We observe, that as expected, article with positive edition is higher in ranking, while article with negative edition is lower. I.e. while exact sentiment scores are not precise enough ranking is done properly.