This is a quick forway into basic NLP. I recently read a Pew report discussing global views on the state of respondents' nations in comparison to fifty years ago. Being aware of and interested in, though not extrememly experienced with, Natural Language Processing, I decided to look into what this would say about survey reports and the questions being asked. There is a wide perception that news media should be objective and neutral. Would reporting on global attitudes be able to maintain this tone when discussing people's subjective feelings? How might the design of the question factor into this? At the end, I will attempt some rewritings of the question to see how this might be different. The sentiment analysis runs on a scale from negative one, most negative, to positive one, most positive, with zero being perfectly neutral. 

h/t to NeuralNine on YouTube for his quick introduction and walkthrough with sentiment analysis using Python, NLTK, and TextBlob. 

Link to NeuralNine's video: [Simple Sentiment Text Analysis in Python](https://www.youtube.com/watch?v=tXuvh5_Xyrw)

Link to the article: [Worldwide, People Divided on Whether Life Today Is Better Than in the Past](https://www.pewresearch.org/global/2017/12/05/worldwide-people-divided-on-whether-life-today-is-better-than-in-the-past/)

First, we begin with a trial run based on NeuralNine's introduction to NLTK, TextBlob and Newspaper...

In [None]:
##Installing necessary NLP libraries
!pip install nltk 
!pip install textblob 
!pip install newspaper3k

In [13]:
##Importing NLP Libraries
import nltk
nltk.download('punkt')
from textblob import TextBlob
from newspaper import Article

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [7]:
##Trial URL
url = 'https://en.wikipedia.org/wiki/Mathematics'
article = Article(url)

In [14]:
##Downloading article and preprocessing for NLP
article.download()
article.parse()
article.nlp()

In [None]:
##Setting 'text' to full-length article from Wikipedia
text = article.text
print(text)

In [None]:
##Setting up a second variable as a summary of the Wikipedia 'Mathematics' article -- We'll compare how they rate later
textsum = article.summary
print(textsum)

Here I diverge from the video because I wanted to see if there would be a difference between the summary of the article and the fuller article in terms of sentiment analysis. I hypothesize that the inclusion of more text would have a tendency to dilute the sentiment an article contains. 

In [17]:
##Full length article sentiment analysis
blob1 = TextBlob(text)
sentiment1 = blob1.sentiment.polarity
print(sentiment1)

0.08933739084512281


In [18]:
##Summary of article sentiment analysis
blob2 = TextBlob(textsum)
sentiment2 = blob2.sentiment.polarity
print(sentiment2)

0.14705627705627705


As we can see from the scores above, my hypothesis was correct. The summary score is more than one and a half times the full length score on the positive side of the sentiment analysis scale. 

Trying the above with a recent Pew report about global sentiment, both on the report itself and the question. Note: the question was asked in over thirty countries, and ostensibly localized, as such we don't have those questions for our analysis. Examining the sentiment of the phrasing of the question in each of the target languages is beyond the scope of this project. 

In [19]:
##Loading the URL and converting it into an Article object
pew_url = 'https://www.pewresearch.org/global/2017/12/05/worldwide-people-divided-on-whether-life-today-is-better-than-in-the-past/'
pew_article = Article(pew_url)

In [20]:
##Downloading and preprocessing the article
pew_article.download()
pew_article.parse()
pew_article.nlp()

In [21]:
##Saving the article as a text object
pew_text = pew_article.text

In [27]:
##Setting up a second variable as a summary of the Pew article for later comparison
pew_text_sum = pew_article.summary

In [22]:
##Running and printing our sentiment analysis FULL article
pew_blob = TextBlob(pew_text)
pew_sentiment = pew_blob.sentiment.polarity
print(pew_sentiment)

0.14013994194630036


In [28]:
##Running and printing out sentiment analysis for the article summary
pew_sum_blob = TextBlob(pew_text_sum)
pew_sum_sentiment = pew_sum_blob.sentiment.polarity
print(pew_sum_sentiment)

0.20666666666666667


In [26]:
##Finally, we perform the same analysis on the question as asked in English -- Note: respondents were asked to fill in a blank
pew_question = 'life in our country today is than it was 50 years ago for people like me'
pqblob = TextBlob(pew_question)
pq_sentiment = pqblob.sentiment.polarity
print(pq_sentiment)

0.0


In [30]:
##Testing the question with a hypothetically positive response
pew_question_pos = 'life in our country today is better than it was 50 years ago for people like me'
pqposblob = TextBlob(pew_question_pos)
pqpos_sentiment = pqposblob.sentiment.polarity
print(pqpos_sentiment)

0.5


In [34]:
##Testing the question with a hypothetically negative response
pew_question_neg = 'life in our country today is worse than it was 50 years ago for people like me'
pqpnegblob = TextBlob(pew_question_neg)
pqneg_sentiment = pqpnegblob.sentiment.polarity
print(pqneg_sentiment)

-0.4


In [37]:
##Testing the question with a hypothetically neutral response
pew_question_same = 'life in our country today is the same as it was 50 years ago for people like me'
pqsameblob = TextBlob(pew_question_same)
pqsame_sentiment = pqsameblob.sentiment.polarity
print(pqsame_sentiment)

0.0


Here we can see that the phrasing of the question does matter. Testing the hypothetical neutral question gave us the same score as the original question that did not feature any descriptive sentiment. A caveat would be that in asking an actual human the hypothetically neutral question, "the same" could refer to a lot of things, not simply the material experience of the citizens' group. 

Similarly to the test example, we see that the summary scores higher on the positive end of the sentiment scale than does the full length article. We also find that the sentiment in the report skews more positive than does the question itself. This probably does not betray a lack of objectivity as much as it reflects that for many countries there have been tremendous strides toward a better quality of life. It is important to note that of the top five, three were undergoing post-war reconstruction fifty years years ago and one, one was suffering domestic turmoil and has seen its fortunes improve with market reforms, and the fifth was mired in a war. As such, there was a great deal of room for improvement. 

As alluded to earlier, this project could be improved upon and expanded by looking at how the questions are phrased in each individual language. Not every language has a grammatical structure that is as modular as English. As such, the ordering of options (e.g., "Better," "The Same," or "Worse") could bias responses by priming the individuals being interviewed. It is almost certain the Pew takes these factors into account, but for other start-up organizations looking to gauge public opinion, these dynamics will need to be considered. Consultations with linguists and native speakers would do well in aiding in this process. 

Finally, this is not an especially sophisticated NLP library. More advanced models will likely have a more granular view of sentiment and dig out greater insights, but for our simple purposes here, it suffices. 