# **Sentiment Analysis**
With transformers from HuggingFace: https://huggingface.co/transformers/quicktour.html

Sentiment analysis, also called opinion mining, is the process of understanding the opinion of an author about a subject. In other words, "What is the emotion or opinion of the author of the text about the subject discussed?

In a sentiment analysis system, depending on the context, we usually have 3 elements: First is the **opinion or an emotion**. An opinion (also called "polarity") can be positive, neutral or negative. An emotion could be qualitative (like joy, surprise, or anger) or quantitative (like rating a movie on the scale from 1 to 10).

The second element in a sentiment analysis system is the **subject that is being talked about**, such as a book, a movie, or a product. Sometimes one opinion could discuss multiple aspects of the same subject. For example: "The camera on this phone is great but its battery life is rather disappointing."" The third element is the **opinion holder**, or entity, expressing the opinion.

Sentiment analysis has many practical applications. In social media monitoring, we want to know how they are talking about it. We can also find sentiment on forums, blogs, and the news. Most brands analyze all of these sources to enrich their understanding of how customers interact with their brand, what they are happy or unhappy about, and what matters most to consumers. Sentiment analysis is thus very important in brand monitoring, and in fields such as customer and product analytics and market research and analysis.

## **Sentiment analysis types and approaches**

Sentiment analysis tasks can be carried out at different levels of granularity. 
First is document level. This is when we look at the whole review of a product, for example. 

Second is the sentence level. This refers to determining whether the opinion expressed in each sentence is positive, negative, or neutral. 

The last level of granularity is the aspect level. The aspect refers to expressing opinions about different features of a product. 

Imagine a sentence such as "The camera in this phone is pretty good but the battery life is disappointing." It expresses both positive and negative opinions about a phone and we might want to be able to say which features of the product clients like and which they don't.

The algorithms used for sentiment analysis could be split into 2 main categories. 

- The first is **rule or lexicon based**. Such methods most commonly have a predefined list of words with a valence score. For example, nice could be +2, good +1, terrible -3, and so on. The algorithm then matches the words from the lexicon to the words in the text and either sums or averages the scores in some way. 

As an example, let's take the sentence, 'Today was a good day.' Each word gets a score, and to get the total valence we sum the words. In this case, we have a positive sentence. 

- A second category is automated systems, which are based on machine learning. 

The task is usually modeled as a classification problem where using some historical data with known sentiment, we need to predict the sentiment of a new piece of text.

In [None]:
text = "Today was a good day"

from textblob import TextBlob

my_valence = TextBlob(text)
my_valence.sentiment

# Sentiment(polarity=0.7, subjectivity=0.60)

## **Valence of a sentence**

We can calculate the valence score of a text, using Python's textblob library. We continue working with our 'Today was a good day' string. We import the TextBlob function from the textblob package and apply it to our string. TextBlob  has obtained some natural language processing skills. 

We are interested in its sentiment; that's why we call sentiment on our TextBlob. The sentiment property returns a tuple: polarity, which is measured on the scale from [-1.0 to 1.0], where -1.0 is very negative, 0 is neutral and +1.0 is very positive. 

Our example 'Today was a good day' carries positive emotion and thus will have a positive **polarity** score: 0.7. The second element in the tuple displays the **subjectivity**, measured from [0.0 to 1.0] where 0.0 is very objective and 1.0 is very subjective. So our example is rather positive and subjective.

### **Automated or rule-based?**

A machine learning sentiment analysis relies on having labeled historical data whereas lexicon-based methods rely on having manually created rules or dictionaries. 

Lexicon-based methods fail at certain tasks because the polarity of words might change with the problem, which will not be reflected in a predefined dictionary. 

However, lexicon-based approaches can be quite fast, whereas Machine learning models might take a while to train. At the same time, machine learning models can be quite powerful. So, the jury is still out on that one. Many people find that a hybrid approach tends to work best in many, usually complex scenarios.

# Word Cloud

A word cloud is an image composed of words with different sizes and colors. They can be especially useful in sentiment analysis. 

Word clouds (also called tag clouds) are used across different contexts. In the most common type of word clouds the size of the text corresponds to the frequency of the word. The more frequent a word is, the bigger and bolder it will appear on the word cloud.

Why are word clouds so popular? First of all, they can reveal the essential. We saw in our word cloud, the word Titanic really popped out. Second, unless told otherwise, they will plot all the words in a text, and a quick scan of the image can provide an overall sense of the text. Last but not least, they are easy to understand and quite fun. However, they have their drawbacks. Sometimes they tend to work less well. All the words plotted on the cloud might seem unrelated and it could be difficult to draw a conclusion based on a crowded word cloud. Secondly, if the text we work with is large, a word cloud might require quite a lot of preprocessing steps before it appears sensible and uncluttered.

We can use the WordCloud function from the wordcloud package. We will have to import matplotlib.pyplot as well, which will allow wordcloud to plot on its base.

In [1]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt 

two_cities = "It was the best of times, it was the worst of times,         it was the age of wisdom, it was the age of foolishness,         it was the epoch of belief, it was the epoch of incredulity,         it was the season of Light, it was the season of Darkness,         it was the spring of hope, it was the winter of despair,         we had everything before us, we had nothing before us,         we were all going direct to Heaven, we were all going        direct the other way – in short, the period was so far        like the present period, that some of its noisiest        authorities insisted on its being received, for good        or for evil, in the superlative degree of comparison only."

ModuleNotFoundError: No module named 'wordcloud'

In [None]:
cloud_two_cities = WordCloud().generate(two_cities)
plt.imshow(cloud_two_cities, interpolation='bilinear')
plt.axis('off')
plt.show()

In [None]:
# Import the word cloud function 
from wordcloud import WordCloud 

# Create and generate a word cloud image
my_cloud = WordCloud(background_color='white', stopwords=my_stopwords).generate(descriptions)

# Display the generated wordcloud image
plt.imshow(my_cloud, interpolation='bilinear') 
plt.axis("off")

# Don't forget to show the final image
plt.show()