# Sentiment Analysis

Sentiment analysis is the process of identifying and extracting subjective information from text data, which can include opinions, attitudes, emotions, and other similar aspects of the writer's experience. The goal of sentiment analysis is to determine the overall sentiment of a piece of text, whether it is positive, negative, or neutral, and to identify the specific aspects of the text that contribute to that sentiment.

There are several ways to perform sentiment analysis, some of which include:

- Rule-based approach: This involves the use of a set of pre-defined rules to identify and classify sentiment in text. For example, a rule-based system might identify the presence of certain words or phrases that are indicative of positive or negative sentiment, and use those to assign a sentiment score to the text.

- Machine learning approach: This involves training a machine learning algorithm on a set of labeled data, where each piece of text is associated with a sentiment label (positive, negative, or neutral). The algorithm learns to identify patterns in the data that are indicative of each sentiment label and can then be used to classify new, unlabeled text data.

- Hybrid approach: This combines the rule-based and machine learning approaches, using a set of pre-defined rules to identify sentiment in text and then using a machine learning algorithm to refine and improve the sentiment analysis.

- Lexicon-based approach: This approach involves the use of sentiment lexicons or dictionaries, which are pre-built lists of words and phrases that are associated with positive or negative sentiment. The sentiment of a given text can then be determined by calculating the number and polarity of sentiment words present in the text.

- Deep learning approach: This involves the use of neural networks to learn and classify sentiment in text. Deep learning models can process large amounts of text data and identify complex patterns that may be difficult to identify using other methods.

Overall, sentiment analysis can be performed using a variety of methods, each with its own strengths and weaknesses. The choice of approach will depend on the specific needs of the application and the resources available for implementation.

Python provides several libraries for performing sentiment analysis. Some popular libraries and tools for sentiment analysis in Python include:

- TextBlob: TextBlob is a Python library that provides simple API for common natural language processing (NLP) tasks such as sentiment analysis, part-of-speech tagging, and noun phrase extraction.

- NLTK: The Natural Language Toolkit (NLTK) is a widely used Python library for NLP. It provides various tools and methods for text processing, including sentiment analysis.

- VaderSentiment: VaderSentiment is a Python library that is specifically designed for sentiment analysis of social media text. It is based on a rule-based approach and can handle emoticons and slang language.

- Scikit-learn: Scikit-learn is a machine learning library for Python that can be used for various NLP tasks including sentiment analysis. It provides various algorithms for text classification such as Naive Bayes, Support Vector Machines, and Logistic Regression.

- Hugging Face: Hugging Face built on top of PyTorch and provides pre-trained models for e.g. sentiment analysis, such as BERT (Bidirectional Encoder Representations from Transformers), RoBERTa (Robustly Optimized BERT), and DistilBERT, among others. These models can be fine-tuned on specific datasets to improve their performance for sentiment analysis on specific domains or languages.

To perform sentiment analysis using these libraries, you will typically need to first preprocess your text data, such as tokenization, stop word removal, and stemming. Then, you can use the sentiment analysis functions or methods provided by the library of your choice to obtain a sentiment score or label for your text data.

For example, using TextBlob, you can perform sentiment analysis on a sentence as follows:

In [None]:
from textblob import TextBlob

text = "I love pizza"
blob = TextBlob(text)
sentiment_score = blob.sentiment.polarity
print(sentiment_score)

In [None]:
help(TextBlob.sentiment)

This code will output the sentiment score of the given sentence as a floating-point value between -1 (negative) and 1 (positive). In this case, the sentiment score will be a positive value, indicating a positive sentiment.

Hugging Face is a popular open-source library for natural language processing (NLP) tasks that provides easy-to-use interfaces to pre-trained transformer models, such as BERT and RoBERTa. These pre-trained models can be fine-tuned on specific NLP tasks, such as sentiment analysis, with just a few lines of code.

A pipeline transformer is a simple and convenient way to perform a wide range of NLP tasks, including sentiment analysis, using pre-trained transformer models. A pipeline transformer allows you to perform these tasks without the need to fine-tune a model or write complex code.

`pip install -q transformers`

In [None]:
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love you", "I hate you"]
sentiment_pipeline(data)

Let's download some data. 

The `tweets.csv` contains tweets from Hillary Clinton and Donald Trump from the 2016 presidential election.

In [None]:
import pandas as pd
df = pd.read_csv('data/tweets.csv')

In [None]:
df.info()

The transformer models can be quite heavy, and we will therefore only run it on a random sample of 100 tweets.

In [None]:
df_sample = df.sample(100, ignore_index=True)

In [None]:
df_sample.head()

We'll use our transformer to perform sentiment analysis on the tweets.

In [None]:
sent = pd.DataFrame(sentiment_pipeline(list(df_sample.text)))

In [None]:
sent.head()

We then merge the data

In [None]:
final = pd.concat([df_sample, sent], axis=1)

In [None]:
final[['text','label', 'score']].head(10)

In [None]:
final.text[0]

In [None]:
final.text[7]

return to [overview](../00_overview.ipynb)