# Task 6 - Twitter Sentiment Classification

*by Lukas DÃ¶tlinger*

In [1]:
import pandas as pd
from nltk.sentiment import SentimentIntensityAnalyzer

columns = ['target', 'ids', 'date', 'flag', 'user', 'text']
data = pd.read_csv('res/training.1600000.processed.noemoticon.csv', names=columns)

texts = data['text'].tolist()

negative_tweets = data[data['target'] == 0]['text'].tolist()
positive_tweets = data[data['target'] == 4]['text'].tolist()


We can split the tweets in the dataset into two categories based on their predefined classification. We can then use existing specifiers like the `SentimentIntensityAnalyzer` from `nltk` to compute a score for each text. To get some base values we use the `compute_results()` function below, which utilizes the compount polarity score. That score indicates that a text is positive for values greater 0 and negative for others.

In [2]:
def compute_results(n_tweets, p_tweets):
    sia = SentimentIntensityAnalyzer()

    n_count = len(n_tweets)
    n_errors = 0
    for n_t in n_tweets:
        if sia.polarity_scores(n_t)['compound'] > 0:
            n_errors += 1

    p_count = len(p_tweets)
    p_errors = 0
    for p_t in p_tweets:
        if sia.polarity_scores(p_t)['compound'] < 0:
            p_errors += 1

    tp = p_count - p_errors
    fn = p_errors
    fp = n_errors
    tn = n_count - n_errors

    precision = tp / (tp + fp)
    recall = tp / (tp + fn)

    print(f'Precision: {precision}')
    print(f'Recall: {recall}')
    print(f'F1: {(2 * precision * recall) / (precision + recall)}')
    print(f'Accuracy: {(tp + tn) / (tp + fp + tn + fn)}')

compute_results(negative_tweets, positive_tweets)

Precision: 0.7560344827586207
Recall: 0.877
F1: 0.812037037037037
Accuracy: 0.797


The results show that the sentiment classifier is already very capable with an acuracy of about 79 percent. We can further try to optimize this by only using alphanumeric lowercase words or removing stopwords. In this case I tried tokenizing with lowercase words and removal of non-alphanumeric characters, removing stopwords, using lemmatization and combining all approaches.

In [3]:
from nltk.tokenize import word_tokenize

def filter_tweets(t):
    return ' '.join([ w.lower() for w in word_tokenize(t) if w.isalpha() ])

def lowercase_alpha(tweets):
    return list(map(filter_tweets, tweets))
    
la_negative = lowercase_alpha(negative_tweets)
la_positive = lowercase_alpha(positive_tweets)
compute_results(la_negative, la_positive)

Precision: 0.7469982847341338
Recall: 0.871
F1: 0.804247460757156
Accuracy: 0.788


In [4]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

def remove_stopwords(t):
    return ' '.join([ w for w in word_tokenize(t) if not w in stop_words ])

def no_stopwords(tweets):
    return list(map(remove_stopwords, tweets))

nos_negative = no_stopwords(negative_tweets)
nos_positive = no_stopwords(positive_tweets)
compute_results(nos_negative, nos_positive)

Precision: 0.741306191687871
Recall: 0.874
F1: 0.8022028453419
Accuracy: 0.7845


In [5]:
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

def lemmatize_tweet(t):
    return ' '.join([ lemmatizer.lemmatize(w) for w in word_tokenize(t) ])

def lemmatized(tweets):
    return list(map(lemmatize_tweet, tweets))

lem_negative = lemmatized(negative_tweets)
lem_positive = lemmatized(positive_tweets)
compute_results(lem_negative, lem_positive)

Precision: 0.7425658453695837
Recall: 0.874
F1: 0.802939825447864
Accuracy: 0.7855


In [7]:
def clean(t):
    return ' '.join([ lemmatizer.lemmatize(w.lower()) for w in word_tokenize(t) if not w in stop_words ])

def combined(tweets):
    return list(map(clean, tweets))

clean_negative = combined(negative_tweets)
clean_positive = combined(positive_tweets)
compute_results(clean_negative, clean_positive)

Precision: 0.7208192573056208
Recall: 0.901225
F1: 0.8009896551704984
Accuracy: 0.776085625


As we can see from the results, nothing really seems to improve the classification. We can conclude that the classifier is already well trained for the tweets in the test set. This is somewhat expected since it was trained on social media texts. When combining all approaches we even get the worst result of all. However, precision is only lowered for about three percent, which shows that the changes don't really have a high impact.