Student Name: SYAM IMMANUEL PAUL BONDADA

Student Number: 230853737

# NLP Assignment 1 (40% of grade): Sentiment Analysis from Tweets

This coursework will involve you implementing functions for a text classifier, which you will train to identify the **sentiment expressed in a text** in a dataset of approx. 27,000 entries, which will be split into a 80%/20% training/test split. 

In this template you are given the basis for that implementation, though some of the functions are missing, which you have to fill in.

Follow the instructions file **NLP_Assignment_1_Instructions.pdf** for details of each question - the outline of what needs to be achieved for each question is as below.

You must submit all **ipython notebooks and extra resources you need to run the code if you've added them** in the code submission, and a **2 page report (pdf)** in the report submission on QMPlus where you report your methods and findings according to the instructions file for each question.

In [1]:
import warnings
warnings.filterwarnings('ignore')
import csv                               # csv reader
from sklearn.svm import LinearSVC
from nltk.classify import SklearnClassifier
from sklearn.pipeline import Pipeline
from sklearn.metrics import precision_recall_fscore_support # to report on precision and recall
from sklearn.metrics import classification_report
from sklearn.model_selection import StratifiedKFold
import numpy as np

In [2]:
def load_data(path):
    """Load data from a tab-separated file and append it to raw_data."""
    with open(path) as f:
        reader = csv.reader(f, delimiter='\t')
        for line in reader:
            if line[0] == "Id":  # skip header
                continue
            (label, text) = parse_data_line(line)
            raw_data.append((text, label))

def split_and_preprocess_data(percentage):
    """Split the data between train_data and test_data according to the percentage
    and performs the preprocessing."""
    num_samples = len(raw_data)
    num_training_samples = int((percentage * num_samples))
    for (text, label) in raw_data[:num_training_samples]:
        train_data.append((to_feature_vector(pre_process(text)),label))
    for (text, label) in raw_data[num_training_samples:]:
        test_data.append((to_feature_vector(pre_process(text)),label))

# Question 1: Input and Basic preprocessing (10 marks)

The parse_data_line function extracts label and text information from a data line, preparing it for further processing. It uses regular expressions to separate punctuation at the beginning and end of words, ensuring punctuation marks are treated as separate tokens during tokenization. The function returns a tuple containing the label and text.

The pre_process function further prepares text for analysis. It utilizes regular expressions to separate punctuation, tokenizes the text using whitespace, and normalizes tokens to lowercase. The result is a list of pre-processed tokens ready for natural language processing tasks.


In [3]:
def parse_data_line(data_line):
    # Should return a tuple of the label as just FAKE or REAL and the statement
    # e.g. (label, statement)
    label = data_line[1]
    text = data_line[2]
    pre_process(text)
    return label, text

In [4]:
import re
# Input: a string of one statement
def pre_process(text):
    # Should return a list of tokens
    # DESCRIBE YOUR METHOD IN WORDS
    text = re.sub(r"(\w)([.,;:!?'\"”\)])", r"\1 \2", text) # separates punctuation at ends of strings
    text = re.sub(r"([.,;:!?'\"“\(\)])(\w)", r"\1 \2", text) # separates punctuation at beginning of strings
    # print("tokenising:", text) # uncomment for debugging
    tokens = re.split(r"\s+",text)
    # normalisation - only by lower casing for now
    tokens = [t.lower() for t in tokens]
    return tokens

# Question 2: Basic Feature Extraction (20 marks)

The function initializes an empty dictionary called feature_vector that will store the features (tokens) and their corresponding weights (occurrence counts). It then iterates through each token in the input list of tokens. For each token, it updates the feature_vector by counting the occurrences of that token. The get method is used to retrieve the current count of the token. If the token is not in the dictionary, it defaults to 0, and 1 is added to it. 


Additionally, the function updates a global feature dictionary (global_feature_dict) with the counts of each token across all feature vectors. This dictionary is global, meaning it accumulates counts across different calls to the to_feature_vector function.


In [5]:
global_feature_dict = {} # A global dictionary of features

def to_feature_vector(tokens):
    # Should return a dictionary containing features as keys, and weights as values
    # DESCRIBE YOUR METHOD IN WORDS
    feature_vector = {}

    for token in tokens:
        # Update the feature vector by counting the occurrences of each token
        feature_vector[token] = feature_vector.get(token, 0) + 1

        # Update the global feature dictionary
        global_feature_dict[token] = global_feature_dict.get(token, 0) + 1
   
    return feature_vector

The function begins by printing a message to indicate that the training of the classifier is starting.

It creates a pipeline using the Pipeline class from scikit-learn. The pipeline is a way to streamline a lot of the routine processes, and in this case, it consists of a single step, which is a Linear Support Vector Classifier (LinearSVC).

In [6]:
# TRAINING AND VALIDATING OUR CLASSIFIER

def train_classifier(data):
    print("Training Classifier...")
    pipeline =  Pipeline([('svc', LinearSVC())])
    return SklearnClassifier(pipeline).train(data)

# Question 3: Cross-validation (20 marks)

The cross_validate function is designed for k-fold cross-validation, a common technique in machine learning for assessing the performance of a model on a dataset. The purpose of cross-validation is to provide a more robust estimate of a model's performance by training and evaluating the model multiple times on different subsets of the data. Here's a detailed report on what the cross_validate function is doing:

1. Input Parameters:
dataset: The input dataset, which is assumed to be a list of samples where each sample is a tuple containing features and labels.
folds: The number of folds for cross-validation, determining how many subsets the dataset is divided into for training and testing.
2. Initialization:
results: A dictionary to store performance metrics (precision, recall, F1-score, and accuracy) for each fold.
fold_size: The size of each fold, calculated based on the length of the dataset and the specified number of folds.
3. Cross-Validation Loop:
The function uses a loop to iterate through each fold, selecting different subsets for training and testing in each iteration.
4. Train-Test Split:
For each iteration, the dataset is split into training and testing folds.
test_data_fold: The subset of the dataset used for testing the classifier.
train_data_fold: The remaining data used for training the classifier.
5. Classifier Training:
The classifier is trained using the train_classifier function on the training fold.
6. Classifier Testing:
The trained classifier is used to predict labels on the testing fold (test_samples).
The true labels (true_labels) are extracted from the testing fold.
7. Classifier Evaluation:
The performance of the classifier is evaluated using the classification_report function from scikit-learn, producing precision, recall, F1-score, and accuracy.
The results are stored in the fold_results dictionary.
8. Results Storage and Printing:
The fold_results dictionary, containing performance metrics for the current fold, is appended to the results dictionary.
A print statement displays the performance metrics for each fold.
9. Average Performance Calculation:
After all folds are processed, the function calculates the average performance metrics across all folds.
The average metrics are stored in the avg_results dictionary.
10. Return:
The function returns the avg_results dictionary containing the average precision, recall, F1-score, and accuracy across all folds.
Summary:
The cross_validate function facilitates the training and evaluation of a classifier using k-fold cross-validation. It provides a comprehensive assessment of the classifier's performance by reporting average metrics across all folds, helping to gauge the model's generalization ability on the given dataset.

In [7]:
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

def cross_validate(dataset, folds):
    results = {
        'precision': [],
        'recall': [],
        'f1-score': [],
        'accuracy': []
    }

    fold_size = int(len(dataset) / folds) + 1

    for i in range(0, len(dataset), fold_size):
        test_data_fold = dataset[i:i + fold_size]
        train_data_fold = dataset[:i] + dataset[i + fold_size:]

        # Train the classifier
        classifier = train_classifier(train_data_fold)

        # Test the classifier
        test_samples, true_labels = zip(*test_data_fold)
        predicted_labels = predict_labels(test_samples, classifier)

        # Evaluate the classifier
        report = classification_report(true_labels, predicted_labels, output_dict=True)

        # Store and print results for each fold
        fold_results = {
            'precision': report['weighted avg']['precision'],
            'recall': report['weighted avg']['recall'],
            'f1-score': report['weighted avg']['f1-score'],
            'accuracy': report['accuracy']
        }

        print(f"Fold {i}: {fold_results}")

        results['precision'].append(fold_results['precision'])
        results['recall'].append(fold_results['recall'])
        results['f1-score'].append(fold_results['f1-score'])
        results['accuracy'].append(fold_results['accuracy'])

        
    # Calculate average scores
    avg_results = {
        'precision': np.mean(results['precision']),
        'recall': np.mean(results['recall']),
        'f1-score': np.mean(results['f1-score']),
        'accuracy': np.mean(results['accuracy'])
    }

    return avg_results


In [8]:
# PREDICTING LABELS GIVEN A CLASSIFIER

def predict_labels(samples, classifier):
    """Assuming preprocessed samples, return their predicted labels from the classifier model."""
    return classifier.classify_many(samples)

def predict_label_from_raw(sample, classifier):
    """Assuming raw text, return its predicted label from the classifier model."""
    return classifier.classify(to_feature_vector(preProcess(reviewSample)))

In [9]:
# MAIN

# loading reviews
# initialize global lists that will be appended to by the methods below
raw_data = []          # the filtered data from the dataset file
train_data = []        # the pre-processed training data as a percentage of the total dataset
test_data = []         # the pre-processed test data as a percentage of the total dataset



# references to the data files
data_file_path = 'sentiment-dataset.tsv'

# Do the actual stuff (i.e. call the functions we've made)
# We parse the dataset and put it in a raw data list
print("Now %d rawData, %d trainData, %d testData" % (len(raw_data), len(train_data), len(test_data)),
      "Preparing the dataset...",sep='\n')

load_data(data_file_path) 

# We split the raw dataset into a set of training data and a set of test data (80/20)
# You do the cross validation on the 80% (training data)
# We print the number of training samples and the number of features before the split
print("Now %d rawData, %d trainData, %d testData" % (len(raw_data), len(train_data), len(test_data)),
      "Preparing training and test data...",sep='\n')

split_and_preprocess_data(0.8)

# We print the number of training samples and the number of features after the split
print("After split, %d rawData, %d trainData, %d testData" % (len(raw_data), len(train_data), len(test_data)),
      "Training Samples: ", len(train_data), "Features: ", len(global_feature_dict), sep='\n')




Now 0 rawData, 0 trainData, 0 testData
Preparing the dataset...
Now 33540 rawData, 0 trainData, 0 testData
Preparing training and test data...
After split, 33540 rawData, 26832 trainData, 6708 testData
Training Samples: 
26832
Features: 
64640


In [10]:
cross_validate(train_data, 10)  # will work and output overall performance of p, r, f-score when cv implemented

Training Classifier...
Fold 0: {'precision': 0.8556451869824183, 'recall': 0.8580476900149031, 'f1-score': 0.8561527728398788, 'accuracy': 0.8580476900149031}
Training Classifier...
Fold 2684: {'precision': 0.8537736021083886, 'recall': 0.8539493293591655, 'f1-score': 0.8538608827002553, 'accuracy': 0.8539493293591655}
Training Classifier...
Fold 5368: {'precision': 0.8092523177051866, 'recall': 0.8088673621460507, 'f1-score': 0.8090459372479645, 'accuracy': 0.8088673621460507}
Training Classifier...
Fold 8052: {'precision': 0.8455935833804838, 'recall': 0.8450074515648286, 'f1-score': 0.8449050295777984, 'accuracy': 0.8450074515648286}
Training Classifier...
Fold 10736: {'precision': 0.8457964548383669, 'recall': 0.8472429210134128, 'f1-score': 0.8462251370530897, 'accuracy': 0.8472429210134128}
Training Classifier...
Fold 13420: {'precision': 0.8661026134928785, 'recall': 0.8673621460506706, 'f1-score': 0.8661904275725697, 'accuracy': 0.8673621460506706}
Training Classifier...
Fold 1

{'precision': 0.8496823305968872,
 'recall': 0.8506290947406878,
 'f1-score': 0.8498381112413093,
 'accuracy': 0.8506290947406878}

# Question 4: Error Analysis (20 marks)

confusion_matrix_heatmap Function:
The confusion_matrix_heatmap function is designed to create a visually appealing and informative heatmap representation of a confusion matrix, a common evaluation metric in classification problems. Here's a detailed report on what this function does:

Confusion Matrix Calculation:
The metrics.confusion_matrix function from scikit-learn is used to calculate the confusion matrix based on the true labels (y_test) and predicted labels (preds).
The confusion matrix represents the counts of true positive, true negative, false positive, and false negative predictions.

Plotting the Heatmap:
A matplotlib figure is created with a specified size (figsize).
The confusion matrix is visualized as a heatmap using the matshow function, with the color intensity representing the count of each class combination.

cross_validate_with_error_analysis Function:
The cross_validate_with_error_analysis function combines the training, testing, and evaluation of a classifier in a k-fold cross-validation setup. Additionally, it performs error analysis by displaying a confusion matrix heatmap and printing false positives and false negatives for the positive label. Here's a detailed report on what this function does:

Error Analysis - False Positives and False Negatives:
False positives and false negatives for the positive label are identified and printed for each fold.

The confusion_matrix_heatmap function provides a clear visualization of the classifier's performance through a heatmap representation of the confusion matrix. The cross_validate_with_error_analysis function extends the analysis by incorporating k-fold cross-validation, providing a more robust assessment of the classifier's generalization performance and including error analysis for the positive label. Together, these functions support comprehensive evaluation and diagnostic analysis of a text classification model.


In [11]:
from sklearn import metrics
import matplotlib.pyplot as plt
# a function to make the confusion matrix readable and pretty
def confusion_matrix_heatmap(y_test, preds, labels):
    """Function to plot a confusion matrix"""
    cm = metrics.confusion_matrix(y_test, preds, labels=labels)
    fig = plt.figure(figsize=(10, 10))
    ax = fig.add_subplot(111)
    cax = ax.matshow(cm)
    plt.title('Confusion matrix of the classifier')
    fig.colorbar(cax)
    ax.set_xticks(np.arange(len(labels)))
    ax.set_yticks(np.arange(len(labels)))
    ax.set_xticklabels(labels, rotation=45)
    ax.set_yticklabels(labels)

    for i in range(len(cm)):
        for j in range(len(cm)):
            text = ax.text(j, i, cm[i, j],
                           ha="center", va="center", color="w")

    plt.xlabel('Predicted')
    plt.ylabel('True')

    # fix for mpl bug that cuts off top/bottom of seaborn viz:
    b, t = plt.ylim()  # discover the values for bottom and top
    b += 0.5  # Add 0.5 to the bottom
    t -= 0.5  # Subtract 0.5 from the top
    plt.ylim(b, t)  # update the ylim(bottom, top) values
    plt.show()  # ta-da!

In [12]:
from sklearn.metrics import confusion_matrix, classification_report
import numpy as np

def cross_validate_x(dataset, folds):
    results = {
        'precision': [],
        'recall': [],
        'f1-score': [],
        'accuracy': []
    }

    fold_size = int(len(dataset) / folds) + 1

    for i in range(0, len(dataset), fold_size):
        test_data_fold = dataset[i:i + fold_size]
        train_data_fold = dataset[:i] + dataset[i + fold_size:]

        # Train the classifier
        classifier = train_classifier(train_data_fold)

        # Test the classifier
        test_samples, true_labels = zip(*test_data_fold)
        predicted_labels = predict_labels(test_samples, classifier)

        # Evaluate the classifier
        report = classification_report(true_labels, predicted_labels, output_dict=True)

        # Store and print results for each fold
        fold_results = {
            'precision': report['weighted avg']['precision'],
            'recall': report['weighted avg']['recall'],
            'f1-score': report['weighted avg']['f1-score'],
            'accuracy': report['accuracy']
        }

        print(f"Fold {i}: {fold_results}")

        results['precision'].append(fold_results['precision'])
        results['recall'].append(fold_results['recall'])
        results['f1-score'].append(fold_results['f1-score'])
        results['accuracy'].append(fold_results['accuracy'])
        
        # Error analysis - Print false positives and false negatives for the positive label
        false_positives = [(sample, true_label, predicted_label) for sample, true_label, predicted_label in zip(test_samples, true_labels, predicted_labels) if true_label == 'positive' and predicted_label == 'negative']
        false_negatives = [(sample, true_label, predicted_label) for sample, true_label, predicted_label in zip(test_samples, true_labels, predicted_labels) if true_label == 'positive' and predicted_label == 'positive']
        print("\nFalse Positives:", len(false_positives))
        print("False Negatives:", len(false_negatives), "\n")
        
        # Print false positives and false negatives
        print("\nFalse Positives:")
        for sample, true_label, predicted_label in false_positives:
            print(f"Sample: {sample}, True Label: {true_label}, Predicted Label: {predicted_label}")

        print("\nFalse Negatives:")
        for sample, true_label, predicted_label in false_negatives:
            print(f"Sample: {sample}, True Label: {true_label}, Predicted Label: {predicted_label}")


        
    # Calculate average scores
    avg_results = {
        'precision': np.mean(results['precision']),
        'recall': np.mean(results['recall']),
        'f1-score': np.mean(results['f1-score']),
        'accuracy': np.mean(results['accuracy'])
    }

    return avg_results


In [13]:
cross_validate_x(train_data, 10)

Training Classifier...
Fold 0: {'precision': 0.8556451869824183, 'recall': 0.8580476900149031, 'f1-score': 0.8561527728398788, 'accuracy': 0.8580476900149031}

False Positives: 155
False Negatives: 1711 


False Positives:
Sample: {'@jacknonce': 1, 'sorry': 1, ',': 3, 'the': 1, 'blacks': 1, 'i': 1, 'know': 1, 'are': 2, 'actually': 1, 'great': 1, 'successful': 1, 'people': 1, '.': 2, 'they': 2, 'aren': 1, "'": 1, 't': 1, 'taken': 1, 'in': 1, 'by': 1, 'dems': 1, 'bs': 1, 'very': 1, 'smart': 1, 'educated': 1, '!': 1}, True Label: positive, Predicted Label: negative
Sample: {'i': 1, 'have': 1, 'heard': 1, 'it': 1, 'said': 1, 'that': 1, 'the': 1, '2nd': 1, 'g': 1, 'in': 1, 'snoop': 1, 'dogg': 1, 'represents': 1, 'extra': 1, 'gangsterishness': 1, ',': 1, 'for': 1, 'how': 1, 'else': 1, 'do': 1, 'you': 1, 'explain': 1, 'this': 1, 'particular': 1, 'redundancy': 1, '?': 1}, True Label: positive, Predicted Label: negative
Sample: {'what': 1, 'i': 1, 'love': 1, 'most': 1, 'about': 1, 'this': 1, 'i

Fold 2684: {'precision': 0.8537736021083886, 'recall': 0.8539493293591655, 'f1-score': 0.8538608827002553, 'accuracy': 0.8539493293591655}

False Positives: 195
False Negatives: 1896 


False Positives:
Sample: {'@michaela13181': 1, 'i': 2, 'shall': 1, 'be': 2, 'there': 1, 'saturday': 1, ',': 1, 'prepared': 1, 'for': 1, 'me': 1, 'to': 1, 'give': 1, 'lap': 1, 'dances': 1, '.': 1, 'will': 1, 'expect': 1, 'a': 1, 'chair': 1, 'in': 1, 'the': 2, 'middle': 1, 'of': 1, 'living': 1, 'room': 1, '..': 1}, True Label: positive, Predicted Label: negative
Sample: {'no': 2, 'white': 1, 't': 1, "'": 1, 's': 1, ',': 1, 'ball': 1, 'cap': 1, '.': 2, '$500': 1, 'cash': 1, 'to': 1, 'the': 2, 'women': 1, 'in': 1, 'sexiest': 1, 'red': 2, 'dress': 1, '...': 1, 'raffles': 1, 'for': 1, 'moet': 1, 'rose': 1, '&': 1, 'a': 1, 'pair': 1, 'of': 1, 'bottoms': 1, 'september': 1, '1st': 1}, True Label: positive, Predicted Label: negative
Sample: {'it': 1, "'": 2, 's': 1, 'funny': 1, 'how': 1, 'i': 1, 'm': 1, 'like': 1

Fold 5368: {'precision': 0.8092523177051866, 'recall': 0.8088673621460507, 'f1-score': 0.8090459372479645, 'accuracy': 0.8088673621460507}

False Positives: 262
False Negatives: 1377 


False Positives:
Sample: {'seriously': 1, 'nothing': 1, 'better': 1, 'than': 1, 'playing': 1, 'under': 1, 'the': 1, 'friday': 1, 'night': 1, 'lights': 1}, True Label: positive, Predicted Label: negative
Sample: {'it': 1, 'may': 1, 'not': 1, 'be': 1, 'the': 2, 'right': 1, 'fit': 1, 'for': 1, 'everyone': 1, ',': 2, 'but': 1, 'prep': 1, 'therapy': 1, 'can': 1, 'prevent': 1, 'spread': 1, 'of': 1, 'hiv': 1, 'read': 1, 'here': 1, ':': 1, 'http': 1, '://t': 1, '.': 1, 'co/fopwpx3rvz': 1, '#hivprevention': 1}, True Label: positive, Predicted Label: negative
Sample: {'there': 1, "'": 1, 's': 1, 'something': 1, 'about': 2, 'friday': 1, 'night': 1, 'lights': 1, ',': 1, 'you': 1, 'just': 1, 'get': 1, 'chills': 1, 'thinking': 1, 'it': 1, '.': 1}, True Label: positive, Predicted Label: negative
Sample: {'3rd': 1, 'ha

Fold 8052: {'precision': 0.8455935833804838, 'recall': 0.8450074515648286, 'f1-score': 0.8449050295777984, 'accuracy': 0.8450074515648286}

False Positives: 179
False Negatives: 1175 


False Positives:
Sample: {'@gas8128': 1, 'because': 1, 'of': 1, 'obamacare': 1, 'my': 1, 'mother': 1, 'was': 1, 'able': 1, 'to': 1, 'have': 1, 'health': 1, 'insurance': 1, 'for': 1, 'the': 1, 'first': 1, 'time': 1, 'in': 1, 'about': 1, '15': 1, 'years': 1, '.': 1}, True Label: positive, Predicted Label: negative
Sample: {'i': 2, 'think': 1, 'have': 1, 'spent': 1, 'quite': 1, 'enough': 1, 'on': 1, 'persona': 1, '5': 1, '.': 1}, True Label: positive, Predicted Label: negative
Sample: {'@parismarx': 1, 'its': 1, 'about': 1, 'automation': 1, '.': 4, 'e': 1, 'g': 1, 'i': 1, 'keep': 1, 'meeting': 1, 'drivers': 1, 'who': 1, 'have': 1, 'never': 1, 'heard': 1, 'of': 1, 'self': 1, 'driving': 1, 'cars': 1, 'the': 1, 'silver': 1, 'collar': 1, 'worker': 1, 'is': 1, 'coming': 1, '!': 1}, True Label: positive, Predict

Fold 10736: {'precision': 0.8457964548383669, 'recall': 0.8472429210134128, 'f1-score': 0.8462251370530897, 'accuracy': 0.8472429210134128}

False Positives: 181
False Negatives: 1582 


False Positives:
Sample: {'friday': 1, 'is': 1, 'significant': 1, 'and': 1, 'more': 1, 'beneficial': 1, 'than': 1, 'any': 1, 'other': 1, 'day': 2, 'of': 1, 'the': 1, 'week': 1, 'for': 1, 'muslims': 2, '.': 1, 'this': 1, 'gather': 1, 'together': 1, 'to': 1, 'pray': 1, 'in': 1, 'congregatin': 1}, True Label: positive, Predicted Label: negative
Sample: {'studs': 1, 'terkel': 1, 'to': 1, 'bob': 1, 'dylan': 1, 'in': 1, '1963': 1, 'on': 1, 'radio': 1, ',': 1, 'about': 1, 'song': 1, '"': 3, 'hard': 1, 'rain': 1, 'gonna': 2, 'fall': 1, '":': 1, 'i': 1, 'think': 1, 'that': 1, 'one': 1, "'": 1, 's': 1, 'be': 1, 'a': 1, 'classic': 1}, True Label: positive, Predicted Label: negative
Sample: {'i': 1, 'think': 1, 'after': 1, 'tonight': 1, ',': 2, 'we': 1, 'must': 1, 'add': 1, 'a': 1, '4th': 1, 'dark': 1, 'horse': 1,

Fold 13420: {'precision': 0.8661026134928785, 'recall': 0.8673621460506706, 'f1-score': 0.8661904275725697, 'accuracy': 0.8673621460506706}

False Positives: 146
False Negatives: 1607 


False Positives:
Sample: {'mushu': 1, 'at': 1, 'the': 1, 'hurricane': 1, 'wearing': 1, 'tressa': 1, "'": 1, 's': 1, 'rockstar': 1, 'bday': 1, 'foam': 1, 'guitar': 1, 'after': 1, 'our': 1, 'show': 1, 'last': 1, 'saturday': 1, '.': 2, 'we': 1, 'killed': 1, 'it': 1, 'http': 1, '://t': 1, 'co/hs178hfp': 1}, True Label: positive, Predicted Label: negative
Sample: {'feeling': 1, 'frisky': 1, '?': 1, 'thursday': 1, 'is': 1, 'national': 1, 'hot': 1, 'dog': 2, 'day': 1, '.': 3, 'we': 1, 'have': 1, 'a': 1, 'to': 1, 'fit': 1, 'whatever': 1, 'mood': 1, 'you': 1, "'": 1, 're': 1, 'in': 1, 'put': 1, 'your': 1, '...': 1, 'http': 1, '://t': 1, 'co/rqdj9cipo1': 1}, True Label: positive, Predicted Label: negative
Sample: {'@timwelcomed': 1, 'since': 1, 'there': 1, 'is': 1, 'a': 2, 'wwe': 1, 'games': 1, 'group': 1, 'now'

Fold 16104: {'precision': 0.8661591924155752, 'recall': 0.8673621460506706, 'f1-score': 0.8665035803162581, 'accuracy': 0.8673621460506706}

False Positives: 156
False Negatives: 1630 


False Positives:
Sample: {'caitlyn': 1, 'jenner': 1, 'dons': 1, 'shift': 1, 'dress': 1, 'with': 1, 'a': 1, 'leather': 1, 'jacket': 1, 'for': 1, 'kylie': 1, "'": 1, 's': 1, '18th': 1, 'birthday': 1, 'http': 1, '://t': 1, '.': 1, 'co/3bmpuo0f8g': 1}, True Label: positive, Predicted Label: negative
Sample: {'@beanbagboy': 1, 'you': 1, 'and': 1, 'me': 1, 'both': 1, '.': 1, 'was': 1, 'chatting': 1, 'to': 1, '@olwenhoff': 1, 'about': 1, '4th': 1, 'wall': 1, 'breaking': 1, 'after': 1, 'ant-man': 1}, True Label: positive, Predicted Label: negative
Sample: {'@edgarntege': 1, 'i': 2, 'am': 1, 'waiting': 1, 'for': 1, 'that': 1, 'dubai': 1, 'festival': 1, 'in': 1, 'oct': 1, ',': 1, 'want': 1, 'u': 1, 'to': 1, 'come': 1, 'back': 1, 'with': 1, 'also': 1, 'the': 2, 'galaxy': 1, 'note': 1, '5': 1, 'wen': 1, 't': 1, 'c

Fold 18788: {'precision': 0.8473423987310903, 'recall': 0.8483606557377049, 'f1-score': 0.8477442403590995, 'accuracy': 0.8483606557377049}

False Positives: 189
False Negatives: 1587 


False Positives:
Sample: {'just': 1, '...': 1, 'when': 1, '?': 1, 'i': 1, 'died': 1, '#onedirection': 1, '#mtvstarsniallhoran': 1, '#3yearsofmidnightmemories': 1, '#niallonamas': 1, 'https': 1, '://t': 1, '.': 1, 'co/ekttvitpta': 1}, True Label: positive, Predicted Label: negative
Sample: {'egypt&#039': 1, ';': 1, 's': 1, 'president': 1, 'mohamed': 1, 'morsi': 1, 'on': 1, 'monday': 1, 'pardoned': 1, 'all': 1, 'those': 1, 'arrested': 1, 'between': 1, 'the': 2, 'start': 1, 'of': 1, 'revolution': 1, 't': 1, '...': 1, 'http': 1, '://t': 1, '.': 1, 'co/emfvhotu': 1}, True Label: positive, Predicted Label: negative
Sample: {'my': 1, 'miami': 1, 'heat': 1, 'are': 1, 'going': 1, 'to': 1, 'demolish': 1, 'the': 1, 'new': 2, 'york': 2, 'knicks': 2, 'tomorrow': 1, '.': 2, 'i': 1, 'am': 1, 'from': 1, 'but': 1, 'sor

Fold 21472: {'precision': 0.8479233285941491, 'recall': 0.849478390461997, 'f1-score': 0.8483519378938857, 'accuracy': 0.849478390461997}

False Positives: 176
False Negatives: 1598 


False Positives:
Sample: {'tom': 1, 'brady': 1, '.': 3, 'the': 2, 'only': 1, 'man': 1, 'with': 1, 'class': 1, 'and': 1, 'integrity': 1, 'throughout': 1, '#framegate': 1, 'saga': 1, 'https': 1, '://t': 1, 'co/pkkveftiiz': 1}, True Label: positive, Predicted Label: negative
Sample: {'i': 1, 'think': 1, '$msft': 1, 'might': 1, "'": 2, 've': 1, 'finally': 1, '"': 2, 'got': 1, 'it': 1, 'with': 1, 'windows': 1, '10': 1, '...': 1, 'or': 1, 'at': 1, 'least': 1, 'may': 1, 'be': 1, 'getting': 1, 'closer-it': 1, 'ain': 1, 't': 1, 'mad': 2, 'me': 1, 'yet': 1, 'lol': 1}, True Label: positive, Predicted Label: negative
Sample: {'sat': 1, 'in': 1, 'the': 1, 'hard': 1, 'rock': 1, 'cafe': 1, ',': 3, 'with': 1, 'my': 1, 'new': 1, 'star': 1, 'wars': 1, 't-shirt': 1, 'on': 1, 'and': 1, 'apparently': 1, 'i': 1, 'just': 1, 'm

Fold 24156: {'precision': 0.8592346277203355, 'recall': 0.8606128550074739, 'f1-score': 0.859401166852293, 'accuracy': 0.8606128550074739}

False Positives: 155
False Negatives: 1592 


False Positives:
Sample: {'@skinnyz0mbie': 1, 'maby': 1, 'cuz': 1, 'u': 3, 'are': 2, 'sooo': 1, 'good': 1, 'at': 1, 'bowling': 1, 'and': 1, 'they': 2, 'now': 1, 'scared': 1, 'from': 2, ':': 1, 'p': 1, 'got': 1, 'sick': 1, 'taking': 1, 'the': 1, '1st': 1, 'place': 1, 'everytime': 1}, True Label: positive, Predicted Label: negative
Sample: {'i': 1, 'managed': 1, 'to': 1, 'go': 1, 'through': 1, 'the': 1, 'entire': 1, 'weekend': 1, ',': 3, 'including': 1, 'labor': 1, 'day': 1, 'and': 2, 'tuesday': 1, 'without': 1, 'seeing': 1, 'any': 1, '#cornhusker': 1, 'highlights': 1, '.': 2, '@pti': 1, 'ruined': 1, 'it': 1}, True Label: positive, Predicted Label: negative
Sample: {'@_tomcc': 1, 'hey': 1, 'tommaso': 1, 'when': 1, 'is': 2, 'the': 3, 'release': 1, 'date': 1, 'for': 1, '0': 1, '.': 1, '12': 1, 'on': 1, 'ios

{'precision': 0.8496823305968872,
 'recall': 0.8506290947406878,
 'f1-score': 0.8498381112413093,
 'accuracy': 0.8506290947406878}