# Soft Replication of Hemker (2018)

The goal of this notebook is to follow the methodology explained in Hemker (2018) to perform a replication of his results. Note that the source code is not available, rendering this task a bit harder.

### Data Retrieval

In [168]:
# Source: Davidson et al. (2017)

import pandas as pd
import numpy as np

df = pd.read_csv("./data/labeled_data.csv", index_col=0)
raw_tweets = df.tweet
raw_labels = df["class"].values

In [169]:
df.head()

Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...


## Data Preprocessing
---

### Data Cleaning

In [170]:
# Source: Davidson et al. (2017)

import re
import html
from string import punctuation

def preprocess(text_string):
    
    # Casing should not make a difference in our case
    text_string = text_string.lower()
    
    # Regex
    html_pattern = r'(&(?:\#(?:(?:[0-9]+)|[Xx](?:[0-9A-Fa-f]+))|(?:[A-Za-z0-9]+));)'    
    space_pattern = '\s+'
    giant_url_regex = ('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|'
        '[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
    mention_regex = '@[\w\-]+'
    hashtag_regex = '#[\w\-]+'
    
    # First, add space surrounding HTML entities
    text_string = re.sub(html_pattern, r' \1 ', text_string)
    
    # Now, if we wish to find hashtags, we have to unescape HTML entities
    text_string = html.unescape(text_string)
    
    # From Udacity TV script generation project
    # Replace some punctuation by dedicated tokens
    symbol_to_token = {
        '.' : '||Period||',
        ',' : '||Comma||',
        '"' : '||Quotation_Mark||',
        ';' : '||Semicolon||',
        '!' : '||Exclamation_Mark||',
        '?' : '||Question_Mark||',
        '(' : '||Left_Parenthesis||',
        ')' : '||Right_Parenthesis||',
        '-' : '||Dash||',
        '\n' : '||Return||'
    }
    
    # Next, find URLs
    text_string = re.sub(giant_url_regex, ' URLHERE ', text_string)
    
    # Then, tokenize punctuation
    for key, token in symbol_to_token.items():
        text_string = text_string.replace(key, ' {} '.format(token))

    # Finally, remove spaces and find mentions and hashtags
    text_string = re.sub(hashtag_regex, ' HASHTAGHERE ', text_string)
    text_string = re.sub(mention_regex, ' MENTIONHERE ', text_string)
    text_string = re.sub(space_pattern, ' ', text_string)
    
    return text_string

def _test_preprocess():
    
    assert " HASHTAGHERE " == preprocess("#iam1hashtag")
    assert " URLHERE " == preprocess("https://seminar.minerva.kgi.edu")
    assert " MENTIONHERE " == preprocess("@vinimiranda")
    assert ' ' == preprocess("        ")
    assert " & MENTIONHERE URLHERE HASHTAGHERE " == \
        preprocess("&amp;@vinimiranda    https://seminar.minerva.kgi.edu     #minerva    ")
    
_test_preprocess()

print("Example of a raw tweet:\n{}".format(raw_tweets[68]))
print("\nIts cleaned version is:\n{}".format(preprocess(raw_tweets[68])))

Example of a raw tweet:
"@Almightywayne__: @JetsAndASwisher @Gook____ bitch fuck u http://t.co/pXmGA68NC1" maybe you'll get better. Just http://t.co/TPreVwfq0S

Its cleaned version is:
 ||Quotation_Mark|| MENTIONHERE : MENTIONHERE MENTIONHERE bitch fuck u URLHERE ||Quotation_Mark|| maybe you'll get better ||Period|| just URLHERE 


In [171]:
tweets = raw_tweets.map(preprocess)

### Sentiment Analysis

In [172]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer as VS

sentiment_analyzer = VS()

# Example
sentiment_analyzer.polarity_scores(tweets[68])

{'neg': 0.329, 'neu': 0.541, 'pos': 0.131, 'compound': -0.6597}

### Checking for outliers

In [173]:
# Get cleaned tweets
df["clean_tweet"] = tweets

# Get their word count
df["word_count"] = df.clean_tweet.apply(lambda x : len(x.split()))

df.word_count.describe()

count    24783.000000
mean        16.729936
std          8.445555
min          1.000000
25%         10.000000
50%         16.000000
75%         23.000000
max         95.000000
Name: word_count, dtype: float64

In [174]:
# Check tweets with the minimum word count
df.loc[df.word_count == df.word_count.min(),]

Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet,clean_tweet,word_count
821,3,0,0,3,2,#Yankees,HASHTAGHERE,1
24147,3,0,3,0,1,bitches,bitches,1
24218,3,3,0,0,0,coons,coons,1
24869,3,0,3,0,1,pussy,pussy,1


Looks good. Let's check the tweet(s) with the maximum word count.

In [175]:
# Check tweets with the maximum length
df.loc[df.word_count == df.word_count.max(),]

Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet,clean_tweet,word_count
22953,3,0,0,3,2,Was finna slit my eyebrows up in the shop but ...,was finna slit my eyebrows up in the shop but ...,95


There's something strange going on. Let's check the tweet again.

In [176]:
df.loc[df.word_count == df.word_count.max(),].tweet.values

array(['Was finna slit my eyebrows up in the shop but nahhhhhh.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.\r\n.'],
      dtype=object)

The tweet contains a lot of new lines. It's hard to know why, but I'll choose to remove them.

In [177]:
old_tweet = df.loc[df.word_count == df.word_count.max(),].tweet.values[0]
new_tweet = old_tweet[:old_tweet.find("\r")]
df.loc[df.word_count == df.word_count.max(), "tweet"] = new_tweet
df.loc[df.word_count == df.word_count.max(), "clean_tweet"] = preprocess(new_tweet)
df.loc[df.word_count == df.word_count.max(), "word_count"] = len(preprocess(new_tweet).split())

Let's check it again.

In [178]:
# Check tweets with the maximum length
df.loc[df.word_count == df.word_count.max(),]

Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet,clean_tweet,word_count
18267,3,0,3,0,1,RT @TrxllLegend: One good girl is worth a thou...,rt MENTIONHERE : one good girl is worth a thou...,91


In [179]:
df.loc[df.word_count == df.word_count.max(),].clean_tweet.values[0]

'rt MENTIONHERE : one good girl is worth a thousand bitches ||Return|| ||Return|| 👰 = 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 👭 … '

Sighes. Well, format-wise it is okay.

### Lookup Tables



In [180]:
# From Udacity script project

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: Tweets
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    
    # Generate vocabulary
    vocab = set()
    tweets.str.split().apply(vocab.update)
    vocab.update(set(SPECIAL_WORDS.values()))
    
    # Generate lookup tables
    vocab_to_int = {word : ii for ii, word in enumerate(vocab, 1)}    
    int_to_vocab = {ii : word for word, ii in vocab_to_int.items()}
    
    # Add padding special word
    vocab_to_int['<PAD>'] = 0
    int_to_vocab[0] = '<PAD>'
    
    # return tuple
    return (vocab_to_int, int_to_vocab)

def _test_lookup_tables():
    
    text = np.array(["this is a toy", "I mean not really a toy", "I mean a toy vocabulary"])
    vocab_to_int, int_to_vocab = create_lookup_tables(text)
    
    # Make sure the dicts make the same lookup
    missmatches = [(word, id, id, int_to_vocab[id]) for word, id in vocab_to_int.items() if int_to_vocab[id] != word]
    
    assert not missmatches,\
        'Found {} missmatche(s). First missmatch: vocab_to_int[{}] = {} and int_to_vocab[{}] = {}'.format(len(missmatches),
                                                                                                          *missmatches[0])
    
_test_lookup_tables()

In [181]:
vocab_to_int, int_to_vocab = create_lookup_tables(tweets)

In [182]:
print("The size of the vocabulary is: {} tokens.".format(len(vocab_to_int)))
vocab = list(vocab_to_int.keys())
np.random.shuffle(vocab)
print("These are 10 randomly sample words in the vocabulary:\n{}".format(vocab[:10]))
del vocab

The size of the vocabulary is: 21134 tokens.
These are 10 randomly sample words in the vocabulary:
['afl', 'seat', 'creatures', 'purnt', 'faggots', 'lyrical', 'aliens', 'suit', 'ne', 'larger']


###  Padding the Data

In [224]:
MAX_LENGTH = df.word_count.max()

def pad_tweets(tweet, max_length=MAX_LENGTH):
    # Do not cut tweet short if it's too long

    # Retrieve tweet word count
    word_count = len(tweet.split())
    
    # Check how much padding will be needed
    n = max_length - word_count if word_count < max_length else 0

    # Pad tweet
    padded_tweet = ''.join(['<PAD> '] * n + [tweet])
   
    return padded_tweet

def _test_pad_tweets():
    
    assert pad_tweets('hi', 0) == 'hi'
    assert pad_tweets('hi', 1) == 'hi'
    assert pad_tweets('hi', 2) == '<PAD> hi'
    assert len(pad_tweets('hi', 10).split()) == 10
    assert len(pad_tweets('hi', 100).split()) == 100
    assert pad_tweets('this sentence is a bit longer', 1) == 'this sentence is a bit longer'
    
_test_pad_tweets()

In [225]:
df["padded_tweets"] = df.clean_tweet.map(pad_tweets)
print(df.padded_tweets[10])

<PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD>  ||Quotation_Mark|| keeks is a bitch she curves everyone ||Quotation_Mark|| lol i walked into a conversation like this ||Period|| smh


### Tokenizing the Data

In [233]:
tweets_ints = np.array([[vocab_to_int[word] for word in tweet.split()] for tweet in df.padded_tweets.values])
print(tweets_ints[10])

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
  5903 18881  7952  5450 15483 11510 20674 12309  5903  1633 19264 14962
  9032  5450 10578 19895 14569  9270 21019]


### Hate Subclass Extraction

In [183]:
# Partly from https://stackoverflow.com/questions/31836058/nltk-named-entity-recognition-to-a-python-list
# I do not implement co-reference resolution since a single NE is sufficient for directed hate speech.
from nltk import sent_tokenize, word_tokenize, pos_tag, ne_chunk
from nltk.tokenize import SpaceTokenizer

def hate_classification(hate_tweet):
    '''Receives a hateful tweet. 
       Return 3 for directed hate speech and 4 otherwise.'''
    
    if bool(hate_tweet.count("MENTIONHERE")): return(3)
    
    # Remove tokens since they will oncused the POS tagger
    token_regex = '\|\|\w+\|\|'
    hate_tweet = re.sub(token_regex, "", hate_tweet)
    
    # URLHERE is considered a proper noun by the pos tagger.
    # Remove them before checking for proper nouns
    no_punct_hate = ''.join([char for char in hate_tweet if char not in punctuation])
    no_URL_hate = ' '.join([token for token in no_punct_hate.split() if token != "URLHERE"])
    has_NE = False
    for sent in sent_tokenize(no_URL_hate):
        for chunk in ne_chunk(pos_tag(word_tokenize(sent))):
            if hasattr(chunk, 'label'):
                return(3)  # Named Entity found    

    return(4)
        
def _test_hate_classification():
    assert hate_classification("MENTIONHERE") == 3
    assert hate_classification("Karen is absolutely crazy") == 3
    assert hate_classification("Karen is his sister. She's absolutely crazy") == 3
    assert hate_classification("They should all be sent to Mexico") == 3
    assert hate_classification("They should all leave the country") == 4
    assert hate_classification("some hate speech stuff") == 4
    assert hate_classification("") == 4

_test_hate_classification()

In [184]:
hate_tweets = tweets[df["class"] == 0].values
_hate_prnt = lambda x : "Generalized" if hate_classification(x) == 4 else "Directed"

print("Example of a hateful tweet: \n{}".format(hate_tweets[20]))
print("Its type of hate speech is: {}\n".format(_hate_prnt(hate_tweets[20])))

print("Example of a hateful tweet:\n{}".format(hate_tweets[10]))
print("Its type of hate speech is: {}\n".format(_hate_prnt(hate_tweets[10])))

Example of a hateful tweet: 
 ||Quotation_Mark|| we're out here ||Comma|| and we're queer ||Exclamation_Mark|| ||Quotation_Mark|| ||Return|| ||Quotation_Mark|| 2 ||Comma|| 4 ||Comma|| 6 ||Comma|| hut ||Exclamation_Mark|| we like it in our butt ||Exclamation_Mark|| ||Quotation_Mark|| 
Its type of hate speech is: Generalized

Example of a hateful tweet:
 ||Quotation_Mark|| MENTIONHERE : jackies a retard HASHTAGHERE ||Quotation_Mark|| at least i can make a grilled cheese ||Exclamation_Mark|| 
Its type of hate speech is: Directed



In [185]:
# Change hate speech labels (0) to directed (3) / generalized labels (4) 
labels = raw_labels.copy()
for i, (tweet, label) in enumerate(zip(tweets, raw_labels)):
    
    if label == 0:  # If hate speech
        labels[i] = hate_classification(tweet)

def _test_labels():
    assert 1 not in pd.Series(labels).value_counts().index
    assert 3 in pd.Series(labels).value_counts().index
    assert 4 in pd.Series(labels).value_counts().index

In [186]:
pd.Series(labels).value_counts()

1    19190
2     4163
3      954
4      476
dtype: int64

## Build the Neural Network
---
### Check Access to GPU

In [160]:
import torch

# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
    print('No GPU found. Please use a GPU to train your neural network.')

### Creating the Training, Validation, and Test Sets

all from Udacity script generator project

In [242]:
from sklearn.utils import shuffle

tweets_ints, labels = shuffle(tweets_ints, labels)
split_frac = 0.8

## split data into training, validation, and test data (features and labels, x and y)
split_idx = int(tweets_ints.shape[0]*split_frac)
train_x, remaining_x = tweets_ints[:split_idx], tweets_ints[split_idx:]
train_y, remaining_y = labels[:split_idx], labels[split_idx:]

test_idx = int(len(remaining_x)*0.5)
val_x, test_x = remaining_x[:test_idx], remaining_x[test_idx:]
val_y, test_y = remaining_y[:test_idx], remaining_y[test_idx:]

## print out the shapes of your resultant feature data
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape), 
      "\nValidation set: \t{}".format(val_x.shape),
      "\nTest set: \t\t{}".format(test_x.shape))

			Feature Shapes:
Train set: 		(19826, 91) 
Validation set: 	(2478, 91) 
Test set: 		(2479, 91)


In [240]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_x), torch.from_numpy(train_y))
valid_data = TensorDataset(torch.from_numpy(val_x), torch.from_numpy(val_y))
test_data = TensorDataset(torch.from_numpy(test_x), torch.from_numpy(test_y))

# dataloaders
batch_size = 64

# make sure the SHUFFLE your training data
train_loader = DataLoader(train_data, shuffle=True, batch_size=batch_size)
valid_loader = DataLoader(valid_data, shuffle=True, batch_size=batch_size)
test_loader = DataLoader(test_data, shuffle=True, batch_size=batch_size)

In [241]:
# obtain one batch of training data
dataiter = iter(train_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

Sample input size:  torch.Size([64, 91])
Sample input: 
 tensor([[    0,     0,     0,  ...,  7758,  1750, 19304],
        [    0,     0,     0,  ..., 10960,  5172,  6663],
        [    0,     0,     0,  ...,  6091,  3135, 21049],
        ...,
        [    0,     0,     0,  ...,  5611, 15878, 15048],
        [    0,     0,     0,  ..., 20384,  7678,  9270],
        [    0,     0,     0,  ...,  8381,  6820,  5578]], dtype=torch.int32)

Sample label size:  torch.Size([64])
Sample label: 
 tensor([1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 2,
        1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 3, 4, 2, 1, 1, 2, 2, 3, 3, 1, 2, 1, 1,
        1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1])


### Define the Architecture

In [274]:
import torch.nn as nn
import torch.nn.functional as F

class HateSpeechClassifier(nn.Module):

    def __init__(self, vocab_size, output_size, embedding_dim, cnn_params, pool_params,
                 hidden_dim, n_layers, dropout=0.5):
        """
        TO BE RESTATED
        Initialize the PyTorch RNN Module
        :param vocab_size: The number of input dimensions of the neural network (the size of the vocabulary)
        :param output_size: The number of output dimensions of the neural network
        :param embedding_dim: The size of embeddings, should you choose to use them
        :param cnn_params: A 4-element tuple containing the number 
            of feature maps, kernel size, stride and padding of a Conv1D layer. 
        :param pool_params: A 3-element tuple containing the kernel size, stride and padding of a MaxPool1D layer. 
        :param hidden_dim: The size of the hidden layer outputs
        :param dropout: dropout to add in between LSTM/GRU layers
        """
        super(HateSpeechClassifier, self).__init__()
       
        # set class variables
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        # define model layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        
        self.conv = nn.Conv1d(embedding_dim, *cnn_params)
        
        self.pool = nn.MaxPool1d(*pool_params)
        
        n_maps, _, _, _ = cnn_params
        self.lstm = nn.LSTM(n_maps, hidden_dim, n_layers, 
                            dropout=dropout, batch_first=True)
        
        self.dropout = nn.Dropout(0.2)
        
        self.fc = nn.Linear(hidden_dim, output_size)
    
    
    def forward(self, nn_input, hidden, test_print=False):
        """
        Forward propagation of the neural network
        :param nn_input: The input to the neural network
        :param hidden: The hidden state        
        :return: Two Tensors, the output of the neural network and the latest hidden state
        """
        # TODO: Implement function   
        batch_size = nn_input.size(0)

        # embeddings
        nn_input = nn_input.long()
        embeds = self.embedding(nn_input)
        
        # Change axes. embedding_dim (in_channels) should be in the middle
        # [batch_size, seq_length, embedding_dim] -> [batch_size, embedding_dim, seq_length]
        embeds_t = embeds.permute(0, 2, 1)
        
        # conv
        conv_out = self.conv(embeds_t)
        
        # pool
        pool_out = self.pool(F.relu(conv_out))
        
        # Change axes. lstm expects features to be the last channel
        # [batch_size, n_maps, down_sampled_seq] -> [batch_size, down_sampled_seq, n_maps]
        pool_out_t = pool_out.permute(0, 2, 1)
        
        # lstm
        lstm_out, hidden = self.lstm(pool_out_t, hidden)
    
        # stack up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        # dropout and fully-connected layer
        # out = self.dropout(lstm_out)
        fc_out = self.fc(lstm_out)
        
        # reshape to be batch_size first
        fc_out_t = fc_out.view(batch_size, -1, self.output_size)  
        
        out = fc_out_t[:, -1] # get last batch of labels
        
        if test_print:
            print("nn_input.\nexpected : [batch_size, seq_length].\nshape: {}\n".format(nn_input.shape))
            print("embeds.\nexpected : [batch_size, seq_length, embedding_dim].\nshape: {}\n".format(embeds.shape))
            print("embeds_t.\nexpected : [batch_size, embedding_dim, seq_length].\nshape: {}\n".format(embeds_t.shape))
            print("conv_out.\nexpected : [batch_size, n_maps, seq_length].\nshape: {}\n".format(conv_out.shape))
            print("pool_out.\nexpected : [batch_size, n_maps, down_sampled_seq].\nshape: {}\n".format(pool_out.shape))
            print("pool_out_t.\nexpected : [batch_size, down_sampled_seq, n_maps].\nshape: {}\n".format(pool_out_t.shape))
            print("lstm_out.\nexpected : [batch_size, down_sampled_seq, hidden_dim].\nshape: {}\n".format(lstm_out.shape))
            print("lstm_out.\nexpected : [batch_size * down_sampled_seq, hidden_dim].\nshape: {}\n".format(lstm_out.shape))
            print("fc_out.\nexpected : [batch_size * down_sampled_seq, output_dim].\nshape: {}\n".format(fc_out.shape))
            print("fc_out_t.\nexpected : [batch_size, down_sampled_seq, output_dim].\nshape: {}\n".format(fc_out_t.shape))
            print("out.\nexpected : [batch_size, output_dim].\nshape: {}\n".format(out.shape))
                  
        # return one batch of output word scores and the hidden state
        return out, hidden
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        :param batch_size: The batch_size of the hidden state
        :return: hidden state of dims (n_layers, batch_size, hidden_dim)
        '''
        # Implement function
        
        # initialize hidden state with zero weights, and move to GPU if available
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden


def _test_HateSpeechClassifier(test_print):
    batch_size = 20
    sequence_length = 14
    vocab_size = 25
    output_size= 4
    embedding_dim= 16
    hidden_dim = 12
    n_layers = 2
    cnn_params = (5, 3, 1, 1)
    pool_params = (2, 2, 0)
    
    # Initialize model
    test_classifier = HateSpeechClassifier(vocab_size, output_size, embedding_dim, 
                                           cnn_params, pool_params, hidden_dim, n_layers)
    
    # create test input
    X_npy = np.random.randint(vocab_size, size=(batch_size, sequence_length))
    X = torch.from_numpy(X_npy)
    
    # Move to GPU if available
    if(train_on_gpu):
        test_classifier.cuda()
        X = X.cuda()
    
    # Compute
    hidden = test_classifier.init_hidden(batch_size)
    out, hidden_out = test_classifier(X, hidden)
    
    # Test output and hidden state shapes
    assert out.shape == (batch_size, output_size)
    assert hidden_out[0].size() == (n_layers, batch_size, hidden_dim)
    
_test_HateSpeechClassifier(False)