#### Problem statement

Predict the political party from the tweet text and the handle












#### Data description
This dataset has three columns - label (party name), twitter handle, tweet text


#### Problem Description:

Design a feed forward deep neural network to predict the political party using the pytorch or tensorflow. 
Build two models

1. Without using the handle

2. Using the handle


#### Deliverables

- Report the performance on the test set.

- Try multiple models and with different hyperparameters. Present the results of each model on the test set. No need to create a dev set.

- Experiment with:
    -L2 and dropout regularization techniques
    -SGD, RMSProp and Adamp optimization techniques



- Creating a fixed-sized vocabulary: Give a unique id to each word in your selected vocabulary and use it as the input to the network

    - Option 1: Feedforward networks can only handle fixed-sized inputs. You can choose to have a fixed-sized K words from the tweet text (e.g. the first K word, randomly selected K word etc.). K can be a hyperparameter. 

    - Option 2: you can choose top N (e.g. N=1000) frequent words from the dataset and use an N-sized input layer. If a word is present in a tweet, pass the id, 0 otherwise
    
    -  Clearly state your design choices and assumptions. Think about the pros and cons of each option.

 

<b> Tabulate your results, either at the end of the code file or in the text box on the submission page. The final result should have:</b>

1. Experiment description

2. Hyperparameter used and their values

3. Performance on the test set

 

Answer starts here 
                  
                   Design Choice and Implementation

I decided to use PyTorch to implement this feed forward netework. I designed a deep network with two hidden layers, each with 100 nodes in it. I tried to implement Dropout Regularization and i went with different dropout probabilities to find out how the test accuracy will vary. Apart from that, i used different optimization techniques to get a better accuracy. I used ReLU , tanh activation functions . All the othe hyper parameters like Learning Rate, embedding size, number of epochs, batch size etc to optimize the code even further and to get better output. 

For creating the fixed size vocabulary, i chose the option 2 and chose the top N frequent word and used the corresponding sized input layer. 

I created two models , one to predict the political party based on the tweet and the other based on the tweet handle, i tried to use almost a similar implementation where in i used the tweet text in the first case and the tweet handle in the second case and based on the data , i one hot encoded the party names to 0 or 1 

I experimented with multiple parameters and the corresponding results are tabulated and presented at the end of the page.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import keras
from sklearn.metrics import precision_score, recall_score, f1_score

# Define the neural network architecture
class PartyClassifier(nn.Module):
    def __init__(self, num_words, embedding_size, hidden_size, num_classes):
        super(PartyClassifier, self).__init__()
        self.embedding = nn.Embedding(num_words, embedding_size)
        self.fc1 = nn.Linear(embedding_size, hidden_size)
        self.dropout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        self.dropout = nn.Dropout(0.2)
        self.fc3 = nn.Linear(num_classes, num_classes)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = self.embedding(x)
        x = torch.mean(x, dim=1) # average the word embeddings to get a fixed-sized vector for the tweet
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.sigmoid(x)
        return x

# Define the hyperparameters
num_words = 1000 # the number of top frequent words to use
embedding_size = 50 # the size of the word embeddings
hidden_size = 150 # the number of neurons in the hidden layer
num_classes = 2 # the number of political parties

# Define the training parameters
learning_rate = 0.001
num_epochs = 32
batch_size = 32

# Load the dataset and preprocess it
# Here we assume that the dataset is stored in a CSV file with three columns: Party, Handle, Tweet
# We will only use the Party and Tweet columns
import pandas as pd
data = pd.read_csv('/var/train.csv')
test_data = pd.read_csv('/var/test.csv')

# print(len(data))
# print(data)
tweets = data['Tweet']
parties = pd.get_dummies(data['Party']) # one-hot encode the parties

for i in range(len(tweets)):
    tweet = tweets[i]
    if isinstance(tweet, float):
        tweets[i] = str(tweet)

# Tokenize the tweets and assign word IDs
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(tweets)
sequences = tokenizer.texts_to_sequences(tweets)

# Pad the sequences to a fixed length
from tensorflow.keras.preprocessing.sequence import pad_sequences
max_length = 100 # the maximum length of a tweet
padded_sequences = pad_sequences(sequences, maxlen=max_length)

# Split the dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(padded_sequences, parties, test_size=0.3)

#Convert the training and test sets to PyTorch tensors
x_train = torch.tensor(x_train, dtype=torch.long)
y_train = torch.tensor(y_train.values, dtype=torch.float32)
x_test = torch.tensor(x_test, dtype=torch.long)
y_test = torch.tensor(y_test.values, dtype=torch.float32)

#Create the neural network and the optimizer
model = PartyClassifier(num_words, embedding_size, hidden_size, num_classes)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

#Train the neural network
for epoch in range(num_epochs):
  running_loss = 0.0
  for i in range(0, len(x_train), batch_size):
  # Zero the parameter gradients
    optimizer.zero_grad()

  # Forward pass
    inputs = x_train[i:i+batch_size]
    labels = y_train[i:i+batch_size]
    outputs = model(inputs)
    loss = nn.functional.binary_cross_entropy(outputs, labels)

    # Backward pass and optimization
    loss.backward()
    optimizer.step()

    running_loss += loss.item()

  # Compute the accuracy on the test set
with torch.no_grad():
      test_outputs = model(x_test)
      test_predictions = torch.argmax(test_outputs, dim=1)
      test_labels = torch.argmax(y_test, dim=1)
      num_correct = torch.sum(test_predictions == test_labels)
      accuracy = num_correct.item() / len(x_test)
      label_names = ['Rep', 'Dem']
      predicted_labels = [label_names[i] for i in test_predictions.numpy()]
      test_labels = [label_names[i] for i in test_labels]

print('Predicted labels:', predicted_labels)
print('True labels:', test_labels)
print('Accuracy:', accuracy)

print('Epoch %d, loss: %.3f, accuracy: %.3f' % (epoch+1, running_loss, accuracy))

#Evaluate the model on the test set


# Compute precision, recall, and F1-score
with torch.no_grad():
  test_outputs = model(x_test)
  test_predictions = torch.argmax(test_outputs, dim=1)
  test_labels = torch.argmax(y_test, dim=1)
  num_correct = torch.sum(test_predictions == test_labels)
  accuracy = num_correct.item() / len(x_test)
  precision = precision_score(test_labels.numpy(), test_predictions.numpy())
  recall = recall_score(test_labels.numpy(), test_predictions.numpy())
  f1 = f1_score(test_labels.numpy(), test_predictions.numpy())

  print('Test set accuracy: %.3f, precision: %.3f, recall: %.3f, F1-score: %.3f' % (accuracy, precision, recall, f1))


Predicted labels: ['Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', '

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import keras
from sklearn.metrics import precision_score, recall_score, f1_score

# Define the neural network architecture
class HandleClassifier(nn.Module):
    def __init__(self, num_words, embedding_size, hidden_size, num_classes):
        super(HandleClassifier, self).__init__()
        self.embedding = nn.Embedding(num_words, embedding_size)
        self.fc1 = nn.Linear(embedding_size, hidden_size)
        self.dropout = nn.Dropout(0.25)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        self.dropout = nn.Dropout(0.25)
        self.fc3 = nn.Linear(num_classes, num_classes)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x):
        x = self.embedding(x)
        x = torch.mean(x, dim=1) # average the word embeddings to get a fixed-sized vector for the tweet
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc3(x)
        x = self.softmax(x)
        return x

# Define the hyperparameters
num_words = 1000 # the number of top frequent words to use
embedding_size = 50 # the size of the word embeddings
hidden_size = 100 # the number of neurons in the hidden layer
num_classes = 2 # the number of political parties

# Define the training parameters
learning_rate = 0.001
num_epochs = 10
batch_size = 32

# Load the dataset and preprocess it
# Here we assume that the dataset is stored in a CSV file with three columns: Party, Handle, Tweet
# We will only use the Party and Tweet columns
import pandas as pd
data = pd.read_csv('/var/train.csv')
test_data = pd.read_csv('/var/test.csv')
# print(len(data))
# print(data)
handle = data['Handle']
parties = pd.get_dummies(data['Party']) # one-hot encode the parties

for i in range(len(handle)):
    handles = handle[i]
    if isinstance(handles, float):
        handle[i] = str(handles)

# Tokenize the tweets and assign word IDs
from keras.preprocessing.text import Tokenizer
tokenizer = Tokenizer(num_words=num_words)
tokenizer.fit_on_texts(tweets)
sequences = tokenizer.texts_to_sequences(tweets)

# Pad the sequences to a fixed length
from tensorflow.keras.preprocessing.sequence import pad_sequences
max_length = 100 # the maximum length of a tweet
padded_sequences = pad_sequences(sequences, maxlen=max_length)

# Split the dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(padded_sequences, parties, test_size=0.3)

#Convert the training and test sets to PyTorch tensors
x_train = torch.tensor(x_train, dtype=torch.long)
y_train = torch.tensor(y_train.values, dtype=torch.float32)
x_test = torch.tensor(x_test, dtype=torch.long)
y_test = torch.tensor(y_test.values, dtype=torch.float32)

#Create the neural network and the optimizer
model = HandleClassifier(num_words, embedding_size, hidden_size, num_classes)
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

#Train the neural network
for epoch in range(num_epochs):
  running_loss = 0.0
  for i in range(0, len(x_train), batch_size):
  # Zero the parameter gradients
    optimizer.zero_grad()

  # Forward pass
    inputs = x_train[i:i+batch_size]
    labels = y_train[i:i+batch_size]
    outputs = model(inputs)
    loss = nn.functional.binary_cross_entropy(outputs, labels)

    # Backward pass and optimization
    loss.backward()
    optimizer.step()

    running_loss += loss.item()

  # Compute the accuracy on the test set
  
  with torch.no_grad():
      test_outputs = model(x_test)
      test_predictions = torch.argmax(test_outputs, dim=1)
      test_labels = torch.argmax(y_test, dim=1)
      num_correct = torch.sum(test_predictions == test_labels)
      accuracy = num_correct.item() / len(x_test)
      label_names = ['Rep', 'Dem']
      predicted_labels = [label_names[i] for i in test_predictions.numpy()]
      test_labels = [label_names[i] for i in test_labels]

print('Predicted labels:', predicted_labels)
print('True labels:', test_labels)
print('Accuracy:', accuracy)

print('Epoch %d, loss: %.3f, accuracy: %.3f' % (epoch+1, running_loss, accuracy))

#Evaluate the model on the test set


# Compute precision, recall, and F1-score
with torch.no_grad():
  test_outputs = model(x_test)
  test_predictions = torch.argmax(test_outputs, dim=1)
  test_labels = torch.argmax(y_test, dim=1)
  num_correct = torch.sum(test_predictions == test_labels)
  accuracy = num_correct.item() / len(x_test)
  precision = precision_score(test_labels.numpy(), test_predictions.numpy())
  recall = recall_score(test_labels.numpy(), test_predictions.numpy())
  f1 = f1_score(test_labels.numpy(), test_predictions.numpy())

  print('Test set accuracy: %.3f, precision: %.3f, recall: %.3f, F1-score: %.3f' % (accuracy, precision, recall, f1))



Predicted labels: ['Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Rep', 'Rep', 'Rep', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Rep', 'Rep', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Dem', 'Rep', 'Dem', 'Dem', 'Rep', 'Rep', 'Dem', 'Rep', 'Dem', 'Rep', 'Dem', 'Dem', 'Dem', 'Dem', '

In [None]:
from prettytable import PrettyTable

# Create a table object with column names
table = PrettyTable()
table.field_names = ["Experiment","Predicting from Tweet/Handle","FC Layers" ,"Dropout Probability", "Optimizer", "Learning Rate", "Batch Size", "Num of epochs","Test Accuracy","F1-Score"]

# Add rows to the table
table.add_row(["1", "Tweet", "3",  "0.2", "Adam", "0.01", "500","50", "64.7%", "70.5%"])
table.add_row(["2", "Handle", "3",  "0.2", "Adam", "0.001", "500","50", "64.1%", "70.9%"])
table.add_row(["3", "Handle", "3",  "0.2", "SGD", "0.001", "500","50", "51.4%", "67.7%"])
table.add_row(["4", "Tweet", "3",  "0.2", "SGD", "0.01", "500","50", "49.9%", "54.4%"])
table.add_row(["5", "Tweet", "3",  "0.2", "RMSProp", "0.01", "500","50", "64.33%", "70.6%"])
table.add_row(["6", "Handle", "3",  "0.2", "RMSProp", "0.001", "500","50", "51.3%", "67.8%"])
table.add_row(["7", "Tweet", "3",  "0.5", "Adam", "0.01", "500","50", "59.2%", "66.6%"])
table.add_row(["8", "Handle", "3",  "0.5", "Adam", "0.001", "500","50", "63.0%", "70.9%"])
table.add_row(["9", "Tweet", "3",  "0.25", "Adam", "0.01", "500","100", "51.4%", "67.6%"])
table.add_row(["10", "Handle", "3",  "0.5", "Adam", "0.001", "500","100", "51%", "67.5%"])
table.add_row(["11", "Tweet", "2",  "0.25", "Adam", "0.01", "500","100", "66%", "67.8%"])
table.add_row(["12", "Handle", "2",  "0.5", "Adam", "0.001", "500","100", "67.4%", "68.7%"])
table.add_row(["13", "Tweet", "2",  "0.25", "SGD", "0.01", "500","100", "52.29%", "67.5%"])
table.add_row(["14", "Handle", "2",  "0.5", "SGD", "0.001", "500","100", "51.1%", "64.8%"])
table.add_row(["15", "Tweet", "2",  "0.25", "Adam", "0.01", "500","100", "67.7%", "69.4%"])
table.add_row(["16", "Handle", "2",  "0.25", "Adam", "0.01", "500","100", "51.6%", "67.1%"])
table.add_row(["17", "Tweet", "2",  "0.25", "Adam", "0.01", "32","10", "68.0%", "70%"])
table.add_row(["18", "Tweet", "2",  "0.25", "Adam", "0.01", "64","10", "67.9%", "70.5%"])
table.add_row(["19", "Tweet", "2",  "0.2", "Adam", "0.01", "128","10", "67.5%", "68.6%"])
table.add_row(["20", "Tweet", "2",  "0.2", "RMSProp", "0.01", "32","10", "63.6%", "52.6%"])
table.add_row(["21", "Handle", "2",  "0.25", "SGD", "0.01", "32","100", "55.1%", "56.7%"])
table.add_row(["22", "Handle", "2",  "0.25", "Adam", "0.001", "32","10", "67.6%", "70.2%"])
table.add_row(["23", "Handle", "3",  "0.25", "Adam", "0.001", "32","10", "65.5%", "62.9%"])
table.add_row(["24", "Tweet", "3",  "0.25", "Adam", "0.001", "32","10", "51.4%", "67.9%"])
table.add_row(["25", "Tweet", "3",  "0.25", "Adam", "0.001", "250","100", "63.5%", "61.5%"])


# Print the table
print(table)


+------------+------------------------------+-----------+---------------------+-----------+---------------+------------+---------------+---------------+----------+
| Experiment | Predicting from Tweet/Handle | FC Layers | Dropout Probability | Optimizer | Learning Rate | Batch Size | Num of epochs | Test Accuracy | F1-Score |
+------------+------------------------------+-----------+---------------------+-----------+---------------+------------+---------------+---------------+----------+
|     1      |            Tweet             |     3     |         0.2         |    Adam   |      0.01     |    500     |       50      |     64.7%     |  70.5%   |
|     2      |            Handle            |     3     |         0.2         |    Adam   |     0.001     |    500     |       50      |     64.1%     |  70.9%   |
|     3      |            Handle            |     3     |         0.2         |    SGD    |     0.001     |    500     |       50      |     51.4%     |  67.7%   |
|     4      |  