#Text classification with Single-Head Model

The purpose of this code is to perform text classification using a single-head neural network model. The objective is to classify text based on the gender of the author (male or female). The code can be divided into the following sections:

**Setting Up Environment:**<br>

 - Importing necessary libraries: NumPy for numerical operations, warnings for  - suppressing warnings, random for setting random seeds, and PyTorch for building and training neural networks.
 - Setting up random seeds for reproducibility to ensure consistent results across runs.

**Loading and Preprocessing Data:**<br>

 - Loading the TF-IDF features from a NumPy file ('tf-idf.npy') into the variable tf_fit.
 - Reading sex labels from a text file ('sex.txt'), reducing each label by 1, and storing them in the sex_list.
 - Combining labels into labels, which represents the sex labels.
 - Splitting the dataset into training and testing sets using the train_test_split function.

**Defining the Neural Network Model:**<br>

 - Creating a neural network class named MyNet that inherits from nn.Module.
 - The neural network has one hidden layer (fc1) with ReLU activation and an output layer (fc_sex) for binary sex classification.
 - Initializing an instance of the neural network, named net.

**Training the Neural Network:**<br>

 - Defining the loss function as cross-entropy loss (nn.CrossEntropyLoss) and the optimizer as Adam (optim.Adam).
 - Iterating through 20 epochs and batches, performing forward and backward passes, and updating the weights.
 - The training process aims to minimize the cross-entropy loss.

**Testing the Neural Network:**<br>

 - Evaluating the trained model on the test set.
 - Calculating accuracy and F1 score for sex predictions.
 - Printing the accuracy and F1 score for each epoch during training.

## Part 0. Setting Up Environment

In [None]:
# Import necessary libraries
import numpy as np
import warnings
import random
import torch

from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
import torch.optim as optim
import torch.utils.data as Data

# Suppress warnings
warnings.filterwarnings('ignore')

In [None]:
# Set random seed for reproducibility
def setup_seed(seed):
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.deterministic = True

# Set random seed
setup_seed(20)

## Part 1. Loading and Preprocessing Data

In [None]:
# Load TF-IDF matrix
tf_fit = np.load('tf-idf.npy', allow_pickle=True)
print(tf_fit.shape)

In [None]:
# Load sex labels
with open('sex.txt', 'r') as f:
    sex_list = [line.rstrip('\n') for line in f]
sex_list = [int(x)-1 for x in sex_list]
print(len(sex_list))

In [None]:
# Combine labels
labels = sex_list
print(labels[0])

In [None]:
# Split the dataset
x_train, x_test, y_train, y_test = train_test_split(tf_fit, labels, test_size=0.3, random_state=2024)

In [None]:
# Choose device (CPU or GPU)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Convert data to PyTorch tensors
X_trn_torch = torch.from_numpy(x_train)
Y_trn_torch_sex = torch.from_numpy(np.array(y_train))
X_tst_torch = torch.from_numpy(x_test)
Y_tst_torch_sex = torch.from_numpy(np.array(y_test))

# Create PyTorch datasets
torch_trn_dataset = Data.TensorDataset(X_trn_torch, Y_trn_torch_sex)
torch_tst_dataset = Data.TensorDataset(X_tst_torch, Y_tst_torch_sex)

# Batch size
bsize = 16

# Create PyTorch data loaders
trainloader = Data.DataLoader(
    dataset=torch_trn_dataset,
    batch_size=bsize,
    shuffle=True,
    num_workers=2,
)

testloader = Data.DataLoader(
    dataset=torch_tst_dataset,
    batch_size=bsize,
    shuffle=True,
    num_workers=2,
)

## Part 2. Defining the Neural Network Model

In [None]:
# Define the neural network model
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(10000, 100)
        self.fc_sex = nn.Linear(100, 2)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x_sex = self.fc_sex(x)
        return x_sex

In [None]:
# Create an instance of the neural network
net = MyNet().to(device)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.01)

## Part 3. Training and Testing the Neural Network

In [None]:
# Train the network
for epoch in range(20):  # Iterate over 20 epochs
    for i, data in enumerate(trainloader):
        inputs, slabels = data
        inputs = inputs.to(torch.float32)
        slabels = slabels.to(torch.int64)
        inputs, slabels = inputs.to(device), slabels.to(device)
        optimizer.zero_grad()  # Clear gradients
        outputs = net(inputs)  # Forward pass
        loss = criterion(outputs, slabels)  # Calculate loss
        loss.backward()  # Backward pass
        optimizer.step()  # Update weights

    # Test the network
    correct = 0
    total = 0
    all_predicted = []
    all_labels = []
    with torch.no_grad():  # During testing, we don't need to compute gradients
        for inputs, slabels in testloader:
            # Ensure data is on the correct device
            inputs = inputs.to(torch.float32)
            slabels = slabels.to(torch.int64)

            inputs, slabels = inputs.to(device), slabels.to(device)

            outputs = net(inputs)  # Forward pass
            _, predicted = torch.max(outputs.data, 1)  # Get predicted results
            total += slabels.size(0)
            correct += (predicted == slabels).sum().item()

            # Save predicted results and true labels for calculating F1 score
            all_predicted.extend(predicted.cpu().numpy())
            all_labels.extend(slabels.cpu().numpy())

    accuracy = 100 * correct / total
    f1 = f1_score(all_labels, all_predicted, average='macro')
    print(f'Epoch {epoch+1}, Accuracy: {accuracy}%, F1 Score: {f1}')