# Label flipping attack

We use the iris dataset to train and test this binary classification model, and then perform the label flipping attack on this model

In [9]:
import numpy as np

import torch
import torch.optim as optim
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
from torch.utils.data import DataLoader, TensorDataset

from sklearn import datasets
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
from sklearn.model_selection import train_test_split

## Load iris dataset from scikit-learn and convert the data to PyTorch tensors

To implement binary classification, consider “virginica” as the positive label (1 - virginica), and the other two “setosa” and “versicolor” as the negative label (0 – non-virginica)

In [10]:
iris = datasets.load_iris()

X = iris.data
y = (iris.target == 2).astype(int)  # if virginica, then 1; if setosa or versicolor, then 0

#Split the data into two sets: 80% for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

## Train a binary classification model

Since we will focus on performing and evaluating label flipping attack, we simplify the model training, and directly use `LogisticRegression` imported from `sklearn.linear_model` to train a binary classification model.

In [11]:
from sklearn.linear_model import LogisticRegression

binary_model = LogisticRegression(solver="newton-cg", random_state=42)
binary_model.fit(X_train, y_train)
y_pred = binary_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print("Test accuracy: %.4f, F1_score: %.4f" % (accuracy, f1))

Test accuracy: 1.0000, F1_score: 1.0000


**<font color='red'>Before label flipping attack, the original test accuracy is 1.0000 and f1 score is 1.000</font>**

## Poisoning Attacks with Label Flipping Attack

#### Use closed form method
In this classification example, we are doing binary classification, and we can use a linear classifier and mean squared loss to obtain weight matrix $W=(X^T\cdot X)^{-1}X^T\hat{y}$. Based on the weight matrix, the linear classifier can be fixed as $f_W(X) = X \cdot W = X \cdot (X^T\cdot X)^{-1}X^T\hat{y}$. After that, $\hat{y}$ is the only parameter in the function. We can consider $\hat{y}$ as a probability distribution to specify how possible a specified label should be flipped to 1. In this way, $\hat{y}$ can be calculated using gradient descent method. The new linear classifier can be defined as follows:

In [12]:
class LinearClassificationNet(nn.Module):
    def __init__(self, train_num):
        super(LinearClassificationNet, self).__init__()
        self.y_hat = torch.ones(train_num, 1, dtype=torch.float32) #Each training sample has a probability for label flipping 
        self.y_hat = 0.5 * self.y_hat                  #Initialize y_hat = 0.5
        self.y_hat = Parameter(self.y_hat, requires_grad=True)

    def closedform(self, x):
        x_t = torch.transpose(x, 0, 1)
        x_x = torch.mm(x_t, x)               #X^T.X
        x_x_1 = torch.inverse(x_x)           #(X^T.X)^-1
        x_x_1_t = torch.mm(x_x_1, x_t)       #(X^T.X)^-1.X^T
        
        return torch.mm(x_x_1_t, self.y_hat) #(X^T.X)^-1.X^T.y_hat
    
    def forward(self, x):
        #Linear model is implemented as matrix multiplication between X and W (f(X) = X.W)
        #Here W is represented using closed form
        closedform = self.closedform(x)
        y = torch.mm(x, closedform)
        
        return y

#### Set up hyperparameters

In [13]:
torch.manual_seed(42)
epochs = 100
learning_rate = 0.01
weight_decay = 5e-4
lossfunction = nn.BCEWithLogitsLoss()

train_num = X_train.shape[0]

linear_model = LinearClassificationNet(train_num)
linear_model_optimizer = optim.Adam(linear_model.parameters(), lr=learning_rate, weight_decay=weight_decay)

#Convert the data to tensors that can be used by Pytorch
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)

#### Maximize the loss to obtain $\hat{y}$

By default, gradient descent is used for loss minimization. We can convert maximizing `lossfunction` to minimizing `-lossfunction` using gradient descent.

In [14]:
#Define the function for linear model training
def train_linear_model(epoch, linear_model, linear_model_optimizer, X, y, lossfunction):
    linear_model.train()
    linear_model_optimizer.zero_grad()
    linear_model_outputs = linear_model(X)
    linear_model_loss = -lossfunction(linear_model_outputs, y) #Need to place "-" before loss function
    linear_model_loss.backward()
    linear_model_optimizer.step()

    print('Epoch: {:d}'.format(epoch+1),
          'linear_model_loss: {:.4f}'.format(linear_model_loss.item()))

#Define the function that returns model parameters
def weight_parameters(model):
    model.eval()
    with torch.no_grad():
        parameters = list(model.parameters())[0]
    
    return parameters.detach().squeeze()

#Train the model
for epoch in range(epochs):
    train_linear_model(epoch, linear_model, linear_model_optimizer, X_train_tensor, y_train_tensor, lossfunction)

#Obtain y_hat
#y_hat is the parameters optimized by maximizing the loss function  
y_hat = weight_parameters(linear_model)
#Convert the parameters to the label flipping probabilities by normalizing the parameters to [0, 1]
y_hat = nn.Sigmoid()(y_hat)
print(y_hat)

Epoch: 1 linear_model_loss: -0.7991
Epoch: 2 linear_model_loss: -0.8027
Epoch: 3 linear_model_loss: -0.8063
Epoch: 4 linear_model_loss: -0.8099
Epoch: 5 linear_model_loss: -0.8136
Epoch: 6 linear_model_loss: -0.8173
Epoch: 7 linear_model_loss: -0.8210
Epoch: 8 linear_model_loss: -0.8247
Epoch: 9 linear_model_loss: -0.8284
Epoch: 10 linear_model_loss: -0.8322
Epoch: 11 linear_model_loss: -0.8359
Epoch: 12 linear_model_loss: -0.8397
Epoch: 13 linear_model_loss: -0.8435
Epoch: 14 linear_model_loss: -0.8474
Epoch: 15 linear_model_loss: -0.8512
Epoch: 16 linear_model_loss: -0.8551
Epoch: 17 linear_model_loss: -0.8590
Epoch: 18 linear_model_loss: -0.8629
Epoch: 19 linear_model_loss: -0.8668
Epoch: 20 linear_model_loss: -0.8708
Epoch: 21 linear_model_loss: -0.8748
Epoch: 22 linear_model_loss: -0.8788
Epoch: 23 linear_model_loss: -0.8828
Epoch: 24 linear_model_loss: -0.8868
Epoch: 25 linear_model_loss: -0.8909
Epoch: 26 linear_model_loss: -0.8950
Epoch: 27 linear_model_loss: -0.8991
Epoch: 28 

#### Based on the obtained $\hat{y}$, select top training samples to perform label flipping

As $\hat{y}$ specifies how possible a sample's label should be flipped to 1, here we directly use these probabilities select top training samples to perform label flipping.

In [15]:
#The number of labels to flip: epsilon
epsilon = 20
cnt = 0

#Return indices whose probabilities are sorted in descending order by values
indices = torch.argsort(y_hat, descending=True)
flipped_labels = y_train.copy()

#Select the training samples with the largest probabilities and flip them to 1 
for idx in indices:
    if flipped_labels[idx] != 1 and y_hat[idx] > 0.5:
        flipped_labels[idx] = 1
        cnt += 1
        if cnt == epsilon:
            break

## Use the flipped labels to create poisoned training data and retrain the model

#### Use the flipped labels to create poisoned training data and retrain the model

The poisoned data is (X_train, flipped_labels)

In [16]:
poisoned_binary_model = LogisticRegression(solver="newton-cg", random_state=42)
poisoned_binary_model.fit(X_train, flipped_labels)
y_pred_poisoned = poisoned_binary_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred_poisoned)
f1 = f1_score(y_test, y_pred_poisoned)

print("Test accuracy: %.4f, F1_score: %.4f" % (accuracy, f1))

Test accuracy: 0.7368, F1_score: 0.7059


**<font color='red'>Before label flipping attack, the original test accuracy is 1.0000 and f1 score is 1.0000. After label flipping attack (flipped 20 training samples), the test accuracy is 0.7368 and f1 score is 0.7059. The classification performance has been significantly decreased</font>**