# Colab Setup  
> Make sure you configure notebook with GPU: Click Edit->notebook settings->hardware accelerator->GPU

> Uncomment the following cell after opening in Google colab. (Do not uncomment it in local setup.)  

<a target="_blank" href="https://colab.research.google.com/github/SEED-VT/FedDebug/blob/main/fault-localization/Reproduce_Table1-Table2.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>


In [1]:
# !pip install pytorch-lightning
# !pip install diskcache
# !pip install dotmap
# !pip install torch torchvision torchaudio
# !pip install matplotlib
# !git clone https://github.com/SEED-VT/FedDebug.git
# # appending the path
# import sys
# sys.path.append("FedDebug/fault-localization/")

# Description

- It defines some variables for the simulation such as the learning rate, batch size, noise rate, number of clients, and number of epochs. 

- It then runs the simulation for the given configuration (`args`) related to Table 1 and Table 2 configurations. 

- Finally, it prints out the faulty client(s) localization accuracy, along with information about the distribution, number of faulty clients, total number of clients, architecture, and dataset used in the simulation.
 

In [2]:
import logging
import matplotlib.pyplot as plt
import time
from dotmap import DotMap
from pytorch_lightning import seed_everything
from torch.nn.init import kaiming_uniform_ 
from utils.faulty_client_localization.FaultyClientLocalization import FaultyClientLocalization
from utils.faulty_client_localization.InferenceGuidedInputs import InferenceGuidedInputs
from utils.FLSimulation import trainFLMain
from utils.fl_datasets import initializeTrainAndValidationDataset
from utils.util import aggToUpdateGlobalModel
from utils.util import testAccModel



logging.basicConfig(filename='example.log', level=logging.ERROR)
logger = logging.getLogger("pytorch_lightning")
seed_everything(786)



def evaluateFaultLocalization(predicted_faulty_clients_on_each_input, true_faulty_clients):
    true_faulty_clients = set(true_faulty_clients)
    detection_acc = 0
    for pred_faulty_clients in predicted_faulty_clients_on_each_input:
        print(f"+++ Faulty Clients {pred_faulty_clients}")
        correct_localize_faults = len(
            true_faulty_clients.intersection(pred_faulty_clients))
        acc = (correct_localize_faults/len(true_faulty_clients))*100
        detection_acc += acc
    fault_localization_acc = detection_acc / \
        len(predicted_faulty_clients_on_each_input)
    return fault_localization_acc


def runFaultyClientLocalization(client2models, exp2info, num_bugs, random_generator=kaiming_uniform_, apply_transform=True, k_gen_inputs=10, na_threshold=0.003, use_gpu=True):
    print(">  Running FaultyClientLocalization ..")
    input_shape = list(exp2info['data_config']['single_input_shape'])
    generate_inputs = InferenceGuidedInputs(client2models, input_shape, randomGenerator=random_generator, apply_transform=apply_transform,
                                            dname=exp2info['data_config']['name'], min_nclients_same_pred=5, k_gen_inputs=k_gen_inputs)
    selected_inputs, input_gen_time = generate_inputs.getInputs()

    start = time.time()
    faultyclientlocalization = FaultyClientLocalization(
        client2models, selected_inputs, use_gpu=use_gpu)

    potential_benign_clients_for_each_input = faultyclientlocalization.runFaultLocalization(
        na_threshold, num_bugs=num_bugs)
    fault_localization_time = time.time()-start
    return potential_benign_clients_for_each_input, input_gen_time, fault_localization_time



results = {}

# ====== Simulation ===== 

args = DotMap()
args.lr = 0.001
args.weight_decay = 0.0001
args.batch_size = 512

args.noise_rate = 1  # noise rate 0 to 1 
args.clients = 30 # keep under 30 clients and use Resnet18, Resnet34, or Densenet to evaluate on Colab 
args.epochs = 10  # range 10-25
args.faulty_clients_ids = "0" # can be multiple clients separated by comma e.g. "0,1,2"  but keep under args.clients clients and at max less than 7 


  from .autonotebook import tqdm as notebook_tqdm
Global seed set to 786
Global seed set to 786


> Note: You can comment a complete cell to skip its execution in order to evalutate any particular configuration

 ### Table 1: resnet18, cifar10, iid distribution and 30 clients

In [3]:
args.model = "resnet18" # [resnet18, resnet34, resnet50, densenet121, vgg16]
args.dataset = "cifar10" # ['cifar10', 'femnist']
args.sampling = "iid" # [iid, "niid"] 
args.clients = 30 # keep under 30 clients and use Resnet18, Resnet34, or Densenet to evaluate on Colab 

# FL training
c2ms, exp2info = trainFLMain(args)
client2models = {k: v.model.eval() for k, v in c2ms.items()}


# Fault localazation
potential_faulty_clients, _, _ = runFaultyClientLocalization(
    client2models=client2models, exp2info=exp2info, num_bugs=len(exp2info['faulty_clients_ids']))
fault_acc = evaluateFaultLocalization(
    potential_faulty_clients, exp2info['faulty_clients_ids'])
# print(f"Fault Localization Acc: {fault_acc}")

print(f"#Fault Localization Accuracy: {fault_acc}, Distribution: {args.sampling},  Faulty clients: {len(args.faulty_clients_ids.split(','))}, Total Clients: {args.clients}, Architecture: {args.model}, Dataset: {args.dataset}")



  ***Simulating FL setup iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001 ***
Files already downloaded and verified
Files already downloaded and verified
Spliting Datasets 50000 into parts:[1686, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666]
input shape, torch.Size([1, 3, 32, 32])
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/faulty_client_0_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 150, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  2.73it/s, loss=2.32, train_acc=0.108, train_loss=2.320, val_acc=0.0818, val_loss=2.410] Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.48it/s, loss=2.31, train_acc=0.096, train_loss=2.300, val_acc=0.0928, val_loss=2.330] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.47it/s, loss=2.31, train_acc=0.096, train_loss=2.300, val_acc=0.0928, val_loss=2.330]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_1.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.84it/s, loss=0.726, train_acc=0.836, train_loss=0.570, val_acc=0.666, val_loss=1.360]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.82it/s, loss=0.726, train_acc=0.836, train_loss=0.570, val_acc=0.666, val_loss=1.360]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_2.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.65it/s, loss=0.682, train_acc=0.850, train_loss=0.565, val_acc=0.665, val_loss=1.340]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.63it/s, loss=0.682, train_acc=0.850, train_loss=0.565, val_acc=0.665, val_loss=1.340]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_3.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.20it/s, loss=0.7, train_acc=0.765, train_loss=0.534, val_acc=0.649, val_loss=1.520]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.19it/s, loss=0.7, train_acc=0.765, train_loss=0.534, val_acc=0.649, val_loss=1.520]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_4.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.53it/s, loss=0.673, train_acc=0.826, train_loss=0.503, val_acc=0.677, val_loss=1.340]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.52it/s, loss=0.673, train_acc=0.826, train_loss=0.503, val_acc=0.677, val_loss=1.340]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_5.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  3.87it/s, loss=0.987, train_acc=0.671, train_loss=0.883, val_acc=0.565, val_loss=1.740]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.45it/s, loss=0.67, train_acc=0.838, train_loss=0.449, val_acc=0.681, val_loss=1.270] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.43it/s, loss=0.67, train_acc=0.838, train_loss=0.449, val_acc=0.681, val_loss=1.270]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_6.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.72it/s, loss=0.671, train_acc=0.761, train_loss=0.672, val_acc=0.670, val_loss=1.450]Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.


`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.70it/s, loss=0.671, train_acc=0.761, train_loss=0.672, val_acc=0.670, val_loss=1.450]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_7.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.98it/s, loss=0.745, train_acc=0.788, train_loss=0.599, val_acc=0.623, val_loss=1.580]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.98it/s, loss=0.745, train_acc=0.788, train_loss=0.599, val_acc=0.623, val_loss=1.580]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_8.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  3.41it/s, loss=1, train_acc=0.667, train_loss=0.852, val_acc=0.571, val_loss=1.690]   Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.66it/s, loss=0.683, train_acc=0.842, train_loss=0.551, val_acc=0.678, val_loss=1.300]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.66it/s, loss=0.683, train_acc=0.842, train_loss=0.551, val_acc=0.678, val_loss=1.300]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_9.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.97it/s, loss=0.678, train_acc=0.842, train_loss=0.517, val_acc=0.686, val_loss=1.360]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.97it/s, loss=0.678, train_acc=0.842, train_loss=0.517, val_acc=0.686, val_loss=1.360]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_10.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.55it/s, loss=0.644, train_acc=0.746, train_loss=0.669, val_acc=0.671, val_loss=1.430]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.54it/s, loss=0.644, train_acc=0.746, train_loss=0.669, val_acc=0.671, val_loss=1.430]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_11.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.68it/s, loss=0.692, train_acc=0.828, train_loss=0.533, val_acc=0.643, val_loss=1.730]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.67it/s, loss=0.692, train_acc=0.828, train_loss=0.533, val_acc=0.643, val_loss=1.730]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_12.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.26it/s, loss=0.666, train_acc=0.837, train_loss=0.510, val_acc=0.671, val_loss=1.410]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.25it/s, loss=0.666, train_acc=0.837, train_loss=0.510, val_acc=0.671, val_loss=1.410]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_13.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.08it/s, loss=0.681, train_acc=0.858, train_loss=0.472, val_acc=0.657, val_loss=1.590]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.08it/s, loss=0.681, train_acc=0.858, train_loss=0.472, val_acc=0.657, val_loss=1.590]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_14.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  2.62it/s, loss=0.959, train_acc=0.738, train_loss=0.718, val_acc=0.588, val_loss=1.740]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.97it/s, loss=0.685, train_acc=0.760, train_loss=0.601, val_acc=0.689, val_loss=1.350]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.96it/s, loss=0.685, train_acc=0.760, train_loss=0.601, val_acc=0.689, val_loss=1.350]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_15.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  3.39it/s, loss=0.948, train_acc=0.803, train_loss=0.659, val_acc=0.598, val_loss=1.960]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.01it/s, loss=0.638, train_acc=0.807, train_loss=0.601, val_acc=0.690, val_loss=1.330]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.00it/s, loss=0.638, train_acc=0.807, train_loss=0.601, val_acc=0.690, val_loss=1.330]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_16.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.59it/s, loss=0.615, train_acc=0.860, train_loss=0.461, val_acc=0.676, val_loss=1.440]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.59it/s, loss=0.615, train_acc=0.860, train_loss=0.461, val_acc=0.676, val_loss=1.440]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_17.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.02it/s, loss=0.657, train_acc=0.855, train_loss=0.495, val_acc=0.674, val_loss=1.450]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.01it/s, loss=0.657, train_acc=0.855, train_loss=0.495, val_acc=0.674, val_loss=1.450]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_18.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.09it/s, loss=0.696, train_acc=0.784, train_loss=0.720, val_acc=0.642, val_loss=1.900]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.08it/s, loss=0.696, train_acc=0.784, train_loss=0.720, val_acc=0.642, val_loss=1.900]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_19.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.97it/s, loss=0.635, train_acc=0.820, train_loss=0.532, val_acc=0.661, val_loss=1.470]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.96it/s, loss=0.635, train_acc=0.820, train_loss=0.532, val_acc=0.661, val_loss=1.470]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_20.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.05it/s, loss=0.657, train_acc=0.816, train_loss=0.669, val_acc=0.662, val_loss=1.420]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.04it/s, loss=0.657, train_acc=0.816, train_loss=0.669, val_acc=0.662, val_loss=1.420]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_21.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.79it/s, loss=0.678, train_acc=0.841, train_loss=0.497, val_acc=0.659, val_loss=1.540]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.78it/s, loss=0.678, train_acc=0.841, train_loss=0.497, val_acc=0.659, val_loss=1.540]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_22.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.25it/s, loss=0.684, train_acc=0.808, train_loss=0.533, val_acc=0.656, val_loss=1.460]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.23it/s, loss=0.684, train_acc=0.808, train_loss=0.533, val_acc=0.656, val_loss=1.460]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_23.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  2.55it/s, loss=0.915, train_acc=0.774, train_loss=0.624, val_acc=0.598, val_loss=1.560]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.63it/s, loss=0.623, train_acc=0.811, train_loss=0.599, val_acc=0.692, val_loss=1.220]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.62it/s, loss=0.623, train_acc=0.811, train_loss=0.599, val_acc=0.692, val_loss=1.220]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_24.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.93it/s, loss=0.635, train_acc=0.838, train_loss=0.496, val_acc=0.662, val_loss=1.720]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.92it/s, loss=0.635, train_acc=0.838, train_loss=0.496, val_acc=0.662, val_loss=1.720]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_25.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  2.74it/s, loss=0.924, train_acc=0.783, train_loss=0.900, val_acc=0.584, val_loss=1.760]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.49it/s, loss=0.618, train_acc=0.843, train_loss=0.372, val_acc=0.677, val_loss=1.430]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.48it/s, loss=0.618, train_acc=0.843, train_loss=0.372, val_acc=0.677, val_loss=1.430]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_26.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.24it/s, loss=0.636, train_acc=0.810, train_loss=0.559, val_acc=0.683, val_loss=1.540]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  3.23it/s, loss=0.636, train_acc=0.810, train_loss=0.559, val_acc=0.683, val_loss=1.540]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_27.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.93it/s, loss=0.681, train_acc=0.737, train_loss=0.631, val_acc=0.686, val_loss=1.320]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.92it/s, loss=0.681, train_acc=0.737, train_loss=0.631, val_acc=0.686, val_loss=1.320]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_28.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.70it/s, loss=0.624, train_acc=0.827, train_loss=0.503, val_acc=0.677, val_loss=1.520]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.69it/s, loss=0.624, train_acc=0.827, train_loss=0.503, val_acc=0.677, val_loss=1.520]
Training : .storage/checkpoints/iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_29.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:01<00:00,  2.58it/s, loss=0.988, train_acc=0.639, train_loss=0.877, val_acc=0.561, val_loss=1.830]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.45it/s, loss=0.678, train_acc=0.784, train_loss=0.569, val_acc=0.663, val_loss=1.330]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:01<00:00,  2.44it/s, loss=0.678, train_acc=0.784, train_loss=0.569, val_acc=0.663, val_loss=1.330]
Total clients: 30
++Training is done: iid_resnet18_cifar10_clients_30_faulty_[0]_bsize_512_epochs_10_lr_0.001
>  Running FaultyClientLocalization ..
Same prediction threshold 5
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
#Fault Localization Accuracy: 100.0, Distribution: iid,  Faulty clients: 1, Total Clients: 30, Architecture: resnet18, Dataset: cifar10


 ### Table 1: densenet121, cifar10, niid distribution and 10 clients

In [4]:
args.model = "densenet121" # [resnet18, resnet34, resnet50, densenet121, vgg16]
args.dataset = "cifar10" # ['cifar10', 'femnist']
args.sampling = "niid" # [iid, "niid"] 
args.clients = 10


# FL training
c2ms, exp2info = trainFLMain(args)
client2models = {k: v.model.eval() for k, v in c2ms.items()}



# Fault localazation
potential_faulty_clients, _, _ = runFaultyClientLocalization(
    client2models=client2models, exp2info=exp2info, num_bugs=len(exp2info['faulty_clients_ids']))
fault_acc = evaluateFaultLocalization(
    potential_faulty_clients, exp2info['faulty_clients_ids'])
# print(f"Fault Localization Acc: {fault_acc}")

print(f"#Fault Localization Accuracy: {fault_acc}, Distribution: {args.sampling},  Faulty clients: {len(args.faulty_clients_ids.split(','))}, Total Clients: {args.clients}, Architecture: {args.model}, Dataset: {args.dataset}")



  ***Simulating FL setup niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001 ***
Files already downloaded and verified
Files already downloaded and verified
Spliting Datasets 50000 into parts:[6237, 5701, 4138, 4019, 5802, 3981, 4709, 5651, 3541, 6221]
input shape, torch.Size([1, 3, 32, 32])
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/faulty_client_0_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 93, and drop_last = False
Epoch 9: 100%|██████████| 13/13 [00:04<00:00,  2.65it/s, loss=2.31, train_acc=0.151, train_loss=2.280, val_acc=0.101, val_loss=2.840]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 13/13 [00:04<00:00,  2.64it/s, loss=2.31, train_acc=0.151, train_loss=2.280, val_acc=0.101, val_loss=2.840]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_1.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 69, and drop_last = False
Epoch 9: 100%|██████████| 12/12 [00:05<00:00,  2.24it/s, loss=0.331, train_acc=0.917, train_loss=0.363, val_acc=0.840, val_loss=0.965]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 12/12 [00:05<00:00,  2.24it/s, loss=0.331, train_acc=0.917, train_loss=0.363, val_acc=0.840, val_loss=0.965]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_2.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 42, and drop_last = False
Epoch 9: 100%|██████████| 9/9 [00:04<00:00,  1.89it/s, loss=0.538, train_acc=0.765, train_loss=0.527, val_acc=0.756, val_loss=1.040]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 9/9 [00:04<00:00,  1.89it/s, loss=0.538, train_acc=0.765, train_loss=0.527, val_acc=0.756, val_loss=1.040]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_3.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 435, and drop_last = False
Epoch 9: 100%|██████████| 8/8 [00:05<00:00,  1.49it/s, loss=0.211, train_acc=0.938, train_loss=0.167, val_acc=0.848, val_loss=1.030]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 8/8 [00:05<00:00,  1.49it/s, loss=0.211, train_acc=0.938, train_loss=0.167, val_acc=0.848, val_loss=1.030]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_4.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 170, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 12/12 [00:07<00:00,  1.68it/s, loss=0.286, train_acc=0.889, train_loss=0.279, val_acc=0.854, val_loss=0.921]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 12/12 [00:07<00:00,  1.67it/s, loss=0.286, train_acc=0.889, train_loss=0.279, val_acc=0.854, val_loss=0.921]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_5.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 397, and drop_last = False
Epoch 9: 100%|██████████| 8/8 [00:04<00:00,  1.67it/s, loss=0.224, train_acc=0.933, train_loss=0.232, val_acc=0.849, val_loss=0.947]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 8/8 [00:04<00:00,  1.66it/s, loss=0.224, train_acc=0.933, train_loss=0.232, val_acc=0.849, val_loss=0.947]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_6.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 101, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 10/10 [00:06<00:00,  1.62it/s, loss=0.313, train_acc=0.911, train_loss=0.342, val_acc=0.826, val_loss=1.050]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 10/10 [00:06<00:00,  1.62it/s, loss=0.313, train_acc=0.911, train_loss=0.342, val_acc=0.826, val_loss=1.050]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_7.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 19, and drop_last = False
Epoch 9: 100%|██████████| 12/12 [00:06<00:00,  1.82it/s, loss=0.758, train_acc=0.558, train_loss=1.650, val_acc=0.724, val_loss=0.998]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 12/12 [00:06<00:00,  1.82it/s, loss=0.758, train_acc=0.558, train_loss=1.650, val_acc=0.724, val_loss=0.998]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_8.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 469, and drop_last = False
Epoch 9: 100%|██████████| 7/7 [00:03<00:00,  1.96it/s, loss=0.206, train_acc=0.928, train_loss=0.201, val_acc=0.841, val_loss=1.020]  

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 7/7 [00:03<00:00,  1.95it/s, loss=0.206, train_acc=0.928, train_loss=0.201, val_acc=0.841, val_loss=1.020]
Training : .storage/checkpoints/niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001/client_9.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 77, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 13/13 [00:06<00:00,  1.93it/s, loss=0.322, train_acc=0.908, train_loss=0.303, val_acc=0.848, val_loss=0.857]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 13/13 [00:06<00:00,  1.93it/s, loss=0.322, train_acc=0.908, train_loss=0.303, val_acc=0.848, val_loss=0.857]
Total clients: 10
++Training is done: niid_densenet121_cifar10_clients_10_faulty_[0]_bsize_512_epochs_10_lr_0.001
>  Running FaultyClientLocalization ..
Same prediction threshold 5
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
+++ Faulty Clients {0}
#Fault Localization Accuracy: 100.0, Distribution: niid,  Faulty clients: 1, Total Clients: 10, Architecture: densenet121, Dataset: cifar10


### Table 2: Five Fautly Clients, densenet121, cifar10, and 30 clients 

In [5]:
args.sampling = "iid"
args.faulty_clients_ids = "0,1,3,4,7" # can be multiple clients separated by comma e.g. "0,1,2"  but keep under args.clients clients and at max less than 7 
args.model = "densenet121" # [resnet18, resnet34, resnet50, densenet121, vgg16]
args.dataset = "cifar10" # ['cifar10', 'femnist']
args.clients = 30 

# FL training
c2ms, exp2info = trainFLMain(args)
client2models = {k: v.model.eval() for k, v in c2ms.items()}

# Fault localazation
potential_faulty_clients, _, _ = runFaultyClientLocalization(
    client2models=client2models, exp2info=exp2info, num_bugs=len(exp2info['faulty_clients_ids']))
fault_acc = evaluateFaultLocalization(
    potential_faulty_clients, exp2info['faulty_clients_ids'])
# print(f"Fault Localization Acc: {fault_acc}")

print(f"#Table 2: Fault Localization Accuracy: {fault_acc}, Distribution: {args.sampling},  Faulty clients: {len(args.faulty_clients_ids.split(','))}, Total Clients: {args.clients}, Architecture: {args.model}, Dataset: {args.dataset}")



  ***Simulating FL setup iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001 ***
Files already downloaded and verified
Files already downloaded and verified
Spliting Datasets 50000 into parts:[1686, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666, 1666]
input shape, torch.Size([1, 3, 32, 32])
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/faulty_client_0_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 150, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 6: 100%|██████████| 4/4 [00:02<00:00,  1.39it/s, loss=2.35, train_acc=0.105, train_loss=2.300, val_acc=0.0883, val_loss=3.840] Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.52it/s, loss=2.33, train_acc=0.125, train_loss=2.310, val_acc=0.0907, val_loss=2.500] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.52it/s, loss=2.33, train_acc=0.125, train_loss=2.310, val_acc=0.0907, val_loss=2.500]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/faulty_client_1_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.48it/s, loss=2.33, train_acc=0.106, train_loss=2.320, val_acc=0.0767, val_loss=2.380]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.48it/s, loss=2.33, train_acc=0.106, train_loss=2.320, val_acc=0.0767, val_loss=2.380]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/faulty_client_3_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.48it/s, loss=2.33, train_acc=0.0916, train_loss=2.340, val_acc=0.0748, val_loss=2.400]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.48it/s, loss=2.33, train_acc=0.0916, train_loss=2.340, val_acc=0.0748, val_loss=2.400]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/faulty_client_4_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:03<00:00,  1.10it/s, loss=2.35, train_acc=0.0867, train_loss=2.320, val_acc=0.0989, val_loss=2.660]Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.25it/s, loss=2.32, train_acc=0.126, train_loss=2.290, val_acc=0.0912, val_loss=2.340] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s, loss=2.32, train_acc=0.126, train_loss=2.290, val_acc=0.0912, val_loss=2.340]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/faulty_client_7_noise_rate_1_classes.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 6: 100%|██████████| 4/4 [00:03<00:00,  1.09it/s, loss=2.35, train_acc=0.112, train_loss=2.330, val_acc=0.077, val_loss=2.660] Epoch 00007: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.63it/s, loss=2.32, train_acc=0.119, train_loss=2.310, val_acc=0.0674, val_loss=2.370] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.62it/s, loss=2.32, train_acc=0.119, train_loss=2.310, val_acc=0.0674, val_loss=2.370]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_2.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.62it/s, loss=0.456, train_acc=0.927, train_loss=0.301, val_acc=0.722, val_loss=1.220]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.61it/s, loss=0.456, train_acc=0.927, train_loss=0.301, val_acc=0.722, val_loss=1.220]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_5.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.32it/s, loss=0.477, train_acc=0.922, train_loss=0.305, val_acc=0.727, val_loss=1.170]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.32it/s, loss=0.477, train_acc=0.922, train_loss=0.305, val_acc=0.727, val_loss=1.170]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_6.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.23it/s, loss=0.481, train_acc=0.890, train_loss=0.368, val_acc=0.726, val_loss=1.250]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.23it/s, loss=0.481, train_acc=0.890, train_loss=0.368, val_acc=0.726, val_loss=1.250]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_8.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.43it/s, loss=0.479, train_acc=0.906, train_loss=0.302, val_acc=0.734, val_loss=1.120]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.43it/s, loss=0.479, train_acc=0.906, train_loss=0.302, val_acc=0.734, val_loss=1.120]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_9.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.53it/s, loss=0.421, train_acc=0.866, train_loss=0.365, val_acc=0.748, val_loss=1.180]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.53it/s, loss=0.421, train_acc=0.866, train_loss=0.365, val_acc=0.748, val_loss=1.180]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_10.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.24it/s, loss=0.468, train_acc=0.880, train_loss=0.358, val_acc=0.740, val_loss=1.120]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.23it/s, loss=0.468, train_acc=0.880, train_loss=0.358, val_acc=0.740, val_loss=1.120]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_11.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s, loss=0.404, train_acc=0.915, train_loss=0.296, val_acc=0.732, val_loss=1.200]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.22it/s, loss=0.404, train_acc=0.915, train_loss=0.296, val_acc=0.732, val_loss=1.200]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_12.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.44it/s, loss=0.511, train_acc=0.879, train_loss=0.351, val_acc=0.732, val_loss=1.210]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.44it/s, loss=0.511, train_acc=0.879, train_loss=0.351, val_acc=0.732, val_loss=1.210]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_13.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.40it/s, loss=0.413, train_acc=0.906, train_loss=0.313, val_acc=0.744, val_loss=1.150]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 00010: reducing learning rate of group 0 to 2.5000e-04.
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.40it/s, loss=0.413, train_acc=0.906, train_loss=0.313, val_acc=0.744, val_loss=1.150]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_14.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.33it/s, loss=0.484, train_acc=0.874, train_loss=0.362, val_acc=0.722, val_loss=1.200]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.33it/s, loss=0.484, train_acc=0.874, train_loss=0.362, val_acc=0.722, val_loss=1.200]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_15.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.61it/s, loss=0.456, train_acc=0.891, train_loss=0.337, val_acc=0.739, val_loss=1.150]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.60it/s, loss=0.456, train_acc=0.891, train_loss=0.337, val_acc=0.739, val_loss=1.150]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_16.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.33it/s, loss=0.467, train_acc=0.931, train_loss=0.266, val_acc=0.737, val_loss=1.170]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.33it/s, loss=0.467, train_acc=0.931, train_loss=0.266, val_acc=0.737, val_loss=1.170]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_17.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.62it/s, loss=0.514, train_acc=0.875, train_loss=0.389, val_acc=0.721, val_loss=1.280]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.61it/s, loss=0.514, train_acc=0.875, train_loss=0.389, val_acc=0.721, val_loss=1.280]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_18.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.42it/s, loss=0.478, train_acc=0.873, train_loss=0.318, val_acc=0.715, val_loss=1.310]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.41it/s, loss=0.478, train_acc=0.873, train_loss=0.318, val_acc=0.715, val_loss=1.310]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_19.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.29it/s, loss=0.477, train_acc=0.883, train_loss=0.332, val_acc=0.727, val_loss=1.190]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.29it/s, loss=0.477, train_acc=0.883, train_loss=0.332, val_acc=0.727, val_loss=1.190]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_20.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s, loss=0.45, train_acc=0.903, train_loss=0.356, val_acc=0.736, val_loss=1.150] 

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.18it/s, loss=0.45, train_acc=0.903, train_loss=0.356, val_acc=0.736, val_loss=1.150]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_21.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:04<00:00,  1.00s/it, loss=0.435, train_acc=0.863, train_loss=0.366, val_acc=0.725, val_loss=1.180]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:04<00:00,  1.00s/it, loss=0.435, train_acc=0.863, train_loss=0.366, val_acc=0.725, val_loss=1.180]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_22.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.25it/s, loss=0.488, train_acc=0.881, train_loss=0.393, val_acc=0.716, val_loss=1.320]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.25it/s, loss=0.488, train_acc=0.881, train_loss=0.393, val_acc=0.716, val_loss=1.320]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_23.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.07it/s, loss=0.455, train_acc=0.925, train_loss=0.284, val_acc=0.720, val_loss=1.280]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.07it/s, loss=0.455, train_acc=0.925, train_loss=0.284, val_acc=0.720, val_loss=1.280]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_24.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.34it/s, loss=0.459, train_acc=0.918, train_loss=0.295, val_acc=0.730, val_loss=1.150]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.33it/s, loss=0.459, train_acc=0.918, train_loss=0.295, val_acc=0.730, val_loss=1.150]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_25.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.44it/s, loss=0.475, train_acc=0.865, train_loss=0.339, val_acc=0.729, val_loss=1.190]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.43it/s, loss=0.475, train_acc=0.865, train_loss=0.339, val_acc=0.729, val_loss=1.190]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_26.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.28it/s, loss=0.454, train_acc=0.899, train_loss=0.354, val_acc=0.737, val_loss=1.140]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.28it/s, loss=0.454, train_acc=0.899, train_loss=0.354, val_acc=0.737, val_loss=1.140]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_27.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.17it/s, loss=0.426, train_acc=0.892, train_loss=0.348, val_acc=0.751, val_loss=1.100]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.17it/s, loss=0.426, train_acc=0.892, train_loss=0.348, val_acc=0.751, val_loss=1.100]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_28.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Train mod batch = 130, and drop_last = False
Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.28it/s, loss=0.516, train_acc=0.933, train_loss=0.287, val_acc=0.703, val_loss=1.260]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:03<00:00,  1.28it/s, loss=0.516, train_acc=0.933, train_loss=0.287, val_acc=0.703, val_loss=1.260]
Training : .storage/checkpoints/iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001/client_29.ckpt


Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Train mod batch = 130, and drop_last = False


LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.65it/s, loss=0.429, train_acc=0.885, train_loss=0.393, val_acc=0.739, val_loss=1.190]

`Trainer.fit` stopped: `max_epochs=10` reached.


Epoch 9: 100%|██████████| 4/4 [00:02<00:00,  1.65it/s, loss=0.429, train_acc=0.885, train_loss=0.393, val_acc=0.739, val_loss=1.190]
Total clients: 30
++Training is done: iid_densenet121_cifar10_clients_30_faulty_[0, 1, 3, 4, 7]_bsize_512_epochs_10_lr_0.001
>  Running FaultyClientLocalization ..
Same prediction threshold 5
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
+++ Faulty Clients {0, 1, 3, 4, 7}
#Table 2: Fault Localization Accuracy: 100.0, Distribution: iid,  Faulty clients: 5, Total Clients: 30, Architecture: densenet121, Dataset: cifar10


### Table 2: Three Fautly Clients, resnet-50, cifar10, and 30 clients 

In [7]:
# args.sampling = "iid"
# args.faulty_clients_ids = "0,1,3" # can be multiple clients separated by comma e.g. "0,1,2"  but keep under args.clients clients and at max less than 7 
# args.model = "resnet50" # [resnet18, resnet34, resnet50, densenet121, vgg16]
# args.dataset = "cifar10" # ['cifar10', 'femnist']
# args.clients = 30 

# # FL training
# c2ms, exp2info = trainFLMain(args)
# client2models = {k: v.model.eval() for k, v in c2ms.items()}

# # Fault localazation
# potential_faulty_clients, _, _ = runFaultyClientLocalization(
#     client2models=client2models, exp2info=exp2info, num_bugs=len(exp2info['faulty_clients_ids']))
# fault_acc = evaluateFaultLocalization(
#     potential_faulty_clients, exp2info['faulty_clients_ids'])
# # print(f"Fault Localization Acc: {fault_acc}")

# print(f"#Table 2: Fault Localization Accuracy: {fault_acc}, Distribution: {args.sampling},  Faulty clients: {len(args.faulty_clients_ids.split(','))}, Total Clients: {args.clients}, Architecture: {args.model}, Dataset: {args.dataset}")