DieHardNET

Repository for a reliable Deep Neural Network (DNN) model. DieHardNet stands for Design improved Hardened neural Network

Directories

The directories are organized as follows:

hg_noise_injector - A module to inject realistic errors in the training process
eval_fault_injection_cfg - Configuration files for NVBITFI for fault injection
pytorch_scripts - PyTorch scripts for training and inference for the used DNNs. For information, read the README.

Main script options

usage: main.py [-h] [--name NAME] [--mode MODE] [--ckpt CKPT] [--dataset DATASET] [--data_dir DATA_DIR] [--device DEVICE] [--loss LOSS] [--clip CLIP] [--epochs EPOCHS] [--batch_size BATCH_SIZE] [--lr LR] [--optimizer OPTIMIZER] [--model MODEL] [--order ORDER]
               [--affine AFFINE] [--activation ACTIVATION] [--nan NAN] [--error_model ERROR_MODEL] [--inject_p INJECT_P] [--inject_epoch INJECT_EPOCH] [--wd WD] [--rand_aug RAND_AUG] [--rand_erasing RAND_ERASING] [--mixup_cutmix MIXUP_CUTMIX] [--jitter JITTER]
               [--label_smooth LABEL_SMOOTH] [--seed SEED] [--comment COMMENT]

PyTorch Training

optional arguments:
  -h, --help            show this help message and exit
  --name NAME           Experiment name.
  --mode MODE           Mode: train/training or validation/validate.
  --ckpt CKPT           Pass the name of a checkpoint to resume training.
  --dataset DATASET     Dataset name: cifar10 or cifar100.
  --data_dir DATA_DIR   Path to dataset.
  --device DEVICE       Device number.
  --loss LOSS           Loss: bce, ce or sce.
  --clip CLIP           Gradient clipping value.
  --epochs EPOCHS       Number of epochs.
  --batch_size BATCH_SIZE
                        Batch Size
  --lr LR               Learning rate.
  --optimizer OPTIMIZER
                        Optimizer name: adamw or sgd.
  --model MODEL         Network name. Resnets only for now.
  --order ORDER         Order of activation and normalization: bn-relu or relu-bn.
  --affine AFFINE       Whether to use Affine transform after normalization or not.
  --activation ACTIVATION
                        Non-linear activation: relu or relu6.
  --nan NAN             Whether to convert NaNs to 0 or not.
  --error_model ERROR_MODEL
                        Optimizer name: adamw or sgd.
  --inject_p INJECT_P   Probability of noise injection at training time.
  --inject_epoch INJECT_EPOCH
                        How many epochs before starting the injection.
  --wd WD               Weight Decay.
  --rand_aug RAND_AUG   RandAugment magnitude and std.
  --rand_erasing RAND_ERASING
                        Random Erasing propability.
  --mixup_cutmix MIXUP_CUTMIX
                        Whether to use mixup/cutmix or not.
  --jitter JITTER       Color jitter.
  --label_smooth LABEL_SMOOTH
                        Label Smoothing.
  --seed SEED           Random seed for reproducibility.
  --comment COMMENT     Optional comment.

To cite this work

The paper that describes the DieHardNet concept:

2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)

@INPROCEEDINGS{diehardnetIOLTS2022,
  author={Cavagnero, Niccolò and Santos, Fernando Dos and Ciccone, Marco and Averta, 
          Giuseppe and Tommasi, Tatiana and Rech, Paolo},
  booktitle={2022 IEEE 28th International Symposium on On-Line Testing and Robust System Design (IOLTS)}, 
  title={Transient-Fault-Aware Design and Training to Enhance DNNs Reliability with Zero-Overhead}, 
  year={2022},
  pages={1-7},
  doi={10.1109/IOLTS56730.2022.9897813}
}

The paper that presents the neutron beam validation of DieHardNet:

IEEE Transactions on Emerging Topics in Computing

@article{diehardnetTETC2024,
  TITLE = {{Improving Deep Neural Network Reliability via Transient-Fault-Aware Design and Training}},
  AUTHOR = {Fernandes dos Santos, Fernando and Cavagnero, Niccol{\`o} and Ciccone, Marco and Averta, Giuseppe and Kritikakou, Angeliki and Sentieys, Olivier and Rech, Paolo and Tommasi, Tatiana},
  URL = {https://hal.science/hal-04818068},
  JOURNAL = {{IEEE Transactions on Emerging Topics in Computing}},
  PUBLISHER = {{Institute of Electrical and Electronics Engineers}},
  PAGES = {1-12},
  YEAR = {2024},
  KEYWORDS = {Deep Learning ; Reliability ; Neutrons ; GPUs ; Radiation-induced faults ✦},
  PDF = {https://hal.science/hal-04818068v1/file/tetc_2023_diehardnet.pdf},
  HAL_ID = {hal-04818068},
  HAL_VERSION = {v1},
}

Neutron beam evaluations

The setup files and scripts for validating with neutron beams are available at diehardnetradsetup

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
configurations		configurations
pytorch_scripts		pytorch_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
diehard.jpg		diehard.jpg
distrib_plots.py		distrib_plots.py
main.py		main.py
requirements.txt		requirements.txt
run_nvbitfi.py		run_nvbitfi.py
run_pytorchfi.py		run_pytorchfi.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DieHardNET

Directories

Main script options

To cite this work

Neutron beam evaluations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DieHardNET

Directories

Main script options

To cite this work

Neutron beam evaluations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages