# Trainers

## Multi-GPU on a Single Node
## (1) Torchrun with Distributed Data Parallel (DDP)
```bash
torchrun --standalone --nproc_per_node=gpu main.py --model Keci --trainer torchDDP --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --p 1 --q 2 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None
# or
torchrun --standalone --nproc_per_node=gpu main.py --model Pykeen_ComplEx --trainer torchDDP --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None
```
There is a memory leakage at Pykeen models. Memory usage seems increase.
## (2) Pytorch-Lightning (PL) with DDP
```bash
python main.py --trainer 'PL' --accelerator "gpu" --strategy "ddp" --model Keci --trainer 'PL' --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --p 1 --q 2 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None
```

## (3) PL with DDP and Low Precision
The following command runs on 2 RTX 3090 with 24 GB and uses 7484MiB MiB / 24576 MiB.
```bash
python main.py --trainer 'PL' --accelerator "gpu" --strategy "ddp" --model Keci --trainer 'PL' --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --p 1 --q 2 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None --batch_size 1024
```
Memory Usage: (7082MiB / 24576MiB)
```bash
python main.py --trainer 'PL' --accelerator "gpu" --strategy "ddp" --model Keci --trainer 'PL' --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --p 1 --q 2 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None --batch_size 1024 --precision 16
```
## (4) Pytorch-Lightning (PL) with Deep Speed 1-3
Didn't observe any memory reduction with deep speed in terms of GPU memory usage.
```bash
python main.py --trainer 'PL' --accelerator "gpu" --strategy "deepspeed_stage_3" --model Keci --trainer 'PL' --num_epochs 1 --scoring_technique KvsAll --embedding_dim 256 --p 1 --q 2 --path_dataset_folder "KGs/YAGO3-10" --eval_mode None --batch_size 1024 --precision 16
```
## Multi-GPU on Multi-Node

## (1) Torchrun and DDP

Execute the following command on the node 1.
```bash
torchrun --nnodes 2 --nproc_per_node=gpu  --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula main.py --model 'ComplEx' --embedding_dim 32 --num_epochs 100 --path_dataset_folder 'KGs/UMLS' --trainer torchDDP
```
Execute the following command on the node 2
```bash
torchrun --nnodes 2 --nproc_per_node=gpu  --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=nebula main.py --model 'ComplEx' --embedding_dim 32 --num_epochs 100 --path_dataset_folder 'KGs/UMLS' --trainer torchDDP
```

Execute the following command on the node 1.
```bash
torchrun --nnodes 2 --nproc_per_node=gpu  --node_rank 0 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=felis main.py --model 'Pykeen_QuatE' --embedding_dim 32 --num_epochs 100 --path_dataset_folder 'KGs/UMLS' --trainer torchDDP
```
Execute the following command on the node 2
```bash
torchrun --nnodes 2 --nproc_per_node=gpu  --node_rank 1 --rdzv_id 455 --rdzv_backend c10d --rdzv_endpoint=felis main.py --model 'Pykeen_QuatE' --embedding_dim 32 --num_epochs 100 --path_dataset_folder 'KGs/UMLS' --trainer torchDDP
```

## Multi-GPU on Multi-Node with Model Parallel
```bash
python main.py --trainer 'PL' --accelerator "gpu" --strategy "deepspeed" --model Pykeen_QuatE --num_epochs 1 --embedding_dim 1 --batch_size 256 --scoring_technique KvsAll --path_dataset_folder "KGs/YAGO3-10" --eval_model None
```