In [1]:
# 06_Train_DistMult
#
# created by LuYF-Lemon-love <luyanfeng_nlp@qq.com> on February 27, 2023
# updated by LuYF-Lemon-love <luyanfeng_nlp@qq.com> on February 27, 2023
#
# 该脚本展示了如何在 DRKG 上训练模型 (DistMult), 并利用网格搜索寻找到最优参数.
#
# 需要的包:
#          torch
#          dgl, version: 0.4.3
#          dglke
#          numpy
#
# 需要的文件:
#          ./dataset
#
# 源教程链接: https://github.com/gnn4dr/DRKG/blob/master/embedding_analysis/Train_embeddings.ipynb

# Training DRKG Using DistMult

这个 notebook 展示了如何在 DRKG 上训练模型 (DistMult), 并利用网格搜索寻找到最优参数.

## 导入需要的库

In [2]:
import numpy as np

## 网格搜索参数

我们能使用 DGL-KE 命令训练 DistMult 模型, 关于如何使用 DGL-KE 的更多信息请参考 https://github.com/awslabs/dgl-ke.

这里我们使用两个 GPU 训练模型.

### 1

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: **50**, 125, 200

- lr: **0.01**, 0.05, 0.1

In [3]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 50.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.312 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.42179756767898796
[proc 0][Train](20000/100000) average pos_loss: 0.4218365873351693
[proc 1][Train](20000/100000) average neg_loss: 0.6085648582309484
[proc 0][Train](20000/100000) average neg_loss: 0.6083566954851151
[proc 1][Train](20000/100000) average loss: 0.5151812129944563
[proc 0][Train](20000/100000) average loss: 0.5150966415151954
[proc 1][Train](20000/100000) average regularization: 0.020644050214713206
[proc 1][Train] 20000 steps take 355.567 seconds
[proc 1]sample: 64.617, forward: 152.108, backward: 48.961, update: 83.055
[proc 0][Train](20000/100000) average r

### 2

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: **50**, 125, 200

- lr: 0.01, **0.05**, 0.1

In [4]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 50.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.438 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.40029544595479966
[proc 0][Train](20000/100000) average pos_loss: 0.40069517919272185
[proc 0][Train](20000/100000) average neg_loss: 0.5240841154903173
[proc 1][Train](20000/100000) average neg_loss: 0.525641973157227
[proc 0][Train](20000/100000) average loss: 0.46238964741081
[proc 1][Train](20000/100000) average loss: 0.46296870943009855
[proc 0][Train](20000/100000) average regularization: 0.0254632799379935
[proc 1][Train](20000/100000) average regularization: 0.025405513223598245
[proc 0][Train] 20000 steps take 425.216 seconds
[proc 0]sample: 68.003, forward: 153.047, 

### 3

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: **50**, 125, 200

- lr: 0.01, 0.05, **0.1**

In [5]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 50.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.386 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.3697479482918978
[proc 1][Train](20000/100000) average pos_loss: 0.3705442703962326
[proc 0][Train](20000/100000) average neg_loss: 0.5007138380482793
[proc 1][Train](20000/100000) average neg_loss: 0.5011041799411178
[proc 0][Train](20000/100000) average loss: 0.43523089328855274
[proc 1][Train](20000/100000) average loss: 0.4358242250487208
[proc 0][Train](20000/100000) average regularization: 0.028166449290263698
[proc 0][Train] 20000 steps take 372.339 seconds
[proc 0]sample: 68.729, forward: 159.020, backward: 52.152, update: 92.166
[proc 1][Train](20000/100000) average r

### 4

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, **125**, 200

- lr: **0.01**, 0.05, 0.1

In [6]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 125.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.263 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.4636830395489931
[proc 0][Train](20000/100000) average pos_loss: 0.4631375725805759
[proc 1][Train](20000/100000) average neg_loss: 0.6498785514950752
[proc 0][Train](20000/100000) average neg_loss: 0.6498611712157726
[proc 1][Train](20000/100000) average loss: 0.5567807957112789
[proc 1][Train](20000/100000) average regularization: 0.023433260954963044
[proc 0][Train](20000/100000) average loss: 0.5564993717148901
[proc 1][Train] 20000 steps take 378.413 seconds
[proc 1]sample: 64.727, forward: 151.948, backward: 46.615, update: 87.780
[proc 0][Train](20000/100000) average re

### 5

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, **125**, 200

- lr: 0.01, **0.05**, 0.1

In [7]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 125.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.298 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.40686404414772986
[proc 1][Train](20000/100000) average pos_loss: 0.4068410370916128
[proc 0][Train](20000/100000) average neg_loss: 0.5964312821388245
[proc 1][Train](20000/100000) average neg_loss: 0.5967325039222836
[proc 0][Train](20000/100000) average loss: 0.5016476630613208
[proc 1][Train](20000/100000) average loss: 0.5017867705523967
[proc 0][Train](20000/100000) average regularization: 0.030246382289892064
[proc 0][Train] 20000 steps take 364.646 seconds
[proc 0]sample: 68.648, forward: 154.113, backward: 50.609, update: 90.963
[proc 1][Train](20000/100000) average r

### 6

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, **125**, 200

- lr: 0.01, 0.05, **0.1**

In [8]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 125.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.173 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.39375939021855594
[proc 0][Train](20000/100000) average pos_loss: 0.39479281584471465
[proc 1][Train](20000/100000) average neg_loss: 0.5217447260841728
[proc 0][Train](20000/100000) average neg_loss: 0.5194535019516945
[proc 1][Train](20000/100000) average loss: 0.4577520582139492
[proc 0][Train](20000/100000) average loss: 0.4571231588438153
[proc 1][Train](20000/100000) average regularization: 0.02917581582288258
[proc 1][Train] 20000 steps take 421.509 seconds
[proc 1]sample: 65.513, forward: 149.961, backward: 47.434, update: 88.523
[proc 0][Train](20000/100000) average r

### 7

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, 125, **200**

- lr: **0.01**, 0.05, 0.1

In [9]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 200.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.299 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.5182063047111034
[proc 1][Train](20000/100000) average pos_loss: 0.5181976607844233
[proc 0][Train](20000/100000) average neg_loss: 0.7419600798189641
[proc 1][Train](20000/100000) average neg_loss: 0.7419044578105212
[proc 0][Train](20000/100000) average loss: 0.6300831921756268
[proc 0][Train](20000/100000) average regularization: 0.039160269008763134
[proc 1][Train](20000/100000) average loss: 0.6300510591447354
[proc 0][Train] 20000 steps take 373.205 seconds
[proc 0]sample: 68.089, forward: 159.606, backward: 50.808, update: 94.344
[proc 1][Train](20000/100000) average re

### 8

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, 125, **200**

- lr: 0.01, **0.05**, 0.1

In [10]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 200.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.385 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.4436071756899357
[proc 1][Train](20000/100000) average pos_loss: 0.44374067816734314
[proc 1][Train](20000/100000) average neg_loss: 0.6223357828766107
[proc 0][Train](20000/100000) average neg_loss: 0.6227474636226893
[proc 1][Train](20000/100000) average loss: 0.5330382306039333
[proc 0][Train](20000/100000) average loss: 0.5331773195847869
[proc 1][Train](20000/100000) average regularization: 0.038742702197469774
[proc 0][Train](20000/100000) average regularization: 0.03875473017254844
[proc 1][Train] 20000 steps take 360.922 seconds
[proc 1]sample: 64.849, forward: 150.290

### 9

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: **200**, 400

- gamma: 50, 125, **200**

- lr: 0.01, 0.05, **0.1**

In [11]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 200 \
--gamma 200.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.140 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.42567392603605986
[proc 0][Train](20000/100000) average pos_loss: 0.42609611642956735
[proc 1][Train](20000/100000) average neg_loss: 0.5435431803569197
[proc 0][Train](20000/100000) average neg_loss: 0.543165649817884
[proc 1][Train](20000/100000) average loss: 0.48460855323672297
[proc 0][Train](20000/100000) average loss: 0.4846308830395341
[proc 1][Train](20000/100000) average regularization: 0.032344718007743356
[proc 0][Train](20000/100000) average regularization: 0.03235437264582142
[proc 1][Train] 20000 steps take 366.254 seconds
[proc 1]sample: 68.019, forward: 156.58

### 10

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: **50**, 125, 200

- lr: **0.01**, 0.05, 0.1

In [12]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 50.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.431 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.4158593353509903
[proc 0][Train](20000/100000) average pos_loss: 0.41543584139496087
[proc 1][Train](20000/100000) average neg_loss: 0.5876080206394195
[proc 0][Train](20000/100000) average neg_loss: 0.5878328572645783
[proc 1][Train](20000/100000) average loss: 0.5017336778134107
[proc 0][Train](20000/100000) average loss: 0.5016343495354056
[proc 1][Train](20000/100000) average regularization: 0.020581230253275136
[proc 0][Train](20000/100000) average regularization: 0.02059113002899685
[proc 1][Train] 20000 steps take 624.818 seconds
[proc 1]sample: 71.123, forward: 226.880

### 11

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: **50**, 125, 200

- lr: 0.01, **0.05**, 0.1

In [13]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 50.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.483 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.383358545742929
[proc 1][Train](20000/100000) average pos_loss: 0.383545296856761
[proc 0][Train](20000/100000) average neg_loss: 0.5061397173285485
[proc 1][Train](20000/100000) average neg_loss: 0.5061507267206907
[proc 0][Train](20000/100000) average loss: 0.4447491315081716
[proc 1][Train](20000/100000) average loss: 0.44484801180809735
[proc 1][Train](20000/100000) average regularization: 0.02617844171752222
[proc 0][Train](20000/100000) average regularization: 0.02617931953517691
[proc 1][Train] 20000 steps take 485.211 seconds
[proc 1]sample: 69.725, forward: 240.174, b

### 12

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: **50**, 125, 200

- lr: 0.01, 0.05, **0.1**

In [14]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 50.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.583 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.3405415379628539
[proc 1][Train](20000/100000) average pos_loss: 0.340508358488977
[proc 0][Train](20000/100000) average neg_loss: 0.48535354651361706
[proc 0][Train](20000/100000) average loss: 0.4129475421488285
[proc 1][Train](20000/100000) average neg_loss: 0.48885482498258354
[proc 0][Train](20000/100000) average regularization: 0.03479059743471444
[proc 0][Train] 20000 steps take 626.788 seconds
[proc 0]sample: 71.626, forward: 244.257, backward: 55.004, update: 255.613
[proc 1][Train](20000/100000) average loss: 0.41468159174621105
[proc 1][Train](20000/100000) average 

### 13

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, **125**, 200

- lr: **0.01**, 0.05, 0.1

In [15]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 125.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.627 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.4095848518535495
[proc 0][Train](20000/100000) average pos_loss: 0.40914382050037384
[proc 1][Train](20000/100000) average neg_loss: 0.5988265433490276
[proc 1][Train](20000/100000) average loss: 0.5042056974336505
[proc 0][Train](20000/100000) average neg_loss: 0.5997970301076769
[proc 1][Train](20000/100000) average regularization: 0.023655453068576752
[proc 1][Train] 20000 steps take 606.669 seconds
[proc 1]sample: 70.234, forward: 217.795, backward: 54.631, update: 263.666
[proc 0][Train](20000/100000) average loss: 0.5044704253330827
[proc 0][Train](20000/100000) average 

### 14

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, **125**, 200

- lr: 0.01, **0.05**, 0.1

In [16]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 125.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.706 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.37238707480877636
[proc 1][Train](20000/100000) average pos_loss: 0.3717193614915013
[proc 0][Train](20000/100000) average neg_loss: 0.4951714565634727
[proc 0][Train](20000/100000) average loss: 0.43377926576137543
[proc 1][Train](20000/100000) average neg_loss: 0.49749376702308656
[proc 0][Train](20000/100000) average regularization: 0.029983778939163312
[proc 1][Train](20000/100000) average loss: 0.43460656431913375
[proc 0][Train] 20000 steps take 632.778 seconds
[proc 0]sample: 70.859, forward: 231.346, backward: 52.323, update: 277.957
[proc 1][Train](20000/100000) avera

### 15

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, **125**, 200

- lr: 0.01, 0.05, **0.1**

In [17]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 125.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.310 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.3483994608566165
[proc 1][Train](20000/100000) average pos_loss: 0.34877421112954615
[proc 0][Train](20000/100000) average neg_loss: 0.493459678901732
[proc 1][Train](20000/100000) average neg_loss: 0.49351727146953345
[proc 0][Train](20000/100000) average loss: 0.4209295699775219
[proc 1][Train](20000/100000) average loss: 0.421145741315186
[proc 0][Train](20000/100000) average regularization: 0.03380008188744541
[proc 1][Train](20000/100000) average regularization: 0.03388066169428639
[proc 0][Train] 20000 steps take 492.715 seconds
[proc 0]sample: 69.655, forward: 242.307, 

### 16

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, 125, **200**

- lr: **0.01**, 0.05, 0.1

In [18]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 200.0 --lr 0.01 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.444 seconds
[proc 1][Train](20000/100000) average pos_loss: 0.41796293351352215
[proc 0][Train](20000/100000) average pos_loss: 0.417397934910655
[proc 1][Train](20000/100000) average neg_loss: 0.6264719388902187
[proc 0][Train](20000/100000) average neg_loss: 0.6264910311222076
[proc 1][Train](20000/100000) average loss: 0.522217436374724
[proc 1][Train](20000/100000) average regularization: 0.0276744533224497
[proc 0][Train](20000/100000) average loss: 0.5219444829180837
[proc 1][Train] 20000 steps take 519.393 seconds
[proc 1]sample: 72.909, forward: 259.388, backward: 54.352, update: 130.293
[proc 0][Train](20000/100000) average regu

### 17

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, 125, **200**

- lr: 0.01, **0.05**, 0.1

In [19]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 200.0 --lr 0.05 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.350 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.39820926177799704
[proc 1][Train](20000/100000) average pos_loss: 0.3997192317456007
[proc 0][Train](20000/100000) average neg_loss: 0.5146068893954158
[proc 1][Train](20000/100000) average neg_loss: 0.514714073894918
[proc 0][Train](20000/100000) average loss: 0.45640807555168866
[proc 0][Train](20000/100000) average regularization: 0.031306551433261484
[proc 1][Train](20000/100000) average loss: 0.45721665294617414
[proc 0][Train] 20000 steps take 495.305 seconds
[proc 0]sample: 69.624, forward: 238.967, backward: 50.915, update: 134.523
[proc 1][Train](20000/100000) average

### 18

- batch_size: **4096**

- neg_sample_size: **256**

- hidden_dim: 200, **400**

- gamma: 50, 125, **200**

- lr: 0.01, 0.05, **0.1**

In [20]:
!DGLBACKEND=pytorch dglke_train --dataset DRKG --data_path ./dataset \
--data_files drkg_train.tsv drkg_valid.tsv drkg_test.tsv --format 'raw_udd_hrt' \
--model_name DistMult \
--batch_size 4096 --neg_sample_size 256 --hidden_dim 400 \
--gamma 200.0 --lr 0.1 --max_step 100000 -adv --regularization_coef 1.00E-07 \
--gpu 0 1 --num_proc 2 --mix_cpu_gpu --async_update --force_sync_interval 1000 \
--valid --test \
--batch_size_eval 128 --neg_sample_size_eval 10000 \
--log_interval 20000 --eval_interval 50000 --num_thread 32

Reading train triples....
Finished. Read 5286834 train triples.
Reading valid triples....
Finished. Read 293713 valid triples.
Reading test triples....
Finished. Read 293714 test triples.
|Train|: 5286834
random partition 5286834 edges into 2 parts
part 0 has 2643417 edges
part 1 has 2643417 edges
|valid|: 293713
|test|: 293714
Total initialize time 16.318 seconds
[proc 0][Train](20000/100000) average pos_loss: 0.3673332645520568
[proc 1][Train](20000/100000) average pos_loss: 0.3676285136580467
[proc 0][Train](20000/100000) average neg_loss: 0.5051938116028905
[proc 1][Train](20000/100000) average neg_loss: 0.5057804387018084
[proc 0][Train](20000/100000) average loss: 0.43626353810876606
[proc 1][Train](20000/100000) average loss: 0.43670447624772785
[proc 0][Train](20000/100000) average regularization: 0.034539960472658275
[proc 1][Train](20000/100000) average regularization: 0.034630826802179215
[proc 0][Train] 20000 steps take 491.451 seconds
[proc 0]sample: 70.626, forward: 236.7