Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Update for Block API #1261

Merged
merged 7 commits into from
Jul 17, 2020
Merged

Update for Block API #1261

merged 7 commits into from
Jul 17, 2020

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Jul 10, 2020

  • Remove params and prefix arguments for MXNet 2 and update
    parameter sharing implementation
  • Remove Block.name_scope() for MXNet 2
  • Remove self.params.get() and self.params.get_constant()

CI will pass once apache/mxnet#18619 is merged

Please review the API changes. Scripts are not updated by this PR, but at least some will be updated and included here following verification of fine-tuning performance and NMT training. Note that it's not required to re-generate the parameter files.

Thanks to @acphile for his hard work on the Gluon API refactor on the MXNet side (apache/mxnet@cb54a4a)

- Remove params and prefix arguments for MXNet 2 and update
  parameter sharing implementation
- Remove Block.name_scope() for MXNet 2
- Remove self.params.get() and self.params.get_constant()
@sxjscience
Copy link
Member

@zheyuye @hymzoque Would you also help review?

@leezu
Copy link
Contributor Author

leezu commented Jul 11, 2020

% export VERSION=2.0
export MODEL_NAME=google_albert_base_v2
python -m pdb -c continue run_squad.py \
    --model_name ${MODEL_NAME} \
    --data_dir squad \
    --output_dir fintune_${MODEL_NAME}_squad_${VERSION} \
    --version ${VERSION} \
    --do_eval \
    --do_train \
    --batch_size 4 \
    --num_accumulated 3 \
    --gpus 0,1,2,3 \
    --epochs 3 \
    --lr 2e-5 \
    --warmup_ratio 0.1 \
    --wd=0.01 \
    --max_seq_length 512 \
    --max_grad_norm 0.1 \
    --overwrite_cache
All Logs will be saved to fintune_google_albert_base_v2_squad_2.0/finetune_squad2.0.log
[00:17:07] ../src/storage/storage.cc:110: Using GPUPooledRoundedStorageManager.
[00:17:09] ../src/storage/storage.cc:110: Using GPUPooledRoundedStorageManager.
[00:17:12] ../src/storage/storage.cc:110: Using GPUPooledRoundedStorageManager.
[00:17:14] ../src/storage/storage.cc:110: Using GPUPooledRoundedStorageManager.
2020-07-11 00:17:19,894 - root - INFO - Loading Backbone Model from /home/ubuntu/.mxnet/models/nlp/google_albert_base_v2/model-125be477.params, with total/fixd parameters=11092992/0
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'weight' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'bias' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'gamma' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'beta' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 442/442 [00:00<00:00, 1235.96it/s]
2020-07-11 00:17:20,959 - root - INFO - Load data from squad, Version=2.0
2020-07-11 00:17:20,959 - root - INFO - Tokenize Training Data:
2020-07-11 00:17:43,094 - root - INFO - Done! Time spent:22.14 seconds
2020-07-11 00:17:55,256 - root - INFO - Processing the Training data:
2020-07-11 00:18:02,455 - root - INFO - Done! #Unreliable Span=18 / #Mismatched Answer=30 / #Total=130319
2020-07-11 00:18:02,518 - root - INFO - Before Chunking, #Train/Is Impossible = 130319/43498
2020-07-11 00:18:02,519 - root - INFO - After Chunking, #Train Sample/Is Impossible = 130614/43737
2020-07-11 00:18:02,519 - root - INFO - Using gradient accumulation. Effective global batch size = 48
2020-07-11 00:18:02,570 - root - INFO - #Total Training Steps=8164, Warmup=816, Save Interval=2721
[00:18:04] ../src/kvstore/././comm.h:757: only 0 out of 12 GPU pairs are enabled direct access. It may affect the performance. You can set MXNET_ENABLE_GPU_P2P=0 to turn it off
[00:18:04] ../src/kvstore/././comm.h:766: ....
[00:18:04] ../src/kvstore/././comm.h:766: ....
[00:18:04] ../src/kvstore/././comm.h:766: ....
[00:18:04] ../src/kvstore/././comm.h:766: ....
2020-07-11 00:19:27,266 - root - INFO - Epoch: 1, Batch: 300/8164, Loss span/answer/total=3.5320/0.3241/3.8561, LR=0.00000245, grad_norm=45.7169. Time cost=84.68, Throughput=56.68 samples/s ETA=1.90h
2020-07-11 00:20:55,263 - root - INFO - Epoch: 1, Batch: 600/8164, Loss span/answer/total=1.5043/0.3069/1.8112, LR=0.00000490, grad_norm=28.4342. Time cost=88.00, Throughput=54.55 samples/s ETA=1.91h
2020-07-11 00:22:27,184 - root - INFO - Epoch: 1, Batch: 900/8164, Loss span/answer/total=1.1916/0.2865/1.4781, LR=0.00000735, grad_norm=25.8383. Time cost=91.92, Throughput=52.22 samples/s ETA=1.93h
2020-07-11 00:23:56,512 - root - INFO - Epoch: 1, Batch: 1200/8164, Loss span/answer/total=1.0523/0.2417/1.2940, LR=0.00000980, grad_norm=33.3647. Time cost=89.33, Throughput=53.73 samples/s ETA=1.91h
2020-07-11 00:25:25,283 - root - INFO - Epoch: 1, Batch: 1500/8164, Loss span/answer/total=0.9825/0.2295/1.2121, LR=0.00001225, grad_norm=34.0635. Time cost=88.77, Throughput=54.07 samples/s ETA=1.88h
2020-07-11 00:26:55,038 - root - INFO - Epoch: 1, Batch: 1800/8164, Loss span/answer/total=0.9403/0.2245/1.1648, LR=0.00001471, grad_norm=31.0572. Time cost=89.75, Throughput=53.48 samples/s ETA=1.86h
2020-07-11 00:28:24,433 - root - INFO - Epoch: 1, Batch: 2100/8164, Loss span/answer/total=0.8861/0.2158/1.1019, LR=0.00001716, grad_norm=22.4757. Time cost=89.39, Throughput=53.69 samples/s ETA=1.84h
2020-07-11 00:29:53,440 - root - INFO - Epoch: 1, Batch: 2400/8164, Loss span/answer/total=0.8841/0.2107/1.0948, LR=0.00001961, grad_norm=24.0204. Time cost=89.01, Throughput=53.93 samples/s ETA=1.82h
2020-07-11 00:31:22,515 - root - INFO - Epoch: 1, Batch: 2700/8164, Loss span/answer/total=0.8298/0.2144/1.0442, LR=0.00001977, grad_norm=30.5798. Time cost=89.07, Throughput=53.89 samples/s ETA=1.79h
2020-07-11 00:32:50,504 - root - INFO - Epoch: 1, Batch: 3000/8164, Loss span/answer/total=0.8359/0.1936/1.0295, LR=0.00001950, grad_norm=22.0126. Time cost=87.99, Throughput=54.55 samples/s ETA=1.77h
2020-07-11 00:34:18,924 - root - INFO - Epoch: 1, Batch: 3300/8164, Loss span/answer/total=0.7798/0.1823/0.9621, LR=0.00001923, grad_norm=23.6958. Time cost=88.42, Throughput=54.29 samples/s ETA=1.74h
2020-07-11 00:35:47,793 - root - INFO - Epoch: 1, Batch: 3600/8164, Loss span/answer/total=0.7839/0.1830/0.9670, LR=0.00001895, grad_norm=16.4897. Time cost=88.87, Throughput=54.01 samples/s ETA=1.72h
2020-07-11 00:37:18,464 - root - INFO - Epoch: 1, Batch: 3900/8164, Loss span/answer/total=0.7729/0.1773/0.9502, LR=0.00001868, grad_norm=21.5925. Time cost=90.67, Throughput=52.94 samples/s ETA=1.70h
2020-07-11 00:38:47,375 - root - INFO - Epoch: 1, Batch: 4200/8164, Loss span/answer/total=0.7711/0.1762/0.9473, LR=0.00001841, grad_norm=20.0461. Time cost=88.91, Throughput=53.99 samples/s ETA=1.67h
2020-07-11 00:40:14,275 - root - INFO - Epoch: 1, Batch: 4500/8164, Loss span/answer/total=0.7453/0.1691/0.9144, LR=0.00001814, grad_norm=30.3925. Time cost=86.90, Throughput=55.24 samples/s ETA=1.64h
2020-07-11 00:41:35,982 - root - INFO - Epoch: 1, Batch: 4800/8164, Loss span/answer/total=0.7216/0.1741/0.8957, LR=0.00001787, grad_norm=18.9427. Time cost=81.71, Throughput=58.75 samples/s ETA=1.61h
2020-07-11 00:42:55,136 - root - INFO - Epoch: 1, Batch: 5100/8164, Loss span/answer/total=0.7020/0.1604/0.8624, LR=0.00001759, grad_norm=23.6912. Time cost=79.15, Throughput=60.64 samples/s ETA=1.58h
2020-07-11 00:44:15,843 - root - INFO - Epoch: 1, Batch: 5400/8164, Loss span/answer/total=0.7096/0.1699/0.8795, LR=0.00001732, grad_norm=21.2201. Time cost=80.71, Throughput=59.47 samples/s ETA=1.55h
2020-07-11 00:45:40,243 - root - INFO - Epoch: 1, Batch: 5700/8164, Loss span/answer/total=0.6996/0.1570/0.8566, LR=0.00001705, grad_norm=26.3110. Time cost=84.40, Throughput=56.87 samples/s ETA=1.52h
2020-07-11 00:47:10,151 - root - INFO - Epoch: 1, Batch: 6000/8164, Loss span/answer/total=0.7188/0.1689/0.8877, LR=0.00001678, grad_norm=22.5355. Time cost=89.91, Throughput=53.39 samples/s ETA=1.50h
2020-07-11 00:48:41,080 - root - INFO - Epoch: 1, Batch: 6300/8164, Loss span/answer/total=0.6882/0.1557/0.8439, LR=0.00001651, grad_norm=29.5829. Time cost=90.93, Throughput=52.79 samples/s ETA=1.47h
2020-07-11 00:50:09,563 - root - INFO - Epoch: 1, Batch: 6600/8164, Loss span/answer/total=0.6520/0.1574/0.8094, LR=0.00001623, grad_norm=23.4113. Time cost=88.48, Throughput=54.25 samples/s ETA=1.45h
2020-07-11 00:51:37,894 - root - INFO - Epoch: 1, Batch: 6900/8164, Loss span/answer/total=0.6585/0.1519/0.8104, LR=0.00001596, grad_norm=26.2093. Time cost=88.33, Throughput=54.34 samples/s ETA=1.43h
2020-07-11 00:53:08,867 - root - INFO - Epoch: 1, Batch: 7200/8164, Loss span/answer/total=0.6442/0.1497/0.7940, LR=0.00001569, grad_norm=20.8646. Time cost=90.97, Throughput=52.76 samples/s ETA=1.41h
2020-07-11 00:54:36,239 - root - INFO - Epoch: 1, Batch: 7500/8164, Loss span/answer/total=0.6603/0.1520/0.8124, LR=0.00001542, grad_norm=19.0104. Time cost=87.37, Throughput=54.94 samples/s ETA=1.38h
2020-07-11 00:56:03,071 - root - INFO - Epoch: 1, Batch: 7800/8164, Loss span/answer/total=0.6424/0.1477/0.7901, LR=0.00001514, grad_norm=17.0360. Time cost=86.83, Throughput=55.28 samples/s ETA=1.36h
2020-07-11 00:57:33,741 - root - INFO - Epoch: 1, Batch: 8100/8164, Loss span/answer/total=0.6654/0.1567/0.8222, LR=0.00001487, grad_norm=23.4842. Time cost=90.67, Throughput=52.94 samples/s ETA=1.33h
2020-07-11 00:57:52,195 - root - INFO - Params saved in: fintune_google_albert_base_v2_squad_2.0/google_albert_base_v2_squad2.0_2721.params
2020-07-11 00:57:52,471 - root - INFO - Epoch: 1, #Samples: 130614, Throughput=54.65 samples/s
2020-07-11 00:59:02,331 - root - INFO - Epoch: 2, Batch: 234/8164, Loss span/answer/total=0.5789/0.1344/0.7133, LR=0.00001460, grad_norm=16.2924. Time cost=69.86, Throughput=68.11 samples/s ETA=1.31h
2020-07-11 01:00:32,047 - root - INFO - Epoch: 2, Batch: 534/8164, Loss span/answer/total=0.5617/0.1154/0.6771, LR=0.00001433, grad_norm=20.1867. Time cost=89.72, Throughput=53.50 samples/s ETA=1.29h
2020-07-11 01:02:02,831 - root - INFO - Epoch: 2, Batch: 834/8164, Loss span/answer/total=0.5653/0.1218/0.6871, LR=0.00001406, grad_norm=22.1036. Time cost=90.78, Throughput=52.87 samples/s ETA=1.26h
2020-07-11 01:03:30,693 - root - INFO - Epoch: 2, Batch: 1134/8164, Loss span/answer/total=0.5546/0.1236/0.6782, LR=0.00001378, grad_norm=30.1931. Time cost=87.86, Throughput=54.63 samples/s ETA=1.24h
2020-07-11 01:05:00,728 - root - INFO - Epoch: 2, Batch: 1434/8164, Loss span/answer/total=0.5737/0.1272/0.7009, LR=0.00001351, grad_norm=22.7749. Time cost=90.03, Throughput=53.31 samples/s ETA=1.21h
2020-07-11 01:06:30,931 - root - INFO - Epoch: 2, Batch: 1734/8164, Loss span/answer/total=0.5298/0.1088/0.6386, LR=0.00001324, grad_norm=24.9865. Time cost=90.20, Throughput=53.21 samples/s ETA=1.19h
2020-07-11 01:07:59,987 - root - INFO - Epoch: 2, Batch: 2034/8164, Loss span/answer/total=0.5410/0.1196/0.6605, LR=0.00001297, grad_norm=15.9100. Time cost=89.06, Throughput=53.90 samples/s ETA=1.17h
2020-07-11 01:09:27,384 - root - INFO - Epoch: 2, Batch: 2334/8164, Loss span/answer/total=0.5435/0.1265/0.6700, LR=0.00001269, grad_norm=27.5773. Time cost=87.40, Throughput=54.92 samples/s ETA=1.14h
2020-07-11 01:10:56,891 - root - INFO - Epoch: 2, Batch: 2634/8164, Loss span/answer/total=0.5214/0.1204/0.6418, LR=0.00001242, grad_norm=20.9572. Time cost=89.51, Throughput=53.63 samples/s ETA=1.12h
2020-07-11 01:12:26,424 - root - INFO - Epoch: 2, Batch: 2934/8164, Loss span/answer/total=0.5504/0.1133/0.6637, LR=0.00001215, grad_norm=20.2055. Time cost=89.53, Throughput=53.61 samples/s ETA=1.09h
2020-07-11 01:13:55,887 - root - INFO - Epoch: 2, Batch: 3234/8164, Loss span/answer/total=0.5361/0.1150/0.6511, LR=0.00001188, grad_norm=17.7540. Time cost=89.46, Throughput=53.65 samples/s ETA=1.07h
2020-07-11 01:15:25,570 - root - INFO - Epoch: 2, Batch: 3534/8164, Loss span/answer/total=0.5298/0.1160/0.6458, LR=0.00001161, grad_norm=25.7555. Time cost=89.68, Throughput=53.52 samples/s ETA=1.05h
2020-07-11 01:16:55,303 - root - INFO - Epoch: 2, Batch: 3834/8164, Loss span/answer/total=0.5634/0.1215/0.6848, LR=0.00001133, grad_norm=16.5189. Time cost=89.73, Throughput=53.49 samples/s ETA=1.02h
2020-07-11 01:18:24,537 - root - INFO - Epoch: 2, Batch: 4134/8164, Loss span/answer/total=0.5251/0.1104/0.6355, LR=0.00001106, grad_norm=33.4706. Time cost=89.23, Throughput=53.79 samples/s ETA=1.00h
2020-07-11 01:19:57,105 - root - INFO - Epoch: 2, Batch: 4434/8164, Loss span/answer/total=0.5365/0.1209/0.6574, LR=0.00001079, grad_norm=19.0388. Time cost=92.57, Throughput=51.85 samples/s ETA=0.97h
2020-07-11 01:21:26,840 - root - INFO - Epoch: 2, Batch: 4734/8164, Loss span/answer/total=0.5208/0.1207/0.6414, LR=0.00001052, grad_norm=19.5382. Time cost=89.73, Throughput=53.49 samples/s ETA=0.95h
2020-07-11 01:22:57,169 - root - INFO - Epoch: 2, Batch: 5034/8164, Loss span/answer/total=0.5307/0.1079/0.6386, LR=0.00001024, grad_norm=27.5455. Time cost=90.33, Throughput=53.14 samples/s ETA=0.93h
2020-07-11 01:24:23,953 - root - INFO - Epoch: 2, Batch: 5334/8164, Loss span/answer/total=0.5372/0.1120/0.6493, LR=0.00000997, grad_norm=26.2957. Time cost=86.78, Throughput=55.31 samples/s ETA=0.90h
2020-07-11 01:25:52,240 - root - INFO - Epoch: 2, Batch: 5634/8164, Loss span/answer/total=0.5274/0.1071/0.6344, LR=0.00000970, grad_norm=24.8889. Time cost=88.29, Throughput=54.37 samples/s ETA=0.88h
2020-07-11 01:27:21,081 - root - INFO - Epoch: 2, Batch: 5934/8164, Loss span/answer/total=0.5284/0.1129/0.6413, LR=0.00000943, grad_norm=18.0035. Time cost=88.84, Throughput=54.03 samples/s ETA=0.85h
2020-07-11 01:28:49,901 - root - INFO - Epoch: 2, Batch: 6234/8164, Loss span/answer/total=0.5144/0.1081/0.6225, LR=0.00000916, grad_norm=21.3835. Time cost=88.82, Throughput=54.04 samples/s ETA=0.83h
2020-07-11 01:30:20,586 - root - INFO - Epoch: 2, Batch: 6534/8164, Loss span/answer/total=0.5259/0.1100/0.6360, LR=0.00000888, grad_norm=18.5121. Time cost=90.68, Throughput=52.93 samples/s ETA=0.80h
2020-07-11 01:31:49,923 - root - INFO - Epoch: 2, Batch: 6834/8164, Loss span/answer/total=0.5204/0.1085/0.6289, LR=0.00000861, grad_norm=21.6928. Time cost=89.34, Throughput=53.73 samples/s ETA=0.78h
2020-07-11 01:33:19,798 - root - INFO - Epoch: 2, Batch: 7134/8164, Loss span/answer/total=0.4833/0.1047/0.5880, LR=0.00000834, grad_norm=24.0331. Time cost=89.87, Throughput=53.41 samples/s ETA=0.75h
2020-07-11 01:34:49,695 - root - INFO - Epoch: 2, Batch: 7434/8164, Loss span/answer/total=0.5246/0.1093/0.6339, LR=0.00000807, grad_norm=24.4552. Time cost=89.90, Throughput=53.39 samples/s ETA=0.73h
2020-07-11 01:36:22,013 - root - INFO - Epoch: 2, Batch: 7734/8164, Loss span/answer/total=0.5304/0.1147/0.6451, LR=0.00000780, grad_norm=19.8521. Time cost=92.32, Throughput=51.99 samples/s ETA=0.71h
2020-07-11 01:37:51,216 - root - INFO - Epoch: 2, Batch: 8034/8164, Loss span/answer/total=0.5066/0.1048/0.6114, LR=0.00000752, grad_norm=15.9122. Time cost=89.20, Throughput=53.81 samples/s ETA=0.68h
2020-07-11 01:38:29,329 - root - INFO - Params saved in: fintune_google_albert_base_v2_squad_2.0/google_albert_base_v2_squad2.0_5442.params
2020-07-11 01:38:30,315 - root - INFO - Epoch: 2, #Samples: 130614, Throughput=53.58 samples/s
2020-07-11 01:39:20,575 - root - INFO - Epoch: 3, Batch: 168/8164, Loss span/answer/total=0.4661/0.0828/0.5489, LR=0.00000725, grad_norm=24.4564. Time cost=50.26, Throughput=94.67 samples/s ETA=0.66h
2020-07-11 01:40:49,763 - root - INFO - Epoch: 3, Batch: 468/8164, Loss span/answer/total=0.4246/0.0746/0.4992, LR=0.00000698, grad_norm=22.9640. Time cost=89.19, Throughput=53.82 samples/s ETA=0.63h
2020-07-11 01:42:18,911 - root - INFO - Epoch: 3, Batch: 768/8164, Loss span/answer/total=0.4387/0.0672/0.5058, LR=0.00000671, grad_norm=23.1538. Time cost=89.15, Throughput=53.84 samples/s ETA=0.61h
2020-07-11 01:43:48,681 - root - INFO - Epoch: 3, Batch: 1068/8164, Loss span/answer/total=0.4437/0.0735/0.5172, LR=0.00000643, grad_norm=28.1258. Time cost=89.77, Throughput=53.47 samples/s ETA=0.58h
2020-07-11 01:45:18,576 - root - INFO - Epoch: 3, Batch: 1368/8164, Loss span/answer/total=0.4201/0.0618/0.4819, LR=0.00000616, grad_norm=16.5424. Time cost=89.89, Throughput=53.40 samples/s ETA=0.56h
2020-07-11 01:46:48,423 - root - INFO - Epoch: 3, Batch: 1668/8164, Loss span/answer/total=0.4129/0.0766/0.4895, LR=0.00000589, grad_norm=32.6500. Time cost=89.85, Throughput=53.42 samples/s ETA=0.53h
2020-07-11 01:48:16,620 - root - INFO - Epoch: 3, Batch: 1968/8164, Loss span/answer/total=0.4147/0.0735/0.4882, LR=0.00000562, grad_norm=15.0941. Time cost=88.20, Throughput=54.42 samples/s ETA=0.51h
2020-07-11 01:49:45,692 - root - INFO - Epoch: 3, Batch: 2268/8164, Loss span/answer/total=0.4097/0.0627/0.4724, LR=0.00000535, grad_norm=23.8714. Time cost=89.07, Throughput=53.89 samples/s ETA=0.48h
2020-07-11 01:51:12,197 - root - INFO - Epoch: 3, Batch: 2568/8164, Loss span/answer/total=0.4157/0.0662/0.4819, LR=0.00000507, grad_norm=39.7953. Time cost=86.50, Throughput=55.49 samples/s ETA=0.46h
2020-07-11 01:52:42,568 - root - INFO - Epoch: 3, Batch: 2868/8164, Loss span/answer/total=0.4254/0.0686/0.4940, LR=0.00000480, grad_norm=26.7246. Time cost=90.37, Throughput=53.11 samples/s ETA=0.43h
2020-07-11 01:54:10,287 - root - INFO - Epoch: 3, Batch: 3168/8164, Loss span/answer/total=0.4014/0.0623/0.4637, LR=0.00000453, grad_norm=27.3254. Time cost=87.72, Throughput=54.72 samples/s ETA=0.41h
2020-07-11 01:55:41,791 - root - INFO - Epoch: 3, Batch: 3468/8164, Loss span/answer/total=0.4173/0.0689/0.4862, LR=0.00000426, grad_norm=22.7920. Time cost=91.50, Throughput=52.46 samples/s ETA=0.39h
2020-07-11 01:57:12,912 - root - INFO - Epoch: 3, Batch: 3768/8164, Loss span/answer/total=0.4064/0.0651/0.4715, LR=0.00000398, grad_norm=15.8398. Time cost=91.12, Throughput=52.68 samples/s ETA=0.36h
2020-07-11 01:58:41,666 - root - INFO - Epoch: 3, Batch: 4068/8164, Loss span/answer/total=0.4026/0.0652/0.4678, LR=0.00000371, grad_norm=26.2612. Time cost=88.75, Throughput=54.08 samples/s ETA=0.34h
2020-07-11 02:00:10,853 - root - INFO - Epoch: 3, Batch: 4368/8164, Loss span/answer/total=0.4241/0.0716/0.4958, LR=0.00000344, grad_norm=21.3658. Time cost=89.19, Throughput=53.82 samples/s ETA=0.31h
2020-07-11 02:01:41,080 - root - INFO - Epoch: 3, Batch: 4668/8164, Loss span/answer/total=0.4013/0.0650/0.4663, LR=0.00000317, grad_norm=28.3836. Time cost=90.23, Throughput=53.20 samples/s ETA=0.29h
2020-07-11 02:03:10,414 - root - INFO - Epoch: 3, Batch: 4968/8164, Loss span/answer/total=0.4187/0.0633/0.4820, LR=0.00000290, grad_norm=18.7006. Time cost=89.33, Throughput=53.73 samples/s ETA=0.26h
2020-07-11 02:04:39,266 - root - INFO - Epoch: 3, Batch: 5268/8164, Loss span/answer/total=0.4143/0.0760/0.4903, LR=0.00000262, grad_norm=26.4106. Time cost=88.85, Throughput=54.02 samples/s ETA=0.24h
2020-07-11 02:06:10,332 - root - INFO - Epoch: 3, Batch: 5568/8164, Loss span/answer/total=0.4046/0.0643/0.4688, LR=0.00000235, grad_norm=19.0901. Time cost=91.07, Throughput=52.71 samples/s ETA=0.21h
2020-07-11 02:07:38,445 - root - INFO - Epoch: 3, Batch: 5868/8164, Loss span/answer/total=0.3997/0.0698/0.4695, LR=0.00000208, grad_norm=23.9702. Time cost=88.11, Throughput=54.48 samples/s ETA=0.19h
2020-07-11 02:09:06,757 - root - INFO - Epoch: 3, Batch: 6168/8164, Loss span/answer/total=0.4099/0.0662/0.4761, LR=0.00000181, grad_norm=10.8356. Time cost=88.31, Throughput=54.35 samples/s ETA=0.16h
2020-07-11 02:10:36,438 - root - INFO - Epoch: 3, Batch: 6468/8164, Loss span/answer/total=0.4174/0.0617/0.4791, LR=0.00000154, grad_norm=24.9383. Time cost=89.68, Throughput=53.52 samples/s ETA=0.14h
2020-07-11 02:12:06,180 - root - INFO - Epoch: 3, Batch: 6768/8164, Loss span/answer/total=0.3898/0.0629/0.4527, LR=0.00000126, grad_norm=20.0646. Time cost=89.74, Throughput=53.49 samples/s ETA=0.11h
2020-07-11 02:13:35,502 - root - INFO - Epoch: 3, Batch: 7068/8164, Loss span/answer/total=0.3961/0.0620/0.4581, LR=0.00000099, grad_norm=17.5214. Time cost=89.32, Throughput=53.74 samples/s ETA=0.09h
2020-07-11 02:15:06,476 - root - INFO - Epoch: 3, Batch: 7368/8164, Loss span/answer/total=0.4007/0.0623/0.4631, LR=0.00000072, grad_norm=23.8710. Time cost=90.97, Throughput=52.76 samples/s ETA=0.07h
2020-07-11 02:16:37,243 - root - INFO - Epoch: 3, Batch: 7668/8164, Loss span/answer/total=0.3994/0.0652/0.4646, LR=0.00000045, grad_norm=23.1507. Time cost=90.77, Throughput=52.88 samples/s ETA=0.04h
2020-07-11 02:18:04,668 - root - INFO - Epoch: 3, Batch: 7968/8164, Loss span/answer/total=0.4066/0.0667/0.4733, LR=0.00000017, grad_norm=26.5468. Time cost=87.42, Throughput=54.90 samples/s ETA=0.02h
2020-07-11 02:18:59,955 - root - INFO - Params saved in: fintune_google_albert_base_v2_squad_2.0/google_albert_base_v2_squad2.0_8163.params
2020-07-11 02:19:01,060 - root - INFO - Params saved in: fintune_google_albert_base_v2_squad_2.0/google_albert_base_v2_squad2.0_8164.params
2020-07-11 02:19:01,060 - root - INFO - Finish training step: 8164
2020-07-11 02:19:01,061 - root - INFO - Epoch: 3, #Samples: 130560, Throughput=53.71 samples/s
2020-07-11 02:19:02,585 - root - INFO - Loading Backbone Model from /home/ubuntu/.mxnet/models/nlp/google_albert_base_v2/model-125be477.params, with total/fixd parameters=11092992/0
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'weight' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'bias' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'gamma' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
/home/ubuntu/src/mxnet-master/python/mxnet/gluon/block.py:568: UserWarning: Parameter 'beta' is already initialized, ignoring. Set force_reinit=True to re-initialize.
  v.initialize(None, ctx, init, force_reinit=force_reinit)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:00<00:00, 1828.18it/s]
2020-07-11 02:19:02,666 - root - INFO - Tokenize Dev Data:
2020-07-11 02:19:05,886 - root - INFO - Done! Time spent:3.22 seconds
2020-07-11 02:19:07,676 - root - INFO - Starting evaluate the checkpoint google_albert_base_v2_squad2.0_8164.params
2020-07-11 02:21:22,663 - root - INFO - [batch 100], Time cost=134.79, Throughput=47.48 samples/s, ETA=0.03h
2020-07-11 02:23:13,264 - root - INFO - Time cost=245.395731 s, Thoughput=49.02 samples/s
2020-07-11 02:23:19,879 - root - INFO - The evaluated results are {"exact": 41.767034447907015, "f1": 45.275399862631694, "total": 11873, "HasAns_exact": 83.65384615384616, "HasAns_f1": 90.68063808519335, "HasAns_total": 5928, "NoAns_exact": 0.0, "NoAns_f1": 0.0, "NoAns_total": 5945, "best_exact": 79.27229849237766, "best_exact_thresh": -2.0645291805267334, "best_f1": 82.0919530894437, "best_f1_thresh": -1.8526396751403809}
2020-07-11 02:23:19,879 - root - INFO - The evaluated files are saved in fintune_google_albert_base_v2_squad_2.0
2020-07-11 02:23:20,927 - root - INFO - The best evaluated results are {"exact": 41.767034447907015, "f1": 45.275399862631694, "total": 11873, "HasAns_exact": 83.65384615384616, "HasAns_f1": 90.68063808519335, "HasAns_total": 5928, "NoAns_exact": 0.0, "NoAns_f1": 0.0, "NoAns_total": 5945, "best_exact": 79.27229849237766, "best_exact_thresh": -2.0645291805267334, "best_f1": 82.0919530894437, "best_f1_thresh": -1.8526396751403809, "best_ckpt": "google_albert_base_v2_squad2.0_8164.params"}

self.mlm_decoder[-1].share_parameters(word_embed_params)
self.backbone_model.token_type_embed.share_parameters(token_type_embed_params)
self.backbone_model.token_pos_embed.share_parameters(token_pos_embed_params)
self.backbone_model.embed_layer_norm.share_parameters(embed_layer_norm_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the above code, we can share weights in two ways: 1. weight = weight 2.share_parameters, is that correct? Do we need consistency on this part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, both is supported. The 1. weight = weight way works well if you only need to replace a single parameter and you know it's location. The second way is more general and doesn't require users to specify the position of the weight (but just pass a dictionary containing the weights with proper name).

Copy link
Member

@zheyuye zheyuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to @leezu for revising, this is generally good but may require some extra efforts on the conversion toolkits which are highly dependent on the prefix as https://github.com/leezu/gluon-nlp/blob/a79101da3a40d5212e419fa1f46a40e9ad3e7eb3/scripts/conversion_toolkits/convert_tf_hub_model.py#L134-L166

params=embed_layer.collect_params(
'(.*_embed|.*_inter_proj)'))
div_val=div_val)
layer_with_shared_proj_embed.share_parameters(embed_layer.collect_params('(.*_embed|.*_inter_proj)'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we cant locate to these parameters since prefix are removed
https://github.com/leezu/gluon-nlp/blob/a79101da3a40d5212e419fa1f46a40e9ad3e7eb3/src/gluonnlp/layers.py#L916-L917

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the regex needs to be updated. It contains _ character which is no longer used and should be replaced by \..

@szha szha added the release focus Progress focus for release label Jul 12, 2020
@szha
Copy link
Member

szha commented Jul 12, 2020

@dmlc/gluon-nlp-committers let's halt other merges in the numpy branch to yield for this change.

@szha szha added this to In Progress in Numpy Refactor via automation Jul 12, 2020
net.hybridize()
num_params, num_fixed_params = count_parameters(net.collect_params())
assert num_params > 0
@pytest.mark.parametrize('name', list_backbone_names())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved to use sequential test intentionally because running multiple tests in parallel may cause some memory issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case we may mark it as serial. Having a single large tests makes it very hard to reproduce failures of specific models, because the test will always test all models. It's not a good development experience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we should remove the forward test of xlmr, which is too large.

@@ -120,6 +120,7 @@ def test_adaptive_embedding(vocab_size, cutoffs, embed_size, units, div_val):
[1000, None, 1.0]])
@pytest.mark.parametrize('embed_size', [128])
@pytest.mark.parametrize('in_units', [16])
# TODO This test even passes without sharing the parameters. It needs to be improved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, we should revise the test. (May be in a later PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should revise the test in a later PR. I just noticed that the test passed prior even when I disabled the parameter sharing or even when the parameter sharing is handled wrongly (such as using invalid regex in collect_params)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to investigate the test after this gets merged. I'm also having issues about parameter sharing in the layout PR so need to wait for this one.

@szha
Copy link
Member

szha commented Jul 13, 2020

2020-07-11 02:23:20,927 - root - INFO - The best evaluated results are {"exact": 41.767034447907015, "f1": 45.275399862631694, "total": 11873, "HasAns_exact": 83.65384615384616, "HasAns_f1": 90.68063808519335, "HasAns_total": 5928, "NoAns_exact": 0.0, "NoAns_f1": 0.0, "NoAns_total": 5945, "best_exact": 79.27229849237766, "best_exact_thresh": -2.0645291805267334, "best_f1": 82.0919530894437, "best_f1_thresh": -1.8526396751403809, "best_ckpt": "google_albert_base_v2_squad2.0_8164.params"}

@sxjscience is this performance as expected?

@sxjscience
Copy link
Member

2020-07-11 02:23:20,927 - root - INFO - The best evaluated results are {"exact": 41.767034447907015, "f1": 45.275399862631694, "total": 11873, "HasAns_exact": 83.65384615384616, "HasAns_f1": 90.68063808519335, "HasAns_total": 5928, "NoAns_exact": 0.0, "NoAns_f1": 0.0, "NoAns_total": 5945, "best_exact": 79.27229849237766, "best_exact_thresh": -2.0645291805267334, "best_f1": 82.0919530894437, "best_f1_thresh": -1.8526396751403809, "best_ckpt": "google_albert_base_v2_squad2.0_8164.params"}

@sxjscience is this performance as expected?

Yes, we need to check "best_exact" and "best_f1".

@zheyuye
Copy link
Member

zheyuye commented Jul 13, 2020

@szha @sxjscience Yes, the best_f1 and best_exact are resonable compared to previous results https://github.com/dmlc/gluon-nlp/tree/numpy/scripts/question_answering#results

@leezu
Copy link
Contributor Author

leezu commented Jul 13, 2020

Thanks to @leezu for revising, this is generally good but may require some extra efforts on the conversion toolkits which are highly dependent on the prefix as https://github.com/leezu/gluon-nlp/blob/a79101da3a40d5212e419fa1f46a40e9ad3e7eb3/scripts/conversion_toolkits/convert_tf_hub_model.py#L134-L166

@zheyuye generally the scripts can be updated by replacing the prefix with the respective attribute name in the Python block. In this codebase, they are mostly the same or very similar, such as _rel_pos_embed vs rel_pos_embed

self._rel_pos_embed = BucketPositionalEmbedding(
                    units=num_heads,
                    num_buckets=self._num_buckets,
                    max_distance=self._max_distance,
                    bidirectional=self._bidirectional,
                    prefix='rel_pos_embed_',

It' can be done in a separate PR if we decide to keep the conversion scripts in the release (which may require adding tests).

@zheyuye
Copy link
Member

zheyuye commented Jul 13, 2020

@leezu This sounds great. Let's leave conversion scripts alone for now and re-revise these with some useful test cases once this PR merged.

@leezu
Copy link
Contributor Author

leezu commented Jul 16, 2020

2020-07-16 18:12:12,077 - root - INFO - Time Spent: 212.10484337806702, #Sent=2737, SacreBlEU=26.621931302568633 Avg NLL=1.37640975370957, Perplexity=3.9606563390205163

train_transformer.log

@sxjscience
Copy link
Member

@leezu It's expected. I propose to merge this in.

@codecov
Copy link

codecov bot commented Jul 16, 2020

Codecov Report

Merging #1261 into numpy will increase coverage by 0.01%.
The diff coverage is 86.55%.

Impacted file tree graph

@@            Coverage Diff             @@
##            numpy    #1261      +/-   ##
==========================================
+ Coverage   82.52%   82.53%   +0.01%     
==========================================
  Files          38       38              
  Lines        5500     5446      -54     
==========================================
- Hits         4539     4495      -44     
+ Misses        961      951      -10     
Impacted Files Coverage Δ
src/gluonnlp/lr_scheduler.py 45.45% <0.00%> (ø)
src/gluonnlp/models/bert.py 84.42% <66.66%> (+0.10%) ⬆️
src/gluonnlp/models/mobilebert.py 81.35% <76.81%> (-0.10%) ⬇️
src/gluonnlp/attention_cell.py 79.91% <80.95%> (-0.09%) ⬇️
src/gluonnlp/models/transformer_xl.py 82.71% <81.25%> (-0.22%) ⬇️
src/gluonnlp/models/electra.py 78.86% <88.23%> (+1.61%) ⬆️
src/gluonnlp/layers.py 86.78% <92.98%> (-0.45%) ⬇️
src/gluonnlp/models/transformer.py 95.95% <94.91%> (-0.04%) ⬇️
src/gluonnlp/models/roberta.py 93.64% <95.65%> (+0.38%) ⬆️
src/gluonnlp/data/loading.py 83.39% <100.00%> (ø)
... and 6 more

@leezu leezu merged commit 70a1887 into dmlc:numpy Jul 17, 2020
Numpy Refactor automation moved this from In Progress to Done Jul 17, 2020
@leezu leezu deleted the numpygluonblock branch July 17, 2020 00:07
@leezu leezu mentioned this pull request Jul 17, 2020
zheyuye added a commit to zheyuye/gluon-nlp that referenced this pull request Jul 17, 2020
commit 35a586676036f627bffd0d3c753c6cd0a70d63cf
Author: ZheyuYe <zheyu.ye1995@gmail.com>
Date:   Fri Jul 17 10:10:14 2020 +0800

    Squashed commit of the following:

    commit 673344d
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 15 22:43:07 2020 +0800

        CharTokenizer

    commit 8dabfd6
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 15 15:47:24 2020 +0800

        lowercase

    commit f5c94a6
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 14 17:45:28 2020 +0800

        test

    commit dc55fc9
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 14 05:45:01 2020 +0800

        tiny update on run_squad

    commit 4defc7a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 23:18:08 2020 +0800

        update testings

    commit 2719e81
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 23:08:32 2020 +0800

        re-upload xlmr

    commit cd0509d
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 22:30:47 2020 +0800

        fix get_pretrained

    commit 8ed8a72
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 22:28:13 2020 +0800

        re-upload roberta

    commit 5811d40
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 18:27:23 2020 +0800

        update

    commit 44a09a3
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sat Jul 11 15:06:33 2020 +0800

        fix

    commit 4074a26
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 10 16:08:49 2020 +0800

        inference without horovod

    commit 31cb953
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 18:41:55 2020 +0800

        update

    commit 838be2a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 15:14:39 2020 +0800

        horovod for squad

    commit 1d374a2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 12:09:19 2020 +0800

        fix

    commit e4fba39
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 10:35:08 2020 +0800

        remove multiply_grads

    commit 007f07e
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 7 11:26:38 2020 +0800

        multiply_grads

    commit b8c85bb
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 6 12:28:56 2020 +0800

        fix ModelForQABasic

    commit 0e13a58
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sat Jul 4 18:42:12 2020 +0800

        clip_grad_global_norm with zeros max_grad_norm

    commit bd270f2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 20:21:31 2020 +0800

        fix roberta

    commit 4fc564c
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 19:36:08 2020 +0800

        update hyper-parameters of adamw

    commit 59cffbf
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 16:25:46 2020 +0800

        try

    commit a84f782
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 20:39:03 2020 +0800

        fix mobilebert

    commit 4bc3a96
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 11:14:39 2020 +0800

        layer-wise decay

    commit 07186d5
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 02:14:43 2020 +0800

        revise

    commit a5a6475
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 19:50:20 2020 +0800

        topk

    commit 34ee884
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 19:25:09 2020 +0800

        index_update

    commit 74178e2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 00:48:32 2020 +0800

        rename

    commit fa011aa
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 23:40:28 2020 +0800

        update

    commit 402d625
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 21:40:30 2020 +0800

        multiprocessing for wiki

    commit ddbde75
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 20:41:35 2020 +0800

        fix bookcorpus

    commit 6cc5ccd
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 16:39:12 2020 +0800

        fix wiki

    commit 9773efd
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 15:52:13 2020 +0800

        fix openwebtext

    commit 1fb8eb8
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 19:51:25 2020 +0800

        upload gluon_electra_small_owt

    commit ca83fac
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 18:09:48 2020 +0800

        revise train_transformer

    commit 1450f5c
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 18:07:04 2020 +0800

        revise

    commit b460bbe
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 17:24:00 2020 +0800

        repeat for pretraining

    commit 8ee381b
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 17:06:43 2020 +0800

        repeat

    commit aea936f
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 16:39:22 2020 +0800

        fix mobilebert

    commit eead164
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 18:44:28 2020 +0800

        fix

    commit 8645115
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 17:27:43 2020 +0800

        update

    commit 2b7f7a3
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 17:18:00 2020 +0800

        fix roberta

    commit 86702fe
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 16:27:43 2020 +0800

        use_segmentation

    commit 6d03d7a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 15:52:40 2020 +0800

        fix

    commit 5c0ca43
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 15:49:48 2020 +0800

        fix token_ids

    commit ff7aae8
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 13:56:07 2020 +0800

        fix xlmr

    commit 2070b86
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 13:54:26 2020 +0800

        fix roberta

commit 70a1887
Author: Leonard Lausen <lausen@amazon.com>
Date:   Fri Jul 17 00:07:08 2020 +0000

    Update for Block API (dmlc#1261)

    - Remove params and prefix arguments for MXNet 2 and update
      parameter sharing implementation
    - Remove Block.name_scope() for MXNet 2
    - Remove self.params.get() and self.params.get_constant()

commit ea9152b
Author: Xingjian Shi <xshiab@connect.ust.hk>
Date:   Thu Jul 16 15:42:04 2020 -0700

    Fixes to make the CI more stable (dmlc#1265)

    * Some fixes to make the CI more stable

    * add retries

    * Update tokenizers.py

commit a646c34
Author: ht <wawawa@akane.waseda.jp>
Date:   Sun Jul 12 02:49:53 2020 +0800

    [FEATURE] update backtranslation and add multinomial sampler (dmlc#1259)

    * back translation bash

    * split "lang-pair" para in clean_tok_para_corpus

    * added clean_tok_mono_corpus

    * fix

    * add num_process para

    * fix

    * fix

    * add yml

    * rm yml

    * update cfg name

    * update evaluate

    * added max_update / save_interval_update params

    * fix

    * fix

    * multi gpu inference

    * fix

    * update

    * update multi gpu inference

    * fix

    * fix

    * split evaluate and parallel infer

    * fix

    * test

    * fix

    * update

    * add comments

    * fix

    * remove todo comment

    * revert remove todo comment

    * raw lines remove duplicated '\n'

    * update multinomaial sampler

    * fix

    * fix

    * fix

    * fix

    * sampling

    * update script

    * fix

    * add test_case with k > 1 in topk sampling

    * fix multinomial sampler

    * update docs

    * comments situation eos_id = None

    * fix

    Co-authored-by: Hu <huta@a483e74650ff.ant.amazon.com>

commit 83e1f13
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jul 9 20:57:55 2020 -0700

    Use Amazon S3 Transfer Acceleration (dmlc#1260)

commit cd48efd
Author: Leonard Lausen <lausen@amazon.com>
Date:   Tue Jul 7 17:39:42 2020 -0700

    Update codecov action to handle different OS and Python versions (dmlc#1254)

    codecov/codecov-action#80 (comment)

commit 689eba9
Author: Sheng Zha <szha@users.noreply.github.com>
Date:   Tue Jul 7 09:55:34 2020 -0700

    [CI] AWS batch job tool for GluonNLP (Part I) (dmlc#1251)

    * AWS batch job tool for GluonNLP

    * limit range

    Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk>

commit e06ff01
Author: Leonard Lausen <lausen@amazon.com>
Date:   Tue Jul 7 08:36:24 2020 -0700

    Pin mxnet version range on CI (dmlc#1257)
@zheyuye zheyuye mentioned this pull request Jul 22, 2020
3 tasks
sxjscience pushed a commit that referenced this pull request Aug 1, 2020
* fix roberta

* fix xlmr

* fix token_ids

* fix

* use_segmentation

* fix roberta

* update

* fix

* fix mobilebert

* repeat

* repeat for pretraining

* revise

* revise train_transformer

* upload gluon_electra_small_owt

* fix openwebtext

* fix wiki

* fix bookcorpus

* multiprocessing for wiki

* update

* rename

* index_update

* topk

* revise

* layer-wise decay

* fix mobilebert

* try

* update hyper-parameters of adamw

* fix roberta

* clip_grad_global_norm with zeros max_grad_norm

* fix ModelForQABasic

* multiply_grads

* remove multiply_grads

* fix

* horovod for squad

* update

* inference without horovod

* fix

* update

* re-upload roberta

* fix get_pretrained

* re-upload xlmr

* update testings

* tiny update on run_squad

* test

* lowercase

* CharTokenizer

* Squashed commit of the following:

commit 35a586676036f627bffd0d3c753c6cd0a70d63cf
Author: ZheyuYe <zheyu.ye1995@gmail.com>
Date:   Fri Jul 17 10:10:14 2020 +0800

    Squashed commit of the following:

    commit 673344d
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 15 22:43:07 2020 +0800

        CharTokenizer

    commit 8dabfd6
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 15 15:47:24 2020 +0800

        lowercase

    commit f5c94a6
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 14 17:45:28 2020 +0800

        test

    commit dc55fc9
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 14 05:45:01 2020 +0800

        tiny update on run_squad

    commit 4defc7a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 23:18:08 2020 +0800

        update testings

    commit 2719e81
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 23:08:32 2020 +0800

        re-upload xlmr

    commit cd0509d
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 22:30:47 2020 +0800

        fix get_pretrained

    commit 8ed8a72
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 22:28:13 2020 +0800

        re-upload roberta

    commit 5811d40
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 13 18:27:23 2020 +0800

        update

    commit 44a09a3
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sat Jul 11 15:06:33 2020 +0800

        fix

    commit 4074a26
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 10 16:08:49 2020 +0800

        inference without horovod

    commit 31cb953
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 18:41:55 2020 +0800

        update

    commit 838be2a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 15:14:39 2020 +0800

        horovod for squad

    commit 1d374a2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 12:09:19 2020 +0800

        fix

    commit e4fba39
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 9 10:35:08 2020 +0800

        remove multiply_grads

    commit 007f07e
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jul 7 11:26:38 2020 +0800

        multiply_grads

    commit b8c85bb
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jul 6 12:28:56 2020 +0800

        fix ModelForQABasic

    commit 0e13a58
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sat Jul 4 18:42:12 2020 +0800

        clip_grad_global_norm with zeros max_grad_norm

    commit bd270f2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 20:21:31 2020 +0800

        fix roberta

    commit 4fc564c
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 19:36:08 2020 +0800

        update hyper-parameters of adamw

    commit 59cffbf
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Fri Jul 3 16:25:46 2020 +0800

        try

    commit a84f782
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 20:39:03 2020 +0800

        fix mobilebert

    commit 4bc3a96
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 11:14:39 2020 +0800

        layer-wise decay

    commit 07186d5
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Thu Jul 2 02:14:43 2020 +0800

        revise

    commit a5a6475
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 19:50:20 2020 +0800

        topk

    commit 34ee884
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 19:25:09 2020 +0800

        index_update

    commit 74178e2
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Wed Jul 1 00:48:32 2020 +0800

        rename

    commit fa011aa
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 23:40:28 2020 +0800

        update

    commit 402d625
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 21:40:30 2020 +0800

        multiprocessing for wiki

    commit ddbde75
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 20:41:35 2020 +0800

        fix bookcorpus

    commit 6cc5ccd
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 16:39:12 2020 +0800

        fix wiki

    commit 9773efd
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Tue Jun 30 15:52:13 2020 +0800

        fix openwebtext

    commit 1fb8eb8
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 19:51:25 2020 +0800

        upload gluon_electra_small_owt

    commit ca83fac
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 18:09:48 2020 +0800

        revise train_transformer

    commit 1450f5c
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 18:07:04 2020 +0800

        revise

    commit b460bbe
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 17:24:00 2020 +0800

        repeat for pretraining

    commit 8ee381b
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 17:06:43 2020 +0800

        repeat

    commit aea936f
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Mon Jun 29 16:39:22 2020 +0800

        fix mobilebert

    commit eead164
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 18:44:28 2020 +0800

        fix

    commit 8645115
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 17:27:43 2020 +0800

        update

    commit 2b7f7a3
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 17:18:00 2020 +0800

        fix roberta

    commit 86702fe
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 16:27:43 2020 +0800

        use_segmentation

    commit 6d03d7a
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 15:52:40 2020 +0800

        fix

    commit 5c0ca43
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 15:49:48 2020 +0800

        fix token_ids

    commit ff7aae8
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 13:56:07 2020 +0800

        fix xlmr

    commit 2070b86
    Author: ZheyuYe <zheyu.ye1995@gmail.com>
    Date:   Sun Jun 28 13:54:26 2020 +0800

        fix roberta

commit 70a1887
Author: Leonard Lausen <lausen@amazon.com>
Date:   Fri Jul 17 00:07:08 2020 +0000

    Update for Block API (#1261)

    - Remove params and prefix arguments for MXNet 2 and update
      parameter sharing implementation
    - Remove Block.name_scope() for MXNet 2
    - Remove self.params.get() and self.params.get_constant()

commit ea9152b
Author: Xingjian Shi <xshiab@connect.ust.hk>
Date:   Thu Jul 16 15:42:04 2020 -0700

    Fixes to make the CI more stable (#1265)

    * Some fixes to make the CI more stable

    * add retries

    * Update tokenizers.py

commit a646c34
Author: ht <wawawa@akane.waseda.jp>
Date:   Sun Jul 12 02:49:53 2020 +0800

    [FEATURE] update backtranslation and add multinomial sampler (#1259)

    * back translation bash

    * split "lang-pair" para in clean_tok_para_corpus

    * added clean_tok_mono_corpus

    * fix

    * add num_process para

    * fix

    * fix

    * add yml

    * rm yml

    * update cfg name

    * update evaluate

    * added max_update / save_interval_update params

    * fix

    * fix

    * multi gpu inference

    * fix

    * update

    * update multi gpu inference

    * fix

    * fix

    * split evaluate and parallel infer

    * fix

    * test

    * fix

    * update

    * add comments

    * fix

    * remove todo comment

    * revert remove todo comment

    * raw lines remove duplicated '\n'

    * update multinomaial sampler

    * fix

    * fix

    * fix

    * fix

    * sampling

    * update script

    * fix

    * add test_case with k > 1 in topk sampling

    * fix multinomial sampler

    * update docs

    * comments situation eos_id = None

    * fix

    Co-authored-by: Hu <huta@a483e74650ff.ant.amazon.com>

commit 83e1f13
Author: Leonard Lausen <lausen@amazon.com>
Date:   Thu Jul 9 20:57:55 2020 -0700

    Use Amazon S3 Transfer Acceleration (#1260)

commit cd48efd
Author: Leonard Lausen <lausen@amazon.com>
Date:   Tue Jul 7 17:39:42 2020 -0700

    Update codecov action to handle different OS and Python versions (#1254)

    codecov/codecov-action#80 (comment)

commit 689eba9
Author: Sheng Zha <szha@users.noreply.github.com>
Date:   Tue Jul 7 09:55:34 2020 -0700

    [CI] AWS batch job tool for GluonNLP (Part I) (#1251)

    * AWS batch job tool for GluonNLP

    * limit range

    Co-authored-by: Xingjian Shi <xshiab@connect.ust.hk>

commit e06ff01
Author: Leonard Lausen <lausen@amazon.com>
Date:   Tue Jul 7 08:36:24 2020 -0700

    Pin mxnet version range on CI (#1257)

* frozen_params

* remove conversion to a sperate pr

* fix

* fix

* update

* test

* revise

* update performance numbers

* update apply_layerwisw_decay

* use shuffle

* fix mobilebert

* fix vocab_file
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
release focus Progress focus for release
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants