Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[CI] Update GPU Test Workflow + Update Some Tests and README #1316

Merged
merged 39 commits into from
Aug 28, 2020
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b45fda2
Merge pull request #1 from dmlc/master
barry-jin Aug 21, 2020
ad7c2ee
[CI] Add GPU pytest + Submit jobs to AWS Batch through GitHub Actions
barry-jin Aug 21, 2020
e9901c2
[CI] Update GPU tests and parameters use
barry-jin Aug 21, 2020
84fac91
[CI] Update CI pipeline
barry-jin Aug 21, 2020
c2f80d9
[CI] Add new line
barry-jin Aug 21, 2020
e5ab220
[CI] Update pytest command for cpu test
barry-jin Aug 21, 2020
0a6a1d3
[CI] Update use_gpu to ctx + add permissions to test.sh
barry-jin Aug 21, 2020
92b9e85
[CI] Update submitted command
barry-jin Aug 21, 2020
749acec
[CI] De-stringify input to mxnet attribute
barry-jin Aug 22, 2020
44d0c5b
[CI] Change pull_request event to pull_request_target event
barry-jin Aug 23, 2020
3e02d5f
[CI] Add new workflow for GPU unit tests
barry-jin Aug 23, 2020
d174fcf
[CI] Update unittests-gpu.yml
barry-jin Aug 24, 2020
a73161a
[CI] Update unittests-gpu.yml
barry-jin Aug 24, 2020
2587f2b
[CI] Update unittests-gpu
barry-jin Aug 24, 2020
8908e71
Merge pull request #2 from dmlc/master
barry-jin Aug 24, 2020
994c2c1
[CI] Update path of test.sh
barry-jin Aug 24, 2020
39d2351
[CI] Update path of /test
barry-jin Aug 24, 2020
43bb922
[CI] Update remote to barry-jin/gluon-nlp
barry-jin Aug 24, 2020
0063052
[CI] Update remote to dmlc/gluon-nlp
barry-jin Aug 24, 2020
68f814f
[CI] Add gpu tests for attention cells, bert, electra + Update README
barry-jin Aug 25, 2020
76cf1c4
[CI] Change remote from dmlc to barry-jin
barry-jin Aug 25, 2020
43aadab
Merge pull request #3 from dmlc/master
barry-jin Aug 25, 2020
c0bfc6d
[CI] Bug Fix
barry-jin Aug 25, 2020
92950e0
[CI] Bug Fix
barry-jin Aug 25, 2020
b134ac1
[CI] Truncate logs + Add failure test
barry-jin Aug 25, 2020
91cd6f0
[CI] Duplicate script to submit test and get logs
barry-jin Aug 25, 2020
837903d
[CI] Update unittest-gpu
barry-jin Aug 25, 2020
074880c
[CI] Quiet the pip install + Redirect the logs to script.log
barry-jin Aug 26, 2020
061cdfb
[CI] Remove asserts
barry-jin Aug 26, 2020
f8b87f4
[CI] Simplify ctx statement
barry-jin Aug 26, 2020
86a4ff2
[CI] Simplify ctx statement
barry-jin Aug 26, 2020
b3c017a
[CI] test_multi_head_rel_attn_score failed for gpu test
barry-jin Aug 26, 2020
5c3a099
[CI] Finalize gpu test - change remote from barry-jin to dmlc
barry-jin Aug 26, 2020
c866de3
Delete submit-test.py
barry-jin Aug 26, 2020
67f2e38
[CI] Update test working directory
barry-jin Aug 26, 2020
2a9b3ba
Merge pull request #4 from dmlc/master
barry-jin Aug 26, 2020
9cbaaf5
Merge branch 'master' of https://github.com/barry-jin/gluon-nlp
barry-jin Aug 26, 2020
0792dea
[CI] Update AWS Batch job type
barry-jin Aug 27, 2020
9d4c459
[CI] Allow test logs downloading
barry-jin Aug 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 4 additions & 7 deletions .github/workflows/unittests-gpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ jobs:

- name: Install Other Dependencies
run: |
python -m pip install --user --upgrade pip
python -m pip install --user -e .[extras]
python -m pip install --user --quiet --upgrade pip
python -m pip install --user --quiet -e .[extras]

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
Expand All @@ -38,9 +38,6 @@ jobs:

- name: Test project on AWS Batch
run: |
python ./tools/batch/submit-job.py --region us-east-1 --job-type p3.2x --source-ref ${{ github.ref }} --work-dir tools/batch --remote https://github.com/dmlc/gluon-nlp --command "./test.sh" --wait
python ./tools/batch/submit-job.py --region us-east-1 --job-type p3.2x --source-ref ${{ github.ref }} --work-dir tools/batch --remote https://github.com/dmlc/gluon-nlp --command "../../test.sh" --wait | tee > script.log

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to manually upload the coverage in AWS Batch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use g4.2x


- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1.0.10
with:
env_vars: OS,PYTHON
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,4 +212,4 @@ def pytest_addoption(parser):

def pytest_generate_tests(metafunc):
if 'ctx' in metafunc.fixturenames:
metafunc.parametrize("ctx", metafunc.config.option.device)
metafunc.parametrize("ctx", [getattr(mx, device)() for device in metafunc.config.option.device])
14 changes: 8 additions & 6 deletions test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@

# alias python3='/usr/bin/python3'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment?


echo $PWD

sudo apt-get install libopenblas-dev
python3 -m pip install --user -upgrade pip
python3 -m pip install --user setuptools pytest pytest-cov contextvars
python3 -m pip install --upgrade cython
python3 -m pip install --pre --user "mxnet-cu102>=2.0.0b20200802" -f https://dist.mxnet.io/python
python3 -m pip install --user -e .[extras]
python3 -m pytest --cov=./ --cov-report=xml --durations=50 --device="gpu" tests/
python3 -m pip install --user --quiet -upgrade pip
python3 -m pip install --user --quiet setuptools pytest pytest-cov contextvars
python3 -m pip install --upgrade --quiet cython
python3 -m pip install --pre --user --quiet "mxnet-cu102>=2.0.0b20200802" -f https://dist.mxnet.io/python
python3 -m pip install --user --quiet -e .[extras]
python3 -m pytest --cov=./ --cov-report=xml --durations=50 --device="gpu" ../../tests/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this script is already at the root level and looking in paths further up would be confusing.

16 changes: 14 additions & 2 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,25 @@
To run the unittests, use the following command

```bash
python3 -m pytest .
python3 -m pytest --device="cpu" .
```

To test for certain file, e.g., the `test_models_transformer.py`, use the following command

```bash
python3 -m pytest test_models_transformer
python3 -m pytest --device="cpu" test_models_transformer.py
```

To test only for gpu device, use the following command

```bash
python3 -m pytest --device="gpu" test_models_transformer.py
```

To test both for cpu and gpu device, use the following command

```bash
python3 -m pytest --device="cpu" --device="gpu" test_models_transformer.py
```

Refer to the [official guide of pytest](https://docs.pytest.org/en/latest/) for more details.
Expand Down
413 changes: 208 additions & 205 deletions tests/test_attention_cell.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tests/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ def test_list_backbone_names():

@pytest.mark.parametrize('name', list_backbone_names())
def test_get_backbone(name, ctx):
with tempfile.TemporaryDirectory() as root, getattr(mx, ctx)():
with tempfile.TemporaryDirectory() as root, ctx:
model_cls, cfg, tokenizer, local_params_path, _ = get_backbone(name, root=root)
net = model_cls.from_cfg(cfg)
net.load_parameters(local_params_path)
Expand Down
143 changes: 72 additions & 71 deletions tests/test_models_bert.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,87 +12,88 @@ def test_list_pretrained_bert():


@pytest.mark.parametrize('compute_layout', ['auto', 'NT', 'TN'])
def test_bert_small_cfg(compute_layout):
cfg = BertModel.get_cfg()
cfg.defrost()
cfg.MODEL.vocab_size = 100
cfg.MODEL.units = 12 * 4
cfg.MODEL.hidden_size = 64
cfg.MODEL.num_layers = 2
cfg.MODEL.num_heads = 2
cfg.MODEL.compute_layout = compute_layout
cfg.freeze()
def test_bert_small_cfg(compute_layout, ctx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We can use this like mx.context._current.set(mx.Context('gpu',0)) without with statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal API and may change in the future. It may be better to use the with statement inside the conftest.py

with ctx:
cfg = BertModel.get_cfg()
cfg.defrost()
cfg.MODEL.vocab_size = 100
cfg.MODEL.units = 12 * 4
cfg.MODEL.hidden_size = 64
cfg.MODEL.num_layers = 2
cfg.MODEL.num_heads = 2
cfg.MODEL.compute_layout = compute_layout
cfg.freeze()

# Generate TN layout
cfg_tn = cfg.clone()
cfg_tn.defrost()
cfg_tn.MODEL.layout = 'TN'
cfg_tn.freeze()
# Generate TN layout
cfg_tn = cfg.clone()
cfg_tn.defrost()
cfg_tn.MODEL.layout = 'TN'
cfg_tn.freeze()

# Sample data
batch_size = 4
sequence_length = 8
num_mask = 3
inputs = mx.np.random.randint(0, 10, (batch_size, sequence_length))
token_types = mx.np.random.randint(0, 2, (batch_size, sequence_length))
valid_length = mx.np.random.randint(3, sequence_length, (batch_size,))
masked_positions = mx.np.random.randint(0, 3, (batch_size, num_mask))
# Sample data
batch_size = 4
sequence_length = 8
num_mask = 3
inputs = mx.np.random.randint(0, 10, (batch_size, sequence_length))
token_types = mx.np.random.randint(0, 2, (batch_size, sequence_length))
valid_length = mx.np.random.randint(3, sequence_length, (batch_size,))
masked_positions = mx.np.random.randint(0, 3, (batch_size, num_mask))

# Test for BertModel
bert_model = BertModel.from_cfg(cfg)
bert_model.initialize()
bert_model.hybridize()
contextual_embedding, pooled_out = bert_model(inputs, token_types, valid_length)
bert_model_tn = BertModel.from_cfg(cfg_tn)
bert_model_tn.share_parameters(bert_model.collect_params())
bert_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn = bert_model_tn(inputs.T, token_types.T, valid_length)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)
# Test for BertModel
bert_model = BertModel.from_cfg(cfg)
bert_model.initialize()
bert_model.hybridize()
contextual_embedding, pooled_out = bert_model(inputs, token_types, valid_length)
bert_model_tn = BertModel.from_cfg(cfg_tn)
bert_model_tn.share_parameters(bert_model.collect_params())
bert_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn = bert_model_tn(inputs.T, token_types.T, valid_length)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)

# Test for BertForMLM
bert_mlm_model = BertForMLM(cfg)
bert_mlm_model.initialize()
bert_mlm_model.hybridize()
contextual_embedding, pooled_out, mlm_score = bert_mlm_model(inputs, token_types,
valid_length, masked_positions)
bert_mlm_model_tn = BertForMLM(cfg_tn)
bert_mlm_model_tn.share_parameters(bert_mlm_model.collect_params())
bert_mlm_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn, mlm_score_tn =\
bert_mlm_model_tn(inputs.T, token_types.T, valid_length, masked_positions)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(mlm_score.asnumpy(), mlm_score_tn.asnumpy(), 1E-4, 1E-4)
# Test for BertForMLM
bert_mlm_model = BertForMLM(cfg)
bert_mlm_model.initialize()
bert_mlm_model.hybridize()
contextual_embedding, pooled_out, mlm_score = bert_mlm_model(inputs, token_types,
valid_length, masked_positions)
bert_mlm_model_tn = BertForMLM(cfg_tn)
bert_mlm_model_tn.share_parameters(bert_mlm_model.collect_params())
bert_mlm_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn, mlm_score_tn =\
bert_mlm_model_tn(inputs.T, token_types.T, valid_length, masked_positions)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(mlm_score.asnumpy(), mlm_score_tn.asnumpy(), 1E-4, 1E-4)

# Test for BertForPretrain
bert_pretrain_model = BertForPretrain(cfg)
bert_pretrain_model.initialize()
bert_pretrain_model.hybridize()
contextual_embedding, pooled_out, nsp_score, mlm_scores =\
bert_pretrain_model(inputs, token_types, valid_length, masked_positions)
bert_pretrain_model_tn = BertForPretrain(cfg_tn)
bert_pretrain_model_tn.share_parameters(bert_pretrain_model.collect_params())
bert_pretrain_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn, nsp_score_tn, mlm_scores_tn = \
bert_pretrain_model_tn(inputs.T, token_types.T, valid_length, masked_positions)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(nsp_score.asnumpy(), nsp_score_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(mlm_score.asnumpy(), mlm_score_tn.asnumpy(), 1E-4, 1E-4)
# Test for BertForPretrain
bert_pretrain_model = BertForPretrain(cfg)
bert_pretrain_model.initialize()
bert_pretrain_model.hybridize()
contextual_embedding, pooled_out, nsp_score, mlm_scores =\
bert_pretrain_model(inputs, token_types, valid_length, masked_positions)
bert_pretrain_model_tn = BertForPretrain(cfg_tn)
bert_pretrain_model_tn.share_parameters(bert_pretrain_model.collect_params())
bert_pretrain_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn, nsp_score_tn, mlm_scores_tn = \
bert_pretrain_model_tn(inputs.T, token_types.T, valid_length, masked_positions)
assert_allclose(contextual_embedding.asnumpy(),
mx.np.swapaxes(contextual_embedding_tn, 0, 1).asnumpy(),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(nsp_score.asnumpy(), nsp_score_tn.asnumpy(), 1E-4, 1E-4)
assert_allclose(mlm_score.asnumpy(), mlm_score_tn.asnumpy(), 1E-4, 1E-4)


@pytest.mark.remote_required
@pytest.mark.parametrize('model_name', list_pretrained_bert())
def test_bert_get_pretrained(model_name):
def test_bert_get_pretrained(model_name, ctx):
assert len(list_pretrained_bert()) > 0
with tempfile.TemporaryDirectory() as root:
with tempfile.TemporaryDirectory() as root, ctx:
cfg, tokenizer, backbone_params_path, mlm_params_path =\
get_pretrained_bert(model_name, load_backbone=True, load_mlm=True, root=root)
assert cfg.MODEL.vocab_size == len(tokenizer.vocab)
Expand Down
67 changes: 34 additions & 33 deletions tests/test_models_electra.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,47 +26,48 @@ def get_test_cfg():


@pytest.mark.parametrize('compute_layout', ['auto', 'NT', 'TN'])
def test_electra_model(compute_layout):
cfg = get_test_cfg()
cfg.defrost()
cfg.MODEL.compute_layout = compute_layout
cfg.freeze()
def test_electra_model(compute_layout, ctx):
with ctx:
cfg = get_test_cfg()
cfg.defrost()
cfg.MODEL.compute_layout = compute_layout
cfg.freeze()

# Generate TN layout
cfg_tn = cfg.clone()
cfg_tn.defrost()
cfg_tn.MODEL.layout = 'TN'
cfg_tn.freeze()
# Generate TN layout
cfg_tn = cfg.clone()
cfg_tn.defrost()
cfg_tn.MODEL.layout = 'TN'
cfg_tn.freeze()

# Sample data
batch_size = 4
sequence_length = 16
num_mask = 3
inputs = mx.np.random.randint(0, 10, (batch_size, sequence_length))
token_types = mx.np.random.randint(0, 2, (batch_size, sequence_length))
valid_length = mx.np.random.randint(3, sequence_length, (batch_size,))
masked_positions = mx.np.random.randint(0, 3, (batch_size, num_mask))
# Sample data
batch_size = 4
sequence_length = 16
num_mask = 3
inputs = mx.np.random.randint(0, 10, (batch_size, sequence_length))
token_types = mx.np.random.randint(0, 2, (batch_size, sequence_length))
valid_length = mx.np.random.randint(3, sequence_length, (batch_size,))
masked_positions = mx.np.random.randint(0, 3, (batch_size, num_mask))

electra_model = ElectraModel.from_cfg(cfg)
electra_model.initialize()
electra_model.hybridize()
contextual_embedding, pooled_out = electra_model(inputs, token_types, valid_length)
electra_model_tn = ElectraModel.from_cfg(cfg_tn)
electra_model_tn.share_parameters(electra_model.collect_params())
electra_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn = electra_model_tn(inputs.T, token_types.T, valid_length)
assert_allclose(contextual_embedding.asnumpy(),
np.swapaxes(contextual_embedding_tn.asnumpy(), 0, 1),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(),
1E-4, 1E-4)
electra_model = ElectraModel.from_cfg(cfg)
electra_model.initialize()
electra_model.hybridize()
contextual_embedding, pooled_out = electra_model(inputs, token_types, valid_length)
electra_model_tn = ElectraModel.from_cfg(cfg_tn)
electra_model_tn.share_parameters(electra_model.collect_params())
electra_model_tn.hybridize()
contextual_embedding_tn, pooled_out_tn = electra_model_tn(inputs.T, token_types.T, valid_length)
assert_allclose(contextual_embedding.asnumpy(),
np.swapaxes(contextual_embedding_tn.asnumpy(), 0, 1),
1E-4, 1E-4)
assert_allclose(pooled_out.asnumpy(), pooled_out_tn.asnumpy(),
1E-4, 1E-4)


@pytest.mark.remote_required
@pytest.mark.parametrize('model_name', list_pretrained_electra())
def test_electra_get_pretrained(model_name):
def test_electra_get_pretrained(model_name, ctx):
assert len(list_pretrained_electra()) > 0
with tempfile.TemporaryDirectory() as root:
with tempfile.TemporaryDirectory() as root, ctx:
cfg, tokenizer, backbone_params_path, (disc_params_path, gen_params_path) =\
get_pretrained_electra(model_name, root=root,
load_backbone=True, load_disc=True, load_gen=True)
Expand Down
2 changes: 1 addition & 1 deletion tests/test_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@


def test_adam(ctx):
with getattr(mx, ctx)():
with ctx:
opt1 = AdamW
opt2 = AdamW
shapes = [(3, 4, 5), (10, 4), (7,)]
Expand Down