Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Bug Fix] trainer.update(1) should be used after loss.mean() is called #1000

Open
wants to merge 49 commits into
base: v0.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
de7b23d
clean slate for 1.x
szha Mar 18, 2020
01122db
[Numpy] Numpy version of GluonNLP (#1225)
sxjscience Jun 10, 2020
982a416
Fix bert cfg (#1245)
zheyuye Jun 11, 2020
789e2b9
fix download
sxjscience Jun 11, 2020
b714eac
[Numpy] Try to fix the CI (#1248)
sxjscience Jun 11, 2020
85b6f09
[Numpy] Add "match_tokens_with_char_spans" + Enable downloading from …
sxjscience Jun 16, 2020
ee1f0e3
[Numpy] Update QA Dataset and revise run_squad (#1250)
zheyuye Jun 18, 2020
e06ff01
Pin mxnet version range on CI (#1257)
leezu Jul 7, 2020
689eba9
[CI] AWS batch job tool for GluonNLP (Part I) (#1251)
szha Jul 7, 2020
cd48efd
Update codecov action to handle different OS and Python versions (#1254)
leezu Jul 8, 2020
83e1f13
Use Amazon S3 Transfer Acceleration (#1260)
leezu Jul 10, 2020
a646c34
[FEATURE] update backtranslation and add multinomial sampler (#1259)
hutao965 Jul 11, 2020
ea9152b
Fixes to make the CI more stable (#1265)
sxjscience Jul 16, 2020
70a1887
Update for Block API (#1261)
leezu Jul 17, 2020
9d83fe6
Fix parameter share regex (#1267)
leezu Jul 17, 2020
4743afc
Add fp16 support for Bert QA inference (#1264)
MoisesHer Jul 17, 2020
e78a24e
[CI] update batch to gluonnlp-dev (#1268)
szha Jul 18, 2020
3a0ed9f
[Numpy] Refactor Roberta (#1269)
zheyuye Jul 21, 2020
f407b8e
[CI] Batch cpu version (#1275)
szha Jul 22, 2020
57eb411
[Numpy] Fix conversion toolkits (#1274)
zheyuye Jul 23, 2020
74bd2ce
[Feature] Add FP16 inference support to NMT + Add BoundedBudgetSample…
hutao965 Jul 23, 2020
d76897b
Add embedding related methods in numpy version (#1263)
acphile Jul 28, 2020
4d43f82
add subversion/wget to docker, add readme (#1279)
szha Jul 28, 2020
3c87457
Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, EL…
sxjscience Jul 29, 2020
033214e
[Numpy] Fix SQuAD + Fix GLUE downloading (#1280)
sxjscience Jul 29, 2020
2294421
[Numpy Refactor] BART (#1282)
zheyuye Jul 30, 2020
1f9ad44
Horovod support for pretraining and fune-tuning squad (#1276)
zheyuye Aug 1, 2020
7e1f9d0
[DOC] Add the basic documentation for the embedding API (#1281)
acphile Aug 4, 2020
20af58f
Fix gelu (#1287)
zheyuye Aug 5, 2020
ded0f99
fix prepare_openwebtext (#1289)
ZiyueHuang Aug 6, 2020
c33e62e
[FEATURE]Horovod support for training transformer + add mirror data f…
hutao965 Aug 7, 2020
9e268c0
Fix electra (#1291)
zheyuye Aug 8, 2020
32e87d4
[Numpy] Benchmark the backbone models + Some fixes + Always use pytho…
sxjscience Aug 14, 2020
6ae558e
[FEATURE]Horovod support for training transformer (PART 2) (#1301)
hutao965 Aug 20, 2020
d8b68c6
[Numpy] Fix AWS Batch + Add Docker Support (#1302)
sxjscience Aug 20, 2020
d17ec4c
minor fix for run_electra.py & remove hybridization in the constructi…
ZiyueHuang Aug 22, 2020
99b35d8
Add Intro for batch + upload squad traininng command (#1305)
zheyuye Aug 22, 2020
d93356f
[MODEL] make beam search a hybrid block (#1310)
szha Aug 23, 2020
210dd0c
[Numpy] [Fix] Update README.md (#1306)
sxjscience Aug 23, 2020
b324ee6
[CI] Add GPU pytest + Append AWS Batch job submission to current pipe…
barry-jin Aug 24, 2020
3b14d69
[CI] Update unittests-gpu (#1313)
barry-jin Aug 24, 2020
dca17ee
automatically generate date suffix for dev versions (#1314)
szha Aug 25, 2020
39ec921
fix typo (#1317)
liuzh47 Aug 26, 2020
970318d
fix typo (#1318)
liuzh47 Aug 26, 2020
bba8697
[CI] Update GPU Test Workflow + Update Some Tests and README (#1316)
barry-jin Aug 28, 2020
66e5e05
fix https://github.com/dmlc/gluon-nlp/issues/1315 (#1319)
ZiyueHuang Aug 28, 2020
ff95fb4
[CI] Fix Source Reference Issues (#1332)
barry-jin Sep 1, 2020
1bd85b6
[BUGFIX] fix valid candidates issue (#1323)
liuzh47 Sep 1, 2020
189bbdc
[MODEL] convert gpt2 model (#1328)
hutao965 Sep 1, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 1 addition & 2 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
[run]
omit =
tests/*
conda/*
scripts/tests/*
scripts/*
concurrency =
multiprocessing
thread
Expand Down
4 changes: 4 additions & 0 deletions .flake8
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[flake8]
max-line-length = 100
max-complexity = 18
exclude = tests,__init__.py
60 changes: 60 additions & 0 deletions .github/workflows/unittests-gpu.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: continuous build - gpu

on: [push, pull_request_target]

defaults:
run:
shell: bash

jobs:
unittest-gpu:
runs-on: ubuntu-latest
strategy:
fail-fast: false
steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Install Linux dependencies
run: sudo apt-get install libopenblas-dev

- name: Setup python
uses: actions/setup-python@v2
with:
python-version: 3.7
architecture: x64

- name: Install Other Dependencies
run: |
python -m pip install --user --quiet --upgrade pip
python -m pip install --user --quiet -e .[extras]

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1

- name: Extract branch name
shell: bash
run: echo "##[set-output name=branch;]$(echo ${GITHUB_REF#refs/heads/})"
id: extract_branch

- name: Test project on AWS Batch(For push)
if: startsWith(steps.extract_branch.outputs.branch, 'PR-') != true
run: |
python ./tools/batch/submit-job.py --region us-east-1 --job-type g4dn.4x --source-ref ${{ github.ref }} --work-dir tools/batch --remote https://github.com/dmlc/gluon-nlp --command "/batch_states/test.sh" --wait | tee > script.log

- name: Test project on AWS Batch(For pull request)
if: startsWith(steps.extract_branch.outputs.branch, 'PR-') == true
run: |
python ./tools/batch/submit-job.py --region us-east-1 --job-type g4dn.4x --source-ref ${{ github.event.pull_request.head.ref }} --work-dir tools/batch --remote https://github.com/${{ github.event.pull_request.head.repo.full_name }} --command "/batch_states/test.sh" --wait | tee > script.log

- name: Upload log file for AWS Batch test results
uses: actions/upload-artifact@v2
with:
name: GPU_Test_Results
path: script.log


47 changes: 47 additions & 0 deletions .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: continuous build

on: [push, pull_request]

defaults:
run:
shell: bash

jobs:
unittest:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
# TODO Add windows test by using "windows-latest"
os: [macos-latest, ubuntu-latest]
python-version: [ '3.6', '3.7', '3.8']
steps:
- name: Checkout repository
uses: actions/checkout@v2

# Install OS specific dependencies
- name: Install Linux dependencies
if: matrix.os == 'ubuntu-latest'
# TODO https://github.com/apache/incubator-mxnet/issues/18293
run: sudo apt-get install libopenblas-dev

- name: Setup python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
architecture: x64
- name: Install Other Dependencies
run: |
python -m pip install --user --upgrade pip
python -m pip install --user setuptools pytest pytest-cov contextvars
python -m pip install --upgrade cython
python -m pip install --pre --user "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
python -m pip install --user -e .[extras]
- name: Test project
run: |
python -m pytest --cov=./ --cov-report=xml --device="cpu" --durations=50 tests/
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1.0.10
with:
env_vars: OS,PYTHON

10 changes: 0 additions & 10 deletions .gitmodules

This file was deleted.

2 changes: 1 addition & 1 deletion .pytype.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ inputs =
src/gluonnlp

# Python version (major.minor) of the target code.
python_version = 3.5
python_version = 3.6
4 changes: 2 additions & 2 deletions CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Watchers and contributors to Apache MXNet repo directories/packages/files
# Watchers and contributors to DMLC GluonNLP repo directories/packages/files
# Please see documentation of use of CODEOWNERS file at
# https://help.github.com/articles/about-codeowners/ and
# https://github.com/blog/2392-introducing-code-owners
#
# Anybody can add themselves or a team as additional watcher or contributor
# Anybody can add themselves or a team as additional watcher or contributor
# to get notified about changes in a specific package.
# See https://help.github.com/articles/about-teams how to setup teams.

Expand Down
1 change: 0 additions & 1 deletion CONTRIBUTING.md

This file was deleted.

5 changes: 0 additions & 5 deletions MANIFEST.in

This file was deleted.

113 changes: 0 additions & 113 deletions Makefile

This file was deleted.

111 changes: 111 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
<h3 align="center">
GluonNLP: Your Choice of Deep Learning for NLP
</h3>

<p align="center">
<a href="https://github.com/dmlc/gluon-nlp/actions"><img src="https://github.com/dmlc/gluon-nlp/workflows/continuous%20build/badge.svg"></a>
<a href="https://codecov.io/gh/dmlc/gluon-nlp"><img src="https://codecov.io/gh/dmlc/gluon-nlp/branch/master/graph/badge.svg"></a>
<a href="https://github.com/dmlc/gluonnlp/actions"><img src="https://img.shields.io/badge/python-3.6%2C3.8-blue.svg"></a>
<a href="https://pypi.org/project/gluonnlp/#history"><img src="https://img.shields.io/pypi/v/gluonnlp.svg"></a>
</p>

GluonNLP is a toolkit that enables easy text preprocessing, datasets
loading and neural models building to help you speed up your Natural
Language Processing (NLP) research.

# Features

For NLP Practitioners
- Easy-to-use Data Pipeline
- Automatically Train Models via AutoNLP (TODO)

For Researchers
- Pretrained Model Zoo
- Programming with numpy-like API

For Engineers
- Fast Deployment
- [TVM](https://tvm.apache.org/) (TODO)
- AWS Integration


# Installation
First of all, install the latest MXNet. You may use the following commands:

```bash
# Install the version with CUDA 10.0
python3 -m pip install -U --pre "mxnet-cu100>=2.0.0b20200802" -f https://dist.mxnet.io/python

# Install the version with CUDA 10.1
python3 -m pip install -U --pre "mxnet-cu101>=2.0.0b20200802" -f https://dist.mxnet.io/python

# Install the version with CUDA 10.2
python3 -m pip install -U --pre "mxnet-cu102>=2.0.0b20200802" -f https://dist.mxnet.io/python

# Install the cpu-only version
python3 -m pip install -U --pre "mxnet>=2.0.0b20200802" -f https://dist.mxnet.io/python
```


To install GluonNLP, use

```bash
python3 -m pip install -U -e .

# Also, you may install all the extra requirements via
python3 -m pip install -U -e ."[extras]"
```

If you find that you do not have the permission, you can also install to the user folder:

```bash
python3 -m pip install -U -e . --user
```

For Windows users, we recommend to use the [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/about).


# Access the Command-line Toolkits

To facilitate the researcher and the engineers, we provide command-line-toolkits for
downloading and preprocessing the NLP datasets. For more details, you may refer to
[GluonNLP Datasets](./scripts/datasets) and [GluonNLP Preprocessing Tools](./scripts/preprocess).

```bash
# CLI for downloading / preparing the dataset
nlp_data help

# CLI for accessing some common data preprocessing scripts
nlp_preprocess help

# Also, you can use `python -m` to access the toolkits
python3 -m gluonnlp.cli.data help
python3 -m gluonnlp.cli.preprocess help

```

### Frequently Asked Questions
- **Question**: I cannot you access the command line toolkits. By running `nlp_data`, it reports `nlp_data: command not found`.

This is sometimes because that you have installed glunonnlp to the user folder and
the executables are installed to `~/.local/bin`. You can try to change the `PATH` variable to
also include '~/.local/bin'.

```
export PATH=${PATH}:~/.local/bin
```


# Run Unittests
You may go to [tests](tests) to see all how to run the unittests.


# Use Docker
You can use Docker to launch a JupyterLab development environment with GluonNLP installed.

```
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
```

For more details, you can refer to the guidance in [tools/docker](tools/docker).
Loading