-
Notifications
You must be signed in to change notification settings - Fork 292
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #161 from didi/update_pip
Update pip
- Loading branch information
Showing
11 changed files
with
357 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# Install from the source code | ||
|
||
To install from the source code, We use [conda](https://conda.io/) to | ||
install required packages. Please | ||
[install conda](https://conda.io/en/latest/miniconda.html) if you do not | ||
have it in your system. | ||
|
||
Also, we provide two options to install DELTA, `nlp` version or `full` | ||
version. `nlp` version needs minimal requirements and only installs NLP | ||
related packages: | ||
|
||
```shell | ||
# Run the installation script for NLP version, with CPU or GPU. | ||
cd tools | ||
./install/install-delta.sh nlp [cpu|gpu] | ||
``` | ||
|
||
**Note**: Users from mainland China may need to set up conda mirror sources, see [./tools/install/install-delta.sh](tools/install/install-delta.sh) for details. | ||
|
||
If you want to use both NLP and speech packages, you can install the `full` version. The full version needs [Kaldi](https://github.com/kaldi-asr/kaldi) library, which can be pre-installed or installed using our installation script. | ||
|
||
```shell | ||
cd tools | ||
# If you have installed Kaldi | ||
KALDI=/your/path/to/Kaldi ./install/install-delta.sh full [cpu|gpu] | ||
# If you have not installed Kaldi, use the following command | ||
# ./install/install-delta.sh full [cpu|gpu] | ||
``` | ||
|
||
To verify the installation, run: | ||
|
||
```shell | ||
# Activate conda environment | ||
conda activate delta-py3.6-tf2.0.0 | ||
# Or use the following command if your conda version is < 4.6 | ||
# source activate delta-py3.6-tf2.0.0 | ||
|
||
# Add DELTA enviornment | ||
source env.sh | ||
|
||
# Generate mock data for text classification. | ||
pushd egs/mock_text_cls_data/text_cls/v1 | ||
./run.sh | ||
popd | ||
|
||
# Train the model | ||
python3 delta/main.py --cmd train_and_eval --config egs/mock_text_cls_data/text_cls/v1/config/han-cls.yml | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Pick a installation way for yourself | ||
|
||
## Multiple installation ways | ||
|
||
Currently we support multiple ways to install `DELTA`. Please choose one | ||
installation for yourself according to your usage and needs. | ||
|
||
## Install by pip | ||
|
||
For the **quick demo of the features** and **pure NLP users**, you can | ||
install the `nlp` version of `DELTA` by pip with a simple command: | ||
|
||
```bash | ||
pip install delta-nlp | ||
``` | ||
|
||
Check here for | ||
[the tutorial for usage of `delta-nlp`](tutorials/training/text_class_pip_example). | ||
|
||
**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in | ||
MacOS or Linux. | ||
|
||
## Install from the source code | ||
|
||
For users who need **whole function of delta** (including speech and | ||
nlp), you can clone our repository and install from the source code. | ||
|
||
Please follow the steps here: [Install from the source code](installation/install_from_source) | ||
|
||
## Use docker | ||
|
||
For users who are **capable of use docker**, you can pull our images | ||
directly. This maybe the best choice for docker users. | ||
|
||
Please follow the steps here: | ||
[Installation using Docker](installation/using_docker) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# How to build the wheel file | ||
|
||
## Intro | ||
|
||
In order to provide users a simpler way to install `Delta`, we need to | ||
build the Wheel file `.whl` and upload this wheel file to Pypi's | ||
website. Once we uploaded the wheel file, all that users need to do is | ||
typing `pip install delta-nlp`. | ||
|
||
**Notice**: installation by pip only supports NLP tasks now. If you need the | ||
full version of the Delta (with speech tasks), you should install the | ||
platform from source. | ||
|
||
## Prepare | ||
|
||
Before build the wheel file, you need to install the `DELTA` before. | ||
|
||
```bash | ||
bash ./tools/install/install-delta.sh nlp gpu | ||
``` | ||
|
||
For linux wheel building, you will need the docker image: | ||
|
||
```bash | ||
docker pull didi0speech0nlu/delta_pip:tf2_ub16 | ||
``` | ||
|
||
## Start to build | ||
|
||
### MacOS | ||
|
||
```bash | ||
bash ./tools/install/build_pip_pkg.sh | ||
``` | ||
|
||
The generated wheel will be under `dist` like | ||
`delta_nlp-0.2-cp36-cp36m-macosx_10_7_x86_64.whl` | ||
|
||
### Linux | ||
|
||
Wheel building in linux is more complicated. You need to run a docker | ||
|
||
```bash | ||
docker run --name delta_pip_tf2_u16 -it -v $PWD:/delta tensorflow/tensorflow:custom-op-ubuntu16 /bin/bash | ||
``` | ||
|
||
In the docker environment, run: | ||
|
||
```bash | ||
bash ./tools/install/build_pip_pkg.sh | ||
``` | ||
|
||
The generated wheel will be under `dist` like | ||
`delta_nlp-0.2-cp36-cp36m-linux_x86_64.whl` | ||
|
||
Repair the wheel file for multiple linux platform support: | ||
|
||
```bash | ||
auditwheel repair dist/xxx.whl | ||
``` | ||
|
||
The final wheel will be under `wheelhouse` like | ||
`delta_nlp-0.2-cp36-cp36m-manylinux1_x86_64.whl`. | ||
|
||
## Upload | ||
|
||
After building the wheel file, upload these files to Pypi: | ||
|
||
``` | ||
twine upload xxx.whl | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# A Text Classification Usage Example for pip users | ||
|
||
## Intro | ||
|
||
In this tutorial, we demonstrate a text classification task with a | ||
demo mock dataset **for users install by pip**. | ||
|
||
A complete process contains following steps: | ||
|
||
- Prepare the data set. | ||
- Develop custom modules (optional). | ||
- Set the config file. | ||
- Train a model. | ||
- Export a model | ||
|
||
Please clone our demo repository: | ||
|
||
```bash | ||
git clone --depth 1 https://github.com/applenob/delta_demo.git | ||
cd ./delta_demo | ||
``` | ||
|
||
## A quick review for installation | ||
|
||
If you haven't install `delta-nlp`, please: | ||
|
||
```bash | ||
pip install delta-nlp | ||
``` | ||
|
||
**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in | ||
MacOS or Linux. | ||
|
||
## Prepare the Data Set | ||
|
||
run the script: | ||
|
||
``` | ||
./gen_data.sh | ||
``` | ||
|
||
The generated data are in directory: `data`. | ||
|
||
The generated data for text classification should be in the standard format for text classification, which is "label\tdocument". | ||
|
||
## Develop custom modules (optional) | ||
|
||
Please make sure we don't have modules you need before you decide to | ||
develop your own modules. | ||
|
||
```python | ||
@registers.model.register | ||
class TestHierarchicalAttentionModel(HierarchicalModel): | ||
"""Hierarchical text classification model with attention.""" | ||
|
||
def __init__(self, config, **kwargs): | ||
super().__init__(config, **kwargs) | ||
|
||
logging.info("Initialize HierarchicalAttentionModel...") | ||
|
||
self.vocab_size = config['data']['vocab_size'] | ||
self.num_classes = config['data']['task']['classes']['num_classes'] | ||
self.use_true_length = config['model'].get('use_true_length', False) | ||
if self.use_true_length: | ||
self.split_token = config['data']['split_token'] | ||
self.padding_token = utils.PAD_IDX | ||
``` | ||
|
||
You need to register this module file path in the config file | ||
`config/han-cls.yml` (relative to the current work directory). | ||
|
||
```yml | ||
custom_modules: | ||
- "test_model.py" | ||
``` | ||
|
||
## Set the Config File | ||
|
||
The config file of this example is `config/han-cls.yml` | ||
|
||
In the config file, we set the task to be `TextClsTask` and the model to be `TestHierarchicalAttentionModel`. | ||
|
||
### Config Details | ||
|
||
The config is composed by 3 parts: `data`, `model`, `solver`. | ||
|
||
Data related configs are under `data`. | ||
You can set the data path (including training set, dev set and test set). | ||
The data process configs can also be found here (mainly under `task`). | ||
For example, we set `use_dense: false` since no dense input was used here. | ||
We set `language: chinese` since it's a Chinese text. | ||
|
||
Model parameters are under `model`. The most important config here is | ||
`name: TestHierarchicalAttentionModel`, which specifies the model to | ||
use. Detail structure configs are under `net->structure`. Here, the | ||
`max_sen_len` is 32 and `max_doc_len` is 32. | ||
|
||
The configs under `solver` are used by solver class, including training optimizer, evaluation metrics and checkpoint saver. | ||
Here the class is `RawSolver`. | ||
|
||
## Train a Model | ||
|
||
After setting the config file, you are ready to train a model. | ||
|
||
``` | ||
delta --cmd train_and_eval --config config/han-cls.yml | ||
``` | ||
|
||
The argument `cmd` tells the platform to train a model and also evaluate | ||
the dev set during the training process. | ||
|
||
After enough steps of training, you would find the model checkpoints have been saved to the directory set by `saver->model_path`, which is `exp/han-cls/ckpt` in this case. | ||
|
||
## Export a Model | ||
|
||
If you would like to export a specific checkpoint to be exported, please set `infer_model_path` in config file. Otherwise, platform will simply find the newest checkpoint under the directory set by `saver->model_path`. | ||
|
||
``` | ||
delta --cmd export_model --config/han-cls.yml | ||
``` | ||
|
||
The exported models are in the directory set by config | ||
`service->model_path`, which is `exp/han-cls/service` here. | ||
|
6 changes: 5 additions & 1 deletion
6
.../tutorials/training/text_class_example.md → ...als/training/text_class_source_example.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.