Skip to content

Commit

Permalink
Merge pull request #161 from didi/update_pip
Browse files Browse the repository at this point in the history
Update pip
  • Loading branch information
applenob committed Nov 2, 2019
2 parents 4efc814 + c4a389b commit bd8be56
Show file tree
Hide file tree
Showing 11 changed files with 357 additions and 41 deletions.
34 changes: 30 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,38 @@ It helps you to train, develop, and deploy NLP and/or speech models, featuring:

## Installation

### Quick Installation
### Installation by pip

We use [conda](https://conda.io/) to install required packages. Please [install conda](https://conda.io/en/latest/miniconda.html) if you do not have it in your system.
We provide the pip install support for our `nlp` version of DELTA for
**pure NLP users** and the **quick demo of the features**.

We provide two options to install DELTA, `nlp` version or `full` version.
`nlp` version needs minimal requirements and only installs NLP related packages:
**Note**: Users can still use both `nlp` and `speech` tasks by installing
from our source code.

We recommend to use [conda](https://conda.io/) or
[virtualenv](https://virtualenv.pypa.io/en/latest/) to install DELTA
from pip.

Before our installation, make sure you have installed the Tensorflow
(2.0.0 is required now).

```bash
pip install delta-nlp
```

Follow the usage steps here if you install by pip:
[A Text Classification Usage Example for pip users](docs/tutorials/training/text_class_pip_example.md)

### Installation from Source Code

To install from the source code, We use [conda](https://conda.io/) to
install required packages. Please
[install conda](https://conda.io/en/latest/miniconda.html) if you do not
have it in your system.

Also, we provide two options to install DELTA, `nlp` version or `full`
version. `nlp` version needs minimal requirements and only installs NLP
related packages:

```shell
# Run the installation script for NLP version, with CPU or GPU.
Expand Down
4 changes: 3 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Welcome to DELTA's documentation!
:caption: Installation
:name: sec-install

installation/pick_install
installation/using_docker
installation/manual_setup
installation/deltann_compile
Expand All @@ -31,7 +32,8 @@ Welcome to DELTA's documentation!

tutorials/training/egs
tutorials/training/speech_features
tutorials/training/text_class_example
tutorials/training/text_class_pip_example
tutorials/training/text_class_source_example
tutorials/training/data/asr_example
tutorials/training/data/emotion-speech-cls
tutorials/training/data/kws-cls
Expand Down
48 changes: 48 additions & 0 deletions docs/installation/install_from_source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Install from the source code

To install from the source code, We use [conda](https://conda.io/) to
install required packages. Please
[install conda](https://conda.io/en/latest/miniconda.html) if you do not
have it in your system.

Also, we provide two options to install DELTA, `nlp` version or `full`
version. `nlp` version needs minimal requirements and only installs NLP
related packages:

```shell
# Run the installation script for NLP version, with CPU or GPU.
cd tools
./install/install-delta.sh nlp [cpu|gpu]
```

**Note**: Users from mainland China may need to set up conda mirror sources, see [./tools/install/install-delta.sh](tools/install/install-delta.sh) for details.

If you want to use both NLP and speech packages, you can install the `full` version. The full version needs [Kaldi](https://github.com/kaldi-asr/kaldi) library, which can be pre-installed or installed using our installation script.

```shell
cd tools
# If you have installed Kaldi
KALDI=/your/path/to/Kaldi ./install/install-delta.sh full [cpu|gpu]
# If you have not installed Kaldi, use the following command
# ./install/install-delta.sh full [cpu|gpu]
```

To verify the installation, run:

```shell
# Activate conda environment
conda activate delta-py3.6-tf2.0.0
# Or use the following command if your conda version is < 4.6
# source activate delta-py3.6-tf2.0.0

# Add DELTA enviornment
source env.sh

# Generate mock data for text classification.
pushd egs/mock_text_cls_data/text_cls/v1
./run.sh
popd

# Train the model
python3 delta/main.py --cmd train_and_eval --config egs/mock_text_cls_data/text_cls/v1/config/han-cls.yml
```
37 changes: 37 additions & 0 deletions docs/installation/pick_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Pick a installation way for yourself

## Multiple installation ways

Currently we support multiple ways to install `DELTA`. Please choose one
installation for yourself according to your usage and needs.

## Install by pip

For the **quick demo of the features** and **pure NLP users**, you can
install the `nlp` version of `DELTA` by pip with a simple command:

```bash
pip install delta-nlp
```

Check here for
[the tutorial for usage of `delta-nlp`](tutorials/training/text_class_pip_example).

**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
MacOS or Linux.

## Install from the source code

For users who need **whole function of delta** (including speech and
nlp), you can clone our repository and install from the source code.

Please follow the steps here: [Install from the source code](installation/install_from_source)

## Use docker

For users who are **capable of use docker**, you can pull our images
directly. This maybe the best choice for docker users.

Please follow the steps here:
[Installation using Docker](installation/using_docker)

2 changes: 1 addition & 1 deletion docs/installation/using_docker.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Intallation using Docker
# Installation using Docker

You can directly pull the pre-build docker images for DELTA and DELTANN. We have created the following docker images:

Expand Down
71 changes: 71 additions & 0 deletions docs/installation/wheel_build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# How to build the wheel file

## Intro

In order to provide users a simpler way to install `Delta`, we need to
build the Wheel file `.whl` and upload this wheel file to Pypi's
website. Once we uploaded the wheel file, all that users need to do is
typing `pip install delta-nlp`.

**Notice**: installation by pip only supports NLP tasks now. If you need the
full version of the Delta (with speech tasks), you should install the
platform from source.

## Prepare

Before build the wheel file, you need to install the `DELTA` before.

```bash
bash ./tools/install/install-delta.sh nlp gpu
```

For linux wheel building, you will need the docker image:

```bash
docker pull didi0speech0nlu/delta_pip:tf2_ub16
```

## Start to build

### MacOS

```bash
bash ./tools/install/build_pip_pkg.sh
```

The generated wheel will be under `dist` like
`delta_nlp-0.2-cp36-cp36m-macosx_10_7_x86_64.whl`

### Linux

Wheel building in linux is more complicated. You need to run a docker

```bash
docker run --name delta_pip_tf2_u16 -it -v $PWD:/delta tensorflow/tensorflow:custom-op-ubuntu16 /bin/bash
```

In the docker environment, run:

```bash
bash ./tools/install/build_pip_pkg.sh
```

The generated wheel will be under `dist` like
`delta_nlp-0.2-cp36-cp36m-linux_x86_64.whl`

Repair the wheel file for multiple linux platform support:

```bash
auditwheel repair dist/xxx.whl
```

The final wheel will be under `wheelhouse` like
`delta_nlp-0.2-cp36-cp36m-manylinux1_x86_64.whl`.

## Upload

After building the wheel file, upload these files to Pypi:

```
twine upload xxx.whl
```
124 changes: 124 additions & 0 deletions docs/tutorials/training/text_class_pip_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# A Text Classification Usage Example for pip users

## Intro

In this tutorial, we demonstrate a text classification task with a
demo mock dataset **for users install by pip**.

A complete process contains following steps:

- Prepare the data set.
- Develop custom modules (optional).
- Set the config file.
- Train a model.
- Export a model

Please clone our demo repository:

```bash
git clone --depth 1 https://github.com/applenob/delta_demo.git
cd ./delta_demo
```

## A quick review for installation

If you haven't install `delta-nlp`, please:

```bash
pip install delta-nlp
```

**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
MacOS or Linux.

## Prepare the Data Set

run the script:

```
./gen_data.sh
```

The generated data are in directory: `data`.

The generated data for text classification should be in the standard format for text classification, which is "label\tdocument".

## Develop custom modules (optional)

Please make sure we don't have modules you need before you decide to
develop your own modules.

```python
@registers.model.register
class TestHierarchicalAttentionModel(HierarchicalModel):
"""Hierarchical text classification model with attention."""

def __init__(self, config, **kwargs):
super().__init__(config, **kwargs)

logging.info("Initialize HierarchicalAttentionModel...")

self.vocab_size = config['data']['vocab_size']
self.num_classes = config['data']['task']['classes']['num_classes']
self.use_true_length = config['model'].get('use_true_length', False)
if self.use_true_length:
self.split_token = config['data']['split_token']
self.padding_token = utils.PAD_IDX
```

You need to register this module file path in the config file
`config/han-cls.yml` (relative to the current work directory).

```yml
custom_modules:
- "test_model.py"
```

## Set the Config File

The config file of this example is `config/han-cls.yml`

In the config file, we set the task to be `TextClsTask` and the model to be `TestHierarchicalAttentionModel`.

### Config Details

The config is composed by 3 parts: `data`, `model`, `solver`.

Data related configs are under `data`.
You can set the data path (including training set, dev set and test set).
The data process configs can also be found here (mainly under `task`).
For example, we set `use_dense: false` since no dense input was used here.
We set `language: chinese` since it's a Chinese text.

Model parameters are under `model`. The most important config here is
`name: TestHierarchicalAttentionModel`, which specifies the model to
use. Detail structure configs are under `net->structure`. Here, the
`max_sen_len` is 32 and `max_doc_len` is 32.

The configs under `solver` are used by solver class, including training optimizer, evaluation metrics and checkpoint saver.
Here the class is `RawSolver`.

## Train a Model

After setting the config file, you are ready to train a model.

```
delta --cmd train_and_eval --config config/han-cls.yml
```

The argument `cmd` tells the platform to train a model and also evaluate
the dev set during the training process.

After enough steps of training, you would find the model checkpoints have been saved to the directory set by `saver->model_path`, which is `exp/han-cls/ckpt` in this case.

## Export a Model

If you would like to export a specific checkpoint to be exported, please set `infer_model_path` in config file. Otherwise, platform will simply find the newest checkpoint under the directory set by `saver->model_path`.

```
delta --cmd export_model --config/han-cls.yml
```

The exported models are in the directory set by config
`service->model_path`, which is `exp/han-cls/service` here.

Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# A Text Classification Usage Example

In this tutorial, we demonstrate a text classification task with an open source dataset: `yahoo answer`.
## Intro

In this tutorial, we demonstrate a text classification task with an
open source dataset: `yahoo answer` for users with installation from
source code..

A complete process contains following steps:

Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
TF_INCLUDE = TF_INCLUDE.split('-I')[1]

TF_LIB_INC, TF_SO_LIB = tf.sysconfig.get_link_flags()
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.1.dylib',
'-ltensorflow_framework.1')
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.2.dylib',
'-ltensorflow_framework.2')
TF_LIB_INC = TF_LIB_INC.split('-L')[1]
TF_SO_LIB = TF_SO_LIB.split('-l')[1]

Expand Down Expand Up @@ -100,7 +100,7 @@ def get_requires():
description=SHORT_DESCRIPTION,
long_description=LONG_DESCRIPTION,
long_description_content_type="text/markdown",
version="0.2",
version="0.2.1",
author=AUTHOR,
author_email=AUTHOR_EMAIL,
maintainer=MAINTAINER,
Expand Down
3 changes: 3 additions & 0 deletions build_pip_pkg.sh → tools/install/build_pip_pkg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,8 @@ echo "Uninstall ${PIP_NAME} if exist ..."
pip3 uninstall -y ${PIP_NAME}

echo "Build binary distribution wheel file ..."
BASH_DIR=`dirname "$BASH_SOURCE"`
pushd ${BASH_DIR}/../..
rm -rf build/ ${PIP_NAME}.egg-info/ dist/
python3 setup.py bdist_wheel
popd
Loading

0 comments on commit bd8be56

Please sign in to comment.