Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pip #161

Merged
merged 4 commits into from
Nov 2, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 30 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,38 @@ It helps you to train, develop, and deploy NLP and/or speech models, featuring:

## Installation

### Quick Installation
### Installation by pip

We use [conda](https://conda.io/) to install required packages. Please [install conda](https://conda.io/en/latest/miniconda.html) if you do not have it in your system.
We provide the pip install support for our `nlp` version of DELTA for
**pure NLP users** and the **quick demo of the features**.

We provide two options to install DELTA, `nlp` version or `full` version.
`nlp` version needs minimal requirements and only installs NLP related packages:
**Note**: Users can still use both `nlp` and `speech` tasks by installing
from our source code.

We recommend to use [conda](https://conda.io/) or
[virtualenv](https://virtualenv.pypa.io/en/latest/) to install DELTA
from pip.

Before our installation, make sure you have installed the Tensorflow
(2.0.0 is required now).

```bash
pip install delta-nlp
```

Follow the usage steps here if you install by pip:
[A Text Classification Usage Example for pip users](docs/tutorials/training/text_class_pip_example.md)

### Installation from Source Code

To install from the source code, We use [conda](https://conda.io/) to
install required packages. Please
[install conda](https://conda.io/en/latest/miniconda.html) if you do not
have it in your system.

Also, we provide two options to install DELTA, `nlp` version or `full`
version. `nlp` version needs minimal requirements and only installs NLP
related packages:

```shell
# Run the installation script for NLP version, with CPU or GPU.
Expand Down
4 changes: 3 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Welcome to DELTA's documentation!
:caption: Installation
:name: sec-install

installation/pick_install
installation/using_docker
installation/manual_setup
installation/deltann_compile
Expand All @@ -31,7 +32,8 @@ Welcome to DELTA's documentation!

tutorials/training/egs
tutorials/training/speech_features
tutorials/training/text_class_example
tutorials/training/text_class_pip_example
tutorials/training/text_class_source_example
tutorials/training/data/asr_example
tutorials/training/data/emotion-speech-cls
tutorials/training/data/kws-cls
Expand Down
48 changes: 48 additions & 0 deletions docs/installation/install_from_source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Install from the source code

To install from the source code, We use [conda](https://conda.io/) to
install required packages. Please
[install conda](https://conda.io/en/latest/miniconda.html) if you do not
have it in your system.

Also, we provide two options to install DELTA, `nlp` version or `full`
version. `nlp` version needs minimal requirements and only installs NLP
related packages:

```shell
# Run the installation script for NLP version, with CPU or GPU.
cd tools
./install/install-delta.sh nlp [cpu|gpu]
```

**Note**: Users from mainland China may need to set up conda mirror sources, see [./tools/install/install-delta.sh](tools/install/install-delta.sh) for details.

If you want to use both NLP and speech packages, you can install the `full` version. The full version needs [Kaldi](https://github.com/kaldi-asr/kaldi) library, which can be pre-installed or installed using our installation script.

```shell
cd tools
# If you have installed Kaldi
KALDI=/your/path/to/Kaldi ./install/install-delta.sh full [cpu|gpu]
# If you have not installed Kaldi, use the following command
# ./install/install-delta.sh full [cpu|gpu]
```

To verify the installation, run:

```shell
# Activate conda environment
conda activate delta-py3.6-tf2.0.0
# Or use the following command if your conda version is < 4.6
# source activate delta-py3.6-tf2.0.0

# Add DELTA enviornment
source env.sh

# Generate mock data for text classification.
pushd egs/mock_text_cls_data/text_cls/v1
./run.sh
popd

# Train the model
python3 delta/main.py --cmd train_and_eval --config egs/mock_text_cls_data/text_cls/v1/config/han-cls.yml
```
37 changes: 37 additions & 0 deletions docs/installation/pick_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Pick a installation way for yourself

## Multiple installation ways

Currently we support multiple ways to install `DELTA`. Please choose one
installation for yourself according to your usage and needs.

## Install by pip

For the **quick demo of the features** and **pure NLP users**, you can
install the `nlp` version of `DELTA` by pip with a simple command:

```bash
pip install delta-nlp
```

Check here for
[the tutorial for usage of `delta-nlp`](tutorials/training/text_class_pip_example).

**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
MacOS or Linux.

## Install from the source code

For users who need **whole function of delta** (including speech and
nlp), you can clone our repository and install from the source code.

Please follow the steps here: [Install from the source code](installation/install_from_source)

## Use docker

For users who are **capable of use docker**, you can pull our images
directly. This maybe the best choice for docker users.

Please follow the steps here:
[Installation using Docker](installation/using_docker)

2 changes: 1 addition & 1 deletion docs/installation/using_docker.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Intallation using Docker
# Installation using Docker

You can directly pull the pre-build docker images for DELTA and DELTANN. We have created the following docker images:

Expand Down
71 changes: 71 additions & 0 deletions docs/installation/wheel_build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# How to build the wheel file

## Intro

In order to provide users a simpler way to install `Delta`, we need to
build the Wheel file `.whl` and upload this wheel file to Pypi's
website. Once we uploaded the wheel file, all that users need to do is
typing `pip install delta-nlp`.

**Notice**: installation by pip only supports NLP tasks now. If you need the
full version of the Delta (with speech tasks), you should install the
platform from source.

## Prepare

Before build the wheel file, you need to install the `DELTA` before.

```bash
bash ./tools/install/install-delta.sh nlp gpu
```

For linux wheel building, you will need the docker image:

```bash
docker pull didi0speech0nlu/delta_pip:tf2_ub16
```

## Start to build

### MacOS

```bash
bash ./tools/install/build_pip_pkg.sh
```

The generated wheel will be under `dist` like
`delta_nlp-0.2-cp36-cp36m-macosx_10_7_x86_64.whl`

### Linux

Wheel building in linux is more complicated. You need to run a docker

```bash
docker run --name delta_pip_tf2_u16 -it -v $PWD:/delta tensorflow/tensorflow:custom-op-ubuntu16 /bin/bash
```

In the docker environment, run:

```bash
bash ./tools/install/build_pip_pkg.sh
```

The generated wheel will be under `dist` like
`delta_nlp-0.2-cp36-cp36m-linux_x86_64.whl`

Repair the wheel file for multiple linux platform support:

```bash
auditwheel repair dist/xxx.whl
```

The final wheel will be under `wheelhouse` like
`delta_nlp-0.2-cp36-cp36m-manylinux1_x86_64.whl`.

## Upload

After building the wheel file, upload these files to Pypi:

```
twine upload xxx.whl
```
124 changes: 124 additions & 0 deletions docs/tutorials/training/text_class_pip_example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# A Text Classification Usage Example for pip users

## Intro

In this tutorial, we demonstrate a text classification task with a
demo mock dataset **for users install by pip**.

A complete process contains following steps:

- Prepare the data set.
- Develop custom modules (optional).
- Set the config file.
- Train a model.
- Export a model

Please clone our demo repository:

```bash
git clone --depth 1 https://github.com/applenob/delta_demo.git
cd ./delta_demo
```

## A quick review for installation

If you haven't install `delta-nlp`, please:

```bash
pip install delta-nlp
```

**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
MacOS or Linux.

## Prepare the Data Set

run the script:

```
./gen_data.sh
```

The generated data are in directory: `data`.

The generated data for text classification should be in the standard format for text classification, which is "label\tdocument".

## Develop custom modules (optional)

Please make sure we don't have modules you need before you decide to
develop your own modules.

```python
@registers.model.register
class TestHierarchicalAttentionModel(HierarchicalModel):
"""Hierarchical text classification model with attention."""

def __init__(self, config, **kwargs):
super().__init__(config, **kwargs)

logging.info("Initialize HierarchicalAttentionModel...")

self.vocab_size = config['data']['vocab_size']
self.num_classes = config['data']['task']['classes']['num_classes']
self.use_true_length = config['model'].get('use_true_length', False)
if self.use_true_length:
self.split_token = config['data']['split_token']
self.padding_token = utils.PAD_IDX
```

You need to register this module file path in the config file
`config/han-cls.yml` (relative to the current work directory).

```yml
custom_modules:
- "test_model.py"
```

## Set the Config File

The config file of this example is `config/han-cls.yml`

In the config file, we set the task to be `TextClsTask` and the model to be `TestHierarchicalAttentionModel`.

### Config Details

The config is composed by 3 parts: `data`, `model`, `solver`.

Data related configs are under `data`.
You can set the data path (including training set, dev set and test set).
The data process configs can also be found here (mainly under `task`).
For example, we set `use_dense: false` since no dense input was used here.
We set `language: chinese` since it's a Chinese text.

Model parameters are under `model`. The most important config here is
`name: TestHierarchicalAttentionModel`, which specifies the model to
use. Detail structure configs are under `net->structure`. Here, the
`max_sen_len` is 32 and `max_doc_len` is 32.

The configs under `solver` are used by solver class, including training optimizer, evaluation metrics and checkpoint saver.
Here the class is `RawSolver`.

## Train a Model

After setting the config file, you are ready to train a model.

```
delta --cmd train_and_eval --config config/han-cls.yml
```

The argument `cmd` tells the platform to train a model and also evaluate
the dev set during the training process.

After enough steps of training, you would find the model checkpoints have been saved to the directory set by `saver->model_path`, which is `exp/han-cls/ckpt` in this case.

## Export a Model

If you would like to export a specific checkpoint to be exported, please set `infer_model_path` in config file. Otherwise, platform will simply find the newest checkpoint under the directory set by `saver->model_path`.

```
delta --cmd export_model --config/han-cls.yml
```

The exported models are in the directory set by config
`service->model_path`, which is `exp/han-cls/service` here.

Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# A Text Classification Usage Example

In this tutorial, we demonstrate a text classification task with an open source dataset: `yahoo answer`.
## Intro

In this tutorial, we demonstrate a text classification task with an
open source dataset: `yahoo answer` for users with installation from
source code..

A complete process contains following steps:

Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@
TF_INCLUDE = TF_INCLUDE.split('-I')[1]

TF_LIB_INC, TF_SO_LIB = tf.sysconfig.get_link_flags()
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.1.dylib',
'-ltensorflow_framework.1')
TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.2.dylib',
'-ltensorflow_framework.2')
TF_LIB_INC = TF_LIB_INC.split('-L')[1]
TF_SO_LIB = TF_SO_LIB.split('-l')[1]

Expand Down Expand Up @@ -100,7 +100,7 @@ def get_requires():
description=SHORT_DESCRIPTION,
long_description=LONG_DESCRIPTION,
long_description_content_type="text/markdown",
version="0.2",
version="0.2.1",
author=AUTHOR,
author_email=AUTHOR_EMAIL,
maintainer=MAINTAINER,
Expand Down
3 changes: 3 additions & 0 deletions build_pip_pkg.sh → tools/install/build_pip_pkg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,8 @@ echo "Uninstall ${PIP_NAME} if exist ..."
pip3 uninstall -y ${PIP_NAME}

echo "Build binary distribution wheel file ..."
BASH_DIR=`dirname "$BASH_SOURCE"`
pushd ${BASH_DIR}/../..
rm -rf build/ ${PIP_NAME}.egg-info/ dist/
python3 setup.py bdist_wheel
popd
Loading