Merge pull request #161 from didi/update_pip

Update pip
Delta-ML · Nov 2, 2019 · bd8be56 · bd8be56
2 parents 4efc814 + c4a389b
commit bd8be56
Show file tree

Hide file tree

Showing 11 changed files with 357 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -49,12 +49,38 @@ It helps you to train, develop, and deploy NLP and/or speech models, featuring:
 
 ## Installation
 
-### Quick Installation
+### Installation by pip
 
-We use [conda](https://conda.io/) to install required packages. Please [install conda](https://conda.io/en/latest/miniconda.html) if you do not have it in your system. 
+We provide the pip install support for our `nlp` version of DELTA for
+**pure NLP users** and the **quick demo of the features**. 
 
-We provide two options to install DELTA, `nlp` version or `full` version. 
-`nlp` version needs minimal requirements and only installs NLP related packages: 
+**Note**: Users can still use both `nlp` and `speech` tasks by installing
+from our source code.
+
+We recommend to use [conda](https://conda.io/) or
+[virtualenv](https://virtualenv.pypa.io/en/latest/) to install DELTA
+from pip.
+
+Before our installation, make sure you have installed the Tensorflow
+(2.0.0 is required now).
+
+```bash
+pip install delta-nlp
+```
+
+Follow the usage steps here if you install by pip:
+[A Text Classification Usage Example for pip users](docs/tutorials/training/text_class_pip_example.md)
+
+### Installation from Source Code
+
+To install from the source code, We use [conda](https://conda.io/) to
+install required packages. Please
+[install conda](https://conda.io/en/latest/miniconda.html) if you do not
+have it in your system.
+
+Also, we provide two options to install DELTA, `nlp` version or `full`
+version. `nlp` version needs minimal requirements and only installs NLP
+related packages:
 
 ```shell
 # Run the installation script for NLP version, with CPU or GPU.

diff --git a/docs/index.rst b/docs/index.rst
@@ -18,6 +18,7 @@ Welcome to DELTA's documentation!
    :caption: Installation
    :name: sec-install
 
+   installation/pick_install
    installation/using_docker
    installation/manual_setup
    installation/deltann_compile
@@ -31,7 +32,8 @@ Welcome to DELTA's documentation!
 
    tutorials/training/egs
    tutorials/training/speech_features
-   tutorials/training/text_class_example
+   tutorials/training/text_class_pip_example
+   tutorials/training/text_class_source_example
    tutorials/training/data/asr_example
    tutorials/training/data/emotion-speech-cls
    tutorials/training/data/kws-cls

diff --git a/docs/installation/install_from_source.md b/docs/installation/install_from_source.md
@@ -0,0 +1,48 @@
+# Install from the source code
+
+To install from the source code, We use [conda](https://conda.io/) to
+install required packages. Please
+[install conda](https://conda.io/en/latest/miniconda.html) if you do not
+have it in your system.
+
+Also, we provide two options to install DELTA, `nlp` version or `full`
+version. `nlp` version needs minimal requirements and only installs NLP
+related packages:
+
+```shell
+# Run the installation script for NLP version, with CPU or GPU.
+cd tools
+./install/install-delta.sh nlp [cpu|gpu]
+```
+
+**Note**: Users from mainland China may need to set up conda mirror sources, see [./tools/install/install-delta.sh](tools/install/install-delta.sh) for details.
+
+If you want to use both NLP and speech packages, you can install the `full` version. The full version needs [Kaldi](https://github.com/kaldi-asr/kaldi) library, which can be pre-installed or installed using our installation script.
+
+```shell
+cd tools
+# If you have installed Kaldi
+KALDI=/your/path/to/Kaldi ./install/install-delta.sh full [cpu|gpu]
+# If you have not installed Kaldi, use the following command
+# ./install/install-delta.sh full [cpu|gpu]
+```
+
+To verify the installation, run:
+
+```shell
+# Activate conda environment
+conda activate delta-py3.6-tf2.0.0
+# Or use the following command if your conda version is < 4.6
+# source activate delta-py3.6-tf2.0.0
+
+# Add DELTA enviornment
+source env.sh
+
+# Generate mock data for text classification.
+pushd egs/mock_text_cls_data/text_cls/v1
+./run.sh
+popd
+
+# Train the model
+python3 delta/main.py --cmd train_and_eval --config egs/mock_text_cls_data/text_cls/v1/config/han-cls.yml
+```
diff --git a/docs/installation/pick_installation.md b/docs/installation/pick_installation.md
@@ -0,0 +1,37 @@
+# Pick a installation way for yourself
+
+## Multiple installation ways
+
+Currently we support multiple ways to install `DELTA`. Please choose one
+installation for yourself according to your usage and needs.
+
+## Install by pip
+
+For the **quick demo of the features** and **pure NLP users**, you can
+install the `nlp` version of `DELTA` by pip with a simple command:
+
+```bash
+pip install delta-nlp
+```
+
+Check here for
+[the tutorial for usage of `delta-nlp`](tutorials/training/text_class_pip_example).
+
+**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
+MacOS or Linux.
+
+## Install from the source code
+
+For users who need **whole function of delta** (including speech and
+nlp), you can clone our repository and install from the source code.
+
+Please follow the steps here: [Install from the source code](installation/install_from_source)
+
+## Use docker
+
+For users who are **capable of use docker**, you can pull our images
+directly. This maybe the best choice for docker users.
+
+Please follow the steps here:
+[Installation using Docker](installation/using_docker)
+
diff --git a/docs/installation/using_docker.md b/docs/installation/using_docker.md
@@ -1,4 +1,4 @@
-# Intallation using Docker
+# Installation using Docker
 
 You can directly pull the pre-build docker images for DELTA and DELTANN. We have created the following docker images:
 

diff --git a/docs/installation/wheel_build.md b/docs/installation/wheel_build.md
@@ -0,0 +1,71 @@
+# How to build the wheel file
+
+## Intro
+
+In order to provide users a simpler way to install `Delta`, we need to
+build the Wheel file `.whl` and upload this wheel file to Pypi's
+website. Once we uploaded the wheel file, all that users need to do is
+typing `pip install delta-nlp`.
+
+**Notice**: installation by pip only supports NLP tasks now. If you need the
+full version of the Delta (with speech tasks), you should install the
+platform from source.
+
+## Prepare
+
+Before build the wheel file, you need to install the `DELTA` before.
+
+```bash
+bash ./tools/install/install-delta.sh nlp gpu
+```
+
+For linux wheel building, you will need the docker image:
+
+```bash
+docker pull didi0speech0nlu/delta_pip:tf2_ub16
+```
+
+## Start to build
+
+### MacOS
+
+```bash
+bash ./tools/install/build_pip_pkg.sh
+```
+
+The generated wheel will be under `dist` like
+`delta_nlp-0.2-cp36-cp36m-macosx_10_7_x86_64.whl`
+
+### Linux
+
+Wheel building in linux is more complicated. You need to run a docker 
+
+```bash
+docker run --name delta_pip_tf2_u16 -it -v $PWD:/delta  tensorflow/tensorflow:custom-op-ubuntu16 /bin/bash
+```
+
+In the docker environment, run:
+
+```bash
+bash ./tools/install/build_pip_pkg.sh
+```
+
+The generated wheel will be under `dist` like
+`delta_nlp-0.2-cp36-cp36m-linux_x86_64.whl`
+
+Repair the wheel file for multiple linux platform support:
+
+```bash
+auditwheel repair dist/xxx.whl
+```
+
+The final wheel will be under `wheelhouse` like
+`delta_nlp-0.2-cp36-cp36m-manylinux1_x86_64.whl`.
+
+## Upload
+
+After building the wheel file, upload these files to Pypi:
+
+```
+twine upload xxx.whl
+```
diff --git a/docs/tutorials/training/text_class_pip_example.md b/docs/tutorials/training/text_class_pip_example.md
@@ -0,0 +1,124 @@
+# A Text Classification Usage Example for pip users
+
+## Intro
+
+In this tutorial, we demonstrate a text classification task with a
+demo mock dataset **for users install by pip**.
+
+A complete process contains following steps:
+
+- Prepare the data set.
+- Develop custom modules (optional).
+- Set the config file.
+- Train a model.
+- Export a model
+
+Please clone our demo repository:
+
+```bash
+git clone --depth 1 https://github.com/applenob/delta_demo.git
+cd ./delta_demo
+```
+
+## A quick review for installation
+
+If you haven't install `delta-nlp`, please:
+
+```bash
+pip install delta-nlp
+```
+
+**Requirements**: You need `tensorflow==2.0.0` and `python==3.6` in
+MacOS or Linux.
+
+## Prepare the Data Set
+
+run the script: 
+
+```
+./gen_data.sh
+```
+
+The generated data are in directory: `data`. 
+
+The generated data for text classification should be in the standard format for text classification, which is "label\tdocument".
+
+## Develop custom modules (optional)
+
+Please make sure we don't have modules you need before you decide to
+develop your own modules.
+
+```python
+@registers.model.register
+class TestHierarchicalAttentionModel(HierarchicalModel):
+  """Hierarchical text classification model with attention."""
+
+  def __init__(self, config, **kwargs):
+    super().__init__(config, **kwargs)
+
+    logging.info("Initialize HierarchicalAttentionModel...")
+
+    self.vocab_size = config['data']['vocab_size']
+    self.num_classes = config['data']['task']['classes']['num_classes']
+    self.use_true_length = config['model'].get('use_true_length', False)
+    if self.use_true_length:
+      self.split_token = config['data']['split_token']
+    self.padding_token = utils.PAD_IDX
+```
+
+You need to register this module file path in the config file
+`config/han-cls.yml` (relative to the current work directory).
+
+```yml
+custom_modules:
+  - "test_model.py"
+```
+
+## Set the Config File
+
+The config file of this example is `config/han-cls.yml`
+
+In the config file, we set the task to be `TextClsTask` and the model to be `TestHierarchicalAttentionModel`.
+
+### Config Details
+
+The config is composed by 3 parts: `data`, `model`, `solver`.
+
+Data related configs are under `data`. 
+You can set the data path (including training set, dev set and test set). 
+The data process configs can also be found here (mainly under `task`). 
+For example, we set `use_dense: false` since no dense input was used here. 
+We set `language: chinese` since it's a Chinese text. 
+
+Model parameters are under `model`. The most important config here is
+`name: TestHierarchicalAttentionModel`, which specifies the model to
+use. Detail structure configs are under `net->structure`. Here, the 
+`max_sen_len` is 32 and `max_doc_len` is 32.
+
+The configs under `solver` are used by solver class, including training optimizer, evaluation metrics and checkpoint saver. 
+Here the class is `RawSolver`.
+
+## Train a Model
+
+After setting the config file, you are ready to train a model.
+
+```
+delta --cmd train_and_eval --config config/han-cls.yml
+```
+
+The argument `cmd` tells the platform to train a model and also evaluate
+the dev set during the training process.
+
+After enough steps of training, you would find the model checkpoints have been saved to the directory set by `saver->model_path`, which is `exp/han-cls/ckpt` in this case.
+
+## Export a Model
+
+If you would like to export a specific checkpoint to be exported, please set `infer_model_path` in config file. Otherwise, platform will simply find the newest checkpoint under the directory set by `saver->model_path`.
+
+```
+delta --cmd export_model --config/han-cls.yml
+```
+
+The exported models are in the directory set by config
+`service->model_path`, which is `exp/han-cls/service` here.
+
diff --git a/.../tutorials/training/text_class_example.md → ...als/training/text_class_source_example.md b/.../tutorials/training/text_class_example.md → ...als/training/text_class_source_example.md
@@ -1,6 +1,10 @@
 # A Text Classification Usage Example
 
-In this tutorial, we demonstrate a text classification task with an open source dataset: `yahoo answer`.
+## Intro
+
+In this tutorial, we demonstrate a text classification task with an
+open source dataset: `yahoo answer` for users with installation from
+source code..
 
 A complete process contains following steps:
 

diff --git a/setup.py b/setup.py
@@ -15,8 +15,8 @@
 TF_INCLUDE = TF_INCLUDE.split('-I')[1]
 
 TF_LIB_INC, TF_SO_LIB = tf.sysconfig.get_link_flags()
-TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.1.dylib',
-                              '-ltensorflow_framework.1')
+TF_SO_LIB = TF_SO_LIB.replace('-l:libtensorflow_framework.2.dylib',
+                              '-ltensorflow_framework.2')
 TF_LIB_INC = TF_LIB_INC.split('-L')[1]
 TF_SO_LIB = TF_SO_LIB.split('-l')[1]
 
@@ -100,7 +100,7 @@ def get_requires():
     description=SHORT_DESCRIPTION,
     long_description=LONG_DESCRIPTION,
     long_description_content_type="text/markdown",
-    version="0.2",
+    version="0.2.1",
     author=AUTHOR,
     author_email=AUTHOR_EMAIL,
     maintainer=MAINTAINER,

diff --git a/build_pip_pkg.sh → tools/install/build_pip_pkg.sh b/build_pip_pkg.sh → tools/install/build_pip_pkg.sh
@@ -6,5 +6,8 @@ echo "Uninstall ${PIP_NAME} if exist ..."
 pip3 uninstall -y ${PIP_NAME}
 
 echo "Build binary distribution wheel file ..."
+BASH_DIR=`dirname "$BASH_SOURCE"`
+pushd ${BASH_DIR}/../..
 rm -rf build/ ${PIP_NAME}.egg-info/ dist/
 python3 setup.py bdist_wheel
+popd