Skip to content

Latest commit

 

History

History
 
 

automl-dnn-nlp

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

AutoML DNN-NLP

Overview

The appearance of transformer structure and the later development of BERT has greatly boosted machine learning's performance on NLP tasks. Now, you can take the advantage of BERT and apply this power pretrained model on you own tasks with Azure Machine Learning AutoML NLP capability.

Currently, our AutoML DNN-NLP service supports three scenarios:

  • multi-class classification
    • There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample.
  • multi-label classification
    • There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample.
  • Named Entity Recognition (NER)
    • There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence.

Installation and set up

  • Azure subscription. If you don't have an Azure subscription, , sign up to try the free or paid version of Azure Machine Learning today.
  • A Workspace with GPUs available. Please check this page for more details of GPU instances provided by Azure
  • In order to utilize this new feature with our SDK, please follow the setup instruction on this page. That would be enough to start AutoML DNN-NLP runs with jupyter notebook. If you would like to explore more about our DNN-NLP module, you can do pip install azureml-automl-dnn-nlp

Getting started

Quick Start

For a quick start with a live notebook, please refer to this example notebook for a complete AutoML DNN-NLP run for multi-class scenario. You can also learn how to run multi-label and NER tasks with code snippets example and sample data.

General procedure

For the general procedure of setting AutoML DNN-NLP run, all three scenarios share similar steps:

  1. Retrieve workspace and create/choose compute instance.
  2. Prepare and register datasets.
  3. Set AutoMLConfig accordingly.
  4. Submit the run.
  5. Check result with SDK or UI.

Step 1, 4 and 5 are exactly the same as general AutoML runs. As for step 3, you only need to choose the preferred task parameter based on your scenario and we will take care of the rest.

For more details and examples for how to set AutoMLConfig and prepare datasets in required format, please check docs for multi-class, multi-label, and NER

Other features

Multilingual support

AutoML DNN-NLP service supports 104 different languages. You can specify the language by language code, or let DNN-NLP auto detect the correct language for you. For the full list of supported languages and their language code, please check this page

To select the language, you need to set

from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig(dataset_language='{your language code}')

And then pass it into AutoMLConfig

automl_config = AutoMLConfig("featurization": featurization_config, **other_settings)

To enable auto language detection, you can simply do

automl_config = AutoMLConfig("featurization": "auto", **other_settings)

Coming Soon

Start training with UI

We are actively working on UI supports to enable everyone to create use AutoML DNN-NLP feature through simple UI operations!

Distributed Training

We are working on applying horovod to support stable, high-performance distributed learning for all three scenarios.

Contact Us

For any questions, bugs and requests of new features, please contact us at AutoMLText@microsoft.com