The appearance of transformer structure and the later development of BERT has greatly boosted machine learning's performance on NLP tasks. Now, you can take the advantage of BERT and apply this power pretrained model on you own tasks with Azure Machine Learning AutoML NLP capability.
Currently, our AutoML DNN-NLP service supports three scenarios:
- multi-class classification
- There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample.
- multi-label classification
- There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample.
- Named Entity Recognition (NER)
- There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence.
- Azure subscription. If you don't have an Azure subscription, , sign up to try the free or paid version of Azure Machine Learning today.
- A Workspace with GPUs available. Please check this page for more details of GPU instances provided by Azure
- In order to utilize this new feature with our SDK, please follow the setup instruction on this page. That would be enough to start AutoML DNN-NLP runs with jupyter notebook. If you would like to explore more about our DNN-NLP module, you can do
pip install azureml-automl-dnn-nlp
For a quick start with a live notebook, please refer to this example notebook for a complete AutoML DNN-NLP run for multi-class scenario. You can also learn how to run multi-label and NER tasks with code snippets example and sample data.
For the general procedure of setting AutoML DNN-NLP run, all three scenarios share similar steps:
- Retrieve workspace and create/choose compute instance.
- Prepare and register datasets.
- Set
AutoMLConfig
accordingly. - Submit the run.
- Check result with SDK or UI.
Step 1, 4 and 5 are exactly the same as general AutoML runs. As for step 3, you only need to choose the preferred task
parameter based on your scenario and we will take care of the rest.
For more details and examples for how to set AutoMLConfig
and prepare datasets in required format, please check docs for multi-class, multi-label, and NER
AutoML DNN-NLP service supports 104 different languages. You can specify the language by language code, or let DNN-NLP auto detect the correct language for you. For the full list of supported languages and their language code, please check this page
To select the language, you need to set
from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig(dataset_language='{your language code}')
And then pass it into AutoMLConfig
automl_config = AutoMLConfig("featurization": featurization_config, **other_settings)
To enable auto language detection, you can simply do
automl_config = AutoMLConfig("featurization": "auto", **other_settings)
We are actively working on UI supports to enable everyone to create use AutoML DNN-NLP feature through simple UI operations!
We are working on applying horovod to support stable, high-performance distributed learning for all three scenarios.
For any questions, bugs and requests of new features, please contact us at AutoMLText@microsoft.com