The repo will show how to complete distributional training of image segmentation on Azure ML.
We complete the distributional training in Azure ML by using mutiple nodes and mutiple GPU's per node.
To run the notebook, you need to have/create:
- Create/have Azure subscription
- Create/have Azure storage
- Create/have Azure ML workspace
- (Optional) Create/have Azure ML compute target (4 nodes of STANDARD_NC24) - this can be created in notebook.
We used the data from a kaggle project:
https://www.kaggle.com/c/airbus-ship-detection
The project is for segmenting ships from sattelite images. We used their train_v2 data.
To run the notebook, you need to:
- create a container in Azure storage.
- Upload "train_v2" into the container with folder name "airbus"
We used a package "Fast.AI". It can use less codes to create deep learning model and train the model. For example, we used 3 lines for the image classfication:
data = ImageDataBunch.from_folder(data_folder, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=sz, bs = bs, num_workers=8).normalize(imagenet_stats)
learn = cnn_learner(data, models.resnet34, metrics=dice)
learn.fit_one_cycle(5, slice(1e-5), pct_start=0.8)
Fast.AI supports computer vision (CNN and U-Net), and NLP (transformer). Please find details in their website.
You can install it by:
pip install fastai
Fasi.AI only support the NCCL backend distributional training, which is not natively supported by Azure ML. We used a script "azureml_adapter.py" to help complete the NCCL initialization on Azure ML.