distributed training of yolov5 files (#1896)

* distributed training of yolov5 files * Add files via upload editing connecting to workspace cells. * Add files via upload ran black on the notebook for formatting * Add files via upload edited the handle to workspace (ml_client) * Add files via upload renamed compute_name to "gpu-cluster" * Delete sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/yolov5/datasets directory deleting dataset directory * Delete sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/yolov5/data directory deleted data folder and is uploaded in https://azuremlexamples.blob.core.windows.net/datasets/yolov5/data/ * Add files via upload Changed the input data path to azure blob storage * Delete .pre-commit-config.yaml
Azure · Dec 12, 2022 · 00b4d27 · 00b4d27
1 parent 753c4ca
commit 00b4d27
Show file tree

Hide file tree

Showing 112 changed files with 25,132 additions and 0 deletions.
diff --git a/...flows/sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML.yml b/...flows/sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML.yml
@@ -0,0 +1,94 @@
+# This code is autogenerated.
+# Code is generated by running custom script: python3 readme.py
+# Any manual changes to this file may cause incorrect behavior.
+# Any manual changes will be overwritten if the code is regenerated.
+
+name: sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML
+# This file is created by sdk/python/readme.py.
+# Please do not edit directly.
+on:
+  workflow_dispatch:
+  schedule:
+    - cron: "0 */8 * * *"
+  pull_request:
+    branches:
+      - main
+    paths:
+      - sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/**
+      - .github/workflows/sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML.yml
+      - sdk/python/dev-requirements.txt
+      - infra/**
+      - sdk/python/setup.sh
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - name: check out repo
+      uses: actions/checkout@v2
+    - name: setup python
+      uses: actions/setup-python@v2
+      with: 
+        python-version: "3.8"
+    - name: pip install notebook reqs
+      run: pip install -r sdk/python/dev-requirements.txt
+    - name: azure login
+      uses: azure/login@v1
+      with:
+        creds: ${{secrets.AZUREML_CREDENTIALS}}
+    - name: bootstrap resources
+      run: |
+          echo '${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}';
+          bash bootstrap.sh
+      working-directory: infra
+      continue-on-error: false
+    - name: setup SDK
+      run: |
+          source "${{ github.workspace }}/infra/sdk_helpers.sh";
+          source "${{ github.workspace }}/infra/init_environment.sh";
+          bash setup.sh
+      working-directory: sdk/python
+      continue-on-error: true
+    - name: setup-cli
+      run: |
+          source "${{ github.workspace }}/infra/sdk_helpers.sh";
+          source "${{ github.workspace }}/infra/init_environment.sh";
+          bash setup.sh
+      working-directory: cli
+      continue-on-error: true
+    - name: run jobs/single-step/pytorch/distributed-training-yolov5/objectdetectionAzureML.ipynb
+      run: |
+          source "${{ github.workspace }}/infra/sdk_helpers.sh";
+          source "${{ github.workspace }}/infra/init_environment.sh";
+          bash "${{ github.workspace }}/infra/sdk_helpers.sh" generate_workspace_config "../../.azureml/config.json";
+          bash "${{ github.workspace }}/infra/sdk_helpers.sh" replace_template_values "objectdetectionAzureML.ipynb";
+          [ -f "../../.azureml/config" ] && cat "../../.azureml/config";
+          papermill -k python objectdetectionAzureML.ipynb objectdetectionAzureML.output.ipynb
+      working-directory: sdk/python/jobs/single-step/pytorch/distributed-training-yolov5
+    - name: upload notebook's working folder as an artifact
+      if: ${{ always() }}
+      uses: actions/upload-artifact@v2
+      with:
+        name: objectdetectionAzureML
+        path: sdk/python/jobs/single-step/pytorch/distributed-training-yolov5
+
+    - name: Send IcM on failure
+      if: ${{ failure() && github.ref_type == 'branch' && (github.ref_name == 'main' || contains(github.ref_name, 'release')) }}
+      uses: ./.github/actions/generate-icm
+      with:
+        host: ${{ secrets.AZUREML_ICM_CONNECTOR_HOST_NAME }}
+        connector_id: ${{ secrets.AZUREML_ICM_CONNECTOR_CONNECTOR_ID }}
+        certificate: ${{ secrets.AZUREML_ICM_CONNECTOR_CERTIFICATE }}
+        private_key: ${{ secrets.AZUREML_ICM_CONNECTOR_PRIVATE_KEY }}
+        args: |
+            incident:
+                Title: "[azureml-examples] Notebook validation failed on branch '${{ github.ref_name }}' for notebook 'jobs/single-step/pytorch/distributed-training-yolov5/objectdetectionAzureML.ipynb'"
+                Summary: |
+                    Notebook 'jobs/single-step/pytorch/distributed-training-yolov5/objectdetectionAzureML.ipynb' is failing on branch '${{ github.ref_name }}': ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+                Severity: 4
+                RoutingId: "github://azureml-examples"
+                Status: Active
+                Source:
+                    IncidentId: "jobs/single-step/pytorch/distributed-training-yolov5/objectdetectionAzureML.ipynb[${{ github.ref_name }}]"
diff --git a/sdk/python/README.md b/sdk/python/README.md
@@ -116,6 +116,7 @@ Test Status is for branch - **_main_**
 |jobs|pipelines|[rai_pipeline_sample](jobs/pipelines/2f_rai_pipeline_sample/rai_pipeline_sample.ipynb)|Create sample RAI pipeline|[![rai_pipeline_sample](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-pipelines-2f_rai_pipeline_sample-rai_pipeline_sample.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-pipelines-2f_rai_pipeline_sample-rai_pipeline_sample.yml)|
 |jobs|single-step|[debug-and-monitor](jobs/single-step/debug-and-monitor/debug-and-monitor.ipynb)|Run a Command to train a basic neural network with TensorFlow on the MNIST dataset|[![debug-and-monitor](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-debug-and-monitor-debug-and-monitor.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-debug-and-monitor-debug-and-monitor.yml)|
 |jobs|single-step|[lightgbm-iris-sweep](jobs/single-step/lightgbm/iris/lightgbm-iris-sweep.ipynb)|Run **hyperparameter sweep** on a Command or CommandComponent|[![lightgbm-iris-sweep](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-lightgbm-iris-lightgbm-iris-sweep.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-lightgbm-iris-lightgbm-iris-sweep.yml)|
+|jobs|single-step|[objectdetectionAzureML](jobs/single-step/pytorch/distributed-training-yolov5/objectdetectionAzureML.ipynb)|*no description*|[![objectdetectionAzureML](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-distributed-training-yolov5-objectdetectionAzureML.yml)|
 |jobs|single-step|[distributed-cifar10](jobs/single-step/pytorch/distributed-training/distributed-cifar10.ipynb)|*no description*|[![distributed-cifar10](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-distributed-training-distributed-cifar10.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-distributed-training-distributed-cifar10.yml)|
 |jobs|single-step|[pytorch-iris](jobs/single-step/pytorch/iris/pytorch-iris.ipynb)|Run Command to train a neural network with PyTorch on Iris dataset|[![pytorch-iris](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-iris-pytorch-iris.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-iris-pytorch-iris.yml)|
 |jobs|single-step|[train-hyperparameter-tune-deploy-with-pytorch](jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb)|Train, hyperparameter tune, and deploy a PyTorch model to classify chicken and turkey images to build a deep learning neural network (DNN) based on PyTorch's transfer learning tutorial.|[![train-hyperparameter-tune-deploy-with-pytorch](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-train-hyperparameter-tune-deploy-with-pytorch-train-hyperparameter-tune-deploy-with-pytorch.yml/badge.svg?branch=main)](https://github.com/Azure/azureml-examples/actions/workflows/sdk-jobs-single-step-pytorch-train-hyperparameter-tune-deploy-with-pytorch-train-hyperparameter-tune-deploy-with-pytorch.yml)|

diff --git a/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/README.md b/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/README.md
@@ -0,0 +1,19 @@
+### yolov5-AzureML
+#### Training Yolov5 with custom data in Azure Machine Learning using Python SDK v2
+
+This repo contains 
+
+1) **objectdetectionAzureML.ipynb** -  notebook which helps in implementing pytorch distributed training of YoloV5 models using Azure ML services (python sdk v2). The data input is provided as a yaml file which consist of the location of the data in a specific format as required by the "train.py" file from the https://github.com/ultralytics/yolov5. 
+
+2) The data files provided by ultralytics-yolov5 will download the data and arrange them into different folders as required by the "train.py" file.
+
+3) If you have a custom data set or would like to work on an open dataset which is not part of the data files provided by YoloV5, the repo also contains an example data processing python file, **dataprep_yolov5_format.py**, which helps in preparing the data in format required by yolov5. 
+
+4) Data is downloaded from https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjects.zip
+
+5) The zipped folder contains two folders **annotations** and **images**. The annotations are in xml format is converted to the yolo required format and the split into train and validation folders. The details of the flow are provided in the **DataProcessingYolov5Format_example.png**
+
+6) Subsequently a sample yaml file "fridge.yaml" is also provided which will use these datasets for training the model.
+
+
+
diff --git a/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/classes.txt b/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/classes.txt
@@ -0,0 +1 @@
+["carton", "milk_bottle", "can", "water_bottle"]
diff --git a/...gle-step/pytorch/distributed-training-yolov5/data_processing_yolov5_example.png b/...gle-step/pytorch/distributed-training-yolov5/data_processing_yolov5_example.png
diff --git a/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/dataprep_yolov5_format.py b/sdk/python/jobs/single-step/pytorch/distributed-training-yolov5/dataprep_yolov5_format.py
@@ -0,0 +1,173 @@
+import os
+import urllib.request as request
+from zipfile import ZipFile
+import argparse
+import json
+import numpy as np
+import PIL.Image as Image
+import xml.etree.ElementTree as ET
+import glob
+import random
+import shutil
+
+url = "https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjects.zip"
+
+data_folder = "./yolov5/data"
+print("data_folder found")
+data_file = "odFridgeObjects.zip"
+f_loc = data_folder + "/" + data_file.split(".")[0]
+
+input_dir = "./yolov5/data/odFridgeObjects/annotations/"
+output_dir = "./yolov5/data/odFridgeObjects/labels/"
+image_dir = "./yolov5/data/odFridgeObjects/images/"
+
+processed_folder = "datasets"
+split_ratio = 0.8
+
+
+def downloaddata(url, data_folder, data_file):
+    os.makedirs("data", exist_ok=True)
+    fname = data_folder + "/" + data_file
+    # urllib.request.urlretrieve(url, filename=fname)
+    request.urlretrieve(url, filename=fname)
+    with ZipFile(fname, "r") as zip:
+        print("extracting files...")
+        zip.extractall(path=data_folder)
+        print("done")
+    # os.remove(data_file)
+
+
+# ************************** xml2yolo from https://gist.github.com/wfng92/c77c822dad23b919548049d21d4abbb8#file-xml2yolo-py ****************
+
+
+def xml_to_yolo_bbox(bbox, w, h):
+    # xmin, ymin, xmax, ymax
+    x_center = ((bbox[2] + bbox[0]) / 2) / w
+    y_center = ((bbox[3] + bbox[1]) / 2) / h
+    width = (bbox[2] - bbox[0]) / w
+    height = (bbox[3] - bbox[1]) / h
+    return [x_center, y_center, width, height]
+
+
+def yolo_to_xml_bbox(bbox, w, h):
+    # x_center, y_center width heigth
+    w_half_len = (bbox[2] * w) / 2
+    h_half_len = (bbox[3] * h) / 2
+    xmin = int((bbox[0] * w) - w_half_len)
+    ymin = int((bbox[1] * h) - h_half_len)
+    xmax = int((bbox[0] * w) + w_half_len)
+    ymax = int((bbox[1] * h) + h_half_len)
+    return [xmin, ymin, xmax, ymax]
+
+
+def xml2yolo(input_dir, output_dir, image_dir):
+    classes = []
+    # create the labels folder (output directory)
+    dirExists(output_dir)
+    # identify all the xml files in the annotations folder (input directory)
+    files = glob.glob(os.path.join(input_dir, "*.xml"))
+    # loop through each
+    for fil in files:
+        basename = os.path.basename(fil)
+        filename = os.path.splitext(basename)[0]
+        # check if the label contains the corresponding image file
+        if not os.path.exists(os.path.join(image_dir, f"{filename}.jpg")):
+            print(f"{filename} image does not exist!")
+            continue
+        result = []
+        # parse the content of the xml file
+        tree = ET.parse(fil)
+        root = tree.getroot()
+        width = int(root.find("size").find("width").text)
+        height = int(root.find("size").find("height").text)
+
+        for obj in root.findall("object"):
+            label = obj.find("name").text
+            # check for new classes and append to list
+            if label not in classes:
+                classes.append(label)
+            index = classes.index(label)
+            pil_bbox = [int(x.text) for x in obj.find("bndbox")]
+            yolo_bbox = xml_to_yolo_bbox(pil_bbox, width, height)
+            # convert data to string
+            bbox_string = " ".join([str(x) for x in yolo_bbox])
+            result.append(f"{index} {bbox_string}")
+
+        if result:
+            # generate a YOLO format text file for each xml file
+            with open(
+                os.path.join(output_dir, f"{filename}.txt"), "w", encoding="utf-8"
+            ) as f:
+                f.write("\n".join(result))
+    # generate the classes file as reference
+    with open("classes.txt", "w", encoding="utf8") as f:
+        f.write(json.dumps(classes))
+
+
+# *********************** rearranging folders for yolov5 https://stackoverflow.com/questions/66238786/splitting-image-based-dataset-for-yolov3 *******************
+
+
+def dirExists(name):
+    if not os.path.isdir(name):
+        os.mkdir(name)
+
+
+def move(paths, folder):
+    for p in paths:
+        shutil.copy(p, folder)
+
+
+def formatFolderStruct(f_loc, processed_folder, split_ratio):
+
+    # Get all paths to your images files and text files
+
+    PATH = f_loc + "/"
+    img_paths = glob.glob(PATH + "images/*.jpg")
+    txt_paths = glob.glob(PATH + "labels/*.txt")
+
+    # Calculate number of files for training, validation
+
+    data_size = len(img_paths)
+    r = split_ratio
+    train_size = int(data_size * r)
+
+    # Now split them
+    train_img_paths = img_paths[:train_size]
+    train_txt_paths = txt_paths[:train_size]
+    valid_img_paths = img_paths[train_size:]
+    valid_txt_paths = txt_paths[train_size:]
+
+    # Move them to train, valid folders
+    dirExists("./yolov5/datasets")
+    # newpath='datasets/fridgedata/'
+    dirExists("./yolov5/datasets/fridgedata/")
+    newpath_images = "./yolov5/datasets/fridgedata/images"
+    dirExists(newpath_images)
+    newpath_labels = "./yolov5/datasets/fridgedata/labels"
+    dirExists(newpath_labels)
+
+    # newpath='datasets/fridgedata'
+    train_images = newpath_images + "/train/"
+    valid_images = newpath_images + "/valid/"
+    train_label = newpath_labels + "/train/"
+    valid_label = newpath_labels + "/valid/"
+
+    dirExists(train_images)
+    dirExists(valid_images)
+    dirExists(train_label)
+    dirExists(valid_label)
+
+    move(train_img_paths, train_images)
+    move(train_txt_paths, train_label)
+    move(valid_img_paths, valid_images)
+    move(valid_txt_paths, valid_label)
+
+
+# downloaddata(url, data_folder,data_file)
+# print("data downloaded")
+
+# xml2yolo(input_dir,output_dir,image_dir)
+# print("formated to yolo")
+
+formatFolderStruct(f_loc, processed_folder, split_ratio)
+print("formated folder structure")