Skip to content

Commit

Permalink
Pneumonia xray example (#164)
Browse files Browse the repository at this point in the history
* refactor components to use dpv2 + remove unnecessary environments

* working dpv2 pipeline

* refactor scripts with right inputs and outputs

* fix code path

* implement fake outputs

* fix paths

* fix imports

* fix args of aggregation script

* add note, fix component args

* add chekcpoint arg

* linting

* linting

* remove sdkv2 folder

* add argparse to submit script

* add docstring

* add docstring

* linting

* linting

* add staging branch to build

* rollback changes to build, leave it for another PR

* remove logging lien

* remove custom uuid

* linting

* add docstring to custom path function

* polish docstring

* rename model_silo_X to input_silo_X

* rename output

* rename agg output

* Improve auto-provisioning resources (#35) (#36)

* docker file stub

* move docker file, implement feedback

* login before setting subscription

* login before setting subscription

* use default k8s version

* pin latest version since default won't work

* remove executionpolicy part, other small updates

* clarify to change job file _in docker filesystem_

* login before setting subscription

* formatting

* \ -> /

* install azureml-core in docker file

* propagate changes to section 7

* fix dataset creation command

Co-authored-by: thomasp-ms <XXX@me.com>

Co-authored-by: thomasp-ms <XXX@me.com>

* Refactor folder structure (#37)

* `plan` -> `docs`

* 'plan' -> 'docs'

* 'automated_provisioning' -> 'mlops'

* 'fl_arc_k8s' -> 'examples'

Co-authored-by: thomasp-ms <XXX@me.com>

* auto provisioning - vanilla internal silos (#41)

* split internal and external provisioning

* adjust directories after internal/external split

* introduce overall mlops readme

* first stab

* remove useless comment and my alias

Co-authored-by: thomasp-ms <XXX@me.com>

* Perform real FL training on the MNIST dataset

Added component files customized for MNIST dataset. Set the setup for 3
silo having their own compute and datastore.
git config --global user.email "you@example.com"

* refine components and add logs

* maintain consistency b/w config files

* add requirement and env files

* add requirement and env files

* rmv redundant dependencies, rename conda envs

* Correct epoch default value

* point data asset instead of underlying URI

* beef up orchestrator cluster (#46)

Co-authored-by: thomasp-ms <XXX@me.com>

* Provision CPUs for silos (instead of GPUs) (#47)

* beef up orchestrator cluster

* gpu -> cpu

Co-authored-by: thomasp-ms <XXX@me.com>

* add preprocessing comp description, fix typo and correct default datastore name

* add integration validation test - build

* update readme file

* Move logger to the maion if block, add pytorch channel in the conda env
yaml and move readme to the docs folder

* code reformatting using black

* add documentation to run an FL experiment

* add more intuitive path for aggr output dir

* Merge changes

* add more intuitive agg output dir path

* reformat using black

* add iteration2 branch for PR build testing

* reformat date and pass kwargs instead in the getUniqueIdentifier fn

* working submit

* working factory submit

* linting

* move component path

* add soft validation

* add soft validation

* Add basic tests on config

* linting

* working bicep deployment for vanilla demo

* proper orchestrator script, double containers

* fix name

* docstring

* docstring

* rollback to using only 1 container

* align naming convention

* instructions

* working submit

* set up permission model

* working orch perms

* wonky perms assignment

* working role assignments

* remove old perm model

* working except silo2orch

* fix typo

* working submit with config

* add sku as param

* use R/W for now

* fix submit to align with bicep provisioning demo

* linting

* remove dataset files

* fix docstring on permission model

* write draft docs with homepage, align structure, remove requirements, ensure demo documented

* rollback change to req

* change factory to use custom model type during validation

* linting

* Display metrics at the pipeline level (#68)

* Fix optional input yaml and mlflow log bugs (#59)

* refactor components to use dpv2 + remove unnecessary environments

* working dpv2 pipeline

* refactor scripts with right inputs and outputs

* fix code path

* implement fake outputs

* fix paths

* fix imports

* fix args of aggregation script

* add note, fix component args

* add chekcpoint arg

* linting

* linting

* remove sdkv2 folder

* add argparse to submit script

* add docstring

* add docstring

* linting

* linting

* add staging branch to build

* rollback changes to build, leave it for another PR

* remove logging lien

* remove custom uuid

* linting

* add docstring to custom path function

* polish docstring

* rename model_silo_X to input_silo_X

* rename output

* rename agg output

* Improve auto-provisioning resources (#35) (#36)

* docker file stub

* move docker file, implement feedback

* login before setting subscription

* login before setting subscription

* use default k8s version

* pin latest version since default won't work

* remove executionpolicy part, other small updates

* clarify to change job file _in docker filesystem_

* login before setting subscription

* formatting

* \ -> /

* install azureml-core in docker file

* propagate changes to section 7

* fix dataset creation command

Co-authored-by: thomasp-ms <XXX@me.com>

Co-authored-by: thomasp-ms <XXX@me.com>

* Refactor folder structure (#37)

* `plan` -> `docs`

* 'plan' -> 'docs'

* 'automated_provisioning' -> 'mlops'

* 'fl_arc_k8s' -> 'examples'

Co-authored-by: thomasp-ms <XXX@me.com>

* auto provisioning - vanilla internal silos (#41)

* split internal and external provisioning

* adjust directories after internal/external split

* introduce overall mlops readme

* first stab

* remove useless comment and my alias

Co-authored-by: thomasp-ms <XXX@me.com>

* Perform real FL training on the MNIST dataset

Added component files customized for MNIST dataset. Set the setup for 3
silo having their own compute and datastore.
git config --global user.email "you@example.com"

* refine components and add logs

* maintain consistency b/w config files

* add requirement and env files

* add requirement and env files

* rmv redundant dependencies, rename conda envs

* Correct epoch default value

* point data asset instead of underlying URI

* beef up orchestrator cluster (#46)

Co-authored-by: thomasp-ms <XXX@me.com>

* Provision CPUs for silos (instead of GPUs) (#47)

* beef up orchestrator cluster

* gpu -> cpu

Co-authored-by: thomasp-ms <XXX@me.com>

* add preprocessing comp description, fix typo and correct default datastore name

* add integration validation test - build

* update readme file

* Move logger to the maion if block, add pytorch channel in the conda env
yaml and move readme to the docs folder

* code reformatting using black

* add documentation to run an FL experiment

* add more intuitive path for aggr output dir

* Merge changes

* Accomodate optional input chnages and switch from mlflow autologging to manual logging

* code style

* change optional inputs syntax

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>
Co-authored-by: Jeff Omhover <jf.omhover@gmail.com>
Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com>
Co-authored-by: thomasp-ms <XXX@me.com>

* Make changes to display all metrics at the pipeline level

* Log preprocessing metadata in mlflow

* linting

* Pass client as an arg

* Fix typo, rmv name from silo config, metric naming convention, and add
metric identifier in the preprocessing component

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>
Co-authored-by: Jeff Omhover <jf.omhover@gmail.com>
Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com>
Co-authored-by: thomasp-ms <XXX@me.com>

* Remove redundant files from the mlops directory (#69)

* Remove internal & external dir as provisioning is taken care by bicep

* keep mnist data files

* rename demo script (#71)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Unified documentation (#72)

* WIP: unifying docs

* Remove redundant doc file. We can always revisit if needed

* FL concepts will be covered in the glossary doc

* Remove internal and external silos docs as the code will be re-written in bicep

* provide comprehensive documentation

* rename file

* refine docs

* refine docs and rename fl_cross_silo_basic to fl_cross_silo_native

* simplify sandbox script

* simplify script, ensure it works

* align config of native submit

* align naming conventions between scripts, reinject rbac role

* create test job for quickly debugging provisioning issues

* fix tests

* linting

* move permissions to storage

* align config with bicep scrits

* Document the metrics panel of the pipeline overview in the quickstart (#76)

* WIP: unifying docs

* Remove redundant doc file. We can always revisit if needed

* FL concepts will be covered in the glossary doc

* Remove internal and external silos docs as the code will be re-written in bicep

* provide comprehensive documentation

* rename file

* refine docs

* refine docs and rename fl_cross_silo_basic to fl_cross_silo_native

* document the metrics/pipeline panel in the quickstart

* linting

* add docstrings and disclaimers

* Add instructions on how to create a custom graph  (#78)

* WIP: unifying docs

* Remove redundant doc file. We can always revisit if needed

* FL concepts will be covered in the glossary doc

* Remove internal and external silos docs as the code will be re-written in bicep

* provide comprehensive documentation

* rename file

* refine docs

* refine docs and rename fl_cross_silo_basic to fl_cross_silo_native

* document the metrics/pipeline panel in the quickstart

* add instructions on how to create a custom graph

* do better comments

* Refine native code (#82)

* fix silo name

* log only one datapoint per iteration for an aggregated metrics

* Align terminology for iteration/round/num_rounds

* linting

* use storage blob data contibutor

* add demoBaseName to guid name of role deployment (#85)

Co-authored-by: thomasp-ms <XXX@me.com>

* use id list, add listkeys builtin

* rename and dissociate orchestrator in resource + orchestrator

* separate orchestrator script

* draft sandbox setup

* make silo script distinct

* Update orchestrator_open.bicep

* Update internal_blob_open.bicep

* remove comments

* align hello world example with new naming conventions

* ensure uai assignments are created AFTER storage is created

* linting

* enforce precedence

* merge from secure branch

* use different regions, limit size of account

* reduce to 3 regions, add keys to guid

* substring

* align config

* do not use model

* Add msi version of scripts

* sandbox main can switch between uai and msi

* fix name

* linting

* linting

* implement ignore param, hotfix model with startswith

* Address my own comments on Jeff's PR (#96)

* remove magic number

* little improvements on some comments

* remove unused files

* put dash replacement next to length check

* don't necessarily assume USER AI

* UAI -> XAI

* revert previous UAI -> XAI changes

* move length check next to dash replacement

* typo

* try movind the dependsOn's

* RAGRS -> LRS

* revert dependsON changes

* revert another small change in a comment

Co-authored-by: thomasp-ms <XXX@me.com>

* align config of both submit scripts

* Make distinction between on-off and repeatable provisioning scripts (#99)

* clarify the role needed

* remove "custom role" line

* adjust locations

* use existing rg if not Owner of the sub

* clarify "Secure" setup

* add usage instructions in docstring

* explain what scripts are one-off (vs repeatable)

Co-authored-by: thomasp-ms <XXX@me.com>

* Align round/iteration terminology with the native code (#103)

* rename parameter in config file

* keep iterations instead of rounds

* round -> iteration

Co-authored-by: thomasp-ms <XXX@me.com>

* get all goodies from secureprovisioning branch wip

* get all goodies from secureprovisioning branch wip

* get all goodies from secureprovisioning branch wip

* align both submits to work

* add optional test

* rename native to literal

* add getting started in readme, introduce emojis

* change person

* remove emojs

* Propose rewriting of readme to highlight motivation first (#110)

* propose rewriting of readme to highlight motivation first

* minor edit

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Update README.md

* Update quickstart to mention rg clean-up

* Update quickstart.md

* Update quickstart.md

* Update quickstart.md

* Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120)

* Update quickstart to lower header (hotfix) (#117)
* add arm templates, add button in quickstart
* switch to releasebranchlink

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Add subscription id, resource group and workspace name as CLI args (#122)

* add more cli args

* code style

* code style

* update quickstart doc

* update readme

* Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Continuous Integration Tests (#119)

* take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline

* change path

* Test azure creds in the github workflow

* reformatting

* add pipeline validation and testing workflow

* add permissions

* add permissions

* check only certain dir to trigger workflows

* add soft validation for any iteration branch PR

* add provisioning script test

* testing

* create rg

* create rg

* change compute for testing

* change demoname

* delete old rg

* change demoname

* add demobasename and aml ws name as github secrets

* random demo base name

* auto generate random base name

* random demo base name

* adjust random num length

* add vnet sandbox test

* rmv dependency b/w jobs

* submit various pipelines

* change execution graph path

* add cli args in the factory code

* change compute for testing

* ignore validation - factory

* create custom action

* correct path

* correct path

* add shell in the github action

* create github actions and take required values as input params

* add shell

* add wait condition

* add logs

* linting

* correct rg name

* add azure ml extension

* handle ml extension installation error.

* add release branch test cases

* add script to delete run history

* cronjob test

* cronjob test

* checkout branch

* test run history deletion script

* test run history deletion script

* test run history deletion script

* azure login

* date format change

* remove double quotes

* date format change

* archive run history script tested

* Add vnet-based provisioning options to cookbook (#128)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Make deployment name unique in our github actions (#135)

* set unique name for deployments
* add attempt to deployment name

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Refactor compute/storage scripts to be independent (#132)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Provide motivation in provisioning docs for using service endpoints (#136)

* add motivation for service endpoints
* add link

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Refresh provisioning arm buttons with latest from bicep (#139)

* align names of directories
* rebuild all arm

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Update silo_vnet_newstorage.md (#141)

* Add Bicep build vs ARM template diff test  (#140)

* Add diff test for bicep vs arm

* Debug

* Debug

* fix syntax error

* redirect build output to stdout

* coorect path

* trigger arm template test when pushing changes to main branch from release* branch

* remove redundant logs

* Add "open aks with cc" provision tutorial and bicep scripts (#138)

* implement bicep scripts to provision open aks with cc
* add aks cc tutorial
* build arm and add in branch
* add button

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Provide script + tutorial to attach pair with an existing storage (#142)

* provision datastore with existing storage
* add arm for existing storage, add docs
* add link in readme

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* add latest arm templates to diff build (#145)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146)

* add jumpbox script with tutorial
* add template to diff build

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Update jumpbox_cc.md (#147)

* update tutorials for silos to integrate feedback (#149)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Implement option to turn orchestrator storage fully private (behind PLE) (#150)


Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Tutorial on how to adapt native and factory code to write FL experiments.  (#100)

* WIP: add general information about the factory code

* moving factory-tutorial to another file

* add scenarios

* add instructions on how to adapt literal code

* rename file

* add general info and fix typos

* Jeff's feedback

* Apply code clean-up to provision scripts before bug bash (#148)

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>

* Instructions for provisioning external silos (#101)

* very first stab, far from done

* non-secure native job using the on-prem k8s

* use on-prem silos in example factory job

* Revert "very first stab, far from done"

This reverts commit e00d882.

* Revert "use on-prem silos in example factory job"

This reverts commit e2ef884.

* Revert "non-secure native job using the on-prem k8s"

This reverts commit 923e5f3.

* restore doc stub

* Make Git ignore resources for test jobs

* fix gitignore

* typo in comment

* steps A through D

* 2 typos

* move to subdir

* fix workspace creation

* add orchestrator part, role, and timeline

* last commit before PR

* adjust to new open_azureml_workspace.bicep

* first wave after Jeff's comments

* address jeff's comments

* typo

* light trims

Co-authored-by: thomasp-ms <XXX@me.com>

* bump up every title

* skeleton

* first attempt at data prep like Harmke

* change secret name

* wrong secret name

* remove separate unzip

* change clients, create silo data assets

* different names for silo data assets, duh

* cleanup

* adjust secret name in doc

* .

* use latest literal code

* align environment with literal

* base on latest component

* one dataset, comment out 2 unused args (for now)

* introduce new arguments

* reflect modified args in component spec

* remove unused arg from config

* start hooking up to Harmke's trainer

* initialize PTLearner and include in run.py

* use same values as Harmke for epochs and lr

* attributes with _, start implementing local_train

* add loggings, add test(), fix device_

* train_loader_

* align _'s

* fix transform bug

* remove unused constants

* use proper model in aggregation code

* removed unused file

* remove unused code and arguments, logging to DEBUG

* restore `metrics_prefix` parameter

* finish restoring `metrics_prefix`

* do not duplicate model code

* revert dedup attempt

* improve docstrings and descriptions

* change experiment name

* change pipeline name and docstring

* cite sources, remove wrongly added licenses

* italics

* black

Co-authored-by: Jeff Omhover <jeomhove@microsoft.com>
Co-authored-by: Jeff Omhover <jf.omhover@gmail.com>
Co-authored-by: thomasp-ms <XXX@me.com>
Co-authored-by: unknown <Mitgarg17495@gmail.com>
  • Loading branch information
5 people authored and majercakdavid committed Jan 8, 2023
1 parent e46b041 commit 6c941d3
Show file tree
Hide file tree
Showing 7 changed files with 323 additions and 0 deletions.
78 changes: 78 additions & 0 deletions .github/workflows/fl-dataprep-pneumonia.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# This workflow is nearly identical to that one by Harmke Alkemade et al.: https://github.com/Azure/medical-imaging/blob/main/.github/workflows/fl-dataprep.yml
name: FL data preparation - pneumonia example

on:
workflow_dispatch:
inputs:
workspace:
description: 'Name of the Azure ML workspace hosting the orchestrator'
required: true
type: string
resource_group:
description: 'Resource group name'
required: true
type: string


jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Check out repo
uses: actions/checkout@main
- name: Install az ml extension
run: az extension add -n ml -y
- name: Azure login
uses: azure/login@v1
with:
creds: ${{secrets.AZURE_CREDENTIALS_DATAPREP}}
- name: Download, split and upload data
run: |
clients=( france brazil us )
export KAGGLE_USERNAME=${{ secrets.KAGGLE_USERNAME }}
export KAGGLE_KEY=${{ secrets.KAGGLE_KEY }}
az configure --defaults group="${{ github.event.inputs.resource_group }}"
az configure --defaults workspace="${{ github.event.inputs.workspace }}"
# Download the Pneumonia dataset
pip install kaggle
pip install split-folders
kaggle datasets download -d paultimothymooney/chest-xray-pneumonia -p /tmp --unzip
# split in train, val and test
splitfolders --output /tmp/chest_xray_tvt/ --ratio .8 .1 .1 --seed 33 --move -- /tmp/chest_xray/train
# create a data asset with all data in case we need them later on
az ml data create --name pneumonia-alldata \
--path /tmp/chest_xray_tvt \
--type uri_folder
stages=( train test val )
classes=( PNEUMONIA NORMAL )
# Create folders
for client in "${clients[@]}"; do
mkdir /tmp/chest_xray_$client
for stage in "${stages[@]}"; do
mkdir /tmp/chest_xray_$client/$stage
for class in "${classes[@]}"; do
mkdir /tmp/chest_xray_$client/$stage/$class
done
done
done
# Copy data to client folders
i=0
for file in $(find /tmp/chest_xray_tvt -name '*.jpeg'); do
classnr=$(( i % 3 ))
cp $file ${file/chest_xray_tvt/chest_xray_${clients[classnr]}}
i=$((i+1))
done
# Create a data asset for each client
for client in "${clients[@]}"; do
az ml data create --name pneumonia-$client \
--path /tmp/chest_xray_$client \
--type uri_folder
done
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
name: pneumonia_agg_conda_env
channels:
- defaults
- pytorch
dependencies:
- python=3.7.11
- pytorch=1.12.1
- torchvision=0.13.1
- cudatoolkit=11.3
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# This file defining the model was taken as-is from https://github.com/Azure/medical-imaging/blob/main/federated-learning/pneumonia-federated/custom/pneumonia_network.py.
import torch
import torch.nn as nn
import torch.nn.functional as F


class PneumoniaNetwork(nn.Module):
def __init__(self):
super(PneumoniaNetwork, self).__init__()
dropout = 0.2

self.conv1 = nn.Conv2d(
in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1
)
self.conv2 = nn.Conv2d(
in_channels=32, out_channels=64, kernel_size=3, stride=1, padding=1
)
self.conv3 = nn.Conv2d(
in_channels=64, out_channels=128, kernel_size=3, stride=1, padding=1
)

self.dropout1 = nn.Dropout(dropout)
self.dropout2 = nn.Dropout(dropout)

self.fc1 = nn.Linear(28 * 28 * 128, 256)
self.fc2 = nn.Linear(256, 2)

def forward(self, x):
x = F.relu(self.conv1(x)) # 224 x 224 x 32
x = F.max_pool2d(x, 2, 2) # 112 x 112 x 32
x = F.relu(self.conv2(x)) # 112 x 112 x 64
x = F.max_pool2d(x, 2, 2) # 56 x 56 x 64
x = self.dropout1(x)
x = F.relu(self.conv3(x)) # 56 x 56 x 128
x = F.max_pool2d(x, 2, 2) # 28 x 28 x 128
x = self.dropout2(x)
x = x.view(-1, 28 * 28 * 128) # 100.352
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
146 changes: 146 additions & 0 deletions examples/components/PNEUMONIA/aggregatemodelweights/run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
import os
import argparse
import logging
import sys

import torch
from torch import nn
from torchvision import models
from pneumonia_network import PneumoniaNetwork


def get_arg_parser(parser=None):
"""Parse the command line arguments for merge using argparse.
Args:
parser (argparse.ArgumentParser or CompliantArgumentParser):
an argument parser instance
Returns:
ArgumentParser: the argument parser instance
Notes:
if parser is None, creates a new parser instance
"""
# add arguments that are specific to the component
if parser is None:
parser = argparse.ArgumentParser(description=__doc__)

parser.add_argument("--input_silo_1", type=str, required=True, help="")
parser.add_argument("--input_silo_2", type=str, required=False, help="")
parser.add_argument("--input_silo_3", type=str, required=False, help="")
parser.add_argument("--aggregated_output", type=str, required=True, help="")
return parser


def aggregate_model_weights(global_model, client_models):
"""
This function has aggregation method 'mean'
Args:
global_model: aggregated model that is saved for each iteration
client_models: list of client models
"""
global_dict = global_model.state_dict()

for k in global_dict.keys():
global_dict[k] = torch.stack(
[
client_models[i].state_dict()[k].float()
for i in range(len(client_models))
],
0,
).mean(0)
global_model.load_state_dict(global_dict)

return global_model


def get_model(model_path):
"""Get the model having custom input dimensions.
model_path: Pretrained model weights file path
"""
model = PneumoniaNetwork()
if model_path:
model.load_state_dict(torch.load(model_path + "/model.pt"))
return model


def get_client_models(args):
"""Get the list of client models.
args: an argument parser instance
"""
client_models = []
for i in range(1, len(args.__dict__)):
client_model_name = "input_silo_" + str(i)
if client_model_name in args.__dict__:
client_models.append(get_model(args.__dict__[client_model_name]))
return client_models


def get_global_model(args):
"""Get the global model.
args: an argument parser instance
"""
global_model = get_model(
args.aggregated_output
if args.aggregated_output
and os.path.isfile(args.aggregated_output + "/model.pt")
else None
)
return global_model


def run(args):
"""Run script with arguments (the core of the component).
Args:
args (argparse.namespace): command line arguments provided to script
"""
logger.debug("Get client models")
client_models = get_client_models(args)
logger.info(f"Total number of client models: {len(client_models)}")

logger.debug(f"Get global model")
global_model = get_global_model(args)

logger.debug("aggregate model weights")
global_model = aggregate_model_weights(global_model, client_models)

logger.info("Saving model weights")
torch.save(global_model.state_dict(), args.aggregated_output + "/model.pt")


def main(cli_args=None):
"""Component main function.
It parses arguments and executes run() with the right arguments.
Args:
cli_args (List[str], optional): list of args to feed script, useful for debugging. Defaults to None.
"""
# build an arg parser
parser = get_arg_parser()

# run the parser on cli args
args = parser.parse_args(cli_args)

print(f"Running script with arguments: {args}")
run(args)


if __name__ == "__main__":

# Set logging to sys.out
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
log_format = logging.Formatter("[%(asctime)s] [%(levelname)s] - %(message)s")
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
handler.setFormatter(log_format)
logger.addHandler(handler)

main()
39 changes: 39 additions & 0 deletions examples/components/PNEUMONIA/aggregatemodelweights/spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: aggregate_model_weights
version: 0.1.0
display_name: Aggregate Model Weights (from all silos)
type: command
description: Component for aggreating model weights.
is_deterministic: true

inputs:
input_silo_1:
type: uri_folder
description: input from silo 1 (e.g., model weights, or gradient updates)
optional: false
input_silo_2:
type: uri_folder
description: input from silo 2 (e.g., model weights, or gradient updates)
optional: true
input_silo_3:
type: uri_folder
description: input from silo 3 (e.g., model weights, or gradient updates)
optional: true

outputs:
aggregated_output:
type: uri_folder
description: the aggregated model or gradients, residing in the orchestrator compute.

code: .

command: >-
python run.py --aggregated_output ${{outputs.aggregated_output}}
--input_silo_1 ${{inputs.input_silo_1}}
$[[--input_silo_2 ${{inputs.input_silo_2}}]]
$[[--input_silo_3 ${{inputs.input_silo_3}}]]
environment:
conda_file: ./conda.yaml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
8 changes: 8 additions & 0 deletions examples/pipelines/pneumonia/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
name: fl_pneumonia_env
channels:
- defaults
dependencies:
- python=3.10.4
- pip=22.1.2
- pip:
- -r requirements.txt
3 changes: 3 additions & 0 deletions examples/pipelines/pneumonia/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
azure-identity
azure-ai-ml==1.0.0
omegaconf

0 comments on commit 6c941d3

Please sign in to comment.