Align round/iteration terminology with the native code #103

thomasp-ms · 2022-10-05T17:30:09Z

This PR fixes a minor issue in the factory job part of the repo, where the code is expecting a parameter named num_rounds while the config only provides num_of_iterations.

The issue is solved by replacing "round" by "iteration" throughout, as was done in #82.

Successfully submitted job can be found here and was submitted by running:

python .\examples\pipelines\fl_cross_silo_factory\submit.py --submit --ignore_validation

(which did not work before).

* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * copy files from template * draft orchestrator * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * protected sandbox draft * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * working deployment, wrong setup * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * add role * Update orchestrator_open.bicep * Update internal_blob_open.bicep * add datastore for orch, align config * remove comments * fix datastore name * align hello world example with new naming conventions * fix merge * work in progress * use mount * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * secure storage * submittable vnet silo * sandbox * Add msi version of scripts * sandbox main can switch between uai and msi * align orch with new design * align silo bicep * finalize vnet main * add vnet links * remove * specify dependson * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * fix * add vnet peering * fix peering * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * upgrade versions all around * add distinct permission * orch and silo as just a pair * orch and silo as just a pair * minor fixes * minor fixes * setname of datastore * verify all storage settings * add rules * add serice endpoint in vnet * add note in vnet * use old api * fix name * align open sandbox with vnet sandbox * align config with bicep * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * finalize * add notive * add note in quickstart * Remove unnecessary scripts * last curation * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Add September release notes (#98) * first draft * add Amit's suggestions * move release notes to CHANGELOG.md * amit's comments + changes to provisioning bullet * Jeff's comments * relative link Co-authored-by: thomasp-ms <XXX@me.com> * Add comments for future self Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com>

* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * Implement feedback from bugbash (#158) * fix fl_pairs path * Add comments in config yaml Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add silo/training subgraph for a better UX (#131) * WIP: refactor components to accomodate subgraph changes * integrate subgraphs * integrate subgraphs to the factory components * add workaround as a subgraph comp does not support optional inputs * change azure-ai-ml version * make required changes to support subgraphs in the factory code * change running_checkpoint's initial data path * correct running_checkpoint path * linting * add optional inputs in the subgraph component * revert subgraph literal changes * revert subgraph literal changes * update version * add comments * add subgraph for silos * add preprocessing/training/evaluation as one subgraph (silo subgraph) * build subgraphs with/without preprocessing * pass silo subgraph as an argument to the pipeline fn * add silo/training subgraph * revert libraries import statements * add azure identity dependency * change iteration name to iteration number * define pipelineoutputbase component * rmv older iteration!=1 condition * rmv redundant logs * test without validation * pipeline job status check via sdk * pipeline job status check via sdk * handle credentialunavailableerror exception * reformatting * rmv cli status check code from sdk * fix typo * test token new policy * test token new policy * reformat * status check using cli * check job status after every 100 sec. * linting * revise version * load_component's params change * change classes names as per a new version * handle nonetype * Jfomhover/subgraphminimal (#156) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * remove subprocess Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * move lnks from release branch to main (#160) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * changelog * address comments * October release changelog (#161) * changelog * address comments Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com>

* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com>

…198) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: David Majercak <damajercak@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: Dávid Majerčák <david1majercak@gmail.com>

* init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * generalize aggregate * a bit more detail on help * reformat to black * feedback addressed Co-authored-by: David Majercak <damajercak@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> Co-authored-by: Dávid Majerčák <david1majercak@gmail.com>

* Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * modifying test job * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * change experiment name * check in k8s config file * print all files and folders in the volume * another attempt * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * . * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * read in proper location * move to new directories * restore HELLOWORLD component * black * restore literal job * really restore literal job * eof * . * doc stub * introduce new example in the readme * inconsistencies when using different base branch * . * reference k8s tutorial in ext-silo doc * yml templates and description * provisioning sections * . * add note, fix typo * remove duplicated file * cleaned up component * black * pipeline * clean up config * fix bug of non-existing output directory * second attempt at bug fix * instructions on how to run test job * clean up instructions Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com>

* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com>

* init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * provide script to create cpu+gpu computes together * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * cpu gpu computes having the same uai * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * add option to create gpu computes * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * Standarize compute names and allow upto 2 computes to be created for orchestrator and silos * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * generalize compute and datastore names * align config files to provisioning scripts * change compute sku in ci/cd * temp. changes to test cpu gpu computes * vnet compute 1 name change * temp. changes to test cpu gpu computes * change compute regions * test example pipelines * test example pipelines * test example pipelines * test example pipelines * implement compute2 settings for the vnet setup * implement compute2 settings for the vnet setup * test vnet setup * change computesku * test industry relevant examples with the vnet setup * update vnet compute with existing storage * revert back github workflow changes * final testing * revert temp. workflow changes * test kaggle kv * test open setup * update arm templates Co-authored-by: David Majercak <damajercak@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: Dávid Majerčák <david1majercak@gmail.com>

* basic tabular data analysis component * improve plots and docs in data analytics * format update * format update * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * add README * update normalization * update exploration * update example for finance with multiple models * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * update formatting * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882. * Revert "use on-prem silos in example factory job" This reverts commit e2ef884. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f3. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * add readme section * rename training to traininsilo for consistency * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * upload data script * add data splitting pipeline * nit updates * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * basic tabular data analysis component * improve plots and docs in data analytics * format update * format update * mlflow pipeline level * revert unwanted rebase changes --------- Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com>

…, differential privacy, etc (#256) * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob dat…

* maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions…

* rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not u…

* add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -…

…chmark results... (#282) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Update release branch (#271) * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-aut…

* Update release branch (#271) * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure branch * use different regions, limit size of account * reduce to 3 regions, add keys to guid * substring * align config * do not use model * Add msi version of scripts * sandbox main can switch between uai and msi * fix name * linting * linting * implement ignore param, hotfix model with startswith * Address my own comments on Jeff's PR (#96) * remove magic number * little improvements on some comments * remove unused files * put dash replacement next to length check * don't necessarily assume USER AI * UAI -> XAI * revert previous UAI -> XAI changes * move length check next to dash replacement * typo * try movind the dependsOn's * RAGRS -> LRS * revert dependsON changes * revert another small change in a comment Co-authored-by: thomasp-ms <XXX@me.com> * align config of both submit scripts * Make distinction between on-off and repeatable provisioning scripts (#99) * clarify the role needed * remove "custom role" line * adjust locations * use existing rg if not Owner of the sub * clarify "Secure" setup * add usage instructions in docstring * explain what scripts are one-off (vs repeatable) Co-authored-by: thomasp-ms <XXX@me.com> * Align round/iteration terminology with the native code (#103) * rename parameter in config file * keep iterations instead of rounds * round -> iteration Co-authored-by: thomasp-ms <XXX@me.com> * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * get all goodies from secureprovisioning branch wip * align both submits to work * add optional test * rename native to literal * add getting started in readme, introduce emojis * change person * remove emojs * Propose rewriting of readme to highlight motivation first (#110) * propose rewriting of readme to highlight motivation first * minor edit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update README.md * Update quickstart to mention rg clean-up * Update quickstart.md * Update quickstart.md * Update quickstart.md * Build bicep scripts as ARM template, add Azure Buttons to quickstart (#120) * Update quickstart to lower header (hotfix) (#117) * add arm templates, add button in quickstart * switch to releasebranchlink Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Add subscription id, resource group and workspace name as CLI args (#122) * add more cli args * code style * code style * update quickstart doc * update readme * Initiate provisioning "cookbook" with list of provisioning scenarios + example (#123) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Continuous Integration Tests (#119) * take values of subscription id, rs grp, ws name, etc from github secrets and submit a native pipeline * change path * Test azure creds in the github workflow * reformatting * add pipeline validation and testing workflow * add permissions * add permissions * check only certain dir to trigger workflows * add soft validation for any iteration branch PR * add provisioning script test * testing * create rg * create rg * change compute for testing * change demoname * delete old rg * change demoname * add demobasename and aml ws name as github secrets * random demo base name * auto generate random base name * random demo base name * adjust random num length * add vnet sandbox test * rmv dependency b/w jobs * submit various pipelines * change execution graph path * add cli args in the factory code * change compute for testing * ignore validation - factory * create custom action * correct path * correct path * add shell in the github action * create github actions and take required values as input params * add shell * add wait condition * add logs * linting * correct rg name * add azure ml extension * handle ml extension installation error. * add release branch test cases * add script to delete run history * cronjob test * cronjob test * checkout branch * test run history deletion script * test run history deletion script * test run history deletion script * azure login * date format change * remove double quotes * date format change * archive run history script tested * Add vnet-based provisioning options to cookbook (#128) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Make deployment name unique in our github actions (#135) * set unique name for deployments * add attempt to deployment name Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refactor compute/storage scripts to be independent (#132) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide motivation in provisioning docs for using service endpoints (#136) * add motivation for service endpoints * add link Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Refresh provisioning arm buttons with latest from bicep (#139) * align names of directories * rebuild all arm Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update silo_vnet_newstorage.md (#141) * Add Bicep build vs ARM template diff test (#140) * Add diff test for bicep vs arm * Debug * Debug * fix syntax error * redirect build output to stdout * coorect path * trigger arm template test when pushing changes to main branch from release* branch * remove redundant logs * Add "open aks with cc" provision tutorial and bicep scripts (#138) * implement bicep scripts to provision open aks with cc * add aks cc tutorial * build arm and add in branch * add button Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Provide script + tutorial to attach pair with an existing storage (#142) * provision datastore with existing storage * add arm for existing storage, add docs * add link in readme Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add latest arm templates to diff build (#145) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implements provisioning script for a confidential compute VM jumpbox inside a vnet (debug) (#146) * add jumpbox script with tutorial * add template to diff build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Update jumpbox_cc.md (#147) * update tutorials for silos to integrate feedback (#149) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Implement option to turn orchestrator storage fully private (behind PLE) (#150) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Tutorial on how to adapt native and factory code to write FL experiments. (#100) * WIP: add general information about the factory code * moving factory-tutorial to another file * add scenarios * add instructions on how to adapt literal code * rename file * add general info and fix typos * Jeff's feedback * Apply code clean-up to provision scripts before bug bash (#148) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Instructions for provisioning external silos (#101) * very first stab, far from done * non-secure native job using the on-prem k8s * use on-prem silos in example factory job * Revert "very first stab, far from done" This reverts commit e00d882dfee6a348eb89cd63e339a051b85ce0ca. * Revert "use on-prem silos in example factory job" This reverts commit e2ef8841c6be25a6c84b57ae079cca8f361323fe. * Revert "non-secure native job using the on-prem k8s" This reverts commit 923e5f321d28b30d8cd9759c47a7ffe5457e3284. * restore doc stub * Make Git ignore resources for test jobs * fix gitignore * typo in comment * steps A through D * 2 typos * move to subdir * fix workspace creation * add orchestrator part, role, and timeline * last commit before PR * adjust to new open_azureml_workspace.bicep * first wave after Jeff's comments * address jeff's comments * typo * light trims Co-authored-by: thomasp-ms <XXX@me.com> * bump up every title * skeleton * first attempt at data prep like Harmke * change secret name * wrong secret name * remove separate unzip * change clients, create silo data assets * different names for silo data assets, duh * cleanup * adjust secret name in doc * . * use latest literal code * align environment with literal * base on latest component * one dataset, comment out 2 unused args (for now) * introduce new arguments * reflect modified args in component spec * remove unused arg from config * start hooking up to Harmke's trainer * initialize PTLearner and include in run.py * use same values as Harmke for epochs and lr * attributes with _, start implementing local_train * add loggings, add test(), fix device_ * train_loader_ * align _'s * fix transform bug * remove unused constants * use proper model in aggregation code * removed unused file * remove unused code and arguments, logging to DEBUG * restore `metrics_prefix` parameter * finish restoring `metrics_prefix` * do not duplicate model code * revert dedup attempt * improve docstrings and descriptions * change experiment name * change pipeline name and docstring * cite sources, remove wrongly added licenses * italics * black Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: unknown <Mitgarg17495@gmail.com> * update formatting * add readme section * rename training to traininsilo for consistency * add more comments and update docs * include urgency in PR template (#184) Co-authored-by: thomasp-ms <XXX@me.com> * Share resources and standardize component names (#182) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black Co-authored-by: thomasp-ms <XXX@me.com> * Thopo/share component and environment (#185) * use shared agg component across all examples * only keep a single {reqs/env} * use more recent pip version * standardize component spec name * support dummy HELLOWORLD example is agg * black * SHARED -> utils, rename agg env Co-authored-by: thomasp-ms <XXX@me.com> * rename config to spec and add upload data step * upload data script * use util aggregateweights * add data splitting pipeline * docs update * log pipeline level only once per silo training * do categorical encoding ahead of splitting * nit updates * update comment * update formatting * Hotfix: grant `az login` permissions to the 'clear run history' script (#166) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * simplify the job wait condition's code * add comments * trigger mnist pipeline check * test token validity * grant `az login` permissions to the clear-history script * revert to sleep wait code * test access token validity * nit Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * fix readme * aggregate weughts on whichever device is available * update docstrings * update formatting * reduce upload pipeline file * fix datastore * add info about data upload step * fix typo * steps for changing access policies * update docs * Named Entity Recognition example (#177) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Create nice-looking homepage for the examples in readme+docs (#190) * add homepage for industry examples Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Align the medical imaging data provisioning process with other examples (#191) * adjust paths in config file * support component with 1 output for Pneumonia * formatting * adjust doc to new provisioning * remove GH action for dataprep * custom component for provisioning pneumonia data * black Co-authored-by: thomasp-ms <XXX@me.com> * hot fix (#192) Co-authored-by: thomasp-ms <XXX@me.com> * Lots of micro-fixes after bug bashing all 3 industry examples (#194) related to components: * create distinct names for all components of each scenario * polish component descriptions * remove unused mnist datatransfer and postprocessing components * upgrade all MCR images to a more recent OS * cut some unnecessary dependencies * use curated environments whenever possible (to speed up job build time) related to pipelines: * fix issues with ccfraud submit script (path to shared folder) * remove unnecessary json+azure imports in submit scripts * align all 3 submissions scripts * in upload data pipeline, make --example required without default value to force intentional decision * in upload data pipeline, use scenario name in the output path to avoid collision * give each submit pipeline a distinct experiment and run name for readability Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Standardize all 3 real world example tutorials (#193) * standardize documentation on all 3 examples * change titles * fix spaces * add pip instructions * upgrade azure-ai-ml version Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * poc for ddp training * remove debug code + allow logging from multiple nodes * update formatting * provide correct link to Kaggle dataset (#196) * provide correct link * . * . Co-authored-by: thomasp-ms <XXX@me.com> * add DDP docs * Add CI tests for industry-relevant examples (#186) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add pneumonia and ner examples tests * add ccfraud test in the CI/CD pipeline * add data upload test * trigger workflow * CI testing1 * CI testing1 * test kv kaggle creds * fix creds * fix creds * set kaggle creds * test pneumonia data-upload * test all industry relevant examples * upload data test for 3 examples * add main tests * rmv redundant chrs * fix typo * avoid industry relevant examples tests to run on the vnet setup as it is already covered by the open setup Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * CLI commands to add credentials in the workspace keyvault (#199) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add cli cmds to set a kv secret * Jeff's feedback * Implement Thomas's feedback Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Thomas/bug bash feedback 04 (#203) * no need to navigate to a specific directory * keyvault -> key vault * improve Kaggle sections * GPU's for NER example * ARM templates with latest bicep version * bold * GPU instructions in quickstart Co-authored-by: thomasp-ms <XXX@me.com> * fix test to align with new sdk (#204) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Hotfix: DataAccessError (orchestrator access) (#205) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix bug * update arm template * fix a problem that was encountered during resolving merge conflicts Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add GitHub Action workflow concurrency and implement token expiration policy workaround (#200) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add GitHub workflow concurrency * test 1 * test 1 * test 1 * test 2 * test 3 * test 2 * test 3 * implement token expiry workaround * test 1 * workaround to handle token expiry error * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Implement troubleshooting guide with first typical issues (#208) * write troubleshooting guide Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Fix order of precedence for AML workspace references in submit.py (#209) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * fix order of precedence * fix build Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Add data permissions issue to TSG (#210) * add permissions issue to TSG Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * November notes (#211) Co-authored-by: thomasp-ms <XXX@me.com> * create instance type and select it for run for cc * upgrade all pip dependencies (#212) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * format * use azureml built in distribution fw * Test industry-relevant examples if any changes in the `utils` dir are observed (#221) * add test to validate changes in the utils dir * test1 trigger workflow * fix typo * only destroy ddp group if it was created * remove unnecessary imports * allow for mixture of ddp and non-ddp processes model aggregation * use documentation instead of ps1 script for creating instancetype for CC * add instance type assignment for all examples * formatting * formatting * update batch size * update model name * use older pytorch * Generalize aggregate component to Babel (#220) * init branch * wip data exploration * data exploration region/silo * basic model * regions * basic network and finished data processing * training * Implement generic FedAvg without model object (#167) * generic fedavg pytorch * support model classes * add docstrings Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * add README * update normalization * update exploration * Thomas/small improvements (#171) * remove unused local MNIST data * add link to provisioning cookbook in docs readme * recommend creating a conda env in the quickstart Co-authored-by: thomasp-ms <XXX@me.com> * update example for finance with multiple models * successful training through lstm * revert unneeded changes * remove local exploration ipynb * fix test metric * different param value for AKS (#179) Co-authored-by: thomasp-ms <XXX@me.com> * Pneumonia xray example (#164) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add more intuitive agg output dir path * reformat using black * add iteration2 branch for PR build testing * reformat date and pass kwargs instead in the getUniqueIdentifier fn * working submit * working factory submit * linting * move component path * add soft validation * add soft validation * Add basic tests on config * linting * working bicep deployment for vanilla demo * proper orchestrator script, double containers * fix name * docstring * docstring * rollback to using only 1 container * align naming convention * instructions * working submit * set up permission model * working orch perms * wonky perms assignment * working role assignments * remove old perm model * working except silo2orch * fix typo * working submit with config * add sku as param * use R/W for now * fix submit to align with bicep provisioning demo * linting * remove dataset files * fix docstring on permission model * write draft docs with homepage, align structure, remove requirements, ensure demo documented * rollback change to req * change factory to use custom model type during validation * linting * Display metrics at the pipeline level (#68) * Fix optional input yaml and mlflow log bugs (#59) * refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * Accomodate optional input chnages and switch from mlflow autologging to manual logging * code style * change optional inputs syntax Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Make changes to display all metrics at the pipeline level * Log preprocessing metadata in mlflow * linting * Pass client as an arg * Fix typo, rmv name from silo config, metric naming convention, and add metric identifier in the preprocessing component Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com> * Remove redundant files from the mlops directory (#69) * Remove internal & external dir as provisioning is taken care by bicep * keep mnist data files * rename demo script (#71) Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> * Unified documentation (#72) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * simplify sandbox script * simplify script, ensure it works * align config of native submit * align naming conventions between scripts, reinject rbac role * create test job for quickly debugging provisioning issues * fix tests * linting * move permissions to storage * align config with bicep scrits * Document the metrics panel of the pipeline overview in the quickstart (#76) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * linting * add docstrings and disclaimers * Add instructions on how to create a custom graph (#78) * WIP: unifying docs * Remove redundant doc file. We can always revisit if needed * FL concepts will be covered in the glossary doc * Remove internal and external silos docs as the code will be re-written in bicep * provide comprehensive documentation * rename file * refine docs * refine docs and rename fl_cross_silo_basic to fl_cross_silo_native * document the metrics/pipeline panel in the quickstart * add instructions on how to create a custom graph * do better comments * Refine native code (#82) * fix silo name * log only one datapoint per iteration for an aggregated metrics * Align terminology for iteration/round/num_rounds * linting * use storage blob data contibutor * add demoBaseName to guid name of role deployment (#85) Co-authored-by: thomasp-ms <XXX@me.com> * use id list, add listkeys builtin * rename and dissociate orchestrator in resource + orchestrator * separate orchestrator script * draft sandbox setup * make silo script distinct * Update orchestrator_open.bicep * Update internal_blob_open.bicep * remove comments * align hello world example with new naming conventions * ensure uai assignments are created AFTER storage is created * linting * enforce precedence * merge from secure …

thomasp-ms added 3 commits October 5, 2022 10:04

rename parameter in config file

7062e41

keep iterations instead of rounds

da91601

round -> iteration

75c9d51

thomasp-ms requested a review from garg-amit October 5, 2022 17:30

thomasp-ms marked this pull request as ready for review October 5, 2022 17:30

garg-amit approved these changes Oct 5, 2022

View reviewed changes

thomasp-ms merged commit 3cae4b6 into release-sdkv2-iteration-02 Oct 5, 2022

thomasp-ms deleted the thomas/fix-config branch October 5, 2022 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align round/iteration terminology with the native code #103

Align round/iteration terminology with the native code #103

thomasp-ms commented Oct 5, 2022

Align round/iteration terminology with the native code #103

Align round/iteration terminology with the native code #103

Conversation

thomasp-ms commented Oct 5, 2022