Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Named Entity Recognition example (#177)
* refactor components to use dpv2 + remove unnecessary environments * working dpv2 pipeline * refactor scripts with right inputs and outputs * fix code path * implement fake outputs * fix paths * fix imports * fix args of aggregation script * add note, fix component args * add chekcpoint arg * linting * linting * remove sdkv2 folder * add argparse to submit script * add docstring * add docstring * linting * linting * add staging branch to build * rollback changes to build, leave it for another PR * remove logging lien * remove custom uuid * linting * add docstring to custom path function * polish docstring * rename model_silo_X to input_silo_X * rename output * rename agg output * Improve auto-provisioning resources (#35) (#36) * docker file stub * move docker file, implement feedback * login before setting subscription * login before setting subscription * use default k8s version * pin latest version since default won't work * remove executionpolicy part, other small updates * clarify to change job file _in docker filesystem_ * login before setting subscription * formatting * \ -> / * install azureml-core in docker file * propagate changes to section 7 * fix dataset creation command Co-authored-by: thomasp-ms <XXX@me.com> Co-authored-by: thomasp-ms <XXX@me.com> * Refactor folder structure (#37) * `plan` -> `docs` * 'plan' -> 'docs' * 'automated_provisioning' -> 'mlops' * 'fl_arc_k8s' -> 'examples' Co-authored-by: thomasp-ms <XXX@me.com> * auto provisioning - vanilla internal silos (#41) * split internal and external provisioning * adjust directories after internal/external split * introduce overall mlops readme * first stab * remove useless comment and my alias Co-authored-by: thomasp-ms <XXX@me.com> * Perform real FL training on the MNIST dataset Added component files customized for MNIST dataset. Set the setup for 3 silo having their own compute and datastore. git config --global user.email "you@example.com" * refine components and add logs * maintain consistency b/w config files * add requirement and env files * add requirement and env files * rmv redundant dependencies, rename conda envs * Correct epoch default value * point data asset instead of underlying URI * beef up orchestrator cluster (#46) Co-authored-by: thomasp-ms <XXX@me.com> * Provision CPUs for silos (instead of GPUs) (#47) * beef up orchestrator cluster * gpu -> cpu Co-authored-by: thomasp-ms <XXX@me.com> * add preprocessing comp description, fix typo and correct default datastore name * add integration validation test - build * update readme file * Move logger to the maion if block, add pytorch channel in the conda env yaml and move readme to the docs folder * code reformatting using black * add documentation to run an FL experiment * add more intuitive path for aggr output dir * Merge changes * add multinerd template files * NER components * re-structure * partition data + log metrics * add redme * add readme * restructuring * restructuring * add doc strings * train on gpus * create a separate component to upload data on silos * docs * rename * add assert statement * change upload-data job compute to orchestrator compute * remove ner from literal example choices * fix doc * add model-name, tokenizer configurable * pip version upgrade * reformatting * use shared aggregated component * rename script file * add note * create a compute that has access to silos' storage accs * change data uploading approach * update doc * incorporate Thomas's feedback * fix typo Co-authored-by: Jeff Omhover <jeomhove@microsoft.com> Co-authored-by: Jeff Omhover <jf.omhover@gmail.com> Co-authored-by: Thomas <7998422+thomasp-ms@users.noreply.github.com> Co-authored-by: thomasp-ms <XXX@me.com>
- Loading branch information