Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Microsoft Azure Distributed Linear Learner Recipe #195

Merged
merged 106 commits into from Jun 27, 2018
Merged

Microsoft Azure Distributed Linear Learner Recipe #195

merged 106 commits into from Jun 27, 2018

Conversation

danyrouh
Copy link
Contributor

@danyrouh danyrouh commented May 2, 2018

No description provided.

@@ -0,0 +1,27 @@
## MADL-CPU-OpenMPI Data Shredding
We included a python script that shows how to shred and deploy your training data prior to running an Azure training job.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor:

prior to running a training job on Azure VMs via Open MPI.


### Pool Configuration
The pool configuration should enable the following properties:
* `vm_size` should be a CPU-only instance, 'STANDARD_D2_V2'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be a CPU-Only instance, for example, STANDARD_D2_V2.

* `-d` log global models to this directory at the host"
* `-b` location for the algorithm's binary"

* The training data will need to be shredded to match the number of VMs and the thread's count per VM, and then deployed to a mounted Azure blob that the VM docker images have read/write access.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should provide the configuration example for this here, e.g.:

  • shared_data_volumes should contain the shared data volume with an azureblob volume driver as specified in the global configuration file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included the part you added with a link to the full configuration file.

* `multi_instance` property must be defined
* `num_instances` should be set to `pool_current_dedicated`, or
`pool_current_low_priority`
* `coordination_command` should be unset or `null`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend eliminating coordination_command and resource_files bullets since they are not needed.

@@ -0,0 +1,27 @@
## MADL-CPU-OpenMPI Data Shredding
We included a python script that shows how to shred and deploy your training data prior to running a training job on Azure VMs via Open MPI.
Azure VMs via Open MPI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line since you modified above.

@@ -0,0 +1,27 @@
## MADL-CPU-OpenMPI Data Shredding
We included a python script that shows how to shred and deploy your training data prior to running a training job on Azure VMs via Open MPI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using "We". Perhaps reword as:

This Data Shredding recipe shows how to shred...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do a quick search on all your markdown files for use of "we" and replace with something else.

#Dockerfile for MADL (Microsoft Distributed Learners)

FROM ubuntu:16.04
MAINTAINER Saeed Maleki Todd Mytkowicz Madan Musuvathi Dany rouhana <https://github.com/Azure/batch-shipyard>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the real URL here

@@ -0,0 +1,27 @@
## MADL-CPU-OpenMPI Data Shredding
This Data Shredding recipe shows how to shred and deploy your training data prior to running a training job on Azure VMs via Open MPI.
Azure VMs via Open MPI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you might have missed the prior comment: remove line 3.

@alfpark alfpark added this to the 3.5 milestone Jun 25, 2018
@alfpark alfpark changed the base branch from master to develop June 26, 2018 15:26
@alfpark alfpark added the recipe label Jun 26, 2018
@alfpark alfpark merged commit baf9ce0 into Azure:develop Jun 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants