Permalink
Browse files

Make the cluster/gpu deployment work from buildartifacts

Give up on running it locally for now.
  • Loading branch information...
elibarzilay committed Nov 13, 2017
1 parent 7ef335c commit 501da9dc57ff0959c81cafb86ecbc827ab57926b
View
@@ -91,7 +91,7 @@ notebooks. See the [documentation](docs/docker.md) for more on Docker use.
> To read the EULA for using the docker image, run \
> `docker run -it -p 8888:8888 microsoft/mmlspark eula`
#### GPU VM Setup
### GPU VM Setup
MMLSpark can be used to train deep learning models on a GPU node from a Spark
application. See the instructions for [setting up an Azure GPU
View
@@ -2,20 +2,23 @@
## Requirements
CNTK training using MMLSpark in Azure requires an HDInsight Spark cluster and a
GPU virtual machine (VM). The GPU VM should be reachable via SSH from the
cluster, but no public SSH access (or even a public IP address) is required.
As an example, it can be on a private Azure virtual network (VNet), and within
this VNet, it can be addressed directly by its name and access the Spark
clsuter nodes (e.g., use the active NameNode RPC endpoint).
See the original [copyright and license notices](third-party-notices.txt) of
third party software used by MMLSpark.
CNTK training using MMLSpark in Azure requires an HDInsight Spark
cluster and a GPU virtual machine (VM). The GPU VM should be reachable
via SSH from the cluster, but no public SSH access (or even a public IP
address) is required, and the cluster's NameNode should be accessible
from the GPU machine via the HDFS RPC. As an example, it can be on a
private Azure virtual network (VNet), and within this VNet, it can be
addressed directly by its name and access the Spark clsuter nodes (e.g.,
use the active NameNode RPC endpoint).
(See the original [copyright and license
notices](third-party-notices.txt) of third party software used by
MMLSpark.)
### Data Center Compatibility
Currently, not all data centers have GPU VMs available. See [the Linux
VMs page](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)
Currently, not all data centers have GPU VMs available. See [the Linux VMs
page](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)
to check availability in your data center.
## Connect an HDI cluster and a GPU VM via the ARM template
@@ -44,21 +47,7 @@ the associated GPU VM:
- `gpuVirtualMachineName`: The name of the GPU virtual machine to create
- `gpuVirtualMachineSize`: The size of the GPU virtual machine to create
If you need to further configure the environment (for example, to change [the
class of VM
sizes](https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)
for HDI cluster nodes), modify the template directly before deployment. See
also [the guide for best ARM template
practices](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-template-best-practices).
For the naming rules and restrictions for Azure resources please refer to the
[Naming conventions
article](https://docs.microsoft.com/en-us/azure/architecture/best-practices/naming-conventions).
There are actually three templates that are used for deployment:
- [`deploy-main-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-main-template.json):
This is the main template. It referencs the following two child
templates — these are relative references so they are expected to be
found in the same base URL.
There are actually two additional templates that are used from this main template:
- [`spark-cluster-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/spark-cluster-template.json):
A template for creating an HDI Spark cluster within a VNet, including
MMLSpark and its dependencies. (This template installs MMLSpark using
@@ -69,46 +58,40 @@ There are actually three templates that are used for deployment:
CNTK and other dependencies that MMLSpark needs for GPU training.
(This is done via a script action that runs
[`gpu-setup.sh`](https://mmlspark.azureedge.net/buildartifacts/0.9/gpu-setup.sh).)
Note that the last two child templates can also be deployed independently, if
Note that these child templates can also be deployed independently, if
you don't need both parts of the installation.
## Deploying an ARM template
### 1. Deploy an ARM template within the [Azure Portal](https://ms.portal.azure.com/)
An ARM template can be opened within the Azure Portal via the following REST
API:
[Click here to open the above
template](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fmmlspark.azureedge.net%2Fbuildartifacts%2F0.9%2Fdeploy-main-template.json)
in the Azure portal.
https://portal.azure.com/#create/Microsoft.Template/uri/<ARM-template-URI>
(If needed, you click the **Edit template** button to view and edit the
template.)
The URI can be one for either an *Azure Blob* or a *GitHub file*. For example,
This link is using the Azure Portal API:
https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fmystorage.blob.core.windows.net%2Fdeploy-main-template.json
https://portal.azure.com/#create/Microsoft.Template/uri/〈ARM-template-URI〉
(Note that the URL is percent-encoded.) Clicking on the above link will
open the template in the Portal. If needed, click the **Edit template** button
(see screenshot below) to view and edit the template.
where the template URI is percent-encoded.
![ARM template in Portal](http://image.ibb.co/gZ6iiF/arm_Template_In_Portal.png)
### 2. Deploy an ARM template with MMLSpark Azure CLI 2.0
### 2. Deploy an ARM template with [MMLSpark Azure CLI 2.0](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.sh)
We also provide a convenient shell script to create a deployment on the
command line:
MMLSpark provides an Azure CLI 2.0 script
([`deploy-arm.sh`](../tools/deployment/deploy-arm.sh)) to deploy an ARM
template (such as
[`deploy-main-template.json`](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-main-template.json))
along with a parameter file (see
[deploy-parameters.template](../tools/deployment/deploy-parameters.template)
for a template of such a file).
* Download the [shell
script](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.sh)
and make a local copy of it
> Note that you cannot use the
> [template file](../tools/deployment/deploy-main-template.json) from
> the source tree, since it requires additional resources that are
> created by the build (specifically, a working version of
> [`install-mmlspark.sh`](../tools/hdi/install-mmlspark.sh)).
* Create a JSON parameter file by downloading [this template
file](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-parameters.template)
and modify it according to your specification.
The script take the following arguments:
You can now run the script — it takes the following arguments:
- `subscriptionId`: The GUID that identifies your subscription (e.g.,
`01234567-89ab-cdef-0123-456789abcdef`), defaults to setting in your
`az` environment.
@@ -118,29 +101,28 @@ The script take the following arguments:
`East US`), note that this is required if creating a new resource
group.
- `deploymentName`: The name for this deployment.
- `templateLocation`: The URL of an ARM template file, or the path to
one. By default, it is set to `deploy-main-template.json` in the same
directory, but note that this will normally not work without the rest
of the required resources.
- `parametersFilePath`: The path to the parameter file, which you need
to create. Use `deploy-parameters.template` as a template for
creating a parameters file.
- `templateLocation`: The URL of an ARM template file. By default, it
is set to the above main template.
- `parametersFilePath`: The path to the parameter file, which you have
created.
Run the script with a `-h` or `--help` to see the flags that are used to
set these arguments:
./deploy-arm.sh -h
If no flags are specified on the command line, the script will prompt
you for all values. If needed, install the Azure CLI 2.0 using the
instruction found in the [Azure CLI Installation
Guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli).
you for all needed values.
> Note that the script uses the Azure CLI 2.0, see the
> [Azure CLI Installation Guide](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
> if you need to install it.
### 3. Deploy an ARM template with the [MMLSpark Azure PowerShell](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.ps1)
### 3. Deploy an ARM template with the MMLSpark Azure PowerShell
MMLSpark also provides a [PowerShell
script](https://mmlspark.azureedge.net/buildartifacts/0.9/deploy-arm.ps1)
to deploy ARM templates, similar to the above bash script, run it with
to deploy ARM templates, similar to the above bash script. Run it with
`-?` to see the usage instructions (or use `get-help`). If needed,
install the Azure PowerShell cmdlets using the instructions in the
[Azure PowerShell
@@ -164,7 +146,7 @@ Azure will stop billing if a VM is in a "Stopped (**Deallocated**)" state,
which is different from the "Stopped" state. So make sure it is *Deallocated*
to avoid billing. In the Azure Portal, clicking the "Stop" button will put the
VM into a "Stopped (Deallocated)" state and clicking the "Start" button brings
it VM. See "[Properly Shutdown Azure VM to Save
it back up. See "[Properly Shutdown Azure VM to Save
Money](https://buildazure.com/2017/03/16/properly-shutdown-azure-vm-to-save-money/)"
for futher details.
@@ -96,12 +96,7 @@
"brainscriptText = \"\"\"\n",
" # ConvNet applied on CIFAR-10 dataset, with no data augmentation.\n",
"\n",
" command = TrainNetwork\n",
"\n",
" precision = \"double\"; traceLevel = 1 ; deviceId = \"auto\"\n",
"\n",
" rootDir = \"../../..\" ; dataDir = \"$$rootDir$$/DataSets/CIFAR-10\" ;\n",
" outputDir = \"./Output\" ;\n",
" parallelTrain = true\n",
"\n",
" TrainNetwork = {\n",
" action = \"train\"\n",
@@ -148,7 +143,7 @@
"\n",
" SGD = {\n",
" epochSize = 0\n",
" minibatchSize = 256\n",
" minibatchSize = 32\n",
"\n",
" learningRatesPerSample = 0.0015625*10:0.00046875*10:0.00015625\n",
" momentumAsTimeConstant = 0*20:607.44\n",
@@ -164,18 +159,7 @@
" dataParallelSGD = { gradientBits = 1 }\n",
" }\n",
" }\n",
"\n",
" reader = {\n",
" readerType = \"CNTKTextFormatReader\"\n",
" file = \"$$DataDir$$/Train_cntk_text.txt\"\n",
" randomize = true\n",
" keepDataInMemory = true # cache all data in memory\n",
" input = {\n",
" features = { dim = 3072 ; format = \"dense\" }\n",
" labels = { dim = 10 ; format = \"dense\" }\n",
" }\n",
" }\n",
"}\n",
" }\n",
"\"\"\""
]
},
@@ -25,9 +25,9 @@
.PARAMETER deploymentName
The deployment name.
.PARAMETER templateFilePath
Path of the template file to deploy.
Optional, defaults to deploy-main-template.json in this directory.
.PARAMETER templateLocation
URL of the template to deploy.
Optional, defaults to the one corresponding to this script.
.PARAMETER parametersFilePath
Path of the parameters file to use for the template, use
@@ -57,37 +57,48 @@ param(
[string]
$resourceGroupName,
[Parameter(Mandatory=$False)]
[string]
$resourceGroupLocation,
[Parameter(Mandatory=$False)]
[string]
$deploymentName,
[Parameter(Mandatory=$False)]
[string]
$templateFilePath = "deploy-main-template.json",
$templateLocation,
[Parameter(Mandatory=$True)]
[string]
$parametersFilePath
)
# <=<= this line is replaced with variables defined with `defvar -X` =>=>
$DOWNLOAD_URL = "$STORAGE_URL/$MML_VERSION"
# TODO: throw an error if $MML_VERSION is not defined
<#
.SYNOPSIS
Registers RPs
#>
Function RegisterRP {
Param(
[string]$ResourceProviderNamespace
)
Write-Host "Registering resource provider '$ResourceProviderNamespace'";
Register-AzureRmResourceProvider -ProviderNamespace $ResourceProviderNamespace;
Param(
[string]$ResourceProviderNamespace
)
Write-Host "Registering resource provider '$ResourceProviderNamespace'";
Register-AzureRmResourceProvider -ProviderNamespace $ResourceProviderNamespace;
}
#******************************************************************************
# Script body
# Execution begins here
#******************************************************************************
if (!$templateLocation) {
$templateLocation = $DOWNLOAD_URL + "/deploy-main-template.json";
}
$ErrorActionPreference = "Stop"
# sign in
@@ -101,29 +112,29 @@ Select-AzureRmSubscription -SubscriptionID $subscriptionId;
# Register RPs
$resourceProviders = @("microsoft.hdinsight");
if ($resourceProviders.length) {
Write-Host "Registering resource providers"
foreach ($resourceProvider in $resourceProviders) {
RegisterRP($resourceProvider);
}
Write-Host "Registering resource providers"
foreach ($resourceProvider in $resourceProviders) {
RegisterRP($resourceProvider);
}
}
#Create or check for existing resource group
$resourceGroup = Get-AzureRmResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue
if (!$resourceGroup) {
Write-Host "Resource group '$resourceGroupName' does not exist. To create a new resource group, please enter a location.";
if (!$resourceGroupLocation) {
$resourceGroupLocation = Read-Host "resourceGroupLocation";
}
Write-Host "Creating resource group '$resourceGroupName' in location '$resourceGroupLocation'";
New-AzureRmResourceGroup -Name $resourceGroupName -Location $resourceGroupLocation
Write-Host "Resource group '$resourceGroupName' does not exist. To create a new resource group, please enter a location.";
if (!$resourceGroupLocation) {
$resourceGroupLocation = Read-Host "resourceGroupLocation";
}
Write-Host "Creating resource group '$resourceGroupName' in location '$resourceGroupLocation'";
New-AzureRmResourceGroup -Name $resourceGroupName -Location $resourceGroupLocation
} else {
Write-Host "Using existing resource group '$resourceGroupName'";
Write-Host "Using existing resource group '$resourceGroupName'";
}
# Start the deployment
Write-Host "Starting deployment...";
if (Test-Path $parametersFilePath) {
New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateFile $templateFilePath -TemplateParameterFile $parametersFilePath;
New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateUri $templateLocation -TemplateParameterFile $parametersFilePath;
} else {
New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateFile $templateFilePath;
New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -TemplateUri $templateLocation;
}
@@ -2,6 +2,15 @@
# Copyright (C) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License. See LICENSE in project root for information.
# This script deploys a Spark Cluster and a GPU, see docs/gpu-setup.md
# for details.
# <=<= this line is replaced with variables defined with `defvar -X` =>=>
DOWNLOAD_URL="$STORAGE_URL/$MML_VERSION"
if [[ -z "$MML_VERSION" ]]; then
echo "Error: this script cannot be executed as-is" 1>&2; exit 1
fi
set -euo pipefail
# -e: exit if any command has a non-zero exit status
# -u: unset variables are an error
@@ -71,7 +80,8 @@ readarg subscriptionId "Subscription ID" "$cursub"
readarg -r resourceGroupName "Resource Group Name"
readarg deploymentName "Deployment Name"
readarg resourceGroupLocation "Resource Group Location"
readarg templateLocation "Template Location (Path/URL)" "$here/deploy-main-template.json"
readarg templateLocation "Template Location URL" \
"$DOWNLOAD_URL/deploy-main-template.json"
readarg -rf parametersFilePath "Parameters File"
if [[ "$subscriptionId" != "$cursub" ]]; then
@@ -99,12 +109,7 @@ echo "Starting deployment..."
args=()
if [[ -n "$deploymentName" ]]; then args+=(--name "$deploymentName"); fi
args+=(--resource-group "$resourceGroupName")
if [[ "$templateLocation" = "http://"* ]]; then args+=(--template-uri)
elif [[ "$templateLocation" = "https://"* ]]; then args+=(--template-uri)
elif [[ -r "$templateLocation" ]]; then args+=(--template-file)
else failwith "templateLocation is neither a URL, nor does it point at a file"
fi
args+=("$templateLocation")
args+=(--template-uri "$templateLocation")
args+=(--parameters "@$parametersFilePath")
az group deployment create "${args[@]}" || failwith "Deployment failed"
Oops, something went wrong.

0 comments on commit 501da9d

Please sign in to comment.