# Distributed XGBoost (CPU)

Scaling out on AmlCompute is simple! The code from the previous notebook has been modified and adapted in [src/run.py](src/run.py). In particular, changes include:

- import and initialize dask_mpi
- use argparse to allow for command line argument inputs
- mlflow logging 

The [environment.yml](environment.yml) contains the conda environment specification.

## Get Workspace

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

## Distributed Remotely

Simply use ``MpiConfiguration`` with the desired node count.

In [None]:
from azureml.core import ScriptRunConfig, Experiment, Environment
from azureml.core.runconfig import MpiConfiguration

arguments = ["--num_boost_round", 100, "--learning_rate", 0.2, "--gamma", 0]
env = Environment.from_conda_specification("xgboost-cpu-tutorial", "environment.yml")
mpi_config = MpiConfiguration(node_count=10)
src = ScriptRunConfig(
    source_directory="src",
    script="run.py",
    arguments=arguments,
    compute_target="cpu-cluster",
    environment=env,
    distributed_job_config=mpi_config,
)
run = Experiment(ws, "xgboost-cpu-tutorial").submit(src)
run

## View Widget

Optionally, view the output in the run widget.

In [None]:
from azureml.widgets import RunDetails

RunDetails(run).show()

for testing, wait for the run to complete

In [None]:
run.wait_for_completion(show_output=True)