Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
README.md
mnist.py
requirements.txt

README.md

Distributed TensorFlow with Estimators

Clusterone

This is a tutorial on how to use TensorFlow's Estimator class, including creating an Estimator by importing a Keras model. The code uses the MNIST dataset.

The tutorial itself is published on our blog and can be found here.

Follow the instructions below to run the tutorial code locally and on Clusterone.

Table of Contents

Install

To run the code locally, you need:

  • Python 3.6
  • Git
  • TensorFlow 1.5 or higher. Install it like this: pip install tensorflow
  • The Clusterone Python library. Install it with pip install clusterone

To run this project on Clusterone, you need:

That's all you need! Add a project by linking this GitHub repo (clusterone/clusterone-tutorials) as shown here.

Setting Up

Follow the Set Up section of the Get Started guide to add your GitHub personal access token to your Clusterone account.

Then follow Create a project section to add clusterone-tutorials project. Use clusterone/clusterone-tutorials repository instead of what is shown in the guide.

Usage

You can run the tutorial code either on your local machine or on the Clusterone deep learning platform, even distributed over multiple GPUs. No code changes are necessary to switch between these modes.

Run the code locally

Start out by cloning this repository onto your local machine.

git clone https://github.com/clusterone/clusterone-tutorials

Then navigate to the directory with cd clusterone-tutorials.

Make sure you have all requirements installed that are listed above. Assuming all packages are installed correctly, you can run all script with python mnist.py. The script will download the mnist dataset and then start training. You can view the training results with Tensorboard with tensorboard --logdir=logs.

Run on Clusterone

These instructions use the just command line tool. It comes with the Clusterone Python library and is installed automatically with the library.

If you have used Clusterone library before with a different Clusterone installation, make sure it is connected to the correct endpoint by running just config endpoint https://clusterone.com.

Log into your Clusterone account using just login, and entering your login information.

First, let's make sure that you have the project. Execute the command just get projects to see all your projects. You should see something like this:

>> just get projects
All projects:

| # | Project                       | Created at          | Description |
|---|-------------------------------|---------------------|-------------|
| 0 | username/clusterone-tutorials | 2018-11-20T00:00:00 |             |

where username should be your Clusterone account name.

Let's create a job. Make sure to replace username with your username.

just create job distributed \
  --project username/clusterone-tutorials \
  --name distributed-mnist-job \
  --worker-replicas 2 \
  --worker-type aws-t2-small \
  --docker-image tensorflow-1.11.0-cpu-py35 \
  --ps-replicas 1 \
  --ps-type aws-t2-small \
  --ps-docker-image tensorflow-1.11.0-cpu-py35 \
  --time-limit 1h \
  --command "python tf-estimator/main.py" \
  --setup_command "pip install -r tf-estimator/requirements.txt"

This creates a job with 2 worker nodes and 1 parameter server. See our documentation for more information on how to change the number and instance types of worker and parameter servers.

Now the final step is to start the job:

just start job -p clusterone-tutorials/distributed-mnist-job

That's it! You can monitor its progress on the command line using just get events. More elaborate monitoring is available on the Matrix, Clusterone's graphical web interface.

More Info

For further information on this example, take a look at the tutorial based on this repository on the Clusterone Blog.

For a more updated MNIST example, check out our MNIST repo. We also have other examples here.

For further info on the MNIST dataset, check out Yann LeCun's page about it. To learn more about TensorFlow and Deep Learning in general, take a look at the TensorFlow website.

If you have any further questions, don't hesitate to reach out on Slack!

License

MIT © Clusterone Inc.

The MNIST dataset has been created and curated by Corinna Cortes, Christopher J.C. Burges, and Yann LeCun.