Skip to content

Latest commit

 

History

History
 
 

sales_forecasting

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Graphcore example applications: sales forecasting

This README describes how to run a sales forecasting machine learning model on Graphcore's IPUs.

Overview

This directory contains code to train a simple multi-layer perceptron (MLP) network to predict sales data.

The model predicts the amount of sales on a particular day given a set of features in the original Rossmann competition dataset.

Running the model

The following files are provided for running the sales forecasting model.

  • README.md This file.
  • main.py Main training and validation loop.
  • data.py Data pipeline.
  • model.py Model file.
  • util.py Helper functions.
  • test_model.py A test script. See below for how to run it.

Quick start guide

Prepare the environment

1) Download the Poplar SDK

Install the Poplar SDK following the instructions in the Getting Started guide for your IPU system. Make sure to source the enable.sh scripts for poplar.

2) Python

Create a virtualenv and install the required packages:

bash
virtualenv venv -p python3.6
source venv/bin/activate
pip install <path to the tensorflow-1 wheel file from the Poplar SDK>
pip install -r requirements.txt

Prepare dataset

1) Download dataset from Kaggle

The data for this example is from a Kaggle competition. You will need to create a Kaggle account from here: https://www.kaggle.com/account/login. Then, navigate to https://www.kaggle.com/c/rossmann-store-sales/data) and press the "Download All" button. If you haven't already, you will be asked to verify your Kaggle account via your mobile phone. After entering a valid mobile phone number you will receive an SMS message with a verification code. Alternatively, you can use the Kaggle API (https://github.com/Kaggle/kaggle-api) to download it via command line with kaggle competitions download rossmann-store-sales after setting up your Kaggle API token.

2) Extract the data into a folder

For example:

unzip rossmann-store-sales.zip -d rossmann-data

Run the training program

Run the program using main.py. Use the --datafolder/-d option to specify the path to the data folder. For example:

python main.py -d rossmann-data

Options

Use --help to show the available options.

--replication-factor will add data parallelism. IPU graph replication copies the graph to N IPUs, splits the data into N streams and trains the N graphs on the N streams in parallel. Periodically, the graphs' gradients are averaged and applied to all graphs. Set the number of replicas with this option. By default, no replication is done (i.e. this is 1). If validating with the --multiprocessing flag, this can be at max M/2, where M is the number of IPUs available, as each process needs N IPUs.

--multiprocessing is recommended if at least 2 IPUs are available. It will run the training and validation graphs in separate processes on separate IPUs, saving time loading and unloading programs from the device. By default, the same IPUs are shared for both training and validation, executed in sequence.

--no-validation disables validation.

--lr-schedule-type: The model can use a manual or a dynamic learning rate schedule. For manual learning rate schedules, you can specify n change points, as a ratio of training progress, and n+1 values of learning rate at each interval. For example, we can specify learning-rate-schedule as a comma separated list of two values 0.33,0.66 and learning-rate-decay as a comma separated list of three values 1,0.1,0.01 to drop the learning rate by a factor of 10 at 33% of training, and by a factor of 100 at 66% of training (relative to the initial learning rate).

For dynamic learning rate schedules, the model attempts to dynamically update learning rate based on validation (or training, if validation isn't supplied) progress. The dynamic scheduler has two features:

  • Reduction on plateau - where learning rate is reduced if training is stagnating. Set --lr-plateau-patience and --lr-schedule-plateau-factor for the responsiveness to training stagnation and rate of reduction of this mechanism.
  • A LR warmup at the start of training, off by default - where learning rate is gradually increased to its initial value at the start of training over a number of epochs. Set --lr-warmup to enable this mechanism and --lr-warmup-epochs to set the number of epochs the learning rate is warmed up over. The former mechanism doesn't apply when learning rate is being warmed up.

--base-learning-rate specifies the exponent of the base learning rate. The learning rate is set by lr = 2^blr * batch-size. See https://arxiv.org/abs/1804.07612 for more details.

--d/--datafolder sets the directory of the data folder. This should contain a preprocessed Rossmann dataset of train.csv and val.csv. Alternatively use --use-synthetic-data to use random data generated directly on the IPU as needed by the program, removing any host <-> IPU data transfers.

--log-dir sets the directory for model logs. The model will log summaries and checkpoints.

--batch-size and --validation-batch-size set the batch sizes of the training and validation graphs respectively.

--precision sets the precision of the variables and calculations respectively. Supply as a dot separated list e.g. 32.32 or 16.16

--no-prng disables stochastic rounding.

--loss-scaling sets the scaling of the loss. By default, this is 1 i.e. no loss scaling is done.

--weight-decay sets the rate of decay of the dense layers' kernels in the model.

--gradient-clipping clips the gradients between -1 and 1 before being applied in each update step.

--epochs sets the number of epochs to train for.

--select-ipus selects the IPUs to use. Use AUTO for automatic selection from available IPUs. Use a comma separated list of IPU IDs to specify the IPUs, by ID, for the training and validation processes to run on, respectively.

--valid-per-epoch sets the number of times to validate per epoch.

--batches-per-step sets the number of batches to complete every training step.

--steps-per-log sets the number of steps to take before logging, to output, of current training progress, repeatedly.

--use-init sets the initial weights of the model to be the same across separate runs.

--compiler-report causes the model to generate a compilation report and then terminate.

Test script

The test script performs basic tests. To run it you need add the utils directory at the top-level of this repository to your PYTHONPATH and also run pip install pytest. Then run

pytest test_main.py