source: https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/courses/machine_learning/deepdive/09_sequence/labs

I have cut out the cloud portion so it is only local training, to check the ML engine part refer to the original source.

In [20]:
import numpy as np
import tensorflow as tf
import os

print(tf.__version__)

1.12.0


<h1> Time series prediction, end-to-end </h1>

This notebook illustrates several models to find the next value of a time-series:
<ol>
<li> Linear
<li> DNN
<li> CNN 
<li> RNN
</ol>

In [21]:
SEQ_LEN = 50

<h3> Simulate some time-series data </h3>

Essentially a set of sinusoids with random amplitudes and frequencies.

In [22]:
def create_time_series():
  freq = (np.random.random()*0.5) + 0.1  # 0.1 to 0.6
  ampl = np.random.random() + 0.5  # 0.5 to 1.5
  noise = [np.random.random()*0.3 for i in range(SEQ_LEN)] # -0.3 to +0.3 uniformly distributed
  x = np.sin(np.arange(0,SEQ_LEN) * freq) * ampl + noise
  return x

In [23]:
def to_csv(filename, N):
  with open(filename, 'w') as ofp:
    for i in range(0, N):
      seq = create_time_series()
      line = ",".join(map(str, seq))
      ofp.write(line + '\n')

In [24]:
try:
  os.makedirs('data/sines/')
  os.makedirs('trained/sines/')
except OSError:
  pass

In [25]:
to_csv('data/sines/train-1.csv', 1000)  # 1000 sequences
to_csv('data/sines/valid-1.csv', 250)   # 250 sequences

In [26]:
!head -3 data/sines/*-1.csv

==> data/sines/train-1.csv <==
0.2823323020847652,0.3819824615351557,0.5264839464174602,0.8694346660221864,0.8696421622571993,0.9619108797501927,1.1762664523525654,1.0273915009028465,1.1897408981682474,0.9943064020181687,1.026994254691376,0.7687822476134952,0.7044421887371354,0.6823893405730586,0.31492712144773544,0.1750388473597263,0.028814304623702935,-0.2346364273013079,-0.4188334478071756,-0.6258252466545875,-0.5886595393231615,-0.769428968191868,-0.7450343362176094,-0.796868624259724,-0.8194373350801099,-0.6953568239592307,-0.6016374553949322,-0.3232881822196406,-0.15679788545448375,0.029247133277437526,0.15492672970016955,0.2720499729211129,0.509378462436521,0.6493107019883545,0.9558202303551813,1.1097399432275652,1.005025676533218,1.2963929337850688,1.2868931670945596,1.0863171019861375,0.8998199340199049,0.978664343928614,0.8591521593536399,0.43779757987184875,0.24551261136735308,0.2822983303966359,-0.1413927939735008,-0.21370769957221858,-0.3361523268983017,-0.627038565098445

<h3> Train model locally </h3>

Make sure the code works as intended.

The `model.py` and `task.py` containing the model code is in <a href="sinemodel">sinemodel/</a>

**Complete the TODOs in `model.py` before proceeding!**

Once you've completed the TODOs, set `--model` below to the appropriate model (linear,dnn,cnn,rnn,rnn2 or rnnN) and run it locally for a few steps to test the code.

In [27]:
!echo $(pwd)

/home/local/git-private/Artificial-Neural-Networks/python/RNN-2layer-examples-2018DEC-B


In [49]:
#!/bin/sh
DATADIR="$(pwd)/data/sines"
OUTDIR="$(pwd)/trained/sines"


In [53]:
!echo $DATADIR
!echo $OUTDIR


/home/local/git-private/Artificial-Neural-Networks/python/RNN-2layer-examples-2018DEC-B/data/sines
/home/local/git-private/Artificial-Neural-Networks/python/RNN-2layer-examples-2018DEC-B/trained/sines


In [56]:
#!/bin/sh

gcloud ml-engine local train \
   --module-name=sinemodel.task \
   --package-path="${pwd}/sinemodel" \
   -- \
   --train_data_path="$DATADIR/train-1.csv" \
   --eval_data_path="$DATADIR/valid-1.csv"  \
   --output_dir="$OUTDIR" \
   --model=linear --train_steps=10 --sequence_length=$SEQ_LEN

SyntaxError: invalid syntax (<ipython-input-56-e3b2797538e4>, line 3)

<h3> Cloud ML Engine </h3>

Now to train on Cloud ML Engine with more data.

In [None]:
import shutil
shutil.rmtree('data/sines', ignore_errors=True)
os.makedirs('data/sines/')
for i in range(0,10):
  to_csv('data/sines/train-{}.csv'.format(i), 1000)  # 1000 sequences
  to_csv('data/sines/valid-{}.csv'.format(i), 250)

In [None]:
%bash
gsutil -m rm -rf gs://${BUCKET}/sines/*
gsutil -m cp data/sines/*.csv gs://${BUCKET}/sines

In [None]:
%%bash
for MODEL in linear dnn cnn rnn rnn2 rnnN; do
  OUTDIR=gs://${BUCKET}/sinewaves/${MODEL}
  JOBNAME=sines_${MODEL}_$(date -u +%y%m%d_%H%M%S)
  REGION=us-central1
  gsutil -m rm -rf $OUTDIR
  gcloud ml-engine jobs submit training $JOBNAME \
     --region=$REGION \
     --module-name=sinemodel.task \
     --package-path=${PWD}/sinemodel \
     --job-dir=$OUTDIR \
     --staging-bucket=gs://$BUCKET \
     --scale-tier=BASIC_GPU \
     --runtime-version=$TFVERSION \
     -- \
     --train_data_path="gs://${BUCKET}/sines/train*.csv" \
     --eval_data_path="gs://${BUCKET}/sines/valid*.csv"  \
     --output_dir=$OUTDIR \
     --train_steps=3000 --sequence_length=$SEQ_LEN --model=$MODEL
done

## Monitor training with TensorBoard

Use this cell to launch tensorboard. If tensorboard appears blank try refreshing after 5 minutes

In [None]:
from google.datalab.ml import TensorBoard
TensorBoard().start('gs://{}/sinewaves'.format(BUCKET))

In [None]:
for pid in TensorBoard.list()['pid']:
  TensorBoard().stop(pid)
  print 'Stopped TensorBoard with pid {}'.format(pid)

## Results

Complete the below table with your own results! Then compare your results to the results in the solution notebook.

| Model | Sequence length | # of steps | Minutes | RMSE |
| --- | ----| --- | --- | --- | 
| linear | 50 | 3000 | - | - |
| dnn | 50 | 3000 | - | - |
| cnn | 50 | 3000 | - | - |
| rnn | 50 | 3000 | - | - |
| rnn2 | 50 | 3000 | - | - |
| rnnN | 50 | 3000 | - | - |

Copyright 2017 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License