---

![alt text](https://pbs.twimg.com/profile_images/966440541859688448/PoHJY3K8_400x400.jpg =150x150) 
![alt text](https://movielens.org/images/movielens-logo.svg =400x150)

---

# Tutorial - Google Cloud Machine Learning Engine with Colaboratory and MovieLens Dataset

a. [Google Cloud Machine Learning (ML) Engine](https://cloud.google.com/ml-engine/) is a managed service that enables developers and data scientists to build and bring superior machine learning models to production. Cloud ML Engine offers training and prediction services, which can be used together or individually. Cloud ML Engine is a proven service used by enterprises to solve problems ranging from identifying [clouds in satellite images](https://cloud.google.com/blog/products/gcp/google-cloud-machine-learning-now-open-to-all-with-new-professional-services-and-education-programs), ensuring [food safety](https://blog.google/products/google-cloud/how-google-cloud-transforming-japanese-businesses/), and responding four times faster to [customer emails](https://cloud.google.com/customers/ocado/).

b. [Colaboratory](https://colab.research.google.com/) is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

c. [MovieLens](https://movielens.org/) is a research site run by GroupLens Research at the University of Minnesota. MovieLens uses "collaborative filtering" technology to make recommendations of movies that you might enjoy, and to help you avoid the ones that you won't. Based on your movie ratings, MovieLens generates personalized predictions for movies you haven't seen yet.

## MovieLens Dataset

The MovieLens sample demonstrates how to build personalized recommendation models to recommend movies to users based on movie ratings data from [MovieLens 20M dataset](https://grouplens.org/datasets/movielens/20m/). 

**Statistic**:
20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags. Released 4/2015; updated 10/2016 to update links.csv and add tag genome data.

## Data Format

The dataset contains several files in CSV format. Following files are used in the sample:

### Ratings

All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:
```
    userId, movieId, rating, timestamp
```

The lines within this file are ordered first by userId, then, within user, by movieId.

Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).

Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.

### Movies

Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format:
```
    movieId, title, genres
```

Movie titles are entered manually or imported from <https://www.themoviedb.org/>, and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.

### Links

Identifiers that can be used to link to other sources of movie data are contained in the file `links.csv`. Each line of this file after the header row represents one movie, and has the following format:
```
    movieId, imdbId, tmdbId
```

**movieId** is an identifier for movies used by <https://movielens.org>. E.g., the movie Toy Story has the link <https://movielens.org/movies/1>.
 
**imdbId** is an identifier for movies used by <http://www.imdb.com>. E.g., the movie Toy Story has the link <http://www.imdb.com/title/tt0114709/>.

**tmdbId** is an identifier for movies used by <https://www.themoviedb.org>. E.g., the movie Toy Story has the link <https://www.themoviedb.org/movie/862>.

## 1 Prepare Environment

### 1.1 Set up your Google Cloud Platform

Complete the following steps to set up a GCP account and activate the Cloud ML Engine API:

* Select or create a GCP project, [here](https://console.cloud.google.com/cloud-resource-manager).

* Make sure that billing is enabled for your project, [here](https://cloud.google.com/billing/docs/how-to/modify-project).

* Enable the Cloud Machine Learning Engine and Compute Engine APIs, [here](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component&_ga=2.86255384.-1663335009.1546527394).

In [1]:
# Initialize and configure the gcloud

# Pick configuration to use:  Re-initialize this configuration [default] with new settings:  1
# You must log in to continue. Would you like to log in (Y/n)?:  Y
# Authorization is necessary
# Choose the cloud project number
# Do you want to configure a default Compute Region and Zone? (Y/n)?:  Y
# Which Google Compute Engine zone would you like to use as project default?:  number of region (For example, "8" to us-central1-a)

# authorization will be necessary
!gcloud auth application-default login

# authorization will be necessary again
!gcloud init

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&prompt=select_account&response_type=code&client_id=764086051850-6qr4p6gpi6hn506pt8ejuq83di341hur.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform&access_type=offline


Enter verification code: 4/2gB8zOw6IHHJNboiD3gIeSbrtC6eg6aPYlT-MJjENsw2RG5LpJxavpY

Credentials saved to file: [/content/.config/application_default_credentials.json]

These credentials will be used by any library that requests
Application Default Credentials.

To generate an access token for other uses, run:
  gcloud auth application-default print-access-token
Welcome! This command will take you through the configuration of gcloud.

Settings from your current configuration [default] are:
component_manager:
  disable_update_check: 'true'
core:
  disable_usage_reporting: 'True'

Pick config

### 1.2 Set up your Cloud Storage Bucket

This section shows you how to create a new bucket. You can use an existing bucket, but if it is not part of the project you are using to run Cloud ML Engine, you must explicitly [grant access to the Cloud ML Engine service accounts](https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage#setup-different-project).

1. Specify a name for your new bucket. The name must be **unique across all buckets in Cloud Storage**.
2. Define the region of your new bucket. More information about Region and Zone, [here](https://cloud.google.com/compute/docs/regions-zones).

In [0]:
import os

# define the project name and region
os.environ["PROJECT_NAME"] = "movielens"
os.environ["REGION"] = "us-central1"

Create the new bucket:

In [5]:
# get project id
cmd = "gcloud config list project --format 'value(core.project)'"
os.environ["PROJECT_ID"] = os.popen(cmd).read()

# form bucket name
os.environ["BUCKET_NAME"] = ("%s-%s" % (os.environ["PROJECT_NAME"], os.environ["PROJECT_ID"].strip()))

# set google cloud storage path
os.environ["GCS_PATH"] = ("%s/%s" % (os.environ["BUCKET_NAME"], os.environ["PROJECT_NAME"]))

!echo "Bucket: $BUCKET_NAME"
!echo "Region: $REGION"
!echo "Storage path: $GCS_PATH"

# create bucket
!gsutil mb -l $REGION gs://$BUCKET_NAME

Bucket: movielens-poli-118822
Region: us-central1
Storage path: movielens-poli-118822/movielens
Creating gs://movielens-poli-118822/...


### 1.3 Mount Google Drive Project Path

In [7]:
from google.colab import drive

# authorization will be necessary
gdrive = os.path.join(os.sep, "content", "drive")
drive.mount(gdrive)

# change initial directory to the project (your google drive)
# default: /content/drive/My Drive/Colab Notebooks/movielens/
project_dir = os.path.join(gdrive, "My Drive", "Colab Notebooks", "movielens")
os.chdir(project_dir)

# copy local data to the cloud
!gsutil -m cp -r "$PWD" gs://$GCS_PATH

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
Copying file:///content/drive/My Drive/Colab Notebooks/movielens/config.yaml [Content-Type=application/octet-stream]...
Copying file:///content/drive/My Drive/Colab Notebooks/movielens/config_hypertune.yaml [Content-Type=application/octet-stream]...
Copying file:///content/drive/My Drive/Colab Notebooks/movielens/README.md [Content-Type=text/markdown]...
Copying file:///content/drive/My Drive/Colab Notebooks/movielens/__init__.py [Content-Type=text/x-python]..

### 1.4 Install Dependencies

In [8]:
# necessary for python-snappy
!sudo apt-get install libsnappy-dev

# python-snappy
# six==1.11
# tensorflow==1.3.0
# tensorflow-transform==0.3.1
!pip install -r requirements.txt --no-cache-dir

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  libsnappy-dev
0 upgraded, 1 newly installed, 0 to remove and 13 not upgraded.
Need to get 27.2 kB of archives.
After this operation, 108 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 libsnappy-dev amd64 1.1.7-1 [27.2 kB]
Fetched 27.2 kB in 1s (45.0 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package libsnappy-dev:amd64.
(Reading database ... 110851 files and

## 2 Get MovieLens Dataset

The MovieLens dataset is available in many different sizes:

* [20M dataset](http://files.grouplens.org/datasets/movielens/ml-20m.zip) (198mb).

* [MovieLens small dataset](http://files.grouplens.org/datasets/movielens/ml-latest-small.zip) (0.9mb).

For our example, **small dataset** is the default (faster).

In [0]:
# folder to save dataset
os.environ["DATA_DIR"] = "data"

# Get the dataset (20M or Small) passing the link to download.
# 20M: ml-20m
# Small [default]: ml-latest-small
os.environ["DATASET"] = "ml-latest-small"

Download and unzip:

In [10]:
os.environ["DATASET_URL"] = ("http://files.grouplens.org/datasets/movielens/%s.zip" % os.environ["DATASET"])

# download...
!wget -c $DATASET_URL -O tmp.zip

# unzip and clear
!unzip -o tmp.zip -d $DATA_DIR && rm tmp.zip

--2019-01-22 17:02:18--  http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.34.235
Connecting to files.grouplens.org (files.grouplens.org)|128.101.34.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 978202 (955K) [application/zip]
Saving to: ‘tmp.zip’


2019-01-22 17:02:19 (3.11 MB/s) - ‘tmp.zip’ saved [978202/978202]

Archive:  tmp.zip
   creating: data/ml-latest-small/
  inflating: data/ml-latest-small/links.csv  
  inflating: data/ml-latest-small/tags.csv  
  inflating: data/ml-latest-small/ratings.csv  
  inflating: data/ml-latest-small/README.txt  
  inflating: data/ml-latest-small/movies.csv  


## 3 Pre-Processing Step

The pre-processing step can be performed either locally or on cloud _depending on the size of the input_.

We will read the above files and convert them into [TFRecords](https://www.tensorflow.org/api_guides/python/python_io)
format for training.

*   Ratings are split into training and evaluation sets based on user_id. <br/> The percentage of users that are included in the evaluation set is controlled by the flag ```--percent_eval```.

*   For users in the training set, we generate one example for each rating. <br/> Each example includes the movie ids and ratings of all movies rated the same user except the candidate movie.

*   Additional negative examples are generated by randomly picking movies not rated by the user and creating an example with a 0-star rating for the movie. <br/> The ratio of negatives to positives is controlled by the flag ```--negative_sample_ratio```. A reasonable value of 1-3 means we want 1-3 times as many negative examples as positives.

*   For users in the eval set, we follow the same process as in training, except that we do not generate negative examples. <br/> When ```eval_type``` is ranking, movies with ratings below ```--eval_score_threshold``` are removed from the evaluation dataset.
    
*   For each example in the evaluation set, a set of ranking candidates can be generated where.

### 3.1 Local Run

First set input and output data path:

In [0]:
os.environ["INPUT_DIR"] = os.path.join(os.environ["DATA_DIR"], os.environ["DATASET"])
os.environ["OUTPUT_DIR"] = "output"

Run the code as below to pre-processing:

In [12]:
# clear output directory (if exists)
!rm -Rf $OUTPUT_DIR

print("Local pre-processing...")

!time python preprocess.py --input_dir $INPUT_DIR \
                           --output_dir $OUTPUT_DIR \
                           --percent_eval 20 \
                           --negative_sample_ratio 1 \
                           --eval_type ranking \
                           --eval_score_threshold 4.1 \
                           --num_ranking_candidate_movie_ids 1000 \
                           --partition_random_seed 0

Local pre-processing...


### 3.2 Cloud Run

First set project, bucket, and data path:

In [14]:
# set input data on cloud
os.environ["GCS_INPUT_DIR"] = os.path.join(os.environ["GCS_PATH"], os.environ["DATA_DIR"], os.environ["DATASET"])
!echo "Input URL:" $GCS_INPUT_DIR

# set output data on cloud
os.environ["GCS_OUTPUT_DIR"] = os.path.join(os.environ["GCS_PATH"], os.environ["OUTPUT_DIR"])
!echo "Output URL:" $GCS_OUTPUT_DIR

# clear directory (if exists)
!gsutil -m rm -r gs://$GCS_INPUT_DIR

# copy local data to the cloud
!gsutil -m cp -r $INPUT_DIR gs://$GCS_INPUT_DIR

Input URL: movielens-poli-118822/movielens/data/ml-latest-small
Output URL: movielens-poli-118822/movielens/output
Removing gs://movielens-poli-118822/movielens/data/ml-latest-small/README.txt#1548176325553663...
Removing gs://movielens-poli-118822/movielens/data/ml-latest-small/movies.csv#1548176325897768...
Removing gs://movielens-poli-118822/movielens/data/ml-latest-small/links.csv#1548176325649197...
Removing gs://movielens-poli-118822/movielens/data/ml-latest-small/ratings.csv#1548176326000562...
Removing gs://movielens-poli-118822/movielens/data/ml-latest-small/tags.csv#1548176325664064...
/ [5/5 objects] 100% Done                                                       
Operation completed over 5 objects.                                              
Copying file://data/ml-latest-small/tags.csv [Content-Type=text/csv]...
Copying file://data/ml-latest-small/README.txt [Content-Type=text/plain]...
Copying file://data/ml-latest-small/movies.csv [Content-Type=text/csv]...
Copying file

We can now run the pre-processing code on cloud as below:

In [17]:
# clear directory (if exists)
!gsutil -m rm -r gs://$GCS_OUTPUT_DIR

print("Cloud pre-processing...")

!time python preprocess.py --input_dir gs://"${GCS_INPUT_DIR}" \
                           --output_dir gs://"${GCS_OUTPUT_DIR}" \
                           --percent_eval 20 \
                           --project_id ${PROJECT_ID} \
                           --negative_sample_ratio 1 \
                           --eval_type ranking \
                           --eval_score_threshold 4.1 \
                           --num_ranking_candidate_movie_ids 1000 \
                           --partition_random_seed 0 \
                           --cloud

CommandException: 1 files/objects could not be removed.
Cloud pre-processing...
  options = pbegin.pipeline.options.view_as(DebugOptions)



real	14m11.279s
user	0m17.422s
sys	0m1.351s


## 4 Model Training Step

Model training step takes the pre-processed TFRecords and trains recommendation models such as matrix factorization and deep neural network model.

The matrix factorization model associates each user u with a user-factor vector -- p_u -- and each item i with an item-factor vector -- q_i. The goal is to learn p_u and q_i that minimizes reconstruction error (L2 loss) between the true and the predicted ratings (as computed from p_u^T * q_i).

For matrix factorization, `eval_type` can be `regression` or `ranking`.

*  In regression mode, we predict the rating of a target movie. RMSE (root mean square error) and MAE (mean absolute error) metrics are used to compare model performance.

*  In ranking mode, we produce a ranked list of top K recommended movies and evaluate the models using recall@K, precision@K and MAP@K metrics.

The deep model uses a neural network to learn low-dimensional representations of users and items (i.e. user and item embeddings). The model learns a linear embedding that maps user features -- based on rated movie IDs and their genres -- to 64-dim vectors which are then passed through 2 hidden layers with ReLu activation functions. The output of the last hidden layers is fed into a softmax layer to make prediction on the most likely item to recommend. During training, the network parameters are learned to minimize a cross entropy loss between the predicted class and the heldout class (item ID). Note that for DNN softmax model, `eval_type` must be set to `ranking`.

### 4.1 Local Run

In [18]:
# clear directory (if exists)
!rm -rf gs://$OUTPUT_DIR/model/dnn_softmax

print("Local training...")

!python trainer/task.py --raw_metadata_path "${OUTPUT_DIR}/raw_metadata" \
                        --transform_savedmodel "${OUTPUT_DIR}/transform_fn" \
                        --train_data_paths "${OUTPUT_DIR}/features_train*" \
                        --eval_data_paths "${OUTPUT_DIR}/features_eval*" \
                        --output_path "${OUTPUT_DIR}/model/dnn_softmax" \
                        --train_steps 500 \
                        --eval_steps 30 \
                        --model_type dnn_softmax \
                        --eval_type ranking \
                        --l2_weight_decay 0.001 \
                        --learning_rate 0.01

Local training...
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 5, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f68980cdf50>, '_model_dir': 'output/model/dnn_softmax', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_tf_random_seed': None, '_save_summary_steps': 100, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_log_step_count_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:Create CheckpointSaverHook.
2019-01-22 17:33:30.841881: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are av

### 4.2 Cloud Run

Example run to train a DNN softmax model:

In [19]:
import datetime as dt
now = dt.datetime.now()

os.environ["JOB_ID"] = os.environ["PROJECT_NAME"] + now.strftime("_%Y%m%d_%H%M%S")

print("Cloud training...")

!gcloud ml-engine jobs submit training "$JOB_ID" \
                      --stream-logs \
                      --module-name trainer.task \
                      --package-path trainer \
                      --staging-bucket gs://"$BUCKET_NAME" \
                      --region "$REGION" \
                      --config config.yaml \
                      -- \
                      --raw_metadata_path gs://"${GCS_OUTPUT_DIR}/raw_metadata" \
                      --transform_savedmodel gs://"${GCS_OUTPUT_DIR}/transform_fn" \
                      --eval_data_paths gs://"${GCS_OUTPUT_DIR}/features_eval*.tfrecord.gz" \
                      --train_data_paths gs://"${GCS_OUTPUT_DIR}/features_train*.tfrecord.gz" \
                      --output_path gs://"${GCS_PATH}/model/${JOB_ID}" \
                      --train_steps 500 \
                      --eval_steps 30 \
                      --model_type dnn_softmax \
                      --eval_type ranking \
                      --l2_weight_decay 0.001 \
                      --learning_rate 0.01 \

Cloud training...
Job [movielens_20190122_174054] submitted successfully.
INFO	2019-01-22 17:40:59 +0000	service		Validating job requirements...
INFO	2019-01-22 17:40:59 +0000	service		Job creation request has been successfully validated.
INFO	2019-01-22 17:40:59 +0000	service		Job movielens_20190122_174054 is queued.
INFO	2019-01-22 17:40:59 +0000	service		Waiting for job to be provisioned.
INFO	2019-01-22 17:44:31 +0000	service		Waiting for training program to start.
INFO	2019-01-22 17:45:45 +0000	ps-replica-0		Running task with arguments: --cluster={"master": ["cmle-training-master-482a9b8aee-0:2222"], "ps": ["cmle-training-ps-482a9b8aee-0:2222"], "worker": ["cmle-training-worker-482a9b8aee-0:2222", "cmle-training-worker-482a9b8aee-1:2222"]} --task={"type": "ps", "index": 0} --job={
INFO	2019-01-22 17:45:45 +0000	ps-replica-0		  "scale_tier": "CUSTOM",
INFO	2019-01-22 17:45:45 +0000	ps-replica-0		  "master_type": "standard_gpu",
INFO	2019-01-22 17:45:45 +0000	ps-replica-0		  "worker

## 5 Prediction step

Once the model finishes training we can deploy it into CloudML Engine for prediction.

first select a model from the export directory:

In [47]:
# list the models
!gsutil ls gs://"${GCS_PATH}/model/${JOB_ID}/export/Servo/"

gs://movielens-poli-118822/movielens/model/movielens_20190122_174054/export/Servo/
gs://movielens-poli-118822/movielens/model/movielens_20190122_174054/export/Servo/1548179417/


In [45]:
# set the model source from the output above up to the timestamp, for example
# MODEL_SOURCE = gs://my-bucket/movielens_deep_20170704_134621/1499209413002
os.environ["MODEL_SOURCE"] = "%s/model/%s/export/Servo/%s" % (os.environ["GCS_PATH"], os.environ["JOB_ID"], "1548179417")

!echo "Model source:" $MODEL_SOURCE

Model source: movielens-poli-118822/movielens/model/movielens_20190122_174054/export/Servo/1548179417


In [0]:
# deploy a model to CloudML Engine.
!gcloud ml-engine models create $PROJECT_NAME --regions $REGION
!gcloud ml-engine versions create "v1" --model $PROJECT_NAME --origin gs://"${MODEL_SOURCE}" --runtime-version=1.4

Now we are ready to issue online prediction requests. Each instance in input file results in top_k_infer related movie ids along with their ranking scores (by default top_k_infer=100).

Select a small JSON text file from the preprocessed data. Each of the entries on the file is a b64 encoded tf.Example record suitable for online prediction.

In [53]:
# list the exists files .txt to predict
!gsutil ls -lh gs://"${GCS_OUTPUT_DIR}/features_predict-*txt"

 10.73 MiB  2019-01-22T17:23:47Z  gs://movielens-poli-118822/movielens/output/features_predict-00000-of-00003.txt
 15.57 MiB  2019-01-22T17:23:47Z  gs://movielens-poli-118822/movielens/output/features_predict-00001-of-00003.txt
 18.06 MiB  2019-01-22T17:23:47Z  gs://movielens-poli-118822/movielens/output/features_predict-00002-of-00003.txt
TOTAL: 3 objects, 46509948 bytes (44.36 MiB)


In [77]:
# Copy the file locally with the following command
# !gsutil cp gs://${GCS_OUTPUT_DIR}/features_predict-00476-of-00560.txt ./

os.environ["FILE_TXT"] = "features_predict-00000-of-00003.txt"
os.environ["PREDICTION_FILE"] = "predict.txt"

!gsutil cp gs://"${GCS_OUTPUT_DIR}/$FILE_TXT" ./
  
!head -100 $FILE_TXT > $PREDICTION_FILE
  
!ls -l ./

Copying gs://movielens-poli-118822/movielens/output/features_predict-00000-of-00003.txt...
- [1 files][ 10.7 MiB/ 10.7 MiB]                                                
Operation completed over 1 objects/10.7 MiB.                                     
total 12155
-rw------- 1 root root      335 Jan 17 19:53 config_hypertune.yaml
-rw------- 1 root root      171 Jan 17 19:53 config.yaml
drwx------ 3 root root     4096 Jan 22 17:02 data
-rw------- 1 root root 11247700 Jan 22 19:34 features_predict-00000-of-00003.txt
-rw------- 1 root root        1 Jan 17 19:53 __init__.py
drwx------ 7 root root     4096 Jan 22 17:33 output
-rw------- 1 root root  1117120 Jan 22 19:34 predict.txt
drwx------ 2 root root     4096 Jan 17 19:53 preproc
-rw------- 1 root root    19381 Jan 21 16:59 preprocess.py
-rw------- 1 root root    13042 Jan 21 19:36 README.md
-rw------- 1 root root      269 Jan 18 20:05 requirements.txt
-rw------- 1 root root    24885 Jan 22 14:41 setup.ipynb
-rw------- 1 root root     

Run online prediction.

In [78]:
# predict the first 50 items using the model created
!gcloud ml-engine predict --model $PROJECT_NAME --version "v1" --json-instances "${PREDICTION_FILE}"

CLASSES                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              SCORES
[u'1', u'2', u'0', u'3', u'5', u'11', u'18', u'48', u'17', u'30', u'7', u'16', u'8', u'25', u'15', u'10', u'22', u'38', u'36', u'27', u'6', u'49', u'37', u'79', u'80', u'13', u'326', u'19', u'44', u'179', u'50', u'96', u'24', u'109', u'

Once the job is successfully submitted, you should be able to find it on the ML Engine section of the Google Cloud Platform [console](https://pantheon.corp.google.com/mlengine/jobs). Check the job status there. Eventually the resultant file(s) should be on the GCS location in the specified output path.