# Training a scikit learn model on AI Platform training. 
AI Platform training can be used to train models from Tensorflow, Keras, Scikit-learn, XGBoost and Custom containers on Google Cloud Platform. 

First go to the GCP console (console.cloud.google.com). When in the console navigate, using the menu on the left, to AI Platform -> Jobs. This will take you to the interface from where you will monitor your training jobs. 

Before we get started have a look at the scikit-learn model code and how the folders are structured. The code can be found in the folder: scikit-caip/trainer. 

Before you can start training the model we have to make sure that we have the required dependency installed. Run the following cells to install the needed dependencies. 

In [None]:
!pip3 install -r requirements.txt

In [None]:
!pip install pandas-gbq

### Local Training
First we have to check if our model trains locally on the Notebobok instance. This can help us with debugging any issues before training on the Google Cloud Platform. Before running the cell  make sure that you set pathoutput:

    --pathoutput gs://path_to_your_folder

Next have a look at how the parameters are set in the gcloud command. Everything before

    -- \ 

are gcloud specific parameters that we need to submit our training job. Everthing after are our command line parameters that go into our application code (trainer). This means that these are application specific. 

In [None]:
!gcloud ai-platform local train \
   --module-name=trainer.task \
   --package-path=trainer/ \
   -- \
   --pathdata gs://erwinh-mldemo/scikit/marketing-data.csv \
   --pathoutput <your_bucket_path> \ # set to your bucket. Example: gs://bucket/folder 
   --storage BQ \
   --bqtable kfp-primer-workshop.marketing_data.raw

### Train on the Cloud using AI Platform Training
If the local training works it's time to train our model on the cloud. Please change these two parameters before running the next cell:

    pathoutput gs://path_to_your_folder
    jobname <set_to_your_jobname>

Your job is submitted succesfully when you see the following:

    Job [your_jobname] submitted successfully.

Now you can go to the console to monitor your job. 

In [23]:
!gcloud ai-platform jobs submit training marketing_v1_99 \
   --staging-bucket=gs://kfp-scikit \
   --region=us-central1 \
   --module-name=trainer.task \
   --package-path=trainer \
   --runtime-version 1.14 \
   --python-version 3.5 \
   -- \
   --pathdata gs://kfp-scikit/data/scikit/marketing-data.csv \
   --pathoutput gs://kfp-scikit/model/output \
   --storage BQ \
   --bqtable kfp-primer-workshop.marketing_data.raw

Job [marketing_v1_99] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe marketing_v1_99

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs marketing_v1_99
jobId: marketing_v1_99
state: QUEUED


### Deploy model using AI Platform 
When the model is succesfully trained we can take the trained model and publish it as an API. Change the following variables:

    MODEL_NAME="<you_model_name>"
    MODEL_LOCATION="gs://<your_model_output_path>" 

Now check if your model is deployed. Go to the [console](console.cloud.google.com) -> AI Platform -> Models -> marketingpredictor. Here you will find your model. This can take a few minutes. 

In [44]:
!gcloud ai-platform versions create marketing_v1 \
--model marketingpredictor \
--origin gs://kfp-scikit/model/output/model/ \
--runtime-version 1.14 \
--framework scikit-learn \
--python-version 3.5

Creating version (this might take a few minutes)......done.                    


### Getting a prediction
After deploying our model we can call the API to get a prediction. Before running the next cell you should change:

    --version <you_model_name>

After running the cell you should see something like this:

    [True, True]

In [46]:
!gcloud ai-platform predict \
  --model marketingpredictor \
  --version marketing_v1 \
  --json-instances trainer/predict.json

[True, True]


Copyright 2019 Google Inc. All Rights Reserved. # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.