<a href="https://colab.research.google.com/github/abouslima/AI-Makerspace/blob/master/VertexAI/PyCaret_VertexAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Custom training and prediction for Vertix AI, using pycaret model

Loan classification problem

**Installing the Vertex SDK and the other dependencies for application using colab notebook**

In [None]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"

! pip install {USER_FLAG} --upgrade google-cloud-aiplatform
! pip install {USER_FLAG} --upgrade google-cloud-storage
! pip install {USER_FLAG} --upgrade pillow
! pip install {USER_FLAG} --upgrade numpy

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

**Vertex AI should be enabled within a project.**



*   Saving the timestamp to use in naming the pipeline
*   Saving the project name, the bucket for storage and the region in variables



In [1]:
from datetime import datetime
import os

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
PROJECT_ID = "vertex-ai-makerspace"  # @param {type:"string"}
BUCKET_NAME = "gs://my-ai-makerspace-bucket-pycaret"  # @param {type:"string"}
REGION = "europe-west4"  # @param {type:"string"}

**Authenticating Google Cloud account**

In [2]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# If on Google Cloud Notebooks, then don't execute this code
if not IS_GOOGLE_CLOUD_NOTEBOOK:
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

Run the following if the storage bucket isn't created in advance

In [3]:
! gsutil mb -p $PROJECT_ID -l $REGION $BUCKET_NAME

Creating gs://my-ai-makerspace-bucket-pycaret/...


Validating access to the cloud storage bucket

In [4]:
! gsutil ls -al $BUCKET_NAME

**Importing the AI Platform library and instantiating it**

In [5]:
from google.cloud import aiplatform
#from google.cloud.aiplatform import gapic as aip

aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)

Specifiying the container for training and predictions

[Pre-Built Training Containers](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) | [Pre-Built Predicting/Deployment Containiers](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)

In [6]:
TRAIN_VERSION = "scikit-learn-cpu.0-23" # @param {type:"string"}
DEPLOY_VERSION = "sklearn-cpu.0-23" # @param {type:"string"}

TRAIN_IMAGE = "gcr.io/cloud-aiplatform/training/{}:latest".format(TRAIN_VERSION)
DEPLOY_IMAGE = "gcr.io/cloud-aiplatform/prediction/{}:latest".format(DEPLOY_VERSION)

Specifiying training and Deployment Machine type and number of vCPU's

In [7]:
MACHINE_TYPE = "n1-standard" # @param {type:"string"}

VCPU = "8" # @param {type:"string"}
TRAIN_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Train machine type", TRAIN_COMPUTE)

MACHINE_TYPE = "n1-standard" # @param {type:"string"}

VCPU = "8" # @param {type:"string"}
DEPLOY_COMPUTE = MACHINE_TYPE + "-" + VCPU
print("Deploy machine type", DEPLOY_COMPUTE)

Train machine type n1-standard-8
Deploy machine type n1-standard-8


In [8]:
JOB_NAME = "custom_job_pycaret_" + TIMESTAMP  # the name of the experiment

The following script loads, builds, compiles, trains and save the model to the specified directory "cloud storage"

[Model Export documentation](https://cloud.google.com/vertex-ai/docs/training/exporting-model-artifacts#tensorflow)

In [9]:
%%writefile task.py

from google.cloud import storage
import pandas as pd
from pycaret.classification import *
import os
import joblib

dataset = pd.read_csv("https://raw.githubusercontent.com/DigitalProductschool/AI-Makerspace/master/PyCaret-Classification/UniversalBank.csv")
dataset.columns = [i.replace(" ", "") for i in dataset.columns]
dataset.drop(["ID","ZIPCode"],axis=1,inplace=True)
cat_cols = ["Family","Education","SecuritiesAccount","CDAccount","Online","CreditCard"]
data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)
exp_1 = setup(data = data, 
                    session_id=123, 
                    target = 'PersonalLoan', 
                    categorical_features=cat_cols,
                    normalize=True, 
                    normalize_method='minmax',
                    transformation=True,
                    use_gpu=False,
                    log_experiment=True,
                    experiment_name='loan1',
                    silent=True)

best = compare_models()

tuned_model = tune_model(best, optimize = 'AUC') # Optimize - Measure used to select the best model through hyperparameter tuning.
final_model = finalize_model(tuned_model, model_only=True)


artifact_filename = 'model.joblib'

# Save model artifact to local filesystem (doesn't persist)
local_path = artifact_filename
joblib.dump(final_model, local_path)

# Upload model artifact to Cloud Storage
model_directory = os.environ['AIP_MODEL_DIR']
storage_path = os.path.join(model_directory, artifact_filename)
blob = storage.blob.Blob.from_string(storage_path, client=storage.Client())
blob.upload_from_filename(local_path)

Writing task.py


Defining the custom training job.

In [None]:
job = aiplatform.CustomTrainingJob(
    display_name=JOB_NAME,
    script_path="task.py",
    container_uri=TRAIN_IMAGE,
    requirements=["pycaret", "numpy==1.19.5"],
    model_serving_container_image_uri=DEPLOY_IMAGE
)

MODEL_DISPLAY_NAME = "pycaret-" + TIMESTAMP

model = job.run(
    model_display_name=MODEL_DISPLAY_NAME,
    replica_count=1,
    machine_type=TRAIN_COMPUTE,
)

**Deploying the trained model**

In [11]:
DEPLOYED_NAME = "pycaret-" + TIMESTAMP

TRAFFIC_SPLIT = {"0": 100}

MIN_NODES = 1
MAX_NODES = 1


endpoint = model.deploy(
    deployed_model_display_name=DEPLOYED_NAME,
    traffic_split=TRAFFIC_SPLIT,
    machine_type=DEPLOY_COMPUTE,
    min_replica_count=MIN_NODES,
    max_replica_count=MAX_NODES
)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/249560904503/locations/europe-west4/endpoints/8426569154345041920/operations/6083812241252024320
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/249560904503/locations/europe-west4/endpoints/8426569154345041920
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/249560904503/locations/europe-west4/endpoints/8426569154345041920')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/249560904503/locations/europe-west4/endpoints/8426569154345041920
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/249560904503/locations/europe-west4/endpoints/8426569154345041920/operations/6859275802089881600
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/249560

Loading and preprocessing the test dataset

In [12]:
import pandas as pd
dataset = pd.read_csv("https://raw.githubusercontent.com/DigitalProductschool/AI-Makerspace/master/PyCaret-Classification/UniversalBank.csv")
dataset.columns = [i.replace(" ", "") for i in dataset.columns]
dataset.drop(["ID","ZIPCode"],axis=1,inplace=True)
cat_cols = ["Family","Education"]
data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)
data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

In [13]:
X_train = pd.get_dummies(data, columns=cat_cols).drop(["PersonalLoan"], axis=1)
X_test = pd.get_dummies(data_unseen, columns=cat_cols).drop(["PersonalLoan"], axis=1)

Finding the accuracy of the predictions

In [14]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
predictions = endpoint.predict(instances=X_test.tolist())
from sklearn.metrics import accuracy_score
accuracy_score(data_unseen.PersonalLoan.values, predictions[0])

0.856

If deployed using the UI

In [15]:
ENDPOINT_ID="2310680860375908352" # @param {type:"string"}
PROJECT_ID="249560904503" # @param {type:"string"}

endpoint_name= f"projects/{PROJECT_ID}/locations/europe-west4/endpoints/{ENDPOINT_ID}"
endpoint = aiplatform.Endpoint(endpoint_name=endpoint_name)

In [16]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
predictions = endpoint.predict(instances=X_test.tolist())
from sklearn.metrics import accuracy_score
accuracy_score(data_unseen.PersonalLoan.values, predictions[0])

0.856

Undeploy the model

In [None]:
deployed_model_id = endpoint.list_models()[0].id
endpoint.undeploy(deployed_model_id=deployed_model_id)