# Deploying Iris-detection model using Vertex AI


## Overview

In this tutorial, you build a scikit-learn model and deploy it on Vertex AI using the custom container method. You use the FastAPI Python web server framework to create a prediction endpoint. You also incorporate a preprocessor from training pipeline into your online serving application.

Learn more about [Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training) and [Vertex AI Prediction](https://cloud.google.com/vertex-ai/docs/predictions/get-predictions).

### Objective

In this notebook, you learn how to create, deploy and serve a custom classification model on Vertex AI. This notebook focuses more on deploying the model than on the design of the model itself. 


This tutorial uses the following Vertex AI services and resources:

- Vertex AI models
- Vertex AI endpoints

The steps performed include:

- Train a model that uses flower's measurements as input to predict the class of iris.
- Save the model and its serialized pre-processor.
- Build a FastAPI server to handle predictions and health checks.
- Build a custom container with model artifacts.
- Upload and deploy custom container to Vertex AI Endpoints.

### Dataset

This tutorial uses R.A. Fisher's Iris dataset, a small and popular dataset for machine learning experiments. Each instance has four numerical features, which are different measurements of a flower, and a target label that
categorizes the flower into: **Iris setosa**, **Iris versicolour** and **Iris virginica**.

This tutorial uses [a version of the Iris dataset available in the
scikit-learn library](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris).

### Costs 

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage
* Artifact Registry
* Cloud Build

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), [Artifact Registry pricing](https://cloud.google.com/artifact-registry/pricing) and [Cloud Build pricing](https://cloud.google.com/build/pricing) and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Get started

### Install Vertex AI SDK for Python and other required packages



In [1]:

# Vertex SDK for Python
! pip3 install --upgrade --quiet  google-cloud-aiplatform

### Set Google Cloud project information 
Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [2]:
PROJECT_ID = "operating-realm-460905-q8"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [3]:
BUCKET_URI = f"gs://mlops-bucket-week1"  # @param {type:"string"}

**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [4]:
! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://mlops-bucket-week1/...
ServiceException: 409 A Cloud Storage bucket named 'mlops-bucket-week1' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


### Initialize Vertex AI SDK for Python

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). 

In [5]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)

### Import the required libraries

In [6]:
import os
import sys

### Configure resource names

Set a name for the following parameters:

`MODEL_ARTIFACT_DIR` - Folder directory path to your model artifacts within a Cloud Storage bucket, for example: "my-models/fraud-detection/trial-4"

`REPOSITORY` - Name of the Artifact Repository to create or use.

`IMAGE` - Name of the container image that is pushed to the repository.

`MODEL_DISPLAY_NAME` - Display name of Vertex AI model resource.

In [7]:
MODEL_ARTIFACT_DIR = "my-models/iris-classifier-week-1"  # @param {type:"string"}
REPOSITORY = "iris-classifier-repo"  # @param {type:"string"}
IMAGE = "iris-classifier-img"  # @param {type:"string"}
MODEL_DISPLAY_NAME = "iris-classifier"  # @param {type:"string"}

# Set the defaults if no names were specified
if MODEL_ARTIFACT_DIR == "[your-artifact-directory]":
    MODEL_ARTIFACT_DIR = "custom-container-prediction-model"

if REPOSITORY == "[your-repository-name]":
    REPOSITORY = "custom-container-prediction"

if IMAGE == "[your-image-name]":
    IMAGE = "sklearn-fastapi-server"

if MODEL_DISPLAY_NAME == "[your-model-display-name]":
    MODEL_DISPLAY_NAME = "sklearn-custom-container"

## Simple Decision Tree model
Build a Decision Tree model on iris data

In [13]:
! mkdir data artifacts

In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas.plotting import parallel_coordinates
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn import metrics

data = pd.read_csv('iris.csv')
data.head(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [14]:
train, test = train_test_split(data, test_size = 0.4, stratify = data['species'], random_state = 42)
X_train = train[['sepal_length','sepal_width','petal_length','petal_width']]
y_train = train.species
X_test = test[['sepal_length','sepal_width','petal_length','petal_width']]
y_test = test.species

In [15]:
mod_dt = DecisionTreeClassifier(max_depth = 3, random_state = 1)
mod_dt.fit(X_train,y_train)
prediction=mod_dt.predict(X_test)
print('The accuracy of the Decision Tree is',"{:.3f}".format(metrics.accuracy_score(prediction,y_test)))

The accuracy of the Decision Tree is 0.983


In [16]:
import pickle
import joblib

joblib.dump(mod_dt, "artifacts/model.joblib")

['artifacts/model.joblib']

### Upload model artifacts and custom code to Cloud Storage

Before you can deploy your model for serving, Vertex AI needs access to the following files in Cloud Storage:

* `model.joblib` (model artifact)
* `preprocessor.pkl` (model artifact)

Run the following commands to upload your files:

In [17]:
# !gsutil cp artifacts/model.joblib {BUCKET_URI}/{MODEL_ARTIFACT_DIR}/

Copying file://artifacts/model.joblib [Content-Type=application/octet-stream]...
/ [1 files][  2.6 KiB/  2.6 KiB]                                                
Operation completed over 1 objects/2.6 KiB.                                      


In [16]:
# -- Step 4: Initialize DVC
!git init
!dvc init -f
!git add .dvc .gitignore
!git commit -m "Initialize DVC"

Reinitialized existing Git repository in /home/jupyter/.git/
Initialized DVC repository.

You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[0m[31m|[0m                                                                     [31m|[0m
[31m|[0m        DVC has enabled anonymous aggregate usage analytics.         [31m|[0m
[31m|[0m     Read the analytics documentation (and how to opt-out) here:     [31m|[0m
[31m|[0m             <[36mhttps://dvc.org/doc/user-guide/analytics[39m>              [31m|[0m
[31m|[0m                                                                     [31m|[0m
[31m+---------------------------------------------------------------------+
[0m
[33mWhat's next?[39m
[33m------------[39m
- Check out the documentation: <[36mhttps://dvc.org/doc[39m>
- Get help and share ideas: <[36mhttps://dvc.org/chat[39m>
- Star us on GitHub: <[36mhttps://github.com/iterative/dvc[39m>
[0mfatal: 

In [22]:
# environment setup
!python3 -m venv .env
!source .env/bin/activate
# install -r requirements.txt
!pip install dvc




In [23]:
# -- Step 5: Add iris dataset to DVC tracking
!dvc add data/iris.csv

[?25l[32m⠋[0m Checking graph                                       core[39m>
Adding...                                                                       
![A
Collecting files and computing hashes in data/iris.csv |0.00 [00:00,     ?file/s[A
Adding...                                                                       [A
[31mERROR[39m: unexpected error - no such column: "size" - should this be a string literal in single-quotes?

[33mHaving any troubles?[0m Hit us up at [34mhttps://dvc.org/support[0m, we are always happy to help!
[0m

In [21]:
!git add data/iris.csv.dvc
!git commit -m "Track iris.csv with DVC"

[?25l[32m⠋[0m Checking graph                                       core[39m>
Adding...                                                                       
![A
Collecting files and computing hashes in data/iris.csv |0.00 [00:00,     ?file/s[A
Adding...                                                                       [A
[31mERROR[39m: unexpected error - no such column: "size" - should this be a string literal in single-quotes?

[33mHaving any troubles?[0m Hit us up at [34mhttps://dvc.org/support[0m, we are always happy to help!
[0mfatal: pathspec 'data/iris.csv.dvc' did not match any files
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.bashrc[m
	[31m.cache/[m
	[31m.config/[m
	[31m.docker/[m
	[31m.gitconfig[m
	[31m.gsutil/[m
	[31m.ipynb_checkpoints/[m
	[31m.ipython/[m
	[31m.jupyter/[m
	[31m.local/[m
	[31m.npm/[m
	[31mSDK_Custom_Container_Prediction.ipynb[m
	[31martifacts/[m
	[31mdata

In [None]:
# -- Step 7: Tag this as version 1
!git tag v1
!git push origin --tags

In [None]:
# -- Step 8: Make changes for version 2
# Example: Replace iris.csv with iris_v2.csv or update scripts
mv data/iris_v2.csv data/iris.csv
dvc add data/iris.csv
git add data/iris.csv.dvc
git commit -m "Version 2: Update dataset and pipeline changes"

In [None]:
# -- Step 9: Push data and code changes
dvc push
git push

In [None]:
# -- Step 10: Tag as version 2
git tag v2
git push origin --tags

In [None]:
# -- Step 11: Demonstrate switching between versions
# To revert to version 1
git checkout v1
dvc pull

In [3]:
!pip install dvc


Collecting dvc
  Downloading dvc-3.60.1-py3-none-any.whl.metadata (17 kB)
Collecting celery (from dvc)
  Downloading celery-5.5.3-py3-none-any.whl.metadata (22 kB)
Collecting configobj>=5.0.9 (from dvc)
  Downloading configobj-5.0.9-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting dpath<3,>=2.1.0 (from dvc)
  Downloading dpath-2.2.0-py3-none-any.whl.metadata (15 kB)
Collecting dulwich (from dvc)
  Downloading dulwich-0.22.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting dvc-data<3.17,>=3.16.2 (from dvc)
  Downloading dvc_data-3.16.10-py3-none-any.whl.metadata (5.0 kB)
Collecting dvc-http>=2.29.0 (from dvc)
  Downloading dvc_http-2.32.0-py3-none-any.whl.metadata (1.3 kB)
Collecting dvc-objects (from dvc)
  Downloading dvc_objects-5.1.1-py3-none-any.whl.metadata (3.8 kB)
Collecting dvc-render<2,>=1.0.1 (from dvc)
  Downloading dvc_render-1.0.2-py3-none-any.whl.metadata (5.4 kB)
Collecting dvc-studio-client<1,>=0.21 (from dvc)
  Downloading dvc_st

In [4]:
! dvc --version

3.60.1
[0m

In [5]:
!git init
!dvc init

[33mhint: Using 'master' as the name for the initial branch. This default branch name[m
[33mhint: is subject to change. To configure the initial branch name to use in all[m
[33mhint: [m
[33mhint: 	git config --global init.defaultBranch <name>[m
[33mhint: [m
[33mhint: Names commonly chosen instead of 'master' are 'main', 'trunk' and[m
[33mhint: 'development'. The just-created branch can be renamed via this command:[m
[33mhint: [m
[33mhint: 	git branch -m <name>[m
Initialized empty Git repository in /home/jupyter/.git/
Initialized DVC repository.

You can now commit the changes to git.

[31m+---------------------------------------------------------------------+
[0m[31m|[0m                                                                     [31m|[0m
[31m|[0m        DVC has enabled anonymous aggregate usage analytics.         [31m|[0m
[31m|[0m     Read the analytics documentation (and how to opt-out) here:     [31m|[0m
[31m|[0m             <[36mhttps://dv

In [8]:
!git add .dvc .gitignore
! git commit -m "Initialize DVC"

fatal: pathspec '.gitignore' did not match any files
[master (root-commit) 1901ded] Initialize DVC
 3 files changed, 6 insertions(+)
 create mode 100644 .dvc/.gitignore
 create mode 100644 .dvc/config
 create mode 100644 .dvcignore


In [7]:
! git config --global user.email "21f2000243@ds.study.iitm.ac.in"
!git config --global user.name "satyamsaitama"

In [9]:
# -- Step 5: Add iris dataset to DVC tracking
!dvc add data/iris.csv
!git add data/iris.csv.dvc
!git commit -m "Track iris.csv with DVC"

[?25l[32m⠋[0m Checking graph                                       core[39m>
Adding...                                                                       
![A
Collecting files and computing hashes in data/iris.csv |0.00 [00:00,     ?file/s[A
Adding...                                                                       [A
[31mERROR[39m: unexpected error - no such column: "size" - should this be a string literal in single-quotes?

[33mHaving any troubles?[0m Hit us up at [34mhttps://dvc.org/support[0m, we are always happy to help!
[0mfatal: pathspec 'data/iris.csv.dvc' did not match any files
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.bashrc[m
	[31m.cache/[m
	[31m.config/[m
	[31m.docker/[m
	[31m.gitconfig[m
	[31m.gsutil/[m
	[31m.ipynb_checkpoints/[m
	[31m.ipython/[m
	[31m.jupyter/[m
	[31m.local/[m
	[31m.npm/[m
	[31mSDK_Custom_Container_Prediction.ipynb[m
	[31martifacts/[m
	[31mdata

In [10]:
! git status

On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31m.bashrc[m
	[31m.cache/[m
	[31m.config/[m
	[31m.docker/[m
	[31m.gitconfig[m
	[31m.gsutil/[m
	[31m.ipynb_checkpoints/[m
	[31m.ipython/[m
	[31m.jupyter/[m
	[31m.local/[m
	[31m.npm/[m
	[31mSDK_Custom_Container_Prediction.ipynb[m
	[31martifacts/[m
	[31mdata/[m
	[31mnotebook_template.ipynb[m

nothing added to commit but untracked files present (use "git add" to track)


In [11]:
# -- Step 7: Tag this as version 1
!git tag v1
!git push origin --tags

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.


In [13]:
# -- Step 8: Make changes for version 2
# Example: Replace iris.csv with iris_v2.csv or update scripts
mv data/iris_v2.csv data/iris.csv
dvc add data/iris.csv
git add data/iris.csv.dvc
git commit -m "Version 2: Update dataset and pipeline changes"

In [13]:
# -- Step 9: Push data and code changes
dvc push
git push

In [13]:
# -- Step 10: Tag as version 2
git tag v2
git push origin --tags

In [13]:
# -- Step 11: Demonstrate switching between versions
# To revert to version 1
git checkout v1
dvc pull

error: pathspec 'main' did not match any file(s) known to git


# FEAST [WEEK 3]

In [20]:
!pip install feast scikit-learn 'feast[gcp]' --quiet

In [21]:
!pip install "numpy<2" "pandas==2.2.2" --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pycaret 3.3.2 requires pandas<2.2.0, but you have pandas 2.2.2 which is incompatible.
sktime 0.26.0 requires pandas<2.2.0,>=1.1, but you have pandas 2.2.2 which is incompatible.[0m[31m
[0m

In [22]:
import pandas as pd
import numpy as np

from datetime import datetime, timedelta

from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from google.cloud import bigquery

In [26]:
df = pd.read_csv('data/iris.csv')

In [27]:
df.head()

AttributeError: 'Index' object has no attribute '_format_flat'

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

In [28]:
df["flower_id"] = np.arange(len(df))
df["event_timestamp"] = pd.date_range("2021-01-01", periods=len(df), freq="h")
df["created"] = datetime.utcnow()

df.head()

AttributeError: 'Index' object has no attribute '_format_flat'

   sepal_length  sepal_width  petal_length  petal_width species  flower_id  \
0           5.1          3.5           1.4          0.2  setosa          0   
1           4.9          3.0           1.4          0.2  setosa          1   
2           4.7          3.2           1.3          0.2  setosa          2   
3           4.6          3.1           1.5          0.2  setosa          3   
4           5.0          3.6           1.4          0.2  setosa          4   

      event_timestamp                    created  
0 2021-01-01 00:00:00 2025-06-22 17:20:35.357309  
1 2021-01-01 01:00:00 2025-06-22 17:20:35.357309  
2 2021-01-01 02:00:00 2025-06-22 17:20:35.357309  
3 2021-01-01 03:00:00 2025-06-22 17:20:35.357309  
4 2021-01-01 04:00:00 2025-06-22 17:20:35.357309  

In [29]:
df.shape

(150, 8)

In [30]:
PROJECT_ID = "operating-realm-460905-q8"
BUCKET_NAME= "bright-primacy-461312-t2-iris-feast-week3"
BIGQUERY_DATASET_NAME = "iris_dataset_week3"
AI_PLATFORM_MODEL_NAME = "iris_table_jsd_model"

In [34]:
import os
PROJECT_ID = "your-project-id"
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID

!gcloud config set project {PROJECT_ID} --quiet
!echo project_id = {PROJECT_ID} > ~/.bigqueryrc


Updated property [core/project].


To take a quick anonymous survey, run:
  $ gcloud survey

