### Week 1 GA

### Assignment Objective : 


Setting up the ML pipeline for IRIS Classifier in Vertex AI platform using GCS as demonstrated in the lecture (Hands-on: Introduction to Google Cloud, Vertex AI) in your GCP account.

1) Activate your GCP Trial
2) Setup Vertex AI Workbench (Enable appropriate services/api as required)
3) Store Training Data in Google Storage Bucket 
4) Fetch the data from Google Storage Bucket and Successfully execute the IRIS Machine Learning Training Pipeline
5) Store the Output artifacts(Models, logs, etc) in Google cloud storage bucket with folders organized by their training execution timestamp
6) Create a new script for inference and run the inference on eval set after fetching the models from GCS Output Artifacts Bucket
7) Run this Training and inference for 2 times resulting in two output artifact folders in Google cloud storage bucket
8) (Optional) Run this pipeline for two versions of data provided in github data folder


### Dataset
We have a data folder that contains three subfolders: raw, v1, and v2.
Inside the raw folder, there is a file named iris.csv, and inside the v1 and v2 folders, there is a file named data.csv respectively.

### Process starts here

### Prerequisites 

In [None]:
# Install Vertex AI SDK for Python and other required packages
! pip3 install --upgrade --quiet  google-cloud-aiplatform

### Set Google Cloud project information


In [None]:
PROJECT_ID = "heroic-throne-473405-m8"
LOCATION = "us-central1"

### Create a Cloud Storage bucket to store the training data.

In [None]:
BUCKET_URI = f"gs://heroic-throne-473405-m8-week1ga"

In [None]:
! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://heroic-throne-473405-m8-week1ga/...


In [None]:
!gsutil cp data/raw/iris.csv {BUCKET_URI}/data/raw/
!gsutil cp data/v1/data.csv {BUCKET_URI}/data/v1/
!gsutil cp data/v2/data.csv {BUCKET_URI}/data/v2/

Copying file://data/raw/iris.csv [Content-Type=text/csv]...
/ [1 files][  3.8 KiB/  3.8 KiB]                                                
Operation completed over 1 objects/3.8 KiB.                                      
Copying file://data/v1/data.csv [Content-Type=text/csv]...
/ [1 files][  2.5 KiB/  2.5 KiB]                                                
Operation completed over 1 objects/2.5 KiB.                                      
Copying file://data/v2/data.csv [Content-Type=text/csv]...
/ [1 files][  1.3 KiB/  1.3 KiB]                                                
Operation completed over 1 objects/1.3 KiB.                                      


### Initialize Vertex AI SDK for Python

To get started using Vertex AI, we must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)

### Import the required libraries

In [None]:
import os
import sys

## Decision Tree model
Build a Decision Tree model on iris data

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from pandas.plotting import parallel_coordinates
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn import metrics
from google.cloud import storage

DATA_VERSION = "v2"  # change to "raw", "v1", or "v2"

# Load data from GCS
data_path = f"{BUCKET_URI}/data/{DATA_VERSION}/data.csv" if DATA_VERSION != "raw" else f"{BUCKET_URI}/data/raw/iris.csv"
data = pd.read_csv(data_path)

In [None]:
data.head(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [None]:
train, test = train_test_split(data, test_size = 0.4, stratify = data['species'], random_state = 42)
X_train = train[['sepal_length','sepal_width','petal_length','petal_width']]
y_train = train.species
X_test = test[['sepal_length','sepal_width','petal_length','petal_width']]
y_test = test.species

In [None]:
mod_dt = DecisionTreeClassifier(max_depth = 3, random_state = 1)
mod_dt.fit(X_train,y_train)
prediction=mod_dt.predict(X_test)
print('The accuracy of the Decision Tree is',"{:.3f}".format(metrics.accuracy_score(prediction,y_test)))

The accuracy of the Decision Tree is 1.000


### Upload model artifacts and custom code to Cloud Storage

Before you can deploy your model for serving, Vertex AI needs access to the following files in Cloud Storage:

* `model.joblib` (model artifact)

Run the following commands to upload your files:

In [None]:
import joblib
import time
from datetime import datetime

# Get the current time
timestamp = int(time.time())

# Convert to readable string (YYYY-MM-DD_HH-MM-SS)
timestamp_str = datetime.fromtimestamp(timestamp).strftime("%Y-%m-%d_%H-%M-%S")
# Use that for artifact directory
artifact_dir = f"{BUCKET_URI}/artifacts/{DATA_VERSION}/{timestamp_str}/"

joblib.dump(mod_dt, "model.joblib")

['model.joblib']

In [None]:
!gsutil cp model.joblib {artifact_dir}

Copying file://model.joblib [Content-Type=application/octet-stream]...
/ [1 files][  2.2 KiB/  2.2 KiB]                                                
Operation completed over 1 objects/2.2 KiB.                                      
