## Vertex_AI_Predicting_Loan_Risk_with_AutoML


### objectives
You learn how to: <br>
•Upload a dataset to Vertex AI. <br>
•Train a machine learning model with AutoML. <br>
•Evaluate the model performance. <br>
•Deploy the model to an endpoint. <br>
•Get predictions. <br>

### prepare data
1.In the Google Cloud Console, on the Navigation menu, 
click Vertex AI. <br>
2.Click Create dataset.<br>
3.On the Datasets page, give the dataset a name.<br>
4.For the data type and objective, click Tabular, and then
select Regression/classification.<br>
5.Click Create.<br>
Upload data<br>
Three options to import data in Vertex AI:<br>
* Upload a local file from your computer.
* Select files from Cloud Storage.
* Select data from BigQuery.
For convenience, the dataset is already uploaded to Cloud Storage. <br>
1- For the data source, select Select CSV files from Cloud Storage. <br>
2- For Import file path, enter spls/cbl455/loan_risk.csv <br>


### Train your model

•Click Train new model. <br>
1.For Objective, select Classification. Select classification instead of regression because 
you are predicting a distinct number (whether a customer will repay a loan: 0 for repay, 1 
for default/not repay) instead of a continuous number.<br>
2.Click Continue <br>
Model details <br>
Specify the name of the model and the target column. <br>
1.Give the model a name, such as LoanRisk. <br>
2.For Target column, select Default . <br>
3.(Optional) Explore Advanced options to determine how to assign the training vs. 
testing data and specify the encryption. <br>
4.Click Continue <br>

**Training options** <br>
Specify which columns you want to include in the training model. For example, ClientID might be irrelevant to predict 
loan risk.<br>
1.Click the minus sign on the ClientID row to exclude it from the training model.<br>
2.(Optional) Explore Advanced options to select different optimization objectives. For more information about 
optimization objectives for tabular AutoML models <br>
3.Click Continue.<br>
**Compute and pricing** <br>
1.For Budget, which represents the number of node hours for training, enter 1. Training your AutoML model for 1 
compute hour is typically a good start for understanding whether there is a relationship between the features and label 
you've selected. From there, you can modify your features and train for more time to improve model performance. <br>
2.Leave early stopping enabled. <br>
3.Click Start training.<br>
Depending on the data size and the training method, the training can take from a few minutes to a couple of hours. 
Normally you would receive an email from Google Cloud when the training job is complete. However, in the Qwiklabs
environment, you will not receive an email.
To save the waiting for the model training, you download a pre-trained model in task 5 to get predictions in task 6. This 
pre-trained model is the training result following the same steps from task 1 to task 2.

### Evaluate the model performance

Veretex AI provides many metrics to evaluate the model 
performance. You focus on three: <br>
* Precision/Recall curve <br>
* Confusion Matrix <br>
* Feature Importance <br>

![evalution](evalution_1.png)

![confusion_matrix](confusion_matrix.png)
![featureImportance](featureImportance.png)

### Deploy the model

Create and define an endpoint <br>
On your model page, on the Deploy and test tab, click Deploy to endpoint.   <br>
For Endpoint name, enter a name for your endpoint, such as LoanRisk. <br>
Click Continue. <br>
Model settings and monitoring <br>
Leave the traffic splitting settings as-is. <br>
As the machine type for your model deployment, under Machine type, select n1-standard-8, 8 vCPUs, 30 GiB memory. <br>
Leave the remaining settings as-is. <br>
Click Deploy. <br>

**SML Bearer Token**  <br>
Retrieve your Bearer Token <br>
To allow the pipeline to authenticate, and be authorized to call the endpoint to get the predictions, you will need to provide your Bearer Token. <br>
Follow the instructions below to get your token. If you have issues getting the Bearer Token, this can be due to cookies in the incognito window. If this is happening to you, try this step in a non-incognito window. <br>
Log in to https://gsp-auth-kjyo252taq-uc.a.run.app/ <br>
When logging in, use your student email address and password. <br>
Click the Copy button. This will copy a very long token to your clipboard. <br>

![SML_Bearer_Token](SML_Bearer_Token.png)


### Get predictions
use the **Shared Machine Learning (SML) service** to work with an existing trained model
| ENVIRONMENT VARIABLE | VALUE                                   |
| -------------------- | --------------------------------------- |
| AUTH_TOKEN           | Use the value from the previous section |
| ENDPOINT             |  https://sml-api-vertex-kjyo252taq-uc.a.run.app/vertex/predict/tabular_classification                                       |
|INPUT_DATA_FILE|INPUT-JSON|

To use the trained model, you will need to create some environment variables.
Open a Cloud Shell window. <br>
Replace INSERT_SML_BEARER_TOKEN with the bearer token value from the previous section:
AUTH_TOKEN="INSERT_SML_BEARER_TOKEN„ <br>
3.Download the lab assets:
gsutil cp gs://spls/cbl455/cbl455.tar.gz .
4. Extract the lab assets:
tar -xvf cbl455.tar.gz
5.Create an ENDPOINT environment variable:
ENDPOINT=https://sml-api-vertex-kjyo252taq-uc.a.run.app/vertex/predict/tabular_classification
Create a INPUT_DATA_FILE environment variable:
INPUT_DATA_FILE="INPUT-JSON„
The file INPUT-JSON is composed of the follwing values:

| age   | ClientID | income   | loan    |
| ----- | -------- | -------- | ------- |
| 40.77 | 997      | 44964.01 | 3944.22 |
|       |          |          |         |


Test the SML Service by passing the parameters specified in the environment variables:
Perform a request to the SML service:
./smlproxy tabular \
  -a $AUTH_TOKEN \
  -e $ENDPOINT \
  -d $INPUT_DATA_FILE
This query should result in a response similar to this:
SML Tabular HTTP Response:
2022/01/10 15:04:45 {"model_class":"0","model_score":0.9999981}



Test the SML Service by passing the parameters specified in the environment variables:
Edit the file INPUT-JSON and replace the original values.
Perform a request to the SML service:
./smlproxy tabular \
  -a $AUTH_TOKEN \
  -e $ENDPOINT \
  -d $INPUT_DATA_FILE
In this case, assuming that the person's income is 50,000, age 30, and loan 20,000, the model predicts that this person will repay the loan
SML Tabular HTTP Response:
2022/01/10 15:04:45 {"model_class":"0","model_score":0.9999981}


![prediction](prediction.png)