<a href="https://colab.research.google.com/github/deep-diver/Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/blob/main/notebooks/03_Batch_Prediction_Performance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Outline
1. Prepare data for data drift simulation
2. Upload the data to the designated GCS bucket
  - The data is stored in GCS bucket to simulate a real world scenario. In reality, data is collected in a central location(i.e. GCS bucket), and it will be used measure the model performance. We can measure the model performance much more reliable on a batch data than a single data(online)
3. Measure the model performance on the data 
4. If we find any degradation on the model, it will trigger the pipeline to re-learn the data (including the previous data)

In [21]:
!pip install -q --upgrade google-cloud-aiplatform
!pip install -q --upgrade google-cloud-storage

In [3]:
!gcloud init

ㄷ
Welcome! This command will take you through the configuration of gcloud.

Settings from your current configuration [default] are:
component_manager:
  disable_update_check: 'True'
compute:
  gce_metadata_read_timeout_sec: '0'

Pick configuration to use:
 [1] Re-initialize this configuration [default] with new settings 
 [2] Create a new configuration
Please enter your numeric choice:  Please enter a value between 1 and 2:  2

Enter configuration name. Names start with a lower case letter and 
contain only lower case letters a-z, digits 0-9, and hyphens '-':  gde
Your current configuration has been set to: [gde]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must log in to continue. Would you like to log in (Y/n)?  Y

Go to the following link in your browser:

    https://accounts.google

In [44]:
from google.colab import auth
auth.authenticate_user()

In [57]:
GOOGLE_CLOUD_PROJECT = 'central-hangar-321813'    #@param {type:"string"}
GOOGLE_CLOUD_REGION = 'us-central1'             #@param {type:"string"}

MODEL_NAME = 'resnet_cifar_latest' #@param {type:"string"}

TEST_FILENAME = 'test-images.txt' #@param {type:"string"}
TEST_GCS_BUCKET = 'gs://batch-prediction-collection' #@param {type:"string"}
TEST_LOCAL_PATH = 'Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images' #@param {type:"string"}


In [12]:
!git clone https://github.com/deep-diver/Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes.git

Cloning into 'Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes'...
remote: Enumerating objects: 100, done.[K
remote: Counting objects: 100% (100/100), done.[K
remote: Compressing objects: 100% (78/78), done.[K
remote: Total 100 (delta 59), reused 38 (delta 21), pack-reused 0[K
Receiving objects: 100% (100/100), 57.61 KiB | 14.40 MiB/s, done.
Resolving deltas: 100% (59/59), done.


In [58]:
from os import listdir

test_files = listdir(TEST_LOCAL_PATH)
test_files

['frog_0000.jpg',
 'truck_0000.jpg',
 'dog_0000.jpg',
 'cat_0000.jpg',
 'ship_0000.jpg',
 'deer_0000.jpg',
 'bird_0000.jpg',
 'horse_0000.jpg',
 'automobile_0000.jpg',
 'airplane_0000.jpg']

In [64]:
f = open(TEST_FILENAME, "w")

for filename in test_files:
  f.write(f'{TEST_GCS_BUCKET}/{filename}\n')

f.close()

In [65]:
!cat {TEST_FILENAME}

gs://batch-prediction-collection/frog_0000.jpg
gs://batch-prediction-collection/truck_0000.jpg
gs://batch-prediction-collection/dog_0000.jpg
gs://batch-prediction-collection/cat_0000.jpg
gs://batch-prediction-collection/ship_0000.jpg
gs://batch-prediction-collection/deer_0000.jpg
gs://batch-prediction-collection/bird_0000.jpg
gs://batch-prediction-collection/horse_0000.jpg
gs://batch-prediction-collection/automobile_0000.jpg
gs://batch-prediction-collection/airplane_0000.jpg


In [66]:
!gsutil -m cp -r {TEST_FILENAME} {TEST_GCS_BUCKET}
!gsutil -m cp -r {TEST_LOCAL_PATH}/*.jpg {TEST_GCS_BUCKET}

Copying file://test-images.txt [Content-Type=text/plain]...
/ [0/1 files][    0.0 B/  480.0 B]   0% Done                                    / [1/1 files][  480.0 B/  480.0 B] 100% Done                                    
Operation completed over 1 objects/480.0 B.                                      
Copying file://Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images/dog_0000.jpg [Content-Type=image/jpeg]...
Copying file://Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images/automobile_0000.jpg [Content-Type=image/jpeg]...
Copying file://Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images/cat_0000.jpg [Content-Type=image/jpeg]...
Copying file://Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images/frog_0000.jpg [Content-Type=image/jpeg]...
Copying file://Continuous-Adaptation-for-Machine-Learning-System-to-Data-Changes/notebooks/test-images/bird

In [3]:
import google.cloud.aiplatform as aiplatform
from google.protobuf import json_format
from google.protobuf.json_format import MessageToJson, ParseDict
from google.protobuf.struct_pb2 import Struct, Value

In [68]:
from typing import Union, Sequence

def create_batch_prediction_job_dedicated_resources_sample(
    project: str,
    location: str,
    model_resource_name: str,
    job_display_name: str,
    gcs_source: Union[str, Sequence[str]],
    gcs_destination: str,
    instances_format: str = "file-list",
    machine_type: str = "n1-standard-2",
    accelerator_count: int = 1,
    accelerator_type: str = "NVIDIA_TESLA_K80",
    starting_replica_count: int = 1,
    max_replica_count: int = 1,
    sync: bool = True,
):
    aiplatform.init(project=project, location=location)

    my_model = aiplatform.Model(model_resource_name)

    batch_prediction_job = my_model.batch_predict(
        job_display_name=job_display_name,
        instances_format=instances_format,
        gcs_source=gcs_source,
        gcs_destination_prefix=gcs_destination,
        machine_type=machine_type,
        accelerator_count=accelerator_count,
        accelerator_type=accelerator_type,
        starting_replica_count=starting_replica_count,
        max_replica_count=max_replica_count,
        sync=sync,
    )

    batch_prediction_job.wait()

    print(batch_prediction_job.display_name)
    print(batch_prediction_job.resource_name)
    print(batch_prediction_job.state)
    return batch_prediction_job

In [69]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

In [70]:
create_batch_prediction_job_dedicated_resources_sample(
    project=GOOGLE_CLOUD_PROJECT, 
    location=GOOGLE_CLOUD_REGION,
    model_resource_name='2008244793993330688',
    job_display_name=f'{MODEL_NAME}-{TIMESTAMP}',
    gcs_source=[f'{TEST_GCS_BUCKET}/{TEST_FILENAME}'],
    gcs_destination=f'{TEST_GCS_BUCKET}/results/',
    accelerator_type=None,
    accelerator_count=None
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/31482268105/locations/us-central1/batchPredictionJobs/1680882799009071104
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/31482268105/locations/us-central1/batchPredictionJobs/1680882799009071104')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/1680882799009071104?project=31482268105
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/31482268105/locations/us-central1/batchPredictionJobs/1680882799009071104 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/31482268105/locations/us-central1/batchPredictionJobs/1680882799009071104 current state:
JobState.JOB_STATE_RUN

<google.cloud.aiplatform.jobs.BatchPredictionJob object at 0x7f6598844910> 
resource name: projects/31482268105/locations/us-central1/batchPredictionJobs/1680882799009071104

In [72]:
import os
import json

RESULTS_DIRECTORY = "results"
RESULTS_DIRECTORY_FULL = f'{TEST_GCS_BUCKET}/{RESULTS_DIRECTORY}'

# Create missing directories
os.makedirs(RESULTS_DIRECTORY, exist_ok=True)

# Get the Cloud Storage paths for each result
!gsutil -m cp -r $RESULTS_DIRECTORY_FULL $RESULTS_DIRECTORY

# Get most recently modified directory
latest_directory = max(
    [
        os.path.join(RESULTS_DIRECTORY, d)
        for d in os.listdir(RESULTS_DIRECTORY)
    ],
    key=os.path.getmtime,
)

# Get downloaded results in directory
results_files = []
for dirpath, subdirs, files in os.walk(latest_directory):
    for file in files:
        if file.startswith("prediction.results"):
            results_files.append(os.path.join(dirpath, file))

# Consolidate all the results into a list
results = []
for results_file in results_files:
    # Download each result
    with open(results_file, "r") as file:
        results.extend([json.loads(line) for line in file.readlines()])

Copying gs://batch-prediction-collection/results/prediction-resnet_cifar_latest-2021_09_16T19_24_05_801Z/prediction.results-00000-of-00001...
/ [0/6 files][    0.0 B/  1.3 KiB]   0% Done                                    Copying gs://batch-prediction-collection/results/prediction-resnet_cifar_latest-2021_09_16T18_47_39_122Z/prediction.results-00000-of-00001...
/ [0/6 files][    0.0 B/  1.3 KiB]   0% Done                                    Copying gs://batch-prediction-collection/results/prediction-resnet_cifar_latest-2021_09_16T18_47_39_122Z/prediction.errors_stats-00000-of-00001...
/ [0/6 files][    0.0 B/  1.3 KiB]   0% Done                                    Copying gs://batch-prediction-collection/results/prediction-resnet_cifar_latest-2021_09_16T19_23_55_003Z/prediction.errors_stats-00000-of-00001...
/ [0/6 files][    0.0 B/  1.3 KiB]   0% Done                                    / [1/6 files][    0.0 B/  1.3 KiB]   0% Done                                    Copying gs://batc

In [73]:
results

[{'instance': 'gs://batch-prediction-collection/airplane_0000.jpg',
  'prediction': {'confidence': 0.635806859, 'label': 'ship'}},
 {'instance': 'gs://batch-prediction-collection/cat_0000.jpg',
  'prediction': {'confidence': 0.514597297, 'label': 'cat'}},
 {'instance': 'gs://batch-prediction-collection/ship_0000.jpg',
  'prediction': {'confidence': 0.944843113, 'label': 'ship'}},
 {'instance': 'gs://batch-prediction-collection/bird_0000.jpg',
  'prediction': {'confidence': 0.710508406, 'label': 'horse'}},
 {'instance': 'gs://batch-prediction-collection/truck_0000.jpg',
  'prediction': {'confidence': 0.980968714, 'label': 'truck'}},
 {'instance': 'gs://batch-prediction-collection/frog_0000.jpg',
  'prediction': {'confidence': 0.696931422, 'label': 'frog'}},
 {'instance': 'gs://batch-prediction-collection/dog_0000.jpg',
  'prediction': {'confidence': 0.382295936, 'label': 'cat'}},
 {'instance': 'gs://batch-prediction-collection/deer_0000.jpg',
  'prediction': {'confidence': 0.437720776, 

In [83]:
num_correct = 0

for result in results:
  label = os.path.basename(result['instance']).split("_")[0]
  prediction = result['prediction']['label']

  print(f'label({label})/prediction({prediction})')
  if label == prediction: 
    num_correct = num_correct + 1

print()
print(f'number of results: {len(results)}')
print(f'number of correct: {num_correct}')
print(f'Accuracy: {num_correct/len(results)}')

label(airplane)/prediction(ship)
label(cat)/prediction(cat)
label(ship)/prediction(ship)
label(bird)/prediction(horse)
label(truck)/prediction(truck)
label(frog)/prediction(frog)
label(dog)/prediction(cat)
label(deer)/prediction(dog)
label(automobile)/prediction(automobile)
label(horse)/prediction(dog)

number of results: 10
number of correct: 5
Accuracy: 0.5
