Team EngAi Milestone 5 Deliverable

AC215 - Milestone 5

Project Organization

├── LICENSE
├── notebooks
│   ├── breed_labels.txt
│   ├── DogNet_Breed_Distillation.ipynb
│   ├── ExploratoryDataAnalysis.ipynb
│   └── model_testing.ipynb
├── README.md
├── requirements.txt
└── src
    ├── api-service
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── api
    │   │   ├── model.py
    │   │   └── service.py
    │   ├── config
    │   │   ├── breed-to-index.json
    │   │   ├── index-to-breed.json
    │   │   ├── model-controller-config.json
    │   │   └── util.py
    │   ├── docker-entrypoint.sh
    │   ├── docker-shell.sh
    │   └── secrets
    │       └── wandb.json
    ├── deployment
    │   ├── Dockerfile
    │   ├── deploy-create-instance.yml
    │   ├── deploy-docker-images.yml
    │   ├── deploy-provision-instance.yml
    │   ├── deploy-setup-containers.yml
    │   ├── deploy-setup-webserver.yml
    │   ├── docker-entrypoint.sh
    │   ├── docker-shell.sh
    │   ├── inventory.yml
    │   ├── loginProfile
    │   ├── nginx-conf
    │   │   └── nginx
    │   │       └── nginx.conf
    │   └── secrets
    │       ├── deployment.json
    │       ├── gcp-service.json
    │       ├── ssh-key-deployment
    │       └── ssh-key-deployment.pub
    ├── dvc
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── docker-shell.sh
    │   └── team-engai-dogs.dvc
    ├── frontend-react
    │   ├── docker-shell.sh
    │   ├── Dockerfile
    │   ├── Dockerfile.dev
    │   ├── package.json
    │   ├── public
    │   │   ├── favicon.ico
    │   │   ├── index.html
    │   │   └── manifest.json
    │   ├── src
    │   │   ├── app
    │   │   │   ├── App.css
    │   │   │   ├── App.js
    │   │   │   ├── background.png
    │   │   │   ├── components
    │   │   │   │   ├── Footer
    │   │   │   │   │   ├── Footer.css
    │   │   │   │   │   └── Footer.js
    │   │   │   │   ├── ImageUpload
    │   │   │   │   │   ├── ImageUpload.css
    │   │   │   │   │   └── ImageUpload.js
    │   │   │   │   └── ModelToggle
    │   │   │   │       ├── ModelToggle.css
    │   │   │   │       └── ModelToggle.js
    │   │   │   └── services
    │   │   │       ├── BreedParse.js
    │   │   │       └── DataService.js
    │   │   └── index.js
    │   └── yarn.lock
    ├── model-deployment
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── cli.py
    │   ├── docker-entrypoint.sh
    │   └── docker-shell.sh
    ├── models
    │   └── resnet152v2
    │       ├── Dockerfile
    │       ├── Pipfile
    │       ├── Pipfile.lock
    │       ├── distiller.py
    │       ├── docker-shell.sh
    │       ├── dog_breed_dataset
    │       │   └── images
    │       │       └── Images
    │       ├── model_training_age_dataset.py
    │       ├── model_training_breed_dataset.py
    │       ├── model_training_breed_dataset_distillation.py
    │       ├── model_training_breed_dataset_pruned.py
    │       ├── run-model.sh
    │       ├── secrets
    │       │   └── data-service-account.json
    │       └── util.py
    ├── preprocessing
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── ResizeDogImages.ipynb
    │   ├── docker-entrypoint.sh
    │   ├── docker-shell.sh
    │   ├── preprocess_age.py
    │   ├── preprocess_breed.py
    │   └── util.py
    ├── pwd
    ├── secrets
    │   ├── data-service-account.json
    │   └── wandb.json
    ├── tensorizing
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── curr_image
    │   ├── curr_image.jpg
    │   ├── docker-entrypoint.sh
    │   ├── docker-shell.sh
    │   ├── hold_working_age.py
    │   ├── secrets
    │   │   └── data-service-account.json
    │   ├── tensorize_age_dataset.py
    │   └── tensorize_breed_dataset.py
    ├── validation
    │   ├── Dockerfile
    │   ├── Pipfile
    │   ├── Pipfile.lock
    │   ├── cv_val.py
    │   ├── cv_val_sql.py
    │   ├── docker-shell.sh
    │   └── requirements.txt
    └── workflow
        ├── Dockerfile
        ├── Pipfile
        ├── Pipfile.lock
        ├── age_model_training.yaml
        ├── cli.py
        ├── data_preprocessing.yaml
        ├── docker-entrypoint.sh
        ├── docker-shell.sh
        ├── pipeline.yaml
        ├── secrets
        │   └── compute-service-account.json
        └── tensorizing.yaml

32 directories, 109 files

AC215 - Milestone 5 - DogWatcher (powered by DogNet)

Team Members Nevil George, Juan Pablo Heusser, Curren Iyer, Annie Landefeld, Abhijit Pujare

Group Name EngAi Group

Project In this project, we aim to build an application that can predict a dog's breed and age using a photo.

Milestone 5

In this milestone we worked on multiple aspects of the project:

  (1) Deployment of the web service to GCP [/src/deployment/](src/deployment/)
  
  (2) Frontend/React container [/src/frontend-react/](src/frontend-react/)
  
  (3) API service [/src/api-service/](src/api-service/)

  (4) Add model deployment to Vertex AI [/src/model-deployment/](src/model-deployment/)

  (5) Switching from Model Pruning to Knowledge Distillation as compression technique

Application Design

You can find the Solutions Architecture and Technical Architecture diagrams below. The two diagrams detail how the various components of the system work together to classify dog images.

Solution Architecture

Technical Architecture

Deployment Strategy

We used Ansible to automate the provisioning and deployment of our frontend and backend containers to GCP. Below you can find a screenshot of the VM that's running our service on GCP.

Additionally, you can find a screenshot that shows the container images we have pused to the GCP container repository:

Deployment Container /src/deployment/

This container builds the containers, creates and provisions a GCP instance and then deploys those containers to those intances.

If you wish to run the container locally :

Navigate to src/deployment in your terminal
Run sh docker-shell.sh
Build and Push Docker Containers to GCR (Google Container Registry) by running the following yaml"

ansible-playbook deploy-docker-images.yml -i inventory.yml

Create Compute Instance (VM) Server which will host the containers

ansible-playbook deploy-create-instance.yml -i inventory.yml --extra-vars cluster_state=present

Provision Compute Instance in GCP to setup all required software

ansible-playbook deploy-provision-instance.yml -i inventory.yml

Install Docker Containers on the Compute Instance

ansible-playbook deploy-setup-containers.yml -i inventory.yml

Setup Webserver on the Instance

ansible-playbook deploy-setup-webserver.yml -i inventory.yml

Adding Model Deployment to Vertex AI

/src/model-deployment/ In order to finish out the model pipeline which powers the ML application, we added the final step of model deployment to the Vertex AI pipeline. This step utilizes a command line interface to take the model from Weights & Biases, upload it to Google Cloud Storage, and deploy it to Vertex AI. With the final step in place, the end to end model development from data processing, to tensorizing, to model training, and now model deployment are all part of a unified pipeline.

To use just the model deployment service, first launch the service with ./docker-shell.sh to get to the interpreter.

Upload the model from Weights & Biases to GCS

python3 cli.py --upload

Deploy the model to Vertex AI

python3 cli.py --deploy

Model Distillation

/notebooks/DogNet_Breed_Distillation.ipynb

In milestone 4 we used model pruning as our compression technique but realized that distillation was more suitable for our application since most of the models layers were not being trained. All of the code used to test different model combinations and distillation can be found in the notebook linked above.

We tested different base architectures for both the teacher and the student model.

Teacher model:

ResNet152v2: Total Parameters - 59,630,968 | Total Size - 227.47 MB

With this model architecture we obtained a maximum validation accuracy of 82.5% on epoch 20. The model learned fairly quickly compared to other architectures, achieving a 68% validation accuracy on the first epoch.

ConNeXtBase: Total Parameters - 88,353,784 | Total Size - 337.04 MB

This base architecture did not perform well on the dogs dataset, as we only achieved a 42.25% maximum validation accuracy on epoch 27.

DenseNet201: Total Parameters - 19,557,304 | Total Size - 74.61 MB

Using the DenseNet201 model architecture we achieved very good results for such a small model, yet it still obtained a lower max validation accuracy compared to ResNet152v2, of 81.9%. The difference is minimal but as a team we decided to use ResNet152v2 as our teacher model.

Student model:

ResNet50: Total Parameters - 24,855,024 | Total Size - 94.81 MB

This model architecture did not perform well on the dataset. The training accuracy was around 84% by the end of the 30 epochs, while the validation accuracy was around just 24% meaning that the model was not generalizing well, and overfitting the training data.

ConNextSmall: Total Parameters - 50,076,880 | Total Size - 191.03 MB

Similar to the ConNextBase architecture, this model did not generalize well and overfit the training data, achieving a max training accuracy of 87.7% and max validation accuracy of 56.3%

DenseNet121: Total Parameters - 7,788,720 | Total Size - 29.71 MB

With this base model architecture we achieved a maximum validation accuracy of 71.6% by epoch 17. The model was able to learn quickly initially and the accuracy obtained was significantly lower than that obtained with the teacher model, making it a prime candidate for model distillation.

Model Distillation: Total Parameters - 7,788,720 | Total Size - 29.71 MB

For model distillation we decided to use the teacher model with the ResNet152v2 base architecture and we built a new student model using the DenseNet121 architecture. Then based on the contents reviewed in class we proceeded to implement the distillation training loop and train the student model by distilling from the teacher model. We obtained a 92.6% validation accuracy, even greater than with the teacher model, on epoch 28. Using distillation we managed to compress the teacher model 7.65x and achieve better validation accuracy.

This result es extremely positive as the distilled student model achieved a better validation accuracy than the teach model. Even more so, this model obtained a validation accuracy similar to top SOTA models for Fine-Grained Image Classification on the Stanford Dogs dataset.

(https://paperswithcode.com/sota/fine-grained-image-classification-on-stanford-1)

The Nº1 model on this list, the ViT-NeT model achieved a 93.6% accuracy on the same dataset. Our results would place our distilled student model in the top 10 of this list.

Below is a comparison table obtained from the ViT-NeT paper.

Source: Kim, S., Nam, J., & Ko, B. C. (2022). ViT-NeT: Interpretable Vision Transformers with Neural Tree Decoder. In Proceedings of the 39th International Conference on Machine Learning (PMLR 162). Baltimore, Maryland, USA.

API Service

The `api-service` provides two endpoints, the index and the predict endpoints. The `/predict` endpoint is called from the frontend with an image to make a model inference.

The ModelController is responsible for calling either the local model (saved in the container) or the remote model (stored on VertexAI)

Front-End Development

Components

We have three components in the Components directory.

Footer contains the footer that stores the history of the past 5 search results (just the predicted breed, not the probabilities).

ImageUpload contains the interface for uploading an image to the website, making a call to the model (depending on ModelToggle), returning the predicted breed and confidence level (probability), and storing that predicted breed in the Footer as part of the search history.

ModelToggle has a dropdown for the user to select either our Hosted or Local Model. We included both to show the difference in response times. The model itself is the same so the performance in terms of accuracy is expected to be the same as well. The parameter is passed from the user-selected dropdown as part of the formData argument that is read in DataService in the services section (see below).

Services

We have two React files in the Services directory.

BreedParse is used to extract the reader friendly version of the predicted breed species name to display it in the results section of ImageUpload and append it to the history of the past 5 results in the Footer.

DataService is used to make the server call to the API endpoint to select the right model, depending on the selection in the ModelToggle component.

GCP Bucket Structure

 team-engai-dogs
  ├── dog_age_dataset/
        ├── Expert_Train/
        ├── PetFinder_All/
  ├── dog_breed_dataset/
        ├── annotations/
        ├── images/
  └── dvc_store

We have the same structure for the tensorized data as well, in bucket team-engai-dogs-tensorized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Team EngAi Milestone 5 Deliverable

AC215 - Milestone 5 - DogWatcher (powered by DogNet)

Milestone 5

Application Design

Deployment Strategy

Adding Model Deployment to Vertex AI

Model Distillation

Teacher model:

ResNet152v2: Total Parameters - 59,630,968 | Total Size - 227.47 MB

ConNeXtBase: Total Parameters - 88,353,784 | Total Size - 337.04 MB

DenseNet201: Total Parameters - 19,557,304 | Total Size - 74.61 MB

Student model:

ResNet50: Total Parameters - 24,855,024 | Total Size - 94.81 MB

ConNextSmall: Total Parameters - 50,076,880 | Total Size - 191.03 MB

DenseNet121: Total Parameters - 7,788,720 | Total Size - 29.71 MB

Model Distillation: Total Parameters - 7,788,720 | Total Size - 29.71 MB

API Service

Front-End Development

Components

Services

Files

README.md

Latest commit

History

README.md

File metadata and controls

Team EngAi Milestone 5 Deliverable

AC215 - Milestone 5 - DogWatcher (powered by DogNet)

Milestone 5

Application Design

Deployment Strategy

Adding Model Deployment to Vertex AI

Model Distillation

Teacher model:

ResNet152v2: Total Parameters - 59,630,968 | Total Size - 227.47 MB

ConNeXtBase: Total Parameters - 88,353,784 | Total Size - 337.04 MB

DenseNet201: Total Parameters - 19,557,304 | Total Size - 74.61 MB

Student model:

ResNet50: Total Parameters - 24,855,024 | Total Size - 94.81 MB

ConNextSmall: Total Parameters - 50,076,880 | Total Size - 191.03 MB

DenseNet121: Total Parameters - 7,788,720 | Total Size - 29.71 MB

Model Distillation: Total Parameters - 7,788,720 | Total Size - 29.71 MB

API Service

Front-End Development

Components

Services