ML infrastructure on GCP for Crayon, using Vertex AI on GCP
Flow:
- Pushing to git triggers GitActions to run some tests and checks (not implemented)
- When merged in master would trigger a new pipeline build (not implemented)
- On uploading new dataset to GCS, it triggers Cloud Functions which start the pipeline (implemented)
- Pipeline flow: (implemented)
- Train model
- Upload model to Vertex model registry
- Evaluate the model
- Evaluate the model on an eval dataset
- Write down metrics for the current training to Firestore
- Pull the metrics of the current best model and compare
- If new model's performance is better -> deploy to endpoint OR follow next step
- (Optional) Split traffic between the currently deployed model and the newly trained
- Gather data about their performance and evaluate next steps
Project structure:
CloudFuncPipelines - files required to set up a Cloud Function on GCP, which triggers when a file is uploaded in the bucket, which then triggers the pipeline
sklearn-training - make a package for training custom containers on Vertex AI