Quality assessment form

This file contains the questions used to initially assessed the ML models techincal gaps

Provide the model name(s) to be assessed with the information of this form.
Is the model part of a model family? A model is part of a family if it shares the same quality attributes, e.g. codebase, monitoring, retraining, etc. of the other models in the family. If so, please provide the name of the model family:
Link to the source code in git (if there are multiple code bases, please provide all the links)
Link to model documentation
Is the model deployed? Please provide the link in the ML registry.
Is the model used in production?
Please provide the link for the latest AB experiment.
What is (roughly) the average daily number of requests that the model receives?
Are there other teams or departments besides yours relying on this model?
What might be the consequences of disabling the model in production? Does this pose an existential risk to the Booking.com business?
Is the model being tested in an experiment (provide relevant link)?
Has the model been compared against a simple, low-cost, non-ML baseline? What is the relative improvement achieved? Please provide analysis supporting the claim
How many hours are needed for training? How many workflows are needed for the whole ML lifecycle?
How easy it is to go back to a previous model version in production? How can you achieve that?
How frequently have ML system failures happened in the model’s lifetime?
How frequently is the model retrained? Please provide any re-training pipeline link.
Does the model's source code (i.e. used for producing training data, model training, model evaluation, etc.) have automated tests? What is the test coverage?
How can you repeat the ML lifecycle (to deploy a new model version for example)? Is there any automation? What are the manual steps involved?
Which indicators are being monitored after the model is rolled out (e.g. model's performance, feature drift, feature parity, business metrics, distributions of input and outputs, etc.)? Please provide link to the relevant dashboard( s)
What are the latency and throughput requirements for the ML system? Are they met?
Which metadata and artifacts are logged during the ML lifecycle (e.g. datasets, hyper-parameters, evaluation metrics, model binary, etc) and are accessible by everyone in the team? How is this being done?
Is the output of the model stored in a table for consumption? If yes, please provide table's link
Did you use any explainability methods on your model? If yes, Which ones?
Is the model checked against any undesired biases (for more info see: go/ml-fairness) ?
Are there any standards to be met? Such as PII compliance? How are they met?
Do you perform input data validation (e.g. check for unexpected feature statistics, nulls and counts of input datasets, etc.)? If yes, what do you check and how?
Does the model consume user generated data or data from external sources (e.g. publicly available datasets or Any dataset downloaded outside the company ecosystem?
Do you filter out bots, to reduce the likelihood of the datasets being tampered intentionally? If yes, how?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quality_assessment_form.md

quality_assessment_form.md

Quality assessment form

Files

quality_assessment_form.md

Latest commit

History

quality_assessment_form.md

File metadata and controls

Quality assessment form