-
Notifications
You must be signed in to change notification settings - Fork 0
Examples of use
We provide some examples to desmostrate the applicability of the tool to different domains, reading data from different data sources, performing several cleanings, using diverse training algorithms, and some evaluation metrics.
The first two pipelines has been developed to evaluate the applicability: the diabetes example is using a known toy dataset provided by sklearn module (data read from a csv file); the Big Mart Sales example has been developed to participate in a popular hackathon. The Iris Classification, the Purchase Experience score and the Image Classification pipelines has been generated to replicate three existing ML pipelines hosted in GitHub.
Example | Domain | Cleanings | Training algorithm | Evaluation metrics | URL |
---|---|---|---|---|---|
Diabetes | Health | No cleaning | SVM | Accuracy | www4.stat.ncsu.edu/~boos/var.select/diabetes.html |
Big Mart Sales | Sales prediction | Replace Nulls per average and Mode | Random Forest Regression | Accuracy | www.analyticsvidhya.com/datahack/contest/practice-problem-big-mart-sales-iii/ |
Iris classification | Botany | No cleaning | Random Forest Regression, SVM | Accuracy | www.github.com/Ernesto905/Zenml-Sentiment-Analysis-Pipeline - training_pipeline.py |
Purchase Experience Score | Sales Reviews | Replace Null by Median or Text | Linear Regression | Accuracy, MSE, MRSE, R2 | www.github.com/Akurati-Kaustiki/MLOPS-Assignment-Group17 |
Image Classification | Images | No cleaning | SVM | Confusion matrix, Precision, Recall, F1-score | [svm-image-classification](https://github.com/ahmdmohamedd/svm-image-classification/tree/main) |
To evalute the quality of the generated code, we used well-known tools such as Pylint and Radon. Pylint provides a quality score from 0 to 10, being 0 the worst value and 10 the best code quality evaluation. Radon computes Cyclomatic complexity (CC) and Maintainability index (MI) metrics. CC measures how many independent paths exist in the code, the more branching logic (e.g., if, for, while, try, etc.), the higher the number, and the harder the code is to understand, test, and maintain. MI is a composite score (from 0 to 100) that estimates how maintainable your code is; the higher the number, the better maintainability. Both metrics complement the quantitative value with a qualitative label from A (the best qualification) to F (the worst).
The following table presents the results for the three code quality metrics for the provided examples generated using the MLS Toolbox Code Generator. The generated code obtains good values for the three metrics, values next to 10 in PyLint scores, values next to 1 for Cyclomatic complexity, and values next to 100 for the Maintainability index.
Example | PyLint score | Cyclomatic complexity | Maintainability index |
---|---|---|---|
Diabetes | 9,25 | 1,40 (A) | 97,95 (A) |
Big Mart Sales | 9,03 | 1,41 (A) | 97,96 (A) |
Iris classification | 8,82 | 1,39 (A) | 97,62 (A) |
Purchase Experience Score | 8,94 | 1,39 (A) | 97,29 (A) |
Image Classification | 8.87 | 1.48 (A) | 93.36 (A) |
The following table provides the quality metrics results for the example implementations that has not been generated by the MLSToolbox Code Generator.
Example | Tool | PyLint score | Cyclomatic complexity | Maintainability index |
---|---|---|---|---|
Diabetes | Kubeflow | 4,29 | 1 (A) | 74,55 (A) |
Diabetes | ZenML | 7,53 | 1 (A) | 71,35 (A) |
Diabetes | scikit-learn | 3,57 | 1 (A) | 88,85 (A) |
Diabetes | -- | 0,55 | 2 (A) | 81,32 (A) |
Big Mart Sales | Kubeflow | 0 | 1 (A) | 61,77 (A) |
Big Mart Sales | ZenML | 5,49 | 2,5 (A) | 63,09 (A) |
Big Mart Sales | scikit-learn | 6,94 | 1,25 (A) | 100 (A) |
Big Mart Sales | -- | 0,55 | 1 (A) | 72,51 (A) |
Iris classification | ZenML | 0 | 1,97 (A) | 97,77 (A) |
Purchase Experience Score | -- | 5,66 | 3 (A) | 71,90 (A) |
Image Classification | -- | 7.1 | 2 (A) | 84.732 (A) |
The results consistently showed that MLSToolbox-generated code achieved the highest Pylint scores and a better MI than most other tools, highlighting its design focus on maintainability and evolution. The slightly higher CC can be attributed to a modular design that prioritizes reusability and extensibility.
- Home
- How to install
- How to use
- How to configure and extend
- Demos
-
- MLSToolbox related Wikis