Skip to content

Diabetes prediction

Lidia edited this page May 14, 2025 · 8 revisions

This page contains an example of using the MLSToolbox Code Generator tool for graphically defining an ML pipeline and generating the corresponding Python code. The defined ML pipeline aims to train a model for diabetes prediction using the SVM-supervised machine learning algorithm from the scikit-learn library. The selected dataset for this example is the Diabetes dataset.

Dataset information

The dataset is composed of 442 samples and 11 columns. The first variables describe patient characteristics (age, sex, body mass index, average blood pressure, and six blood serum measurements), while the last variable represents the expected progression of diabetes over one year.

The pipeline

Main editor

The pipeline is composed of the following stages:

  • Data Collection: gets the diabetes data from the diabetes dataset
  • Feature Engineering: splits the columns into the features used to train and the truth or target values. Data cleaning is not required.
  • Model Training: splits the features data into features_train and features_test data and truth data into truth_train and truth_test data. Then, features_train and truth_train data are used to train the model and features_test, truth_test data and diabetes_model are provided as output to be used in the next stage.
  • Model Evaluation: calculates the accuracy metric of the diabetes_model, making predictions using the features_test, truth_test and the diabetes_model.

All the information required to reproduce the diabetes example and the generated code of the ML pipeline can be found on the pipeline examples repository.

Clone this wiki locally