<a href="https://colab.research.google.com/github/AlineEmmer/Machine-Learning/blob/main/Machine_learning_Diabetes_Dyslipidemia_Prediction1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##In this project, a database containing infrared spectra of patients was analyzed: group 0 = patients with mixed dyslipidemia (n=100), group 1 = patients with diabetes (n=100), group 2 = patients with hypercholesterolemia (n=100 ), group 3 = patients with hypertriglyceridemia (n=100), group 4-healthy volunteers (n=100), group 5= patients with pre-diabetes (n=100). The goal is to develop a machine learning model to predict diabetes and dyslipidemia and to identify biomarkers associated with these diseases, with a view to optimizing the diagnosis

In [None]:
# Task to be performed
# Step 1: Import the database
# Step 2: Import the Pandas library to help manipulate the database
# Step 3: Remove unnecessary columns
# Step 4: Installing the Pycaret library (helps to perform Auto-Machine Learn)
# Step 5: Importing the Pycaret library
# Step 6: Pre-processing the data
# Step 7: Building and comparing different models
# Step 8: Training the best model based on predictive performance metrics
# Step 9: Extracting the metrics results from the model
# Step 10: Conclusions about the model
# Step 11: Saving the model to make predictions in real life (Deploy)

In [None]:
# Step 1: Import the database
from google.colab import files
uploaded = files.upload()

In [None]:
# Step 2: Import the Pandas library to help manipulate the database
import pandas as pd
df1 = pd.read_excel("Dataset.xlsx")
display (df1)

In [None]:
# Step 4: Installing the Pycaret library (helps to perform Auto-Machine Learn)
!pip install pycaret

In [None]:
# Step 5: Importing the classification library from PyCaret
from pycaret import classification


In [None]:
# Step 6: Pre-processing the data
classification_setup = classification.setup(data = df1, target = "Classes")

In [None]:
# Step 7: Building and comparing different models
best = classification.compare_models()

In [None]:
# Step 8: Training the best model based on predictive performance metrics
# Note: The Extra Trees Classifier was the top of them. Now, we will create and validate this model
best_model = classification.create_model("et")

In [None]:
# Step 9: Extracting the metrics results from the model
classification.evaluate_model(best_model)

In [None]:
#Step 10: Plotting the decision boundary plot
classification.plot_model(best_model, plot ="boundary", plot_kwargs ={'classes': ["Dyslipidemia", "Diabetes", "Hypercholesterolemia", 
                                                                                  "Hypertriglyceridemia", "Healthy", "Pre-diabetes"]})
classification.plot_model(best_model, plot ="boundary", plot_kwargs ={'classes': 
  ["Dyslipidemia", "Diabetes", "Hypercholesterolemia", "Hypertriglyceridemia", "Healthy", "Pre-diabetes"]},                                                                
                          save = True, scale = 3)


In [None]:
# Step 11: Plotting the learning curve
classification.plot_model(best_model, plot ="learning")
classification.plot_model(best_model, plot ="learning", save = True, scale = 3)

In [None]:
# Step 12: Plotting the ROC curves
classification.plot_model(best_model, plot ="auc", plot_kwargs ={'classes': ["Dyslipidemia", "Diabetes",
                                                                           "Hypercholesterolemia", "Hypertriglyceridemia", "Healthy", "Pre-diabetes"]})
classification.plot_model(best_model, plot ="auc", plot_kwargs ={'classes': ["Dyslipidemia", "Diabetes",
                                                                           "Hypercholesterolemia", "Hypertriglyceridemia", "Healthy", "Pre-diabetes"]}, scale = 6, save = True)

In [None]:
# Step 13: Plotting de confusion matrix graph
classification.plot_model(best_model, plot ="confusion_matrix", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]})
classification.plot_model(best_model, plot ="confusion_matrix", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]}, 
                          scale = 6, save = True)

In [None]:
# Step 15: Compare predictions and results of the model
classification.plot_model(best_model, plot ="error", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]})
classification.plot_model(best_model, plot ="error", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]}, 
                          scale = 3, save = True)

In [None]:
# Step 16: Figures of merit of the model
classification.plot_model(best_model, plot ="class_report", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]})
classification.plot_model(best_model, plot ="class_report", plot_kwargs ={'classes': ["Dys", "DM", "Hchol", "Htrig", "Healthy", "Pre-DM"]}, 
                          scale = 6, save = True)

In [None]:
# Step 17: Plotting only the TOP 10 most important wavenumber
classification.plot_model(best_model, plot ="feature")
classification.plot_model(best_model, plot ="feature", scale = 6, save = True)

In [None]:
# Step 8: Training the best model based on predictive performance metrics
# Note: The random forest model was one of the top three. Now, we will create and validate this.
best_model = classification.create_model("rf")

In [None]:
# Step 17: Plotting only the TOP 10 most important wavenumber
classification.plot_model(best_model, plot ="feature")
classification.plot_model(best_model, plot ="feature", scale = 3, save = True)

In [None]:
# Step 8: Training the best model based on predictive performance metrics
# Note: The PLS-DA model was one of the top three. Now, we will create and validate the Linear Discriminant Analysis model
best_model = classification.create_model("lda")

In [None]:
# Step 17: Plotting only the TOP 10 most important wavenumber
classification.plot_model(best_model, plot ="feature")
classification.plot_model(best_model, plot ="feature", scale = 3, save = True)

In [None]:
# Passo 11: Saving the model for predictions (Deploy)
classification.save_model(best_model, "best_model_et")