• Built predictive machine learning models (logistic regression, random forest, SVM, gradient boosting, neural networks) using Python and scikit-learn to classify diabetes risk from clinical data.
• Processed and analyzed a dataset of 520 observations and 16 clinical features, performing data preprocessing, categorical encoding, and feature scaling
• Applied Recursive Feature Elimination (RFE) and statistical analysis (correlation, chi-square, t-tests) to identify key predictors
• Evaluated model performance using accuracy, precision, recall, F1-score, and AUC-ROC, achieving 98.08% accuracy and AUC 0.995 with a neural network model
The DIABETES data sets in this directory are provided for use in 1994 AI in Medicine symposium submissions. Permission is granted to use the data sets for other research purposes as long as appropriate credit is given as to the source (AIM-94 data set provided by Michael Kahn, MD, PhD, Washington University, St. Louis, MO).
-
Data-Codes: a listing of the codes used in the data sets.
-
Domain-Description: This file describes the basic physiology and patho- physiology of diabetes mellitus and its treatment.
-
data-[01-70]: data sets covering several weeks' to months' worth of outpatient care on 70 patients. An additional 10 sets will be made available two weeks prior to the symposium for interested parties. Please contact the organizers if you would like to obtain these data sets.
You do not need to use all the data in order to participate. Use any subset of the available data from either the ICU data set or the diabetes data set. Furthermore, do not feel constrained if your methods cannot be applied directly to these data sets. We will consider submissions on related (i.e., clinical data interpretation) topics. If in doubt, consult with us via e-mail at aim-94@camis.stanford.edu.
We realize that an accurate interpretation of clinical data requires
a thorough understanding of the physiological principles and clinical
issues involved. We also realize that many AIM researchers do not
have convenient access to medical expertise, and that a symposium
focusing on a clinical theme may catch several parties at a disadvantage.
Conversely, some clinical researchers may be interested in participating
but may not have collaborators on the computer science end of the field.
To offset such disadvantages, we will provide a simple 'Matchmaker'
service for AIM-94. The purpose of this service is to establish a medium
by which researchers can seek collaborators of complementary background
and interests for AIM-94 participation and beyond.
If you are interested in participating in this program, send a one-paragraph description of your background, research interests, and the type of collaboration you are pursuing to aim-94@camis.stanford.edu by September 20th. We will collate these entries and distribute the whole list to all participants of the program. It will be the participants' responsibility to contact others to discuss and establish collaborative efforts; AIM-94 organizers will solely act as mediators.