The dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.
Objective We will try to build a machine learning model to accurately predict whether or not the patients in the dataset have diabetes or not?
Data The datasets consists of several medical predictor variables and one target variable, Outcome. Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and so on.
"- A: Number of pregnancies\n",
"- B: Concentration of plasma glucose in a 2-hour oral glucose tolerance test\n",
"- C: Diastolic blood pressure - Measured in mmHg\n",
"- D: Triceps skin fold thickness - Measured in mm\n",
"- E: Insulin concentration in the serum in 2 hours. Measured in (mu U/ml)\n",
"- F: Weight in kg/height in (m^2)\n",
"- G: Function that assigns the probability of someone getting diabetes\n",
"- H: Age\n",
"- Target: Value of 0 or 1 corresponds to no diabetes and diabetes\n"