-
Implemented and investigated performance of classification algorithms such as decision tree, K-nearest neighbors, logistic regression and random forest to classify patients with liver problems in a clinical data set.
-
Experimented and identified best features for different algorithms.
-
Performed data normalization using different methods (Min-Max, z-score).
-
Performed N-fold cross-validation on the data set.
-
Compared precision, recall and F-score of the algorithms.
-
This data set contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos.
Data Set Characteristics | Number of Instances | Area | Attribute Characteristics | Number of Attributes | Date Donated | Associated Tasks |
---|---|---|---|---|---|---|
Multivariate | 583 | Life | Integer, Real | 10 | 2012-05-21 | Classification |
-
This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from north east of Andhra Pradesh, India. Selector is a class label used to divide into groups(liver patient or not). This data set contains 441 male patient records and 142 female patient records.
-
Any patient whose age exceeded 89 is listed as being of age "90".
- Age Age of the patient
- Gender Gender of the patient
- TB Total Bilirubin
- DB Direct Bilirubin
- Alkphos Alkaline Phosphotase
- Sgpt Alamine Aminotransferase
- Sgot Aspartate Aminotransferase
- TP Total Protiens
- ALB Albumin
- A/G Ratio Albumin and Globulin Ratio
- Selector field used to split the data into two sets (labeled by the experts)