ILPD-Data-Mining

Implemented and investigated performance of classification algorithms such as decision tree, K-nearest neighbors, logistic regression and random forest to classify patients with liver problems in a clinical data set.
Experimented and identified best features for different algorithms.
Performed data normalization using different methods (Min-Max, z-score).
Performed N-fold cross-validation on the data set.
Compared precision, recall and F-score of the algorithms.
This data set contains 10 variables that are age, gender, total Bilirubin, direct Bilirubin, total proteins, albumin, A/G ratio, SGPT, SGOT and Alkphos.

Data Set Characteristics	Number of Instances	Area	Attribute Characteristics	Number of Attributes	Date Donated	Associated Tasks
Multivariate	583	Life	Integer, Real	10	2012-05-21	Classification

Data Set Information:

This data set contains 416 liver patient records and 167 non liver patient records.The data set was collected from north east of Andhra Pradesh, India. Selector is a class label used to divide into groups(liver patient or not). This data set contains 441 male patient records and 142 female patient records.
Any patient whose age exceeded 89 is listed as being of age "90".

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Baysen Net		Baysen Net
Dataset		Dataset
FS1		FS1
IBK		IBK
KStar		KStar
LMT		LMT
Logistic		Logistic
Presentation		Presentation
RandomForest		RandomForest
SMO		SMO
Weka File		Weka File
j48		j48
README.md		README.md