/
lung-cancer.names
61 lines (46 loc) · 2.07 KB
/
lung-cancer.names
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
1. Title: Lung Cancer Data
2. Source Information:
- Data was published in :
Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small
Number of Samples and Design Method of Classifier on the Plane",
Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991.
- Donor: Stefan Aeberhard, stefan@coral.cs.jcu.edu.au
- Date : May, 1992
3. Past Usage:
- Hong, Z.Q. and Yang, J.Y. "Optimal Discriminant Plane for a Small
Number of Samples and Design Method of Classifier on the Plane",
Pattern Recognition, Vol. 24, No. 4, pp. 317-324, 1991.
- Aeberhard, S., Coomans, D, De Vel, O. "Comparisons of
Classification Methods in High Dimensional Settings",
submitted to Technometrics.
- Aeberhard, S., Coomans, D, De Vel, O. "The Dangers of
Bias in High Dimensional Settings", submitted to
pattern Recognition.
4. Relevant Information:
- This data was used by Hong and Young to illustrate the
power of the optimal discriminant plane even in ill-posed
settings. Applying the KNN method in the resulting plane
gave 77% accuracy. However, these results are strongly
biased (See Aeberhard's second ref. above, or email to
stefan@coral.cs.jcu.edu.au). Results obtained by
Aeberhard et al. are :
RDA : 62.5%, KNN 53.1%, Opt. Disc. Plane 59.4%
The data described 3 types of pathological lung cancers.
The Authors give no information on the individual
variables nor on where the data was originally used.
- In the original data 4 values for the fifth attribute were -1.
These values have been changed to ? (unknown). (*)
- In the original data 1 value for the 39 attribute was 4. This
value has been changed to ? (unknown). (*)
5. Number of Instances: 32
6. Number of Attributes: 57 (1 class attribute, 56 predictive)
7. Attribute Information:
attribute 1 is the class label.
- All predictive attributes are nominal, taking on integer
values 0-3
8. Missing Attribute Values: Attributes 5 and 39 (*)
9. Class Distribution:
- 3 classes,
1.) 9 observations
2.) 13 "
3.) 10 "