## Identifying Poisonous Mushrooms  
#### Exploring and preparing the data

In [None]:
mushrooms <- read.csv("mushrooms.csv", stringsAsFactors = TRUE)

In [2]:
# examine the structure of the data frame
str(mushrooms)

'data.frame':	8124 obs. of  23 variables:
 $ type                    : Factor w/ 2 levels "edible","poisonous": 2 1 1 2 1 1 1 1 2 1 ...
 $ cap_shape               : Factor w/ 6 levels "bell","conical",..: 3 3 1 3 3 3 1 1 3 1 ...
 $ cap_surface             : Factor w/ 4 levels "fibrous","grooves",..: 4 4 4 3 4 3 4 3 3 4 ...
 $ cap_color               : Factor w/ 10 levels "brown","buff",..: 1 10 9 9 4 10 9 9 9 10 ...
 $ bruises                 : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 2 2 2 2 ...
 $ odor                    : Factor w/ 9 levels "almond","anise",..: 8 1 2 8 7 1 1 2 8 1 ...
 $ gill_attachment         : Factor w/ 2 levels "attached","free": 2 2 2 2 2 2 2 2 2 2 ...
 $ gill_spacing            : Factor w/ 2 levels "close","crowded": 1 1 1 1 2 1 1 1 1 1 ...
 $ gill_size               : Factor w/ 2 levels "broad","narrow": 2 1 1 2 1 1 1 1 2 1 ...
 $ gill_color              : Factor w/ 12 levels "black","brown",..: 1 1 2 2 1 2 5 2 8 5 ...
 $ stalk_shape             : Factor w/

In [14]:
# dropping the veil_type feature because it has only one level
mushrooms$veil_type <- NULL

In [4]:
# examine the class distribution using the "table" function
table(mushrooms$type)


   edible poisonous 
     4208      3916 

In [9]:
## Step 3: Training a model on the data ----
library(RWeka)

In [7]:
install.packages("RWeka")

also installing the dependencies 'RWekajars', 'rJava'




  There are binary versions available but the source versions are later:
      binary source needs_compilation
rJava  1.0-4  1.0-6              TRUE
RWeka 0.4-43 0.4-44             FALSE

  Binaries will be installed
package 'RWekajars' successfully unpacked and MD5 sums checked
package 'rJava' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\EDEH EMEKA NWEKE\AppData\Local\Temp\RtmpI5V6jv\downloaded_packages


installing the source package 'RWeka'



In [10]:
# train OneR() on the data
mushroom_1R <- OneR(type ~ ., data = mushrooms)

In [11]:
## SteEvaluating model performance ----
mushroom_1R
summary(mushroom_1R)

odor:
	almond	-> edible
	anise	-> edible
	creosote	-> poisonous
	fishy	-> poisonous
	foul	-> poisonous
	musty	-> poisonous
	none	-> edible
	pungent	-> poisonous
	spicy	-> poisonous
(8004/8124 instances correct)



=== Summary ===

Correctly Classified Instances        8004               98.5229 %
Incorrectly Classified Instances       120                1.4771 %
Kappa statistic                          0.9704
Mean absolute error                      0.0148
Root mean squared error                  0.1215
Relative absolute error                  2.958  %
Root relative squared error             24.323  %
Total Number of Instances             8124     

=== Confusion Matrix ===

    a    b   <-- classified as
 4208    0 |    a = edible
  120 3796 |    b = poisonous

#### From the above summary, about 98% of the instances were correctly classified with an error of just 1.477%

In [13]:
## Improving model performance
mushroom_JRip <- JRip(type ~ ., data = mushrooms)
mushroom_JRip
summary(mushroom_JRip)

JRIP rules:

(odor = foul) => type=poisonous (2160.0/0.0)
(gill_size = narrow) and (gill_color = buff) => type=poisonous (1152.0/0.0)
(gill_size = narrow) and (odor = pungent) => type=poisonous (256.0/0.0)
(odor = creosote) => type=poisonous (192.0/0.0)
(spore_print_color = green) => type=poisonous (72.0/0.0)
(stalk_surface_below_ring = scaly) and (stalk_surface_above_ring = silky) => type=poisonous (68.0/0.0)
(habitat = leaves) and (cap_color = white) => type=poisonous (8.0/0.0)
(stalk_color_above_ring = yellow) => type=poisonous (8.0/0.0)
 => type=edible (4208.0/0.0)

Number of Rules : 9



=== Summary ===

Correctly Classified Instances        8124              100      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
Total Number of Instances             8124     

=== Confusion Matrix ===

    a    b   <-- classified as
 4208    0 |    a = edible
    0 3916 |    b = poisonous

#### All the instances are now correctly classified after improving the model perfomance with JRip