# Edibility of Mushrooms
## Introduction:
Mushrooms can be made for delicious dishes, but some of them are poisonous! Predicting the edibility from its appearance would be essential when we encounter a an unknown mushroom.

- In our project, we aim to train a model which uses classification method to predict whether an unknown mushroom is edible or poisonous given various dimensions in its appearance.

#### The Dataset:
- "Secondary mushroom data" from UCI Machine Learning Repository (https://mushroom.mathematik.uni-marburg.de/files/) 
- Includes 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species)
- Each mushroom is identified as edible, poisonous, or of unknown edibility
- Of the 20 variables, 3 are continuous, and 17 are nominal (3 are binary, and 14 are categorical)

## Exploratory data analysis:
#### - Read, clean and wrangle data into a tidy format

In [2]:
library(tidyverse)
library(tidymodels)
library(cowplot)
set.seed(777)
link <- "https://mushroom.mathematik.uni-marburg.de/files/SecondaryData/secondary_data_shuffled.csv"
mushrooms <- read_delim(link, delim=";") # download our dataset
head(mushrooms) # show the first 6 examples

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.6     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.7     [32m✔[39m [34mdplyr  [39m 1.0.9
[32m✔[39m [34mtidyr  [39m 1.2.0     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.1.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

── [1mAttaching packages[22m ────────────────────────────────────── tidymodels 1.0.0 ──

[32m✔[39m [34mbroom       [39m 1.0.0     [32m✔[39m [34mrsample     [39m 1.0.0
[32m✔[39m [34mdials       [39m 1.0.0     [32m✔[39m [34mtune        [39m 1.0.0
[32m✔[39m [34minfer       [39m 1.0.2     [32m✔[39m [34mworkflows   [39m 1.0.0
[32m✔

class,cap-diameter,cap-shape,cap-surface,cap-color,does-bruise-or-bleed,gill-attachment,gill-spacing,gill-color,stem-height,⋯,stem-root,stem-surface,stem-color,veil-type,veil-color,has-ring,ring-type,spore-print-color,habitat,season
<chr>,<dbl>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,<chr>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<lgl>,<chr>,<chr>,<chr>,<chr>
e,1.72,x,s,y,False,d,,w,7.01,⋯,,t,y,,,False,f,,h,u
e,9.59,f,e,b,False,,c,b,4.73,⋯,,,w,,,True,f,,d,a
p,0.86,x,g,p,False,a,,p,4.25,⋯,,s,k,,,False,f,,d,s
p,4.32,x,,e,False,x,,w,4.91,⋯,,,w,,,False,f,,d,u
e,2.8,x,s,w,False,d,d,w,3.13,⋯,,,w,,,False,f,,m,a
p,1.18,s,s,y,False,f,f,f,3.39,⋯,,,y,,,False,f,,d,u


In [3]:
new_colnames <- map(colnames(mushrooms), function (col_name) gsub("-", "_", col_name)) # replace '-' with '_'
new_colnames[2] <- 'diameter' # compact the names of diameter, height, and width
new_colnames[10] <- 'height'
new_colnames[11] <- 'width'
colnames(mushrooms) <- new_colnames
mushroom_split <- mushrooms |>
    mutate(across(-c(diameter, height, width), factor)) |>
    initial_split(prop=0.75, strata=class)
mushroom_training_raw <- training(mushroom_split)
mushroom_training <- mushroom_training_raw |>
    select(class, diameter, height, width)
head(mushroom_training)

class,diameter,height,width
<fct>,<dbl>,<dbl>,<dbl>
e,9.59,4.73,20.49
e,5.76,8.11,17.69
e,11.53,8.99,18.61
e,1.48,5.39,2.56
e,12.59,12.33,15.4
e,1.4,5.97,1.8


In [4]:
k <- 10
mushroom_recipe <- recipe(class ~ diameter + height + width, data=mushroom_training) |>
    step_scale(all_predictors()) |>
    step_center(all_predictors())
mushroom_spec <- nearest_neighbor(weight_func="rectangular", neighbors=k) |>
    set_engine("kknn") |>
    set_mode("classification")
mushroom_fit <- workflow() |>
    add_recipe(mushroom_recipe) |>
    add_model(mushroom_spec) |>
    fit(data = mushroom_training)
mushroom_predicted <- mushroom_fit |>
    predict(mushroom_training) |>
    bind_cols(mushroom_training)
training_metrics <- mushroom_predicted |>
    metrics(truth=class, estimate=.pred_class)
training_metrics

.metric,.estimator,.estimate
<chr>,<chr>,<dbl>
accuracy,binary,0.8335189
kap,binary,0.6628845
