# WhichShroom
Here, a simple neural network using Keras is used to get accurate prediction weights to be used in a Javascript web application for predicting mushroom toxicity.

## Imports

In [1]:
from IPython.display import display, HTML
import numpy as np
import pandas as pd

from keras.layers import Activation, Dense
from keras.models import Sequential

from sklearn.model_selection import train_test_split

Using TensorFlow backend.


## Data Survey

In [2]:
df = pd.read_csv("data/MushroomClassification/mushrooms.csv")

df.head()

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g


In [3]:
list(df)

['class',
 'cap-shape',
 'cap-surface',
 'cap-color',
 'bruises',
 'odor',
 'gill-attachment',
 'gill-spacing',
 'gill-size',
 'gill-color',
 'stalk-shape',
 'stalk-root',
 'stalk-surface-above-ring',
 'stalk-surface-below-ring',
 'stalk-color-above-ring',
 'stalk-color-below-ring',
 'veil-type',
 'veil-color',
 'ring-number',
 'ring-type',
 'spore-print-color',
 'population',
 'habitat']

While I originally trained this network on all 22 of these features, I saw two factors that directed me to shorten the feature list:
* The network was training to 100% accuracy in very few epochs.
* The number of features would be cumbersome for users to input.

So, the following steps work to shorten the feature list, to capitalize on the degree of accuracy and to make a more friendly client-side.

The following was review for all 22 features to determine those with the most substantial impact on classification. However, I have only included two here as examples in order to facilitate readability.

In [4]:
print("Cap Shape")
display(pd.DataFrame({
    "Edible": df.loc[df["class"] == "e", "gill-attachment"].value_counts(),
    "Poison": df.loc[df["class"] == "p", "gill-attachment"].value_counts()
}))

print("Cap Surface")
display(pd.DataFrame({
    "Edible": df.loc[df["class"] == "e", "gill-spacing"].value_counts(),
    "Poison": df.loc[df["class"] == "p", "gill-spacing"].value_counts()
}))

Cap Shape


Unnamed: 0,Edible,Poison
f,4016,3898
a,192,18


Cap Surface


Unnamed: 0,Edible,Poison
c,3008,3804
w,1200,112


Features of import:
* cap-shape
* cap-surface
* cap-color
* bruises
* gill-spacing
* gill-size
* stalk-root
* ring-number
* ring-type
* spore-print-color
* population
* habitat

However, my intention is to select an even smaller set of features, ones that would be easy for end users to select for using the network's results. You'll thus see this list shortened further in the preprocessing below

## Preprocessing

In [5]:
labels = np.array([0 if x=="p" else 1 for x in df["class"]])

print(labels[:50])

[0 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1
 0 1 1 1 1 1 0 1 1 1 1 1 1]


In [6]:
prelim_features = df.copy()
prelim_features = prelim_features.drop(["class", "stalk-root", "spore-print-color", "cap-surface",
                                        "ring-number", "ring-type", "gill-spacing", "gill-size",
                                        'odor', 'gill-attachment', 'gill-color', 'stalk-shape',
                                        'stalk-surface-above-ring', 'stalk-surface-below-ring',
                                        'stalk-color-above-ring', 'stalk-color-below-ring',
                                        'veil-type', 'veil-color'], axis=1)
prelim_features = pd.get_dummies(prelim_features, prefix=list(prelim_features), columns=list(prelim_features))

features = np.array(prelim_features.values)

print(features[:2])
print(features.shape)

[[0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0]
 [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0]]
(8124, 31)


In [7]:
list(prelim_features)

['cap-shape_b',
 'cap-shape_c',
 'cap-shape_f',
 'cap-shape_k',
 'cap-shape_s',
 'cap-shape_x',
 'cap-color_b',
 'cap-color_c',
 'cap-color_e',
 'cap-color_g',
 'cap-color_n',
 'cap-color_p',
 'cap-color_r',
 'cap-color_u',
 'cap-color_w',
 'cap-color_y',
 'bruises_f',
 'bruises_t',
 'population_a',
 'population_c',
 'population_n',
 'population_s',
 'population_v',
 'population_y',
 'habitat_d',
 'habitat_g',
 'habitat_l',
 'habitat_m',
 'habitat_p',
 'habitat_u',
 'habitat_w']

In [8]:
train_x, test_x, train_y, test_y = train_test_split(features, labels, test_size=0.2)

## Hyperparameters

In [9]:
epochs = np.array(350)
batch_size = np.array(256)

## Model

### Keras

In [10]:
model = Sequential()

model.add(Dense(8, input_dim=features.shape[1]))
model.add(Activation("sigmoid"))
model.add(Dense(1))

In [11]:
model.compile(loss="mean_squared_error", optimizer="adam", metrics=["accuracy"])

## Training

In [12]:
model.fit(train_x, train_y, batch_size=batch_size, epochs=epochs)

Epoch 1/350
Epoch 2/350
Epoch 3/350
Epoch 4/350
Epoch 5/350
Epoch 6/350
Epoch 7/350
Epoch 8/350
Epoch 9/350
Epoch 10/350
Epoch 11/350
Epoch 12/350
Epoch 13/350
Epoch 14/350
Epoch 15/350
Epoch 16/350
Epoch 17/350
Epoch 18/350
Epoch 19/350
Epoch 20/350
Epoch 21/350
Epoch 22/350
Epoch 23/350
Epoch 24/350
Epoch 25/350
Epoch 26/350
Epoch 27/350
Epoch 28/350
Epoch 29/350
Epoch 30/350
Epoch 31/350
Epoch 32/350
Epoch 33/350
Epoch 34/350
Epoch 35/350
Epoch 36/350
Epoch 37/350
Epoch 38/350
Epoch 39/350
Epoch 40/350
Epoch 41/350
Epoch 42/350
Epoch 43/350
Epoch 44/350
Epoch 45/350
Epoch 46/350
Epoch 47/350
Epoch 48/350
Epoch 49/350
Epoch 50/350
Epoch 51/350
Epoch 52/350
Epoch 53/350
Epoch 54/350
Epoch 55/350
Epoch 56/350
Epoch 57/350
Epoch 58/350
Epoch 59/350
Epoch 60/350
Epoch 61/350
Epoch 62/350
Epoch 63/350
Epoch 64/350
Epoch 65/350
Epoch 66/350
Epoch 67/350
Epoch 68/350
Epoch 69/350
Epoch 70/350
Epoch 71/350
Epoch 72/350
Epoch 73/350
Epoch 74/350
Epoch 75/350
Epoch 76/350
Epoch 77/350
Epoch 78

Epoch 83/350
Epoch 84/350
Epoch 85/350
Epoch 86/350
Epoch 87/350
Epoch 88/350
Epoch 89/350
Epoch 90/350
Epoch 91/350
Epoch 92/350
Epoch 93/350
Epoch 94/350
Epoch 95/350
Epoch 96/350
Epoch 97/350
Epoch 98/350
Epoch 99/350
Epoch 100/350
Epoch 101/350
Epoch 102/350
Epoch 103/350
Epoch 104/350
Epoch 105/350
Epoch 106/350
Epoch 107/350
Epoch 108/350
Epoch 109/350
Epoch 110/350
Epoch 111/350
Epoch 112/350
Epoch 113/350
Epoch 114/350
Epoch 115/350
Epoch 116/350
Epoch 117/350
Epoch 118/350
Epoch 119/350
Epoch 120/350
Epoch 121/350
Epoch 122/350
Epoch 123/350
Epoch 124/350
Epoch 125/350
Epoch 126/350
Epoch 127/350
Epoch 128/350
Epoch 129/350
Epoch 130/350
Epoch 131/350
Epoch 132/350
Epoch 133/350
Epoch 134/350
Epoch 135/350
Epoch 136/350
Epoch 137/350
Epoch 138/350
Epoch 139/350
Epoch 140/350
Epoch 141/350
Epoch 142/350
Epoch 143/350
Epoch 144/350
Epoch 145/350
Epoch 146/350
Epoch 147/350
Epoch 148/350
Epoch 149/350
Epoch 150/350
Epoch 151/350
Epoch 152/350
Epoch 153/350
Epoch 154/350
Epoch 155

Epoch 164/350
Epoch 165/350
Epoch 166/350
Epoch 167/350
Epoch 168/350
Epoch 169/350
Epoch 170/350
Epoch 171/350
Epoch 172/350
Epoch 173/350
Epoch 174/350
Epoch 175/350
Epoch 176/350
Epoch 177/350
Epoch 178/350
Epoch 179/350
Epoch 180/350
Epoch 181/350
Epoch 182/350
Epoch 183/350
Epoch 184/350
Epoch 185/350
Epoch 186/350
Epoch 187/350
Epoch 188/350
Epoch 189/350
Epoch 190/350
Epoch 191/350
Epoch 192/350
Epoch 193/350
Epoch 194/350
Epoch 195/350
Epoch 196/350
Epoch 197/350
Epoch 198/350
Epoch 199/350
Epoch 200/350
Epoch 201/350
Epoch 202/350
Epoch 203/350
Epoch 204/350
Epoch 205/350
Epoch 206/350
Epoch 207/350
Epoch 208/350
Epoch 209/350
Epoch 210/350
Epoch 211/350
Epoch 212/350
Epoch 213/350
Epoch 214/350
Epoch 215/350
Epoch 216/350
Epoch 217/350
Epoch 218/350
Epoch 219/350
Epoch 220/350
Epoch 221/350
Epoch 222/350
Epoch 223/350
Epoch 224/350
Epoch 225/350
Epoch 226/350
Epoch 227/350
Epoch 228/350
Epoch 229/350
Epoch 230/350
Epoch 231/350
Epoch 232/350
Epoch 233/350
Epoch 234/350
Epoch 

Epoch 244/350
Epoch 245/350
Epoch 246/350
Epoch 247/350
Epoch 248/350
Epoch 249/350
Epoch 250/350
Epoch 251/350
Epoch 252/350
Epoch 253/350
Epoch 254/350
Epoch 255/350
Epoch 256/350
Epoch 257/350
Epoch 258/350
Epoch 259/350
Epoch 260/350
Epoch 261/350
Epoch 262/350
Epoch 263/350
Epoch 264/350
Epoch 265/350
Epoch 266/350
Epoch 267/350
Epoch 268/350
Epoch 269/350
Epoch 270/350
Epoch 271/350
Epoch 272/350
Epoch 273/350
Epoch 274/350
Epoch 275/350
Epoch 276/350
Epoch 277/350
Epoch 278/350
Epoch 279/350
Epoch 280/350
Epoch 281/350
Epoch 282/350
Epoch 283/350
Epoch 284/350
Epoch 285/350
Epoch 286/350
Epoch 287/350
Epoch 288/350
Epoch 289/350
Epoch 290/350
Epoch 291/350
Epoch 292/350
Epoch 293/350
Epoch 294/350
Epoch 295/350
Epoch 296/350
Epoch 297/350
Epoch 298/350
Epoch 299/350
Epoch 300/350
Epoch 301/350
Epoch 302/350
Epoch 303/350
Epoch 304/350
Epoch 305/350
Epoch 306/350
Epoch 307/350
Epoch 308/350
Epoch 309/350
Epoch 310/350
Epoch 311/350
Epoch 312/350
Epoch 313/350
Epoch 314/350
Epoch 

Epoch 325/350
Epoch 326/350
Epoch 327/350
Epoch 328/350
Epoch 329/350
Epoch 330/350
Epoch 331/350
Epoch 332/350
Epoch 333/350
Epoch 334/350
Epoch 335/350
Epoch 336/350
Epoch 337/350
Epoch 338/350
Epoch 339/350
Epoch 340/350
Epoch 341/350
Epoch 342/350
Epoch 343/350
Epoch 344/350
Epoch 345/350
Epoch 346/350
Epoch 347/350
Epoch 348/350
Epoch 349/350
Epoch 350/350


<keras.callbacks.History at 0x1a1c937208>

In [13]:
scores = model.evaluate(test_x, test_y)



In [14]:
print("%s: %.2f" % (model.metrics_names[1], scores[1]*100))

acc: 97.97


So, reducing the model to (effectively) five features only reduced its accuracy by ~3%. For the degree of usability this change adds, the reduction seems worthwhile.

In [15]:
model.save("MushroomClassification.h5")
with open("MushroomClassification.json", "w") as file:
    file.write(model.to_json())

In [16]:
final_weights = model.get_weights()[0]
final_biases = model.get_weights()[1]

np.savetxt("MushroomClassification_Weights.csv", final_weights, delimiter=",")
np.savetxt("MushroomClassification_Biases.csv", final_biases, delimiter=",")