# DeepShrooms

Our goal is to classify pictures of (common) mushrooms using some kind of web-app.

Challenge is to get good quality data for the training and then negate some common problems with images such as: lighting, angle, blurriness and background noise.

Current plan is to classify only the poisonous mushrooms of Finland along with some common edible and un-edible ones. Probably using Convolutional Neural Network.


Preliminary Models:

Mushroom Class
- name_fin - Finnish name
- name_eng - English name
- name_latin - Latin name
- url_mw - Mushroom-world url
- url_wiki? - Wikipedia url
- url_lajit? - Lajit.fi url
- img_urls? - List of links to its images. Probably should be deleted.
- edibility - edible/poisonous/inedible

Mushroom Image
- name_latin - Latin name
- name_img - Name of the image file
- img_url - URL to the original picture
- file_path - Path to the image from the root of its containing folder.

## Sources of data

Mushroom World
http://www.mushroom.world

Lajit
http://tun.fi/HBF.25786?locale=fi

Luontoportti
http://www.luontoportti.com/suomi/fi/sienet/


## General guidance for classifying mushrooms from pictures

1) Note where the mushroom grows: is it on the ground or on wood.  
1.1) If it grows on ground it could be a saprotroph of detritus (karikkeen lahottaja), mycorrhiza (juurisieni) which is specific type of mushroom living from the roots of a tree or it could still be a saprotroph of a tree but the wood is on ground level.  
1.2) If it grows on wood consider if it's conifer(havupuu) or hardwood(lehtipuu). Some species grow on only a very specific wood like oak.  
2) Do not consider the time of year to be a indicator of any sort. Every year the seasons length differ so one year the mushroom season might start way later than the next year.  
3) Underneath the mushroom cap can be different types of gills in various combinations. (TODO)  
4) Surface of the cap is also a very good classifier of the mushroom. The structure might not be possible to know from a picture though.  
5) Color of the mushroom varies a lot depending on the humidity and the age of the mushroom. When raining the colors get deeper and more distinctive. Young mushrooms have stronger colors than old ones. Also sunlight might diminish the colors.  
6) Stipe(jalka) of a mushroom can also indicate a lot of information from the mushroom. Thin and thick shapes are more distinctive than average size.  
7) Geographic location of the mushroom could be useful as some species probably don't grow everywhere in Finland.

There is also a lot of data such as smell and touch that could be used but then user would have to input it him or herself.

Sources:
* http://www.funga.fi/teema-aiheet/sienten-tunnistaminen/


# Downloading and importing the dataset

We scraped the data from mushroom.world website using a scraper beforehand. The images and the metadata are stored in both Google Drive and Amazon S3. But since Google Drive doesn't support direct downloads (like wtf) I had to put the file in S3 too. With public access rights, yey.

## About the pictures

The pictures are `.jpg` pictures resized to a standard 480x480 size.

In [3]:
from io import BytesIO
from urllib.request import urlopen
from zipfile import ZipFile

DATASET_VERSION = 'mushroom_world_2017_16_10'
DATASET_LINK = 'https://s3.eu-central-1.amazonaws.com/deep-shrooms/{}.zip'.format(DATASET_VERSION)

with urlopen(DATASET_LINK) as zipresp:
    with ZipFile(BytesIO(zipresp.read())) as zfile:
        zfile.extractall('./data')

In [4]:
import pandas as pd
import numpy as np

DATASET_PATH = 'data/{}/'.format(DATASET_VERSION)

mushroom_classes = pd.read_json(DATASET_PATH + 'mushroom_classes.json', lines=True)
mushroom_imgs = pd.read_json(DATASET_PATH + 'mushroom_imgs.json', lines=True)
fin_names = pd.read_csv('test_labels.csv')

In [5]:
from scipy import misc

def load_mushroom_images(folder_path, img_df):
    img_dict = {}
    for index, path in enumerate(img_df['file_path']):
        img_dict[index] = misc.imread(folder_path + path)
    return img_dict
        
img_dict = load_mushroom_images(DATASET_PATH, mushroom_imgs)

In [7]:
img_dict[0]

array([[[ 59,  65,  87],
        [ 78,  85, 104],
        [103, 111, 124],
        ..., 
        [ 86,  71,  66],
        [ 96,  78,  74],
        [101,  83,  79]],

       [[ 87,  96, 113],
        [106, 115, 130],
        [129, 139, 148],
        ..., 
        [ 81,  66,  61],
        [ 89,  74,  69],
        [ 97,  79,  75]],

       [[108, 122, 131],
        [126, 141, 146],
        [147, 161, 162],
        ..., 
        [ 75,  61,  58],
        [ 85,  70,  67],
        [ 93,  78,  75]],

       ..., 
       [[ 45,  80,  60],
        [ 51,  85,  68],
        [ 58,  95,  77],
        ..., 
        [178, 209, 212],
        [180, 211, 216],
        [183, 214, 219]],

       [[ 47,  79,  58],
        [ 51,  82,  64],
        [ 55,  88,  69],
        ..., 
        [175, 208, 213],
        [178, 211, 216],
        [179, 212, 217]],

       [[ 57,  85,  63],
        [ 58,  87,  67],
        [ 58,  87,  67],
        ..., 
        [173, 208, 212],
        [177, 212, 216],
        [178, 213,

# Some formatting?

Idk. Add finnish names to some classes? Subset the dataset to only use those?

In [8]:
#[row for row in test_labels.itertuples()]
#print(test_labels[1:2])
#dum = {row[1].lower().replace(' ', '_') for row in test_labels.itertuples()}
#print(dum)
#'http://www.mushroom.world/data/fungi/Cantharelluscibarius1.JPG'[-3:].lower()

mushroom_imgs.loc[1:3]

Unnamed: 0,file_path,img_url,name_img,name_latin
1,mushroom_img/tylopilus_felleus1.jpg,http://www.mushroom.world/data/fungi/Tylopilus...,tylopilus_felleus1.jpg,Tylopilus felleus
2,mushroom_img/albatrellus_ovinus0.jpg,http://www.mushroom.world/data/fungi/Albatrell...,albatrellus_ovinus0.jpg,Albatrellus ovinus
3,mushroom_img/albatrellus_ovinus1.jpg,http://www.mushroom.world/data/fungi/Albatrell...,albatrellus_ovinus1.jpg,Albatrellus ovinus
