## Flowers Recognition

In this notebook we solve classification task using DL. data set also available on Kaggle: https://www.kaggle.com/alxmamaev/flowers-recognition

## Context:
Total number of images: 4242.

The pictures are divided into five classes: chamomile, tulip, rose, sunflower, dandelion. For each class there are about 800 photos. Photos are not high resolution, about 320x240 pixels. Photos are not reduced to a single size, they have different proportions!

## Goal:
To build a DL model to recognise the images as accurate as possible.

## 1. Importing the data
Data:

1.1. Preparing environment and importing libraries

In [0]:
try:
    %tensorflow_version 2.x
except Exception:
    pass

TensorFlow 2.x selected.


In [0]:
!pip install -q -U --pre efficientnet

In [0]:
!pip install -q toai==0.1.19

In [0]:
__import__('toai').__version__

'0.1.19'

In [0]:
import os

In [0]:
from toai.imports import *
from toai.data import Dataset, DataParams, DataContainer, split_df
from toai.models import save_keras_model, load_keras_model
from toai.metrics import sparse_top_2_categorical_accuracy
from toai.image import (
    ImageLearner,
    ImageAugmentor,
    ImageDataset,
    ImageParser,
    ImageResizer,
    LearningRateFinder,
    ImageTrainingStep,
    ImageTrainer,
)
from toai.utils import download_file, unzip, save_file, load_file
import tensorflow as tf
from tensorflow import keras
import efficientnet.tfkeras as efn



In [0]:
from typing import *

In [0]:
tf.__version__

'2.0.0-rc2'

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 2. Data preparation and analysis

In [0]:
DATA_DIR = Path("data/flowers2")
TEMP_DIR = Path('drive/My Drive/Kiti/AI/Projects/project9_flowers2')
DATA_DIR.mkdir(parents=True, exist_ok=True)
TEMP_DIR.mkdir(parents=True, exist_ok=True)

In [0]:
def setup_kaggle():
    x = !ls kaggle.json
    assert x == ['kaggle.json'], 'Upload kaggle.json'
    !mkdir /root/.kaggle
    !mv kaggle.json /root/.kaggle
    !chmod 600 /root/.kaggle/kaggle.json

setup_kaggle()

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [0]:
!kaggle datasets download -q --unzip alxmamaev/flowers-recognition -p {str(DATA_DIR)}

In [0]:
IMG_DIMS = (299, 299, 3)

In [0]:
IMG_DIMS

(299, 299, 3)

We create the data frame to prepare data for training

In [0]:
def make_df_from_dir(path):
    data = {
        'labels': [],
        'image': [],
    }
    for labels in os.listdir(path):
        for image_name in os.listdir(path/labels):
            try:
                Image.open(str(path/labels/image_name))
                data['labels'].append(labels)
                data['image'].append(str(path/labels/image_name))
            except:
                pass
    return pd.DataFrame(data)

In [0]:
full_df = make_df_from_dir(DATA_DIR/'flowers')

In [0]:
full_df.head().T

Unnamed: 0,0,1,2,3,4
labels,rose,rose,rose,rose,rose
image,data/flowers2/flowers/rose/4675532860_890504a4...,data/flowers2/flowers/rose/15202632426_d88efb3...,data/flowers2/flowers/rose/7461896668_cfef58f8...,data/flowers2/flowers/rose/174109630_3c544b8a2...,data/flowers2/flowers/rose/5001852101_877cb2ae...


In [0]:
full_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4323 entries, 0 to 4322
Data columns (total 2 columns):
labels    4323 non-null object
image     4323 non-null object
dtypes: object(2)
memory usage: 67.6+ KB


In [0]:
full_df.labels.unique()

array(['rose', 'dandelion', 'daisy', 'sunflower', 'tulip'], dtype=object)

In [0]:
full_df.to_csv(TEMP_DIR/'full_df.csv', index=False)

In [0]:
full_df = pd.read_csv(TEMP_DIR/'full_df.csv')

Image distribution by label

In [0]:
full_df.groupby('labels')['image'].nunique()

labels
daisy         769
dandelion    1052
rose          784
sunflower     734
tulip         984
Name: image, dtype: int64

In [0]:
# import matplotlib.pyplot as plt

In [0]:
plt.style.use('seaborn')
sns.countplot(full_df['labels'].sort_values(), data=full_df)
plt.xticks(rotation=90)
plt.title('All images distribution by labels', fontsize=10)

Text(0.5, 1.0, 'All images distribution by labels')

## 3. Data preprocessing

In [0]:
target_col = 'labels'
image_path_col = 'image'

In [0]:
train_data, valid_data, test_data = ImageDataset.split(
    dataset=ImageDataset.from_dataframe(full_df, x_col=image_path_col, y_col=target_col),
    fracs=(0.8, 0.1, 0.1),
)

In [0]:
train_image_dataset = (
    train_data
    .dataset(batch_size=32, img_dims=IMG_DIMS, shuffle=True)
    .make_pipeline(
        image_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=3, flips="both"),
        ],
    )
    .save_pipeline(TEMP_DIR/"train")
    .preprocess()
)

In [0]:
valid_image_dataset = (
    valid_data
    .dataset(batch_size=32, img_dims=IMG_DIMS, shuffle=False)
    .make_pipeline(
        label_map=train_image_dataset.label_map,
        image_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
        ],
    )
    .save_pipeline(TEMP_DIR/"pred")
    .preprocess()
)

In [0]:
test_image_dataset = (
    test_data
    .dataset(batch_size=32, img_dims=IMG_DIMS, shuffle=False)
    .load_pipeline(TEMP_DIR/"pred")
    .preprocess()
)

In [0]:
train_image_dataset.label_map

{'daisy': 0, 'dandelion': 1, 'rose': 2, 'sunflower': 3, 'tulip': 4}

In [0]:
test_image_dataset.label_map

{'daisy': 0, 'dandelion': 1, 'rose': 2, 'sunflower': 3, 'tulip': 4}

In [0]:
len(train_image_dataset)

3459

In [0]:
len(test_image_dataset)

431

In [0]:
data_container = DataContainer(
    train=train_image_dataset,
    validation=valid_image_dataset,
    test=test_image_dataset,
)

In [0]:
data_container.train.show()

In [0]:
data_container.validation.show()

In [0]:
data_container.test.show()

## 4. Training the Models

In [0]:
learner = ImageLearner(
    path=TEMP_DIR/"xception_v1",
    base_model=keras.applications.Xception,
    input_shape=IMG_DIMS,
    output_shape=[data_container.train.n_classes],
    activation=keras.activations.softmax,
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics=[keras.metrics.sparse_categorical_accuracy, sparse_top_2_categorical_accuracy],
    dropout=0.5,
    l1=3e-6,
    l2=3e-5,
    override=True,
)

In [0]:
steps = [
    ImageTrainingStep(
        n_epochs=5,
        lr=3e-4,
        optimizer=keras.optimizers.Adam,
        freeze=True,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=1),
        ],
    ),
    ImageTrainingStep(
        n_epochs=5,
        lr=3e-5,
        optimizer=keras.optimizers.Adam,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=5, flips="both"),
        ],
    ),
    ImageTrainingStep(
        n_epochs=5,
        lr=3e-5,
        optimizer=keras.optimizers.Adam,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=3, flips="both"),
        ],
    ),
    ImageTrainingStep(
        n_epochs=5,
        lr=1e-5,
        optimizer=keras.optimizers.SGD,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=1),
        ],
    ),
]

In [0]:
trainer = ImageTrainer(
    learner=learner,
    data_container=data_container,
    steps=steps,
)

In [0]:
trainer.train()

Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
--------------------------------------------------------------------------------
Name: xception Train Time: 79.9 min. Eval Time: 7.85s Loss: 0.2130 Accuracy: 93.97%
--------------------------------------------------------------------------------


In [0]:
trainer.evaluate_dataset()



[0.21296972382281507, 0.93973213, 0.96875]

In [0]:
ds_analysis = trainer.analyse_dataset()

In [0]:
print(classification_report(ds_analysis["label"].values, ds_analysis["pred"].values))

              precision    recall  f1-score   support

       daisy       0.97      0.91      0.94        81
   dandelion       0.94      0.98      0.96       113
        rose       0.91      0.95      0.93        63
   sunflower       0.96      0.93      0.94        80
       tulip       0.92      0.92      0.92        96

    accuracy                           0.94       433
   macro avg       0.94      0.94      0.94       433
weighted avg       0.94      0.94      0.94       433



In [0]:
trainer.predict_dataset()

array([[6.3811064e-01, 2.6506661e-03, 7.2437629e-02, 1.7793449e-02,
        2.6900768e-01],
       [1.4747172e-02, 9.8232102e-01, 2.5946329e-07, 9.5742289e-04,
        1.9741512e-03],
       [1.8551377e-09, 9.9997878e-01, 8.6394891e-07, 2.0341544e-05,
        4.2835985e-08],
       ...,
       [1.6146160e-08, 2.0900940e-08, 8.6052543e-10, 1.0663957e-09,
        1.0000000e+00],
       [1.0352473e-09, 9.7991233e-08, 1.3919524e-04, 1.3943089e-08,
        9.9986064e-01],
       [9.6148205e-01, 3.2867290e-02, 7.4837313e-05, 1.7264688e-04,
        5.4031559e-03]], dtype=float32)

In [0]:
trainer.analyse_dataset()

Unnamed: 0,path,image,label,label_code,pred,pred_code,label_probs,pred_probs
0,data/flowers2/flowers/daisy/4581199679_867652c...,"[[[0.92549026, 0.8862746, 0.5019608], [0.92549...",daisy,0,daisy,0,0.638113,0.638113
1,data/flowers2/flowers/daisy/33830843653_ee6d79...,"[[[0.121568635, 0.13333334, 0.027450982], [0.1...",daisy,0,dandelion,1,0.014747,0.982321
2,data/flowers2/flowers/dandelion/6412422565_ce6...,"[[[0.5254902, 0.6, 0.24705884], [0.5724245, 0....",dandelion,1,dandelion,1,0.999979,0.999979
3,data/flowers2/flowers/tulip/20910465721_fd8dcc...,"[[[0.38823533, 0.454902, 0.32156864], [0.38080...",tulip,4,tulip,4,0.999970,0.999970
4,data/flowers2/flowers/tulip/2436998042_4906ea0...,"[[[0.0014748488, 0.0014748488, 0.0013181193], ...",tulip,4,tulip,4,0.999140,0.999140
5,data/flowers2/flowers/tulip/3498663243_42b39b4...,"[[[0.6784314, 0.8470589, 0.9843138], [0.680011...",tulip,4,tulip,4,0.999658,0.999658
6,data/flowers2/flowers/dandelion/18010259565_d6...,"[[[0.32507706, 0.41919473, 0.2937045], [0.3128...",dandelion,1,dandelion,1,0.999917,0.999917
7,data/flowers2/flowers/dandelion/22190242684_8c...,"[[[0.0, 0.33680898, 0.34073055], [0.0, 0.33699...",dandelion,1,dandelion,1,0.999836,0.999836
8,data/flowers2/flowers/sunflower/20753711039_0b...,"[[[0.5786019, 0.38740906, 0.0842088], [0.53643...",sunflower,3,sunflower,3,0.999544,0.999544
9,data/flowers2/flowers/dandelion/5598591979_ed9...,"[[[0.3276215, 0.3253525, 0.48083812], [0.27595...",dandelion,1,dandelion,1,0.999996,0.999996


In [0]:
trainer.show_predictions()

In [0]:
trainer.show_predictions(correct=True)

In [0]:
steps_resnet50 = [
    ImageTrainingStep(
        n_epochs=3,
        lr=3e-4,
        optimizer=keras.optimizers.Adam,
        freeze=True,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=1),
        ],
    ),
    ImageTrainingStep(
        n_epochs=5,
        lr=3e-4,
        optimizer=keras.optimizers.Adam,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=3, flips="both"),
        ],
    ),
    ImageTrainingStep(
        n_epochs=5,
        lr=3e-5,
        optimizer=keras.optimizers.Adam,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=3, flips="both"),
        ],
    ),
    ImageTrainingStep(
        n_epochs=3,
        lr=1e-5,
        optimizer=keras.optimizers.SGD,
        feature_pipeline=[
            ImageParser(),
            ImageResizer(img_dims=IMG_DIMS, resize="stretch"),
            ImageAugmentor(level=1),
        ],
    ),
]

In [0]:
learner_resnet50 = ImageLearner(
    path=TEMP_DIR/"resnet50_v1",
    base_model=keras.applications.ResNet50,
    input_shape=IMG_DIMS,
    output_shape=[data_container.train.n_classes],
    activation=keras.activations.softmax,
    loss=keras.losses.sparse_categorical_crossentropy,
    metrics=[keras.metrics.sparse_categorical_accuracy, sparse_top_2_categorical_accuracy],
    dropout=0.5,
    l1=3e-6,
    l2=3e-5,
    override=True,
)

In [0]:
trainer_resnet50 = ImageTrainer(
    learner=learner_resnet50,
    data_container=data_container,
    steps=steps_resnet50,
)

In [0]:
trainer_resnet50.train()

Train for 109 steps, validate for 14 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3
Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train for 109 steps, validate for 14 steps
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Train for 109 steps, validate for 14 steps
Epoch 1/3
Epoch 2/3
Epoch 3/3
--------------------------------------------------------------------------------
Name: resnet50 Train Time: 43.5 min. Eval Time: 5.68s Loss: 0.1463 Accuracy: 95.09%
--------------------------------------------------------------------------------


In [0]:
trainer_resnet50.evaluate_dataset()



[0.1462951713640775, 0.95089287, 0.984375]

In [0]:
ds_analysis_resnet50 = trainer_resnet50.analyse_dataset()

In [0]:
print(classification_report(ds_analysis["label"].values, ds_analysis["pred"].values))

              precision    recall  f1-score   support

       daisy       0.97      0.91      0.94        81
   dandelion       0.94      0.98      0.96       113
        rose       0.91      0.95      0.93        63
   sunflower       0.96      0.93      0.94        80
       tulip       0.92      0.92      0.92        96

    accuracy                           0.94       433
   macro avg       0.94      0.94      0.94       433
weighted avg       0.94      0.94      0.94       433



In [0]:
trainer_resnet50.predict_dataset()

array([[2.2958279e-01, 5.0620534e-03, 1.5814045e-02, 9.0595946e-04,
        7.4863511e-01],
       [1.6908677e-01, 8.3082581e-01, 5.1946984e-07, 3.7806731e-05,
        4.9108370e-05],
       [6.0218208e-06, 9.9998546e-01, 8.0363407e-07, 7.2425983e-06,
        5.3555988e-07],
       ...,
       [1.3781066e-07, 8.2751157e-09, 4.5815714e-06, 7.8002223e-08,
        9.9999523e-01],
       [7.1511913e-10, 3.7459645e-12, 1.1536716e-06, 4.8261519e-08,
        9.9999881e-01],
       [1.0000000e+00, 7.5191513e-13, 9.8632207e-11, 1.5934265e-12,
        6.1544596e-11]], dtype=float32)

In [0]:
trainer_resnet50.analyse_dataset()

Unnamed: 0,path,image,label,label_code,pred,pred_code,label_probs,pred_probs
0,data/flowers2/flowers/daisy/4581199679_867652c...,"[[[0.92549026, 0.8862746, 0.5019608], [0.92549...",daisy,0,tulip,4,0.229583,0.748635
1,data/flowers2/flowers/daisy/33830843653_ee6d79...,"[[[0.121568635, 0.13333334, 0.027450982], [0.1...",daisy,0,dandelion,1,0.169086,0.830826
2,data/flowers2/flowers/dandelion/6412422565_ce6...,"[[[0.5254902, 0.6, 0.24705884], [0.5724245, 0....",dandelion,1,dandelion,1,0.999985,0.999985
3,data/flowers2/flowers/tulip/20910465721_fd8dcc...,"[[[0.38823533, 0.454902, 0.32156864], [0.38080...",tulip,4,tulip,4,0.999997,0.999997
4,data/flowers2/flowers/tulip/2436998042_4906ea0...,"[[[0.0014748488, 0.0014748488, 0.0013181193], ...",tulip,4,tulip,4,0.999994,0.999994
5,data/flowers2/flowers/tulip/3498663243_42b39b4...,"[[[0.6784314, 0.8470589, 0.9843138], [0.680011...",tulip,4,tulip,4,0.993852,0.993852
6,data/flowers2/flowers/dandelion/18010259565_d6...,"[[[0.32507706, 0.41919473, 0.2937045], [0.3128...",dandelion,1,dandelion,1,1.000000,1.000000
7,data/flowers2/flowers/dandelion/22190242684_8c...,"[[[0.0, 0.33680898, 0.34073055], [0.0, 0.33699...",dandelion,1,dandelion,1,1.000000,1.000000
8,data/flowers2/flowers/sunflower/20753711039_0b...,"[[[0.5786019, 0.38740906, 0.0842088], [0.53643...",sunflower,3,sunflower,3,1.000000,1.000000
9,data/flowers2/flowers/dandelion/5598591979_ed9...,"[[[0.3276215, 0.3253525, 0.48083812], [0.27595...",dandelion,1,dandelion,1,1.000000,1.000000


In [0]:
trainer_resnet50.show_predictions()

In [0]:
trainer_resnet50.show_predictions(correct=True)

## Summary:

**Accuracy:**

Xception -  93,97 %

ResNet50 - 95,05 %

The better result because off less augmentation level for second layer used.