# Plant Pathology 2020 - FGVC7

dataset - https://www.kaggle.com/c/plant-pathology-2020-fgvc7/data

<center>
<img src="leaf.jpg" width="800" height="800">

Given a photo of an apple leaf, can you accurately assess its health? This competition will challenge you to distinguish between leaves which are healthy, those which are infected with apple rust, those that have apple scab, and those with more than one disease.

Files

### train.csv
* image_id: the foreign key for the parquet files
* combinations: one of the target labels
* healthy: one of the target labels
* rust: one of the target labels
* scab: one of the target labels

### images
A folder containing the train and test images, in jpg format.

### test.csv
* image_id: the foreign key for the parquet files
* sample_submission.csv
* image_id: the foreign key for the parquet files
* combinations: one of the target labels
* healthy: one of the target labels
* rust: one of the target labels
* scab: one of the target labels

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
from tensorflow.python.keras.models import Sequential, model_from_json, load_model
from tensorflow.python.keras.layers import Conv2D, MaxPooling2D
from tensorflow.python.keras.layers import Activation, Dropout, Flatten, Dense
from tensorflow.python.keras import optimizers

In [4]:
!cat sample_submission.csv

image_id,healthy,multiple_diseases,rust,scab
Test_0,0.25,0.25,0.25,0.25
Test_1,0.25,0.25,0.25,0.25
Test_2,0.25,0.25,0.25,0.25
Test_3,0.25,0.25,0.25,0.25
Test_4,0.25,0.25,0.25,0.25
Test_5,0.25,0.25,0.25,0.25
Test_6,0.25,0.25,0.25,0.25
Test_7,0.25,0.25,0.25,0.25
Test_8,0.25,0.25,0.25,0.25
Test_9,0.25,0.25,0.25,0.25
Test_10,0.25,0.25,0.25,0.25
Test_11,0.25,0.25,0.25,0.25
Test_12,0.25,0.25,0.25,0.25
Test_13,0.25,0.25,0.25,0.25
Test_14,0.25,0.25,0.25,0.25
Test_15,0.25,0.25,0.25,0.25
Test_16,0.25,0.25,0.25,0.25
Test_17,0.25,0.25,0.25,0.25
Test_18,0.25,0.25,0.25,0.25
Test_19,0.25,0.25,0.25,0.25
Test_20,0.25,0.25,0.25,0.25
Test_21,0.25,0.25,0.25,0.25
Test_22,0.25,0.25,0.25,0.25
Test_23,0.25,0.25,0.25,0.25
Test_24,0.25,0.25,0.25,0.25
Test_25,0.25,0.25,0.25,0.25
Test_26,0.25,0.25,0.25,0.25
Test_27,0.25,0.25,0.25,0.25
Test_28,0.25,0.25,0.25,0.25
Test_29,0.25,0.25,0.25,0.25
Test_30,0.25,0.25,0.25,0.25
Test_31,0.25,0.25,0.25,0.25
Test_32,0.25,0.25,0.25,0.25
Test_33

In [5]:
test = pd.read_csv('test.csv')
train = pd.read_csv('train.csv')

In [6]:
test.shape

(1821, 1)

In [7]:
test.head()

Unnamed: 0,image_id
0,Test_0
1,Test_1
2,Test_2
3,Test_3
4,Test_4


In [8]:
train.shape

(1821, 5)

In [9]:
train.head()

Unnamed: 0,image_id,healthy,multiple_diseases,rust,scab
0,Train_0,0,0,0,1
1,Train_1,0,1,0,0
2,Train_2,1,0,0,0
3,Train_3,0,0,1,0
4,Train_4,1,0,0,0


<center> 

## Train_0 (Scab)
<img src="Train_0.jpg" width="600" height="600">

## Train_1 (Multiple_diseases)
<img src="Train_1.jpg" width="600" height="600">

## Train_2 (Healthy)
<img src="Train_2.jpg" width="600" height="600">

## Train_3 (Rust)
<img src="Train_3.jpg" width="600" height="600">

## Train_4 (Healthy)
<img src="Train_4.jpg" width="600" height="600">

In [10]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1821 entries, 0 to 1820
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   image_id           1821 non-null   object
 1   healthy            1821 non-null   int64 
 2   multiple_diseases  1821 non-null   int64 
 3   rust               1821 non-null   int64 
 4   scab               1821 non-null   int64 
dtypes: int64(4), object(1)
memory usage: 71.3+ KB


In [11]:
train.describe()

Unnamed: 0,healthy,multiple_diseases,rust,scab
count,1821.0,1821.0,1821.0,1821.0
mean,0.283361,0.049973,0.341571,0.325096
std,0.450754,0.217948,0.474367,0.468539
min,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0
75%,1.0,0.0,1.0,1.0
max,1.0,1.0,1.0,1.0


В данном случае 'mean' это количество объектов определенных классов в обучающей выборке:

* Здоровые растения - 28.3 %
* Больные растения - 71.7 %

Из больных растений больны смешанным заболеванием - 4.95 %, ржавчиной - 34.2 %, а паршой - 32.5 %.

In [12]:
# Каталог с данными для обучения
train_dir = 'train'

# Каталог с данными для проверки
val_dir = 'val'

# Каталог с данными для тестирования
test_dir = 'test'

# Размеры изображения
img_width, img_height = 225, 225

# Размерность тензора на основе изображения для входных данных в нейронную сеть
# backend Tensorflow, channels_last
input_shape = (img_width, img_height, 3)

# Количество эпох
epochs = 30

# Размер мини-выборки
batch_size = 32

### Создаем сверточную нейронную сеть


In [13]:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5, seed=0))

model.add(Dense(4))
model.add(Activation('softmax'))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


### Архитектура сети

In [14]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 223, 223, 32)      896       
_________________________________________________________________
activation (Activation)      (None, 223, 223, 32)      0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 111, 111, 32)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 109, 109, 64)      18496     
_________________________________________________________________
activation_1 (Activation)    (None, 109, 109, 64)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 54, 54, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 52, 52, 128)       73856     
__________

In [15]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

Создаем генератор изображений

Генератор изображений создается на основе класса ImageDataGenerator. Генератор делит значения всех пикселов изображения на 255. Потом поворачивает/масштабирует/сдвигает и т.д. изображения делая их копию

In [16]:
image_gen_train = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

In [17]:
train_generator = image_gen_train.flow_from_directory(
    train_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical')

Found 1273 images belonging to 4 classes.


In [None]:
image_gen_val = ImageDataGenerator(rescale=1./255)

In [18]:
val_generator = image_gen_val.flow_from_directory(
    val_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical')

Found 277 images belonging to 4 classes.


In [19]:
test_generator = image_gen_val.flow_from_directory(
    test_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    shuffle=True,
    class_mode='categorical')

Found 271 images belonging to 4 classes.


In [20]:
# Количество изображений для обучения
nb_train_samples = 1273

# Количество изображений для проверки
nb_validation_samples = 277

# Количество изображений для тестирования
nb_test_samples = 271

Обучаем модель с использованием генераторов
train_generator - генератор данных для обучения
validation_data - генератор данных для проверки

In [None]:
%%time
history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data=val_generator,
    validation_steps=nb_validation_samples // batch_size)

Instructions for updating:
Use tf.cast instead.
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30

In [None]:
print(history.history.keys())

In [None]:
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

Оцениваем качество работы сети с помощью генератора

In [None]:
scores = model.evaluate_generator(test_generator, nb_test_samples // batch_size)

In [None]:
print("accuracy score на тестовых данных: %.2f%%" % (scores[1]*100))

In [None]:
# Сохраняем сеть для последующего использования

In [None]:
!rm plantPathology_cnn.h5 & rm plantPathology_cnn.json

In [None]:
# Генерируем описание модели в формате json
model_json = model.to_json()

with open('plantPathology_cnn.json', 'w') as json_file:
    # Записываем архитектуру сети в файл
    json_file.write(model_json)

# Записываем данные о весах в файл
model.save_weights('plantPathology_cnn.h5')
    
print('Сохранение завершено')

In [None]:
from keras.utils import plot_model
plot_model(model, to_file='model.png')

In [None]:
132132132lklkjlkjlkj

## Transfer Learning

In [220]:
from tensorflow.python.keras.applications import VGG16
from tensorflow.python.keras.optimizers import Adam

In [211]:
vgg16_net = VGG16(weights='imagenet', include_top=False, input_shape=(225, 225, 3))

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


"Замораживаем" веса предварительно обученной нейронной сети VGG16

In [212]:
vgg16_net.trainable = False

In [213]:
vgg16_net.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 225, 225, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 225, 225, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 225, 225, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
__________

### Создаем составную нейронную сеть на основе VGG16

In [214]:
model = Sequential()

# Добавляем в модель сеть VGG16 вместо слоя
model.add(vgg16_net)

model.add(Flatten())

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(4))
model.add(Activation('softmax'))

In [215]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
vgg16 (Model)                (None, 7, 7, 512)         14714688  
_________________________________________________________________
flatten_8 (Flatten)          (None, 25088)             0         
_________________________________________________________________
dense_16 (Dense)             (None, 256)               6422784   
_________________________________________________________________
activation_60 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_8 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_17 (Dense)             (None, 4)                 1028      
_________________________________________________________________
activation_61 (Activation)   (None, 4)                 0         
Total para

Компилируем составную нейронную сеть

In [221]:
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=1e-5), 
              metrics=['accuracy'])

In [223]:
%%time
history = model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=10,
    validation_data=val_generator,
    validation_steps=nb_validation_samples // batch_size)

Epoch 1/10
 8/40 [=====>........................] - ETA: 6:34 - loss: 1.5128 - acc: 0.3320

KeyboardInterrupt: 

In [None]:
# Plot training & validation accuracy values
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

In [None]:
scores = model.evaluate_generator(test_generator, nb_test_samples // batch_size)
print("accuracy score на тестовых данных: %.2f%%" % (scores[1]*100))

In [None]:
Тонкая настройка сети (fine tuning) - 
https://github.com/sozykin/dlpython_course/blob/master/computer_vision/cats_and_dogs/cats_and_dogs_vgg16.ipynb

In [None]:
!rm plantPathology_cnn.h5 & rm plantPathology_cnn_TL.json

In [None]:
# Генерируем описание модели в формате json
model_json = model.to_json()

with open('plantPathology_cnn_TL.json', 'w') as json_file:
    # Записываем архитектуру сети в файл
    json_file.write(model_json)

# Записываем данные о весах в файл
model.save_weights('plantPathology_cnn_TL.h5')
    
print('Сохранение завершено')