<a href="https://colab.research.google.com/github/Rsimetti/cursoAP2020/blob/master/Semana06_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning

## Preparação do ambiente

Usaremos a API do Kaggle para ler os dados, portanto os próximos passos são para instalar, configurar e baixar os dados diretamente da plataforma.

In [None]:
!pip install kaggle

In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  
# Then move kaggle.json into the folder where the API expects to find it.
!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

O banco de dados utilizado foi de [imagens da cães e gatos](https://www.kaggle.com/c/dogs-vs-cats/data). Os dados estão separados em pastas de treino e teste.

In [9]:
!kaggle competitions download -c dogs-vs-cats

Downloading train.zip to /content
100% 543M/543M [00:05<00:00, 106MB/s] 
100% 543M/543M [00:05<00:00, 106MB/s]
Downloading test1.zip to /content
 94% 256M/271M [00:01<00:00, 157MB/s]
100% 271M/271M [00:02<00:00, 108MB/s]
Downloading sampleSubmission.csv to /content
  0% 0.00/86.8k [00:00<?, ?B/s]
100% 86.8k/86.8k [00:00<00:00, 25.7MB/s]


Apos o download dos dados iremos extrair as imagens que já estão separadas.

In [17]:
# Extraindo os arquivos
import zipfile

with zipfile.ZipFile("train.zip","r") as z:
    z.extractall(".")

with zipfile.ZipFile("test1.zip","r") as z:
    z.extractall(".")

## Importar bibliotecas

In [268]:

import numpy as np 
import pandas as pd #
import matplotlib.pyplot as plt
import cv2
import random
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout, Activation, Conv2D, MaxPooling2D, BatchNormalization

## Leitura da imagens

In [311]:
# Estabelecendo os padroes
FAST_RUN = False
IMAGE_WIDTH=128
IMAGE_HEIGHT=128
IMAGE_SIZE=(IMAGE_WIDTH, IMAGE_HEIGHT)
IMAGE_CHANNELS=3

In [312]:
filenames = os.listdir("/content/train")
categories = []
for filename in filenames:
    category = filename.split('.')[0]
    if category == 'dog':
        categories.append(1)
    else:
        categories.append(0)

df = pd.DataFrame({
    'filename': filenames,
    'category': categories
})

In [313]:
# verificando os dados
df.head()

Unnamed: 0,filename,category
0,cat.7714.jpg,0
1,dog.9582.jpg,1
2,cat.3025.jpg,0
3,dog.9559.jpg,1
4,dog.4744.jpg,1


In [314]:
# transformando numericos em categoricos
df["category"] = df["category"].replace({0: 'cat', 1: 'dog'}) 

In [315]:
train_df, validate_df = train_test_split(df, test_size=0.20, random_state=42)
train_df = train_df.reset_index(drop=True)
validate_df = validate_df.reset_index(drop=True)

In [316]:
train_datagen = ImageDataGenerator(
    rotation_range=15,
    rescale=1./255,
    shear_range=0.1,
    zoom_range=0.2,
    horizontal_flip=True,
    width_shift_range=0.1,
    height_shift_range=0.1
)

train_generator = train_datagen.flow_from_dataframe(
    train_df, 
    "/content/train/", 
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)

Found 19999 validated image filenames belonging to 2 classes.


  .format(n_invalid, x_col)


In [317]:
validation_datagen = ImageDataGenerator(rescale=1./255)
validation_generator = validation_datagen.flow_from_dataframe(
    validate_df, 
    "/content/train/", 
    x_col='filename',
    y_col='category',
    target_size=IMAGE_SIZE,
    class_mode='categorical',
    batch_size=batch_size
)

Found 5001 validated image filenames belonging to 2 classes.


## Criação do modelo

In [318]:
# Fonte: https://www.kaggle.com/uysimty/keras-cnn-dog-or-cat-classification
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Activation, BatchNormalization

model = Sequential()

model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS)))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax')) # 2 because we have cat and dog classes

model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

model.summary()

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_33 (Conv2D)           (None, 126, 126, 32)      896       
_________________________________________________________________
batch_normalization_12 (Batc (None, 126, 126, 32)      128       
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 63, 63, 32)        0         
_________________________________________________________________
dropout_22 (Dropout)         (None, 63, 63, 32)        0         
_________________________________________________________________
conv2d_34 (Conv2D)           (None, 61, 61, 64)        18496     
_________________________________________________________________
batch_normalization_13 (Batc (None, 61, 61, 64)        256       
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 30, 30, 64)      

In [319]:
total_train = train_df.shape[0]
total_validate = validate_df.shape[0]
batch_size=15

In [321]:
history = model.fit_generator(
    train_generator, 
    epochs=2,
    validation_data=validation_generator,
    validation_steps=total_validate//batch_size,
    steps_per_epoch=total_train//batch_size
)

Epoch 1/2
Epoch 2/2


In [322]:
model.save_weights("model.h5")

# Considerações Finais

Em relação ao método adotado para identificação de cães e gatos em imagens:
* A acurácia foi satisfatória (>.70)
* As CNN apresentaram aumento no desempenho com o aumento do número de epocas, é possível supor que com o aumento do número de epocas a acuracia também aumentaria. 
* A escolha do modelo foi facilita por já haver referencias utilizando esses mesmos dados com CNN. 
* Não há muito o que inferir em relação ao modelo, pois não é possível fazer suposições em relação aos pesos que o modelo assume. 
Destaca-se que o número de epocas (2) foi baixo, devido ao elevado tempo necessário para o processamento das informações. 