# CNN 6 - Do Larger Model Lead to Better Performance?
- Dataset:
    - https://www.kaggle.com/shaunthesheep/microsoft-catsvsdogs-dataset


**What you should know by now:**
- How to preprocess image data
- How to load image data from a directory
- What's a convolution, pooling, and a fully-connected layer
- Categorical vs. binary classification

<br>

- First things first, let's import the libraries
- The models we'll declare today will have more layers than the ones before
    - We'll implement individual classes from TensorFlow

In [1]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.losses import categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import BinaryAccuracy

tf.random.set_seed(42)
physical_devices = tf.config.list_physical_devices('GPU')
try:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
    pass

- I'm using Nvidia RTX 3060 TI

In [2]:
physical_devices

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

<br>

## Load in the data
- Use `ImageDataGenerator` to convert image matrices to 0-1 range
- Load in the images from directories and convert them to 224x224x3
- For memory concerns, we'll lower the batch size:

In [3]:
train_datagen = ImageDataGenerator(rescale=1/255.0)
valid_datagen = ImageDataGenerator(rescale=1/255.0)

train_data = train_datagen.flow_from_directory(
    directory='data/train/',
    target_size=(224, 224),
    class_mode='categorical',
    batch_size=32,
    shuffle=True,
    seed=42
)

valid_data = valid_datagen.flow_from_directory(
    directory='data/validation/',
    target_size=(224, 224),
    class_mode='categorical',
    batch_size=32,
    seed=42
)

Found 20030 images belonging to 2 classes.
Found 2478 images belonging to 2 classes.


<br>

## Model 1
- Block 1: Conv, Conv, Pool
- Block 2: Conv, Conv, Pool
- Block 3: Flatten, Dense
- Output

<br>

- We won't mess with the hyperparameters today

In [5]:
model_1 = tf.keras.Sequential([
    Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'),
    Conv2D(filters=32, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Flatten(),
    Dense(units=128, activation='relu'),
    Dense(units=2, activation='softmax')
])


model_1.compile(
    loss=categorical_crossentropy,
    optimizer=Adam(),
    metrics=[BinaryAccuracy(name='accuracy')]
)
model_1_history = model_1.fit(
    train_data,
    validation_data=valid_data,
    epochs=10
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<br>

- Not bad, but we got 75% accuracy on the validation set in notebook 010
- Will adding complexity to the model increase the accuracy?

## Model 2
- Block 1: Conv, Conv, Pool
- Block 2: Conv, Conv, Pool
- Block 3: Conv, Conv, Pool
- Block 4: Flatten, Dense
- Ouput

<br>

- This artchitecture is a bit of an overkill for our dataset
- The model isn't learning at all:

In [6]:
model_2 = Sequential([
    Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'),
    Conv2D(filters=32, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Conv2D(filters=128, kernel_size=(3, 3), activation='relu'),
    Conv2D(filters=128, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Flatten(),
    Dense(units=128, activation='relu'),
    Dense(units=2, activation='softmax')
])


model_2.compile(
    loss=categorical_crossentropy,
    optimizer=Adam(),
    metrics=[BinaryAccuracy(name='accuracy')]
)
model_2_history = model_2.fit(
    train_data,
    validation_data=valid_data,
    epochs=10
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<br>

- When that happens, you can try experimenting with the learning rate and other parameters
- Let's dial it down a bit next

<br>

## Model 3 
- Block 1: Conv, Conv, Pool
- Block 2: Conv, Conv, Pool
- Block 3: Flatten, Dense, Dropout, Dense
- Output

<br>

- The first model was better than the second
- We can try adding a dropout layer as a regulizer and tweaking the fully connected layers:

In [7]:
model_3 = tf.keras.Sequential([
    Conv2D(filters=32, kernel_size=(3, 3), input_shape=(224, 224, 3), activation='relu'),
    Conv2D(filters=32, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    MaxPool2D(pool_size=(2, 2), padding='same'),
    
    Flatten(),
    Dense(units=512, activation='relu'),
    Dropout(rate=0.3),
    Dense(units=128),
    Dense(units=2, activation='softmax')
])

model_3.compile(
    loss=categorical_crossentropy,
    optimizer=Adam(),
    metrics=[BinaryAccuracy(name='accuracy')]
)

model_3_history = model_3.fit(
    train_data,
    validation_data=valid_data,
    epochs=10
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<br>

- It made the model worse
- More complex model don't necessarily lead to an increase in performance

<br>

## Conclusion
- There you have it - we've been focusing on the wrong thing from the start
- Our model architecture in the notebook 010 was solid
    - Adding more layers and complexity decreases the predictive power
- We should shift our focus to improving the dataset quality
- The following notebook will teach you all about **data augmentation**, and you'll see how it increases the power of our model
- After that you'll take your models to new heights with **transfer learning**, and you'll see why coming up with custom architectures is a waste of time in most cases