This layer performs a spatial convolution on each channel of its input, independently, before mixing output channels via a pointwise convolution (a 1 × 1 convolution)

This is equivalent to separating the learning of spatial features and the learning of channel-wise features. In much the same way that convolution relies on the assumption that the patterns in images are not tied to specific locations, depthwise separable convolution relies on the assumption that spatial locations in intermediate activations are highly correlated, but different channels are highly independent.

## 9.3.5 Putting it together: A mini Xception-like model

When it comes to larger-scale models, depthwise separable convolutions are the basis of the Xception architecture, a high-performing convnet that comes packaged with Keras. You can read more about the theoretical grounding for depthwise separable convolutions and Xception in the paper “Xception: Deep Learning with Depthwise Separable Convolutions.”

here are the convnet architecture principles you’ve learned so far:
- Your model should be organized into repeated blocks of layers, usually made of multiple convolution layers and a max pooling layer.
- The number of filters in your layers should increase as the size of the spatial feature maps decreases.
- Deep and narrow is better than broad and shallow.
- Introducing residual connections around blocks of layers helps you train deeper networks.
- It can be beneficial to introduce batch normalization layers after your convolution layers.
- It can be beneficial to replace Conv2D layers with SeparableConv2D layers, which are more parameter-efficient.

### apply this to dogs vs cats example:

In [1]:
import keras
from keras import layers

In [2]:
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.2),
])

In [3]:


inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)

#always recale the input
x = layers.Rescaling(1./255) (x)

#as the Rgb color channels are NOT independent well use regular conv2d
# and separable on the ones after
x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False) (x)

#we apply series of convolutional blocks with increasing feature depth. Each block consists
#of two batch-normalized depthwise separable convolution layers and a max pooling layer,
#with a residual connection around the entire block

for size in [32, 64, 128, 256, 512]:
	residual = x

	x = layers.BatchNormalization()(x)
	x = layers.Activation("relu")(x)
	x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False) (x)

	x = layers.BatchNormalization()(x)
	x = layers.Activation("relu")(x)
	x = layers.SeparableConv2D(size, 3, padding="same", use_bias=False) (x)

	x = layers.MaxPooling2D(3, strides=2, padding="same") (x)


	#with residual connection we set use_bias=False to avoid adding extra parameters
	residual = layers.Conv2D(
		size, 1, strides=2, padding="same", use_bias=False) (residual)

	x = layers.Add()([x, residual])

#in the original model we used a flatten layer before the dense one,
#here we use a global average pooling layer instead
x = layers.GlobalAveragePooling2D() (x)
#we add dropout layer for regularization
x = layers.Dropout(0.5) (x)

outputs = layers.Dense(1, activation="sigmoid") (x)
model = keras.Model(inputs=inputs, outputs=outputs)

model.summary()


In [4]:
model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

for testing

In [None]:
#downloadint the dataset1
import os, shutil, pathlib

original_dir = pathlib.Path("train")
# criamos novo diretorio para armazenar dataset pequeno
new_base_dir = pathlib.Path("cats_vs_dogs_small")

def make_subset(subset_name, start_index, end_index):
	for category in ("cat", "dog"):
		dir = new_base_dir / subset_name / category
		os.makedirs(dir)
		fnames = [f"{category}.{i}.jpg"
				for i in range(start_index, end_index)]
		for fname in fnames:
			shutil.copyfile(src=original_dir / fname,
							dst=dir / fname)

make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)


In [None]:
#using image_dataset_from_directory to read images and
from keras.utils import image_dataset_from_directory

train_dataset = image_dataset_from_directory(
	new_base_dir / "train",
	image_size=(180, 180),
	batch_size=32,
)

validation_dataset = image_dataset_from_directory(
	new_base_dir / "validation",
	image_size=(180, 180),
	batch_size=32,
)
test_dataset = image_dataset_from_directory(
	new_base_dir / "test",
	image_size=(180, 180),
	batch_size=32,
)

for data_batch, labels_batch in train_dataset:
	print("data batch shape:", data_batch.shape)
	print("labels batch shape:", labels_batch.shape)
	break

You’ll find that our new model achieves a test accuracy of 90.8%, compared to 83.5% for the naive model in the last chapter. As you can see, following architecture best practices does have an immediate, sizable impact on model performance!

one last important topic we need to cover: interpreting how a model arrives at its predictions

## 9.4 Interpreting what convnets learn

most accessible and useful ones:


most accessible and useful ones:
- Visualizing intermediate convnet outputs (intermediate activations)—Useful for understanding how successive convnet layers transform their input, and for getting a first idea of the meaning of individual convnet filters
- Visualizing convnet filters—Useful for understanding precisely what visual pattern or concept each filter in a convnet is receptive to
- Visualizing heatmaps of class activation in an image—Useful for understanding which parts of an image were identified as belonging to a given class, thus allowing you to localize objects in images

## 9.4.1 Visualizing intermediate activations

Visualizing intermediate activations consists of displaying the values returned by various convolution and pooling layers in a model, given a certain input

We want to visualize feature maps with three dimensions: width, height, and depth (channels). Each channel encodes relatively independent features, so the proper way to visualize these feature maps is by independently plotting the contents of every channel as a 2D image.