<a href="https://colab.research.google.com/github/SusheelThapa/ML-From-Scratch/blob/tensorflow/tensorflow/tensorFlow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Fundamentals

## Introduction of Tensorflow

### Installing Tensorflow

Use the below command, to install the ***tensorflow** in your local machine

```bash
pip install tensorflow
```

### Importing Tensorflow

In [None]:
import tensorflow as tf
tf.version

### What is tensor?

Tensor is a generalization of vectors and matrices to potentially higher dimension.

Internally, tensorflow represent tensors as  n-dimensional arrays of base datatypes.

Each tensor has a data type and a shape

**Data Types** includes: float32, int32, string and others

**Shape**: Represents the dimension of data

### Creating tensor

Below are the examples of creating tensor

In [None]:
string = tf.Variable("This is a string", tf.string)
number = tf.Variable(324, tf.int16)
floating = tf.Variable(3.567,tf.float64)

### Rank/Degree of Tensors

Another word for rank is degree, it can be define as the number of dimensions involved in the tensor.

In the above code block, what we have created is *tensor of rank zero*

Now, let's create tensor of higher degree/ranks

In [None]:
rank1_tensor = tf.Variable(["Something","Nothing"], tf.string)

To find the rank of the tensor we can call `rank()` method as 

In [None]:
tf.rank(rank1_tensor)

### Shape of Tensors

Shape of the tensors is simply the amount of elements that exist in each dimension.

*Tensorflow will try to determine the shape of a tensor but sometimes it may be unknown*

To get the shape of the tensor, we can call **shape attribute***

In [None]:
rank1_tensor.shape

### Changing the shape

Number of elements of a tensor is the product of the sizes of all its shape.

Due to which many shapes that have the same number of elements, making it convient to be able to change the shape of a tensor

Example of changing the shape of tensor

In [None]:
tensor1 = tf.ones([1,2,3]) # tf.ones will create tensor of provide shape will all its element of ones

tensor2 = tf.reshape(tensor1,[3,2,1]) # reshape the existing tensor to shape [3,2,1]

tensor3= tf.reshape(tensor2,[3,-1]) # -1 tells tensor to calculate the size of the dimension at that place

# The number of elements in orginal tensor and the reshape tensor is same

Now, lets have a look at the shape of the tensor we have created

In [None]:
print(tensor1.shape)
print(tensor2.shape)
print(tensor3.shape)

### Types of tensor

Commonly used tensor are as follows:
- Variable
- Constant
- Placeholder
- SparseTensor

## Core Learning Algorithms

We will be studying 4 fundamental machine learning algorithms.

- Linear Regression
- Classification
- Clustering
- Hidden Markov Models


### Linear Regression

Linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). (***Wikipedia***)

#### Setup and Imports

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import numpy as np # Optimize version of array
import pandas as pd # Data analytics tools
import matplotlib.pyplot as plt # Visualization tools

import IPython.display as clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc # Required later in linear regression

import tensorflow as tf

#### Data

The dataset we will be focusing here will be titanic dataset. It has tons of information about each passanger on the ship.

**Below, we will load a dataset and learn how we can explore it using some built-in tools**

In [None]:
# Load datasets

# Training datasets
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') 
y_train = dftrain.pop('survived')

# Testing datasets
dftest = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') # Testing datasets
y_test = dftest.pop('survived')

`pd.read_csv()` method will return a new pandas *dataframe*. Dataframe is like a table and actually have a look at the table representation.

We have decided to pop the "survived" column from our dataset and store it in a new varible as this column tells us whether the passanger survived or not. It is most like to be something that our model should predict

To look at the data we will use `head()` method from pandas.

In [None]:
dftrain.head()

And if we need more statical description of the data we can use `describe()` method

In [None]:
dftrain.describe()

To get the information about the dataype of each column, number of columns and what are those we can use `info()` method of pandas

In [None]:
dftrain.info()

Let's have a look at the shape of the dataframe

In [None]:
dftrain.shape

Now, let's visualize the data we have got.

In [None]:
dftrain.age.hist(bins=20)

In [None]:
dftrain.sex.value_counts().plot(kind='barh')

In [None]:
dftrain['class'].value_counts().plot(kind='barh')

In [None]:
pd.concat([dftrain,y_train],axis=1).groupby('sex').survived.mean().plot(kind='barh').set_xlabel('% survived')

After analyzing this information we should notice the following:
- The majority of passangers are in their 20's or 30's
- The majority of passengers are male
- The majority of passengers are in "Third Class"
- Females have a much higher chances of survival

##### Training vs Testing Data

**Training Data** is what we feed to the model so that it can develop and learn. It is usually much larger size than the testing data

**Testing Data** is what we use to evaluate the model and see how well it is performing. It is important to use seperate set of data that the model has not been trained on to evaluate it.


##### Features Columns

In the dataset, we have two types of information
- **Categorical**

    It is anything that isn't numerical.

    *For example, the sex column does use numbers, it use words 'male' and "female"*

- **Numeric**

    These are the data with numeric value.

Before continuing, we need to change all our categorical data into numeric data.
Todo this, Tensorflow has some tools to help us.

In [None]:
CATEGORICAL_COLUMNS = ['sex','n_siblings_spouses','parch','class','deck','embark_town','alone']
NUMERIC_COLUMNS = ['age','fare']

feature_columns =[]

for feature in CATEGORICAL_COLUMNS:
    vocabulary = dftrain[feature].unique()
    feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature,vocabulary))


for feature in NUMERIC_COLUMNS:
    feature_columns.append(tf.feature_column.numeric_column(feature,dtype=tf.float32))


#### Training the model

Training the model describes about how the model is being train. Specifically speaking how data is fed to our model.

To train the model, we will fed the model with data of batch size of 32. It means we will fed small batches of entries to our model multiple times according to the **epoches**

**Epoches** is one stream of our entire datasets. Number of epoches we define is the amount of times our model will see the entire dataset.

*Examples: If we have 10 epocs, our model will see the same datasets 10 times.*

To feed our data to model in the form of batches we need ***input function*** which task is to convert our dataset into batches at each epoch

##### Input function

The Tensorflow model we are going to use requires that the data we pass it comes in as `tf.data.Dataset` object.

It means that we must create a *input function* that can convert our current pandas dataframe into that object.

*input_function* show below is directly copied from tensorflow documentation.

In [None]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # Create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000) # randomize the order of data
    ds = ds.batch(batch_size).repeat(num_epochs) # split dataset into batches of 32 and repeat the process for number of epochs
    return ds # return a batch of the dataset
  return input_function # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train)
eval_input_fn = make_input_fn(dftrain, y_train, num_epochs=1, shuffle=False)

##### Creating the model

We will be using linear estimator to utilize the linear regression algorithm.

In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns) # We are creating a linear estimator by passing the feature columns we created earlier.

##### Training the model

Training the model is as easy as passing the input functions that we created earlier.

In [None]:
# Training the model
linear_est.train(train_input_fn) # just passing the input function

#### Testing our model
Testing is also same as training the model but here we will be passing input function for testing dataset

In [None]:
result = linear_est.evaluate(eval_input_fn)

print("The accuracy of our model is ",result['accuracy'])

#### Predicting using our model

In [None]:
result = list(linear_est.predict(eval_input_fn))

print("Passanger chance of survival is ",result[100]['probabilities'][1])

### Classification

#### Importing the necessary packages

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import tensorflow as tf
import pandas as pd

#### Datasets

This species dataset seperates the flower into 3 different classes of species
- Setosa
- Versicolor
- Virginica

The information about each flower is the following:
- sepal length
- sepal width
- petal length
- petal width


#### Loading the datasets

Next, we will be loading the datasets

In [None]:
# Defining some constant that will help later on
CSV_COLUMN_NAMES = ["SepalLength","SepalWidth","PetalLength","PetalWidth","Species"]
SPECIES = ["Setosa","Versicolor","Virginica"]

# Loading the datasets, we are using keras to grab our datasets and read them into pandas dataframe
train_path = tf.keras.utils.get_file(
    "iris_training.csv","https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"
)
test_path = tf.keras.utils.get_file(
    "iris_test.csv","https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv"
)

train = pd.read_csv(train_path,names= CSV_COLUMN_NAMES,header =0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)

Let's look at our datasets.

In [None]:
train.head()

Now, we can pop the "Species" as they are label to classify.

In [None]:
y_train = train.pop('Species')
y_test = test.pop('Species')

Let's look into shape of our datasets

In [None]:
train.shape

So, we have 120 data with 4 features

#### Input function

In [None]:
def input_fn(features, labels, training=True, batch_size=256):
    # Convert the inputs to a dataset
    dataset = tf.data.Dataset.from_tensor_slices((dict(features),labels))

    # Shuffle and repeat if you are in training mode
    if training:
        dataset= dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

#### Features columns

In [None]:
my_feature_columns = []

for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

#### Building the model

For classification tasks there are variety of different esitmators/models that we can pick from.

Some options are listed below:

- `DNNClassifier`(Deep Neural Network)
- `LinearClassifier`

We can choose either model but DNN is the best choice as we may not be able to  find a linear correspondence in our data.

In [None]:
# Building a DNN with 2 hidden layer with 30 and 10 hidden nodes each
classifier = tf.estimator.DNNClassifier(
    feature_columns = my_feature_columns,
    hidden_units=[30,10], # Defining the two hidden layer
    n_classes =3 #model must chose between 3 classes
)

#### Training the model

In [None]:
classifier.train(
    input_fn = lambda: input_fn(train,y_train, training=True),
    steps=5000
)

#### Testing the model

In [None]:
classifier.evaluate(
    input_fn=lambda:input_fn(test, y_test, training = False)
    )

#### Making prediction

In [None]:
def input_fn_pred(features, batch_size=256):
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']
predict = {}

print("Please type numeric values as prompted.")
for feature in features:
  valid = True
  while valid: 
    val = input(feature + ": ")
    if not val.isdigit(): valid = False

  predict[feature] = [float(val)]

predictions = classifier.predict(input_fn=lambda: input_fn_pred(predict))
for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%)'.format(
        SPECIES[class_id], 100 * probability))


### Clustering

Clustering is a machine learning technique that involves the grouping of data points. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features.

#### Algorithm for K-means clustering

- Randomly pick K points to place K centroids
- Assign all of the data points to the centroids by distance. The closest centroid to a point is the one it is assigned to.
- Average all of the points belonging to each centroid to find the middle of those clusters(center of mass). Place the corresponding centroids into that position
- Reassign every point once again to the closest centroid
- Repeat steps 3-4 until no points changes which centroid it belongs to

### Hidden Markov Models

Hidden Markov Models is a finite sets of state, each of which is associated with a(generally multidimensional) probality distribution.

Transition among the states are governed by a set of **probalities** called transition probalities.

A hidden markov models works with probalities to predict future events or states.

We will be creating a hidden markov model that can predict the weather.


#### Data in Hidden Markov Models

**States**

In each markov model we have a finite set of states. These states could be something like "warm" and "cold" or "high" and "low".

These states are "hidden" within the model, which mean we do not directly observe them.

**Observations**

Each state had a particular outcome or observation assocaited with it based on probality distribution.

*Examples: On a hot day, Tim has a 80% chance of being happy and a 20% chance of being sad.*

**Transitions**

Each state will have a probability defining the likelyhood of transforming to a different state. 

*Example: A cold day has a 30% chance of being followed by a hot day and a 70% chance of being followed by another cold day*

To create a hidden markov model we need:
- States
- Observation Distribution
- Transition Distribution

For our purpose, we will assume we already have this information avaliable as we attempt to predict the weather on a given day.



## Neural Networks

An artificial neural network learning algorithm, or neural network, or just neural network, is a computational learning system that uses a network of functions to understand and translate a data input of one form into a desired output, usually in another form. The concept of the artificial neural network was inspired by human biology and the way neurons of the human brain function together to understand inputs from human senses. *(https://deepai.org/machine-learning-glossary-and-terms/neural-network)*

### Imports

In [None]:
import tensorflow as tf
from tensorflow import keras

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt

### Datasets

We will be using MNIST Fashain Dataset. This dataset is included in keras and it includes 60000 images for training and 10,000 images for validation/testing.

In [None]:
# Loading the dataset
fashion_mnist = keras.datasets.fashion_mnist

# Spliting into training and testing datasets
(train_images, train_labels),(test_images, test_labels) = fashion_mnist.load_data()

Let's have a look as it shape

In [None]:
train_images.shape

It means that we have got 60000 images of 28 * 28 pixels for training the models

In [None]:
train_images[0,23,23]

Our pixel values are between 0 and 255, 0 being black and 255 being white. It means that we have grayscale images as there are no color other than black and white.

In [None]:
train_labels[:10]

Our label ranges from 0-9. Each integer represents a specific article of clothing.

In [None]:
class_names = ['T-shirt/top','Trouser','Pullover','Dress','Coat','Sandal','Shirt','Sneaker','Bag','Ankle boot']

Finally lets have a look at what some of these images look like

In [None]:
plt.figure()
plt.imshow(train_images[1])
plt.colorbar()
plt.grid(False)
plt.show()

### Data Preprocessing

Data preprocessing means applying some prior transformation to our data before feeding to the model. In this case, we will simply scale all of our greyscale pixel values(0-255) to be between 0 and 1.

To do this, we will divided training and testing sets by 255.0. We do this because smaller values will make it easier for the model to process our values.

In [None]:
train_images =  train_images / 255.0
test_images = test_images / 255.0

### Building the model

We are going to use keras *sequential* model with three different layer. This model represents a feed-forward neural network(one that passes values from left to right).

In [None]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28,28)), # input layer (1)
    keras.layers.Dense(128,activation='relu'), # hidden layer (2)
    keras.layers.Dense(10,activation='softmax')# output layer (3)
])

**Layer 1**

This is our input layer and it will consist fo 784 neurons. We use the flatten layer with an input shape of (28,28) to denote that our input should come in that shape. The flatten means that our layer will reshape the shape (28,28) array into a vector of 784 neurons so that each pixel will be associated with one neuron.

**Layer 2**

This is our first and only hidden layer. The dense denotes that this layer will be fully connected and each neyron from the previous layer connects to each neuron of this layer. It has 128 neurons and uses the rectify linear unit activation function.

**Layer 3**

This is our output layer and is also a dense layer. It has 10 neurons that we will look at to determine our models output. Each neuron represents the probabillity of a given images being one of the 10 different classes. The activation function *softmax* is used on this layer to calculate a probabillity distribution for each class. This means the value of any neuron in this layer will be between 0 and 1, where 1 represents a high probability of the image being that class.

#### Compile the Model

This is the last step of building the model. While compiling the model we will specify the loss function, optimizer adn metrics we would like to track.

We will be using ***sparse_categorical_crossentropy*** as loss function and ***adam*** as optimizer.

In [None]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

### Training the model

Now that we have build the model, we will be training the model with the preprocessed data.

In [None]:
model.fit(train_images,train_labels,epochs=5)

### Evaluating the model

After we have train the model, we will be looking into evaluating the model we have created.

The *verbose* argument can have two values 0 or 1 where 0 = slient and 1 = progress bar.

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=1)

print("Test accuracy = ", test_acc)

### Making predictions

To make predictions we simply need to pass an array of data in the form we've specified in the input layer to `.predict()` method.

In [None]:
predictions = model.predict(test_images)

THis method returns to us an array of predictions for each images we passed to it. Let's have a look at the predictions for image 1.

In [None]:
predictions[0]

If we want to get the value withthe highest score we can use `argmax()` function from numpy. This simply returns hte index of the maximum value from an numpy array.

In [None]:
label_index = np.argmax(predictions[0])
print("Prediction is ", class_names[test_labels[label_index]])

Lets check if the prediction is correct or not

In [None]:
plt.imshow(test_images[0])
plt.show()

## Deep Computer Vision

We will learn how to perform **image classification** and **object detection/recognition** using *deep computer vision* with something called ***convolutional neural network***

The goal of the covolution neural network will be to classify and detect images or specific objects from within the images. We will be using image data as our feature and a label for those images as our label or output.

Apart from the basis of Neural Network we will look into following terms:
- Image Data
- Convolutional Layer
- Pooling layer
- CNN Architectures

### Image Data
Upto now we have look into images that have 1 or 2 dimensions. Now, we are about to deal with the images that is usual made up to 3 dimensions. The 3 dimensions are as follows:
- image height
- image width
- color channels

    Color channels represents the depth of an images and coorelates to the colors used in it.

    *For examples, an image with three channels lis likely made up of rgb(reb ,green, blue) pixels.*

### Convolutional Neural Network

Each convolutional neural network is made up of one or many convolutional layer. These layer are different from *dense* layer we have seen previously as there goal is to find the pattern from within the images that can be used to classify the images or parts of it.

The fundamental difference between a dense layer and a convolutional layer is that dense layer detect pattern globally while convolutional layers detects pattern locally.

### Creating a Convnet

It is bases on guide from Tensorflow documentation : https://www.tensorflow.org/tutorials/images/cnn

#### Importing the necessary package

In [None]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt

#### Dataset

The problem we will consider here is classifying 10 different everyday object. The dataset we will use is built into tensorflow and called CIFAR Image Dataset. It contains 60,000 32 * 32 color images with 6000 images of each class.

The labels in this dataset are as follows:

- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck

Now, we will be loading the dataset

#### Loading the datasets

In [None]:
# Load and split the datasets
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images/255.0 , test_images /255.0

class_names = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']

In [None]:
# Let's look at one image
plt.imshow(train_images[1],cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[1][0]])
plt.show()

### CNN Architecture

A common architecture for a CNN is a stack of Conv2D and MaxPooling2D layers followed by a few densely connected layers. The idea is that the stack of convolutional and maxPooling layers extract the features from the image. Then these features are flattened and fed to densly connected layers that determine the class of an image based on the presence of features.

Let's start by building the **Convolution Base**

In [None]:
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation='relu', input_shape=(32,32,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation='relu'))

**Layer 1**

The input shape of our data will be 32, 32, 3 and we will process 32 filters of size 3 * 3 over our input data. We will also apply the activation function relu to the output of each convolution operation.

**Layer 2**

This layer perform the max pooling operation using 2*2 samples and a stride of 2

**Other layer**

The next set of layers do very similar things but take as input the features map from the previous layer. They also increase the frequency of filters from 32 to 64. We can do this as our data shrinks in spacial dimension as it passed through the layers, meaning we can afford(computationally) to add more depth.

In [None]:
model.summary()

### Adding Dense Layers

Upto now we have completed the convolution base. Now, we need to take these extracted features and add a way to classify them. This is why we add the following layers to the model.

In [None]:
model.add(layers.Flatten())
model.add(layers.Dense(64,activation='relu'))
model.add(layers.Dense(10))

In [None]:
model.summary()

We can see that the flatten layer changes the shape of our data so that we can feed it to the 64 node dense layer, followed by the final output layer of 10 neurons(one for each class).

### Training the model

Now we will train and compile the model using the recommended hyper parameters from tensorflow.

*Note: It might take longer time than other model*


In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
                    validation_data = (test_images, test_labels)
                    )

### Evaluating the model

We can determine how well the model performed by looking at it's performance on the test data set.

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

print(test_acc)

### Working with Small Datasets

In the situation where you don't have millions of images, it is difficult to train a CNN from scratch that perform very well. This is why we will learn a few techniques we can use to train CNN on small datasets of just a few thousand images.

#### Data Augmentation

Data augmentation is a technique to avoid over fitting and create a larger dataset from a smaller one.

This is simply performing random transformation on our images so that our models can generalize better. These transformations can be things like compressions, rotations, stretches and even color changes.

Let's look at the code below to an example of data augumentation.

In [None]:
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.utils import img_to_array

# Creates a data generator object that transforms images
datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Pick a image to transform

test_img = train_images[14]
img = img_to_array(test_img) # convert image into numpy array
img = img.reshape((1,)+img.shape) # reshape image

i = 0

for batch in datagen.flow(img, save_prefix='test', save_format='jpeg'):
    plt.figure(i)
    plot = plt.imshow(img_to_array(batch[0]))
    i+=1
    if i>4 :
        break

plt.show()


### Pretrained Models

### Fine Tuning

### Using a Pretrained Model

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
keras = tf.keras

#### Dataset

We will load the *cats_vs_dogs* dataset from the module tensorflow_datasets.

In [None]:
import tensorflow_datasets as tfds
tfds.disable_progress_bar()

# Split the data manually into 80% training, 10% cross validation, 10% validation
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]','train[80%:90%]','train[90%:]'],
    with_info = True,
    as_supervised = True,
)

In [None]:
get_label_name = metadata.features['label'].int2str #create a function object that we can use to get labels

# display 2 images from the dataset
for image, label in raw_train.take(2):
    plt.figure()
    plt.imshow(image)
    plt.title(get_label_name(label))

#### Data Preprocessing

Since the sizes of our images are all different we need to convert them all to the same size. We can create a function that will do that for us below.

In [None]:
IMG_SIZE = 160 # All images will be resize to 160 * 160

def format_example(image, label):
    """
    returns an image that is reshape to IMG_SIZE
    """
    image = tf.cast(image, tf.float32)
    image = (image/127.5) -1
    image = tf.image.resize(image, (IMG_SIZE,IMG_SIZE))
    return image, label

Now we can apply this function to all our images using map.

In [None]:
train = raw_train.map(format_example)
validation = raw_validation.map(format_example)
test = raw_test.map(format_example)

Let's have a look at our image.

In [None]:
for image, label in train.take(2):
    plt.figure()
    plt.imshow(image)
    plt.title(get_label_name(label))

Finally, we will shuffle and batch the images.

In [None]:
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

train_batches = train .shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches  = test.batch(BATCH_SIZE)

### Picking a Pretrained Model

The model we are going to use as the convolutional base for our model is the **MobileNet V2** developer by Google.

In [None]:
IMG_SHAPE = (IMG_SIZE,IMG_SIZE,3)

# Create the base model from the pre-trained model MobileNet v2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

In [None]:
base_model.summary()

In [None]:
for image, _ in train_batches.take(1):
    pass

feature_batch = base_model(image)
print(feature_batch.shape)

At this point the base model output a shape(32,5,5,1280) tensor that is a feature extration from our orginal (1,160,160,3) image. The 32 means that we have 32 layers of different filters.

#### Freezing the Base

The term **freezing** refers to disabling the training property of a layer. It simply means we wont make any changes to the weights of any layers that are frozen during training. This is important as we don't want to change the convolutional base that already has learned weights.

In [None]:
base_model.trainable = False

In [None]:
base_model.summary()

#### Adding our Classifier

Now that we have our base layer setup we can add the classifier. Instead of flattening the feature map of the base layer we will use a global average pooling layer that will average the entire 5*5 area of each 2D feature map and return to us a single 1280 element vector per filer.



In [None]:
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()

Finally we will add the prediction layer that will be a single dense neuron.

We can do this because we only have two classes to predict for.

In [None]:
prediction_layer = keras.layers.Dense(1)

Now, we will combine these layers together in a model

In [None]:
model = tf.keras.Sequential([
    base_model,
    global_average_layer,
    prediction_layer
])

In [None]:
model.summary()

#### Training the model

In [None]:
base_learning_rate = 0.0001

model.compile(optimizer = tf.keras.optimizers.RMSprop(learning_rate=base_learning_rate),
              loss = tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
initial_epochs = 3
validation_steps = 20

loss0, accurac0 = model.evaluate(validation_batches, steps= validation_steps)

In [None]:
# Now we can train it on our images
history = model.fit(train_batches,
                    epochs=initial_epochs,
                    validation_data=validation_batches)

acc = history.history['accuracy']
print(acc)

Finally, saving and loading the model we have train

In [None]:
model.save("dogs_vs_cats.h5")
new_model = tf.keras.models.load_model('dogs_vs_cats.h5')