
#Image Classification From Scratch  With TensorFlow 2.0

[Enrique Z. Losoya](https://orcid.org/0000-0001-7763-3349) based on [Jian Tao](https://orcid.org/0000-0003-4228-6089), Texas A&M University.
Updated: Jan. 2, 2023.

This notebook trains a Convolutional Neural Network (CNN) without using Keras --- it only uses Tensorflow's 2.0 APIs!

We will declare the weights, loss function and an optimizer for our CNN and train it on the Horses-Or-Humans dataset already available from the TensorFlow Datasets.


## 1) Installing TensorFlow 2.X

First, if our Colab's default TensorFlow version is not 2.0 then we need to upgrade the TF package using `pip`.

For instance, to install the 2.0 version of TensorFlow, we can run the following command:

 !pip install --upgrade tensorflow==2.0


## 2) Importing packages and declaring useful variables

We declare some useful variables like `batch_size`, `padding` for convolutional layers, the `learning_rate` and so on.


In [2]:

import tensorflow as tf
import tensorflow_datasets as tfds

padding = "SAME"  #@param ['SAME', 'VALID' ]
num_output_classes = 102  #@param {type: "number"}
batch_size = 32  #@param {type: "number"}
learning_rate = 0.001  #@param {type: "number"}



## 3) Fetching the Dataset

We fetch our dataset from TensorFlow Datasets which makes much of the data preprocessing easier ;-)


In [3]:
dataset_name = 'horses_or_humans'  #@param {type: "string"}

dataset = tfds.load( name=dataset_name , split=tfds.Split.TRAIN )
dataset = dataset.shuffle( 1024 ).batch( batch_size )

2023-01-04 09:19:17.679638: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".


[1mDownloading and preparing dataset 153.59 MiB (download: 153.59 MiB, generated: Unknown size, total: 153.59 MiB) to /Users/ezlo/tensorflow_datasets/horses_or_humans/3.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/2 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/1027 [00:00<?, ? examples/s]

Shuffling /Users/ezlo/tensorflow_datasets/horses_or_humans/3.0.0.incompleteQUG6U2/horses_or_humans-train.tfrec…

Generating test examples...:   0%|          | 0/256 [00:00<?, ? examples/s]

Shuffling /Users/ezlo/tensorflow_datasets/horses_or_humans/3.0.0.incompleteQUG6U2/horses_or_humans-test.tfreco…

[1mDataset horses_or_humans downloaded and prepared to /Users/ezlo/tensorflow_datasets/horses_or_humans/3.0.0. Subsequent calls will reuse this data.[0m
Metal device set to: Apple M1 Max

systemMemory: 64.00 GB
maxCacheSize: 24.00 GB



2023-01-04 09:19:20.658970: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-01-04 09:19:20.659126: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


## 4) Defining CNN operations

For our CNN model, we mainly use three operations (layers).

1. `conv2d` : Performs convolutions over the `inputs` matrix with kernels (`filters`) and `stride_size`. It also performs the Leaky ReLU activation function.

2. `maxpool` : Performs max pooling over the `inputs`.

3. `dense` : Dense layers for the CNN with dropout.

In [4]:

leaky_relu_alpha = 0.2 #@param {type: "number"}
dropout_rate = 0.5 #@param {type: "number"}

def conv2d( inputs , filters , stride_size ):
    out = tf.nn.conv2d( inputs , filters , strides=[ 1 , stride_size , stride_size , 1 ] , padding=padding ) 
    return tf.nn.leaky_relu( out , alpha=leaky_relu_alpha ) 

def maxpool( inputs , pool_size , stride_size ):
    return tf.nn.max_pool2d( inputs , ksize=[ 1 , pool_size , pool_size , 1 ] , padding='VALID' , strides=[ 1 , stride_size , stride_size , 1 ] )

def dense( inputs , weights ):
    x = tf.nn.leaky_relu( tf.matmul( inputs , weights ) , alpha=leaky_relu_alpha )
    return tf.nn.dropout( x , rate=dropout_rate )


## 5) Initializing CNN weights

We initialize the weights for our CNN. The shapes need to be calculated, but the `tf.nn.conv2d` expects the filters to have a shape of `[kernel_size, kernel_size, in_dims, out_dims]`.

We use the `glorot_uniform` initializer for our weights.

In [6]:
output_classes = 3
initializer = tf.initializers.glorot_uniform()
def get_weight( shape , name ):
    return tf.Variable( initializer( shape ) , name=name , trainable=True , dtype=tf.float32 )

shapes = [
    [ 3 , 3 , 3 , 16 ] , 
    [ 3 , 3 , 16 , 16 ] , 
    [ 3 , 3 , 16 , 32 ] , 
    [ 3 , 3 , 32 , 32 ] ,
    [ 3 , 3 , 32 , 64 ] , 
    [ 3 , 3 , 64 , 64 ] ,
    [ 3 , 3 , 64 , 128 ] , 
    [ 3 , 3 , 128 , 128 ] ,
    [ 3 , 3 , 128 , 256 ] , 
    [ 3 , 3 , 256 , 256 ] ,
    [ 3 , 3 , 256 , 512 ] , 
    [ 3 , 3 , 512 , 512 ] ,
    [ 8192 , 3600 ] , 
    [ 3600 , 2400 ] ,
    [ 2400 , 1600 ] , 
    [ 1600 , 800 ] ,
    [ 800 , 64 ] ,
    [ 64 , output_classes ] ,
]

weights = []
for i in range( len( shapes ) ):
    weights.append( get_weight( shapes[ i ] , 'weight{}'.format( i ) ) )



## 6) Assembling the operations

We put together all the CNN ops we defined earlier into a final model which we will use for training finally.


In [7]:

def model( x ) :
    x = tf.cast( x , dtype=tf.float32 )
    c1 = conv2d( x , weights[ 0 ] , stride_size=1 ) 
    c1 = conv2d( c1 , weights[ 1 ] , stride_size=1 ) 
    p1 = maxpool( c1 , pool_size=2 , stride_size=2 )
    
    c2 = conv2d( p1 , weights[ 2 ] , stride_size=1 )
    c2 = conv2d( c2 , weights[ 3 ] , stride_size=1 ) 
    p2 = maxpool( c2 , pool_size=2 , stride_size=2 )
    
    c3 = conv2d( p2 , weights[ 4 ] , stride_size=1 ) 
    c3 = conv2d( c3 , weights[ 5 ] , stride_size=1 ) 
    p3 = maxpool( c3 , pool_size=2 , stride_size=2 )
    
    c4 = conv2d( p3 , weights[ 6 ] , stride_size=1 )
    c4 = conv2d( c4 , weights[ 7 ] , stride_size=1 )
    p4 = maxpool( c4 , pool_size=2 , stride_size=2 )

    c5 = conv2d( p4 , weights[ 8 ] , stride_size=1 )
    c5 = conv2d( c5 , weights[ 9 ] , stride_size=1 )
    p5 = maxpool( c5 , pool_size=2 , stride_size=2 )

    c6 = conv2d( p5 , weights[ 10 ] , stride_size=1 )
    c6 = conv2d( c6 , weights[ 11 ] , stride_size=1 )
    p6 = maxpool( c6 , pool_size=2 , stride_size=2 )

    flatten = tf.reshape( p6 , shape=( tf.shape( p6 )[0] , -1 ))

    d1 = dense( flatten , weights[ 12 ] )
    d2 = dense( d1 , weights[ 13 ] )
    d3 = dense( d2 , weights[ 14 ] )
    d4 = dense( d3 , weights[ 15 ] )
    d5 = dense( d4 , weights[ 16 ] )
    logits = tf.matmul( d5 , weights[ 17 ] )

    return tf.nn.softmax( logits )



## 7) Defining the loss function and optimization using `tf.GradientTape`

We then use `tf.losses.categorical_crossentropy` as our loss function. Now comes `tf.GradientTape`, an automatic differentiation engine which records all the derivatives within its scope.

The `train_step` function takes in a batch of data, calculates the loss and gradients, and then with `optimizer.apply_gradients`, we update the `weights`.


In [8]:

def loss( pred , target ):
    return tf.losses.categorical_crossentropy( target , pred )

optimizer = tf.optimizers.Adam( learning_rate )

def train_step( model, inputs , outputs ):
    with tf.GradientTape() as tape:
        current_loss = loss( model( inputs ), outputs)
    grads = tape.gradient( current_loss , weights )
    optimizer.apply_gradients( zip( grads , weights ) )
    print( tf.reduce_mean( current_loss ) )



## 8) Final Training

We train our model for a specific number of epochs.


In [9]:

num_epochs = 256 #@param {type: "number"}

for e in range( num_epochs ):
    for features in dataset:
        image , label = features[ 'image' ] , features[ 'label' ]
        train_step( model , image , tf.one_hot( label , depth=3 ) )


2023-01-04 09:23:40.806204: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2023-01-04 09:23:40.807407: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


tf.Tensor(0.9273989, shape=(), dtype=float32)
tf.Tensor(7.0516663, shape=(), dtype=float32)
tf.Tensor(7.0516667, shape=(), dtype=float32)
tf.Tensor(7.555357, shape=(), dtype=float32)
tf.Tensor(5.540595, shape=(), dtype=float32)
tf.Tensor(7.5553575, shape=(), dtype=float32)
tf.Tensor(6.5479765, shape=(), dtype=float32)
tf.Tensor(7.0516667, shape=(), dtype=float32)
tf.Tensor(10.07381, shape=(), dtype=float32)
tf.Tensor(6.5479765, shape=(), dtype=float32)
tf.Tensor(7.5553575, shape=(), dtype=float32)
tf.Tensor(10.577499, shape=(), dtype=float32)
tf.Tensor(10.07381, shape=(), dtype=float32)
tf.Tensor(8.059048, shape=(), dtype=float32)
tf.Tensor(6.5479765, shape=(), dtype=float32)
tf.Tensor(7.5553575, shape=(), dtype=float32)
tf.Tensor(7.5553575, shape=(), dtype=float32)
tf.Tensor(7.051667, shape=(), dtype=float32)
tf.Tensor(8.059048, shape=(), dtype=float32)
tf.Tensor(6.5479765, shape=(), dtype=float32)
tf.Tensor(7.051667, shape=(), dtype=float32)
tf.Tensor(11.08119, shape=(), dtype=float3


That is all. You can now export the graph, play around with more layers, or fine-tune the hyperparameters.