# Working with Custom Images

So far everything we've worked with has been nicely formatted for us already by Keras.

Let's explore what its like to work with a more realistic data set.

## The Data

-----------

## PLEASE NOTE: THIS DATASET IS VERY LARGE. IT CAN BE DOWNLOADED FROM THE PREVIOUS LECTURE. PLEASE WATCH THE VIDEO LECTURE ON HOW TO GET THE DATA.

## USE OUR VERSION OF THE DATA. WE ALREADY ORGANIZED IT FOR YOU!!

--------
----------
--------

ORIGINAL DATA SOURCE:

https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765

-----------

The Kaggle Competition: [Cats and Dogs](https://www.kaggle.com/c/dogs-vs-cats-redux-kernels-edition) includes 25,000 images of cats and dogs. We will be building a classifier that works with these images and attempt to detect dogs versus cats!

The pictures are numbered 0-12499 for both cats and dogs, thus we have 12,500 images of Dogs and 12,500 images of Cats. This is a huge dataset!!

--------
----------
------------


**Note: We will be dealing with real image files, NOT numpy arrays. Which means a large part of this process will be learning how to work with and deal with large groups of image files. This is too much data to fit in memory as a numpy array, so we'll need to feed it into our model in batches. **

### Visualizing the Data


-------
Let's take a closer look at the data.

In [1]:
#import matplotlib.pyplot as plt
import cv2


## Preparing the Data for the model

There is too much data for us to read all at once in memory. We can use some built in functions in Keras to automatically process the data, generate a flow of batches from a directory, and also manipulate the images.

### Image Manipulation

Its usually a good idea to manipulate the images with rotation, resizing, and scaling so the model becomes more robust to different images that our data set doesn't have. We can use the **ImageDataGenerator** to do this automatically for us. Check out the documentation for a full list of all the parameters you can use here!

In [2]:
from keras.preprocessing.image import ImageDataGenerator

In [3]:
image_gen = ImageDataGenerator(rotation_range=30, # rotate the image 30 degrees
                               width_shift_range=0.1, # Shift the pic width by a max of 10%
                               height_shift_range=0.1, # Shift the pic height by a max of 10%
                               rescale=1/255, # Rescale the image by normalzing it.
                               shear_range=0.2, # Shear means cutting away part of the image (max 20%)
                               zoom_range=0.2, # Zoom in by 20% max
                               horizontal_flip=True, # Allo horizontal flipping
                               fill_mode='nearest' # Fill in missing pixels with the nearest filled value
                              )

### Generating many manipulated images from a directory


In order to use .flow_from_directory, you must organize the images in sub-directories. This is an absolute requirement, otherwise the method won't work. The directories should only contain images of one class, so one folder per class of images.

Structure Needed:

* Image Data Folder
    * Class 1
        * 0.jpg
        * 1.jpg
        * ...
    * Class 2
        * 0.jpg
        * 1.jpg
        * ...
    * ...
    * Class n

In [4]:
image_gen.flow_from_directory('E:/NU/fall 22/Grad project/datasets/TRAIN')

Found 63178 images belonging to 2 classes.


<keras.preprocessing.image.DirectoryIterator at 0x1a2260eb8e0>

In [5]:
image_gen.flow_from_directory('E:/NU/fall 22/Grad project/datasets/TEST')

Found 18699 images belonging to 2 classes.


<keras.preprocessing.image.DirectoryIterator at 0x1a2260d0430>

### Resizing Images

Let's have Keras resize all the images to 150 pixels by 150 pixels once they've been manipulated.

In [6]:
# width,height,channels
image_shape = (512,512,3)

# Creating the Model

In [7]:
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense, Conv2D, MaxPooling2D

In [14]:
model = Sequential()

model.add(Conv2D(filters=128, kernel_size=(7,7),input_shape=(512,512,3), activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(filters=64, kernel_size=(5,5),input_shape=(512,512,3), activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Conv2D(filters=32, kernel_size=(3,3),input_shape=(512,512,3), activation='relu',))
model.add(MaxPooling2D(pool_size=(2, 2)))


model.add(Flatten())


model.add(Dense(128))
model.add(Activation('relu'))


model.add(Dropout(0.25))


model.add(Dense(1))
model.add(Activation('sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

### Training the Model

In [15]:
batch_size = 16

train_image_gen = image_gen.flow_from_directory('E:/NU/fall 22/Grad project/datasets/TRAIN',
                                               target_size=image_shape[:2],
                                               batch_size=batch_size,
                                               class_mode='binary')

Found 63178 images belonging to 2 classes.


In [16]:
test_image_gen = image_gen.flow_from_directory('E:/NU/fall 22/Grad project/datasets/TEST',
                                               target_size=image_shape[:2],
                                               batch_size=batch_size,
                                               class_mode='binary')

Found 18699 images belonging to 2 classes.


In [17]:
train_image_gen.class_indices

{'MYDATASET AUTHENTIC - Copy TRAINING': 0,
 'MYDATASET TAMPERED - Copy TRAINING': 1}

In [18]:
import warnings
warnings.filterwarnings('ignore')

In [19]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_5 (Conv2D)           (None, 506, 506, 128)     18944     
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 253, 253, 128)    0         
 2D)                                                             
                                                                 
 conv2d_6 (Conv2D)           (None, 249, 249, 64)      204864    
                                                                 
 max_pooling2d_6 (MaxPooling  (None, 124, 124, 64)     0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 122, 122, 32)      18464     
                                                                 
 max_pooling2d_7 (MaxPooling  (None, 61, 61, 32)      

In [21]:
results = model.fit_generator(train_image_gen,epochs=10,
                              steps_per_epoch=150,
                              validation_data=test_image_gen,
                             validation_steps=12)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [22]:
model.save('CNN_Image2.h5')

# Evaluating the Model

In [23]:
results.history['accuracy']

[0.512499988079071,
 0.5037500262260437,
 0.5475000143051147,
 0.6200000047683716,
 0.6512500047683716,
 0.73416668176651,
 0.7691666483879089,
 0.786967396736145,
 0.8070833086967468,
 0.8095238208770752]

In [24]:
model.save('CNN_Image_10epochs2.h5')

# Predicting on new images

In [25]:
train_image_gen.class_indices

{'MYDATASET AUTHENTIC - Copy TRAINING': 0,
 'MYDATASET TAMPERED - Copy TRAINING': 1}

In [28]:
import numpy as np
import tensorflow as tf


file = 'E:/NU/fall 22/Grad project/images/cattle.png'

img = tf.keras.utils.load_img(file, target_size=(512,512))

img = tf.keras.utils.img_to_array(img)

img = np.expand_dims(img, axis=0)
img = img/255

In [29]:
prediction_prob = model.predict(img)



In [30]:
# Output prediction
print(f'Probability that image is a tampered is: {prediction_prob} ')

Probability that image is a tampered is: [[0.42638242]] 


In [31]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
model = keras.models.load_model('CNN_Image2.h5')

In [32]:
from scipy import ndimage, misc
import os
import cv2

In [33]:

path = "E:/NU/fall 22/Grad project/images authentic"
real_label = []
predicted = []
# iterate through the names of contents of the folder
for image_path in os.listdir(path):

    # create the full input path and read the file
    input_path = os.path.join(path, image_path)
    image= cv2.imread(input_path)
    if image is None:
        print('Wrong path:', input_path)
    else:
        img = cv2.resize(image, (512,512))
        img = tf.keras.utils.img_to_array(img)
        img = np.expand_dims(img, axis=0)
        img = img/255
        prediction_prob = model.predict(img)
        real_label.append(0)
        if prediction_prob < 0.5:
            predicted.append(0)
        else:
            predicted.append(1)
        print(f'Probability that image is a tampered is: {prediction_prob} ')

Probability that image is a tampered is: [[0.42591536]] 
Probability that image is a tampered is: [[0.44839832]] 
Probability that image is a tampered is: [[0.45910746]] 
Probability that image is a tampered is: [[0.5430195]] 
Probability that image is a tampered is: [[0.6463366]] 
Probability that image is a tampered is: [[0.4222735]] 
Probability that image is a tampered is: [[0.24627315]] 
Probability that image is a tampered is: [[0.3996654]] 
Probability that image is a tampered is: [[0.8826029]] 
Probability that image is a tampered is: [[0.4732859]] 
Probability that image is a tampered is: [[0.41801155]] 
Probability that image is a tampered is: [[0.50616723]] 
Probability that image is a tampered is: [[0.71795464]] 
Probability that image is a tampered is: [[0.4078108]] 
Probability that image is a tampered is: [[0.43630466]] 
Probability that image is a tampered is: [[0.26347342]] 
Probability that image is a tampered is: [[0.94761026]] 
Probability that image is a tampered i

In [34]:

path = "E:/NU/fall 22/Grad project/images tampered"
real_label2 = []
predicted2 = []
# iterate through the names of contents of the folder
for image_path in os.listdir(path):

    # create the full input path and read the file
    input_path = os.path.join(path, image_path)
    image= cv2.imread(input_path)
    if image is None:
        print('Wrong path:', input_path)
    else:
        img = cv2.resize(image, (512,512))
        img = tf.keras.utils.img_to_array(img)
        img = np.expand_dims(img, axis=0)
        img = img/255
        prediction_prob = model.predict(img)
        real_label2.append(1)
        if prediction_prob > 0.5:
            predicted2.append(1)
        else:
            predicted2.append(0)
        print(f'Probability that image is a tampered is: {prediction_prob} ')

Probability that image is a tampered is: [[0.7411762]] 
Probability that image is a tampered is: [[0.73913693]] 
Probability that image is a tampered is: [[0.7430946]] 
Probability that image is a tampered is: [[0.7426874]] 
Probability that image is a tampered is: [[0.74018484]] 
Probability that image is a tampered is: [[0.7315102]] 
Probability that image is a tampered is: [[0.74713755]] 
Probability that image is a tampered is: [[0.74747396]] 
Probability that image is a tampered is: [[0.74514407]] 
Probability that image is a tampered is: [[0.7362174]] 
Probability that image is a tampered is: [[0.7342488]] 
Probability that image is a tampered is: [[0.749936]] 
Probability that image is a tampered is: [[0.74343556]] 
Probability that image is a tampered is: [[0.74392724]] 
Probability that image is a tampered is: [[0.9420984]] 
Probability that image is a tampered is: [[0.89095026]] 
Probability that image is a tampered is: [[0.9307284]] 
Probability that image is a tampered is: 

In [35]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from skimage.util import view_as_windows

In [36]:
print(accuracy_score(real_label,predicted))
print(confusion_matrix(real_label,predicted))
print(classification_report(real_label,predicted))

0.7006125574272588
[[915 391]
 [  0   0]]
              precision    recall  f1-score   support

           0       1.00      0.70      0.82      1306
           1       0.00      0.00      0.00         0

    accuracy                           0.70      1306
   macro avg       0.50      0.35      0.41      1306
weighted avg       1.00      0.70      0.82      1306



In [37]:
print(accuracy_score(real_label2,predicted2))
print(confusion_matrix(real_label2,predicted2))
print(classification_report(real_label2,predicted2))

0.4970954356846473
[[  0   0]
 [606 599]]
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       1.00      0.50      0.66      1205

    accuracy                           0.50      1205
   macro avg       0.50      0.25      0.33      1205
weighted avg       1.00      0.50      0.66      1205



In [38]:
real_label_comp = real_label + real_label2

In [39]:
predicted_comp = predicted + predicted2

In [40]:
print(accuracy_score(real_label_comp,predicted_comp))
print(confusion_matrix(real_label_comp,predicted_comp))
print(classification_report(real_label_comp,predicted_comp))

0.60294703305456
[[915 391]
 [606 599]]
              precision    recall  f1-score   support

           0       0.60      0.70      0.65      1306
           1       0.61      0.50      0.55      1205

    accuracy                           0.60      2511
   macro avg       0.60      0.60      0.60      2511
weighted avg       0.60      0.60      0.60      2511



# Great Job!