# 3-2 Assignment: Identifying CIFAR-10 Images
---
<div class="alert alert-block alert-success" style="color:black;">
<b>To Begin:</b> Run all code blocks and observe the output. Once you have reviewed the sample output. Use the <b>LastName_FirstName_Assignment2.ipynb</b> file to complete your assignment.
</div>

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> For compatability purposes, libraries have been updated from those used in the required readings to match to current versions; hence some of the package invocations may differ slightly from the book. The affected lines of code have comments added to the right as applicable, with the old code commented out above for reference.
</div>

<div class="alert alert-block alert-danger" style="color:black;">
<b>GPU/CUDA/Memory Warnings/Errors:</b> You may receive some errors referencing that GPUs will not be used, CUDA could not be found, or free system memory allocation errors. These and a few others, are standard errors that can be ignored here as they are environment based.<br><br>
<b>Example messages:</b>
    <ul>
        <li>Could not find cuda drivers on your machine, GPU will not be used.</li>
        <li>Please check linkage and avoid linking the same target more than once.</li>
        <li>E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)</li>
        <li>Allocation of ######## exceeds 10% of free system memory</li>
    </ul>
</div>

---

### Installing Required Packages
This is to install necessary components to run the assignment

In [1]:
!pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.2[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
from keras.datasets import cifar10
from keras.utils import to_categorical # Syntax change due to version bump
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten # Syntax change due to version bump
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop
import matplotlib.pyplot as plt
import tensorflow as tf
import os # For saving model purposes

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b> # Logs that appear below are informational!
</div>


In [3]:
# CIFAR_10 is a set of 60K images 32 x 32 pixels on 3 channels
IMG_CHANNELS = 3
IMG_ROWS = 32
IMG_COLS = 32

# Constant 
BATCH_SIZE = 128
NB_EPOCH = 20
NB_CLASSES = 10
VERBOSE = 1
VALIDATION_SPLIT = 0.2
OPTIM = RMSprop()

I0000 00:00:1769020905.243358     616 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1769020905.390752     616 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1769020905.394810     616 cuda_executor.cc:1015] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
I0000 00:00:1769020905.398557     616 cuda_executor.cc:1015] successful NUMA node read from SysFS ha

<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b><br>
    If an UNKNOWN ERROR LOG DISPLAYS, it's just tensorflow trying to connect to GPU. <br>
    It'll simply redirect and use the CPU!
</div>


In [4]:
# Load the datasets
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

X_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples


<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b><br>
    One-hot Encoding & Normalization of images<br>
    In the book, the older version is:<br>
    Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
</div>


In [5]:
Y_train = to_categorical(y_train, NB_CLASSES)
Y_test = to_categorical(y_test, NB_CLASSES)
                        
# float and normalization
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

In [6]:
# Network
model = Sequential()
model.add(Conv2D(32, (3, 3), padding="same",
                input_shape=(IMG_ROWS, IMG_COLS, IMG_CHANNELS)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))
model.summary()
# Below red is simply a userwarning. 

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


<div class="alert alert-block alert-warning" style="color:black;">
    <b>Note:</b><br>
    <b><i>Training...</i></b> <br><br>NOTE: THIS MAY TAKE SOME TIME. Go grab a cup of coffee!
</div>


In [7]:
model.compile(loss="categorical_crossentropy", optimizer=OPTIM, metrics=["accuracy"])
model.fit(X_train, Y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, validation_split=VALIDATION_SPLIT, verbose=VERBOSE)
score = model.evaluate(X_test, Y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print('Test Score:', score[0])
print('Test Accuracy:', score[1])

Epoch 1/20


I0000 00:00:1769020915.906136     681 service.cc:146] XLA service 0x76e8100197e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1769020915.906190     681 service.cc:154]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5


[1m 38/313[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m1s[0m 4ms/step - accuracy: 0.1420 - loss: 2.9383

I0000 00:00:1769020918.590454     681 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 20ms/step - accuracy: 0.2781 - loss: 2.1251 - val_accuracy: 0.4789 - val_loss: 1.4799
Epoch 2/20
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.4842 - loss: 1.4529 - val_accuracy: 0.5398 - val_loss: 1.3041
Epoch 3/20
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.5359 - loss: 1.3198 - val_accuracy: 0.5675 - val_loss: 1.2407
Epoch 4/20
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.5707 - loss: 1.2229 - val_accuracy: 0.5830 - val_loss: 1.1725
Epoch 5/20
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.5925 - loss: 1.1544 - val_accuracy: 0.6075 - val_loss: 1.1365
Epoch 6/20
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.6193 - loss: 1.0839 - val_accuracy: 0.6115 - val_loss: 1.1134
Epoch 7/20
[1m313/313[0m [32m━━━━━

In [8]:
# Save the model
os.makedirs('output', exist_ok=True)
model_json = model.to_json()
open('./output/cifar10_architecture.json', 'w').write(model_json)

# And the weights learned by our deep network on the training set
model.save_weights('./output/cifar10.weights.h5', overwrite=True) # NOTE, it is now cifar10.weights, not cifar10_weights

In [9]:
# Improving the CIFAR-10 performance with deeper network
model = Sequential()
model.add(Conv2D(32, (3,3), padding='same', input_shape=(IMG_ROWS, IMG_COLS, IMG_CHANNELS)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))

model.summary()

In [10]:
# Improving the CIFAR-10 performance with data augmentation
from tensorflow.keras.preprocessing.image import ImageDataGenerator # Note, we are using Tensorflow's Keras package!
from keras.datasets import cifar10
import numpy as np
NUM_TO_AUGMENT = 5

In [11]:
# load dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()

# Augmenting
print("Augmenting training set images...")
datagen = ImageDataGenerator(rotation_range=40,
                            width_shift_range=0.2,
                            height_shift_range=0.2,
                            zoom_range=0.2,
                            horizontal_flip=True,
                            fill_mode='nearest')

Augmenting training set images...


<div class="alert alert-block alert-warning" style="color:black;">
    The Below will run to create images from the above ImageDataGenerator(). <br>
    <b>Note:</b> <b><i>If you run it, it will take awhile!</b></i>
</div>


In [12]:
xtas, ytas = [], []

for i in range(X_train.shape[0]):
    num_aug = 0
    x = X_train[i] #(3, 32, 32)
    x = x.reshape((1,) + x.shape) # (1 , 3, 32, 32)
    for x_aug in datagen.flow(x, batch_size=1, save_to_dir='preview', save_prefix='cifar', save_format='jpeg'):
        if num_aug >= NUM_TO_AUGMENT:
            break
        xtas.append(x_aug[0])
        num_aug += 1

In [13]:
optimizer = RMSprop() # Recreating the optimizer

#for the datagen
datagen.fit(X_train)

model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])

# train changed from model.fit_generator()
history = model.fit(
    datagen.flow(X_train, Y_train, batch_size=BATCH_SIZE),
    steps_per_epoch=X_train.shape[0] // BATCH_SIZE,
    epochs=NB_EPOCH,
    validation_data=tf.data.Dataset.from_tensor_slices((X_test, Y_test)).batch(BATCH_SIZE),
    verbose=VERBOSE
)

# score = model.evaluate(X_test, Y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])

Epoch 1/20


  self._warn_if_super_not_called()


[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 74ms/step - accuracy: 0.1639 - loss: 2.9001 - val_accuracy: 0.3335 - val_loss: 1.8116
Epoch 2/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 423us/step - accuracy: 0.2109 - loss: 1.9378 - val_accuracy: 0.3421 - val_loss: 1.8330
Epoch 3/20


  self.gen.throw(typ, value, traceback)


[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 64ms/step - accuracy: 0.2745 - loss: 1.9694 - val_accuracy: 0.3635 - val_loss: 1.7070
Epoch 4/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 450us/step - accuracy: 0.3047 - loss: 1.8942 - val_accuracy: 0.3787 - val_loss: 1.7016
Epoch 5/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 65ms/step - accuracy: 0.3163 - loss: 1.8491 - val_accuracy: 0.4167 - val_loss: 1.6118
Epoch 6/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 448us/step - accuracy: 0.3125 - loss: 1.9051 - val_accuracy: 0.4068 - val_loss: 1.6593
Epoch 7/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m25s[0m 64ms/step - accuracy: 0.3440 - loss: 1.7852 - val_accuracy: 0.4417 - val_loss: 1.5433
Epoch 8/20
[1m390/390[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 450us/step - accuracy: 0.3438 - loss: 1.7486 - val_accuracy: 0.4478 - val_loss: 1.5472
Epoch 9/20
[1m390/390[0m 

## Identifying CIFAR-10 Images and Exploring the Ethical and Privacy Implications 

<p style="text-indent: 2em;"> 
The CIFAR-10 dataset, consisting of low-resolution images of common objects such as animals and vehicles, was used in this assignment to train a convolutional neural network (CNN). This particular use case is relatively benign, but when sensitive image datasets (images of people, for example) are used for training similar deep learning algorithms, ethical and privacy concerns emerge (Luminovo, 2019, April 24).
</p>

 <p style="text-indent: 2em;"> 
When CNNs are trained on personal image data, privacy is a paramount concern. If a CNN were trained using image datasets of people's faces, it could then be used for facial recognition systems without the consent of the individuals represented in the data (Luminovo, 2019, April 24). Additionally, machine learning (ML) models run the risk of unintentionally memorizing or exposing sensitive information from training data, thus increasing the likelihood of data leakage or re-identification, especially when datasets are very large and insufficiently anonymized (Luminovo, 2019, April 24). These concerns highlight the need for privacy-preserving measures like limiting data collection, anonymizing datasets, enforcing strict controls over data use through the model lifecycle, and implementing rigorous oversight of how training data is collected, stored, and reused (Luminovo, 2019, April 24).
 </p>

 <p style="text-indent: 2em;"> 
Beyond privacy concerns, biases that could be linked to demographic groups present a significant ethical issue. When ML systems learn from biased or unrepresentive data, they may end up producing unfair or discriminatory outcomes. For example, existing inequalities in contexts like hiring, credit decisions, or law enforcement can be reinforced when models are trained on datasets that under-represent specific demographic groups (Lumenova AI, 2024, July 23). Practices to ensure fairness in ML include requiring careful attention to data collection, transparency in model development, regular auditing for bias, and inclusive team participation during design and deployment (Lumenova AI, 2024, July 23).
 </p>

 ### <div style="text-align: center;">**References**</div> 

Lumenova AI. (2024, July 23). *Fairness and bias in machine learning.*  
    https://www.lumenova.ai/blog/fairness-bias-machine-learning/

Luminovo. (2019, April 24). *Data privacy in machine learning: A technical deep dive.*  
    https://medium.com/luminovo/data-privacy-in-machine-learning-a-technical-deep-dive-f7f0365b1d60