In [None]:
from keras.datasets import cifar10
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop
import matplotlib.pyplot as plt
from keras.preprocessing.image import ImageDataGenerator
import numpy as np
import os
import ssl
ssl._create_default_https_context = ssl._create_unverified_context # Used to bypass SSL certificate verification
os.makedirs('preview', exist_ok=True)# Make directory for augmentation of images
NUM_TO_AUGMENT=5

#CIFAR_10 is a set  of 60K images 32x32 pixels on Flatten 3 channels
IMG_CHANNELS = 3
IMG_ROWS = 32
IMG_COLS = 32
#constant
BATCH_SIZE = 128
NB_EPOCH = 20
NB_CLASSES = 10
VERBOSE = 1
VALIDATION_SPLIT = 0.2
OPTIM = RMSprop()
#load dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')
#augementing
print("Augmenting training set images...")
# convert to categorical
Y_train = np_utils.to_categorical(y_train, NB_CLASSES)
Y_test = np_utils.to_categorical(y_test, NB_CLASSES)
# float and normalization
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
# network
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape=(IMG_ROWS, IMG_COLS, IMG_CHANNELS)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))
model.summary()
# train
model.compile(loss='categorical_crossentropy', optimizer=OPTIM, metrics=['accuracy'])
model.fit(X_train, Y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, validation_split=VALIDATION_SPLIT, verbose=VERBOSE)
score = model.evaluate(X_test, Y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print("Test score:", score[0])
print('Test accuracy:', score[1])
#save model
model_json = model.to_json()
open('cifar10_architecture.json', 'w').write(model_json)
#And the weights learned by our deep network on the training set
model.save_weights('cifar10_weights.h5', overwrite=True)
datagen = ImageDataGenerator( rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest')
xtas, ytas = [], []
for i in range(X_train.shape[0]):
    num_aug = 0
    x = X_train[i] # (3, 32, 32)
    x = x.reshape((1,) + x.shape) # (1, 3, 32, 32)
for x_aug in datagen.flow(x, batch_size=1, save_to_dir='preview', save_prefix='cifar', save_format='jpeg'):
    if num_aug >= NUM_TO_AUGMENT:
        break
    xtas.append(x_aug[0])
num_aug += 1
#fit the dataget
datagen.fit(X_train)
#train
history = model.fit_generator(datagen.flow(X_train, Y_train, batch_size=BATCH_SIZE), samples_per_epoch=X_train.shape[0], epochs=NB_EPOCH, verbose=VERBOSE)
score = model.evaluate(X_test, Y_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print("Test score:", score[0])
print('Test accuracy:', score[1])

Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
X_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
Augmenting training set images...




Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
activation_2 (Activation)    (None, 32, 32, 32)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)       

This algorithm, applied to other data sets, poses significant ethical and privacy implications. When utilizing a data set comprised of people’s faces, sensitive information may be obtained in unauthorized ways, stripping public anonymity and individual safety. With public access to this tool, any individual could upload a picture of an individual they potentially have malicious intent towards and find sensitive information. This infringement of privacy and identification of an unknown individual is described as "users would potentially be able to identify every person they saw. The tool could identify activists at a protest or an attractive stranger on the subway, revealing not just their names but where they lived, what they did, and whom they knew" (Hill, 2020). This poses a significant privacy issue for an individual, as their private identifiable information can be easily retrieved by a third party without their knowledge. Subsequently, this identification can manifest in the weaponization of the tool, such as by "a rogue law enforcement officer who wants to stalk potential romantic partners or a foreign government using this to dig up secrets about people to blackmail them or throw them in jail" (Hill, 2020). Additionally, "deep learning models appear to often memorize rare details about the training data that are completely unrelated to the intended task while the model is still learning the underlying behavior" (Carlini, Liu, Erlingsson, Kos, & Song, 2019). This can raise additional privacy concerns as the algorithm may remember highly sensitive information, even if it only appears sporadically, which can later be extracted and maliciously utilized by a separate adversary.

Next, this algorithm also poses ethical considerations if utilized in a self-driving vehicle. As each city has a different demographic makeup, sampling bias can be a significant threat to the training data used by the algorithm. An example of this demographic difference is described as "a pedestrian recognition model trained on only pictures of pedestrians in rural America will not operate well in a multicultural urban city because pedestrians from the two populations would not have similar appearances" (Xiang, 2019). Inadvertently, the algorithm may incorporate this bias when identifying pedestrians along the road. This manifestation of bias allows for higher precision based on skin tones, described as "evidence that standard models for the task of object detection, trained on standard datasets, appear to exhibit higher precision on lower Fitzpatrick skin types than higher skin types. This behavior appears on large images of pedestrians and even grows when we remove occluded pedestrians" (Wilson, Hoffman, & Morgenstern, 2019). Subsequently, self-driving vehicles may introduce accidents or fatalities due to their lower fidelity across different demographic groups.

References

Carlini, N., Liu, C., Erlingsson, Ú., Kos, J., & Song, D. (2019). The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks. In SEC’19: Proceedings of the 28th USENIX Conference on Security Symposium (pp. 267–284). USENIX Association.

Hill, K. (2020, January 18). The Secretive Company That Might End Privacy as We Know It. The New York Times. https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html

Wilson, B., Hoffman, J., & Morgenstern, J. (2019). Predictive Inequity in Object Detection. ArXiv, abs/1902.11097.

Xiang, M. (2019, March 17). Human Bias in Machine Learning. Medium. https://towardsdatascience.com/bias-what-it-means-in-the-big-data-world-6e64893e92a1