This Google colab notebook is a short example of automatically Classifying shear-Sense-indicating clasts in Photomicrographs using a method called "transfer learning", which is based on Convolutional Neural Networks (CNN). It explains the basic steps involved using a deep learning library called Keras.

First, we have to mount google drive so the data can be acessed.
After running this code, a hyperlink will appear to get an authorization code. Go to that link, select the google account you're currently using, click allow, then copy the authorization code into the entry box.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive/')

folder_path = 'gdrive/My Drive/EC2020-SSU/data/'

Next, we need to import the libraries we use.

In [0]:
import pandas as pd
import numpy as np
import os
import keras
import matplotlib.pyplot as plt
from keras.layers import Dense,GlobalAveragePooling2D
from keras.applications import ResNet50
from keras.preprocessing import image
from keras.applications.mobilenet import preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.optimizers import Adam



We download an instance of ResNet50, a image recognition network, with the setting include_top=False, which cuts off the last layer of the CNN, which is the layer that takes all the processing of the previous layers, and outputs a final prediction of what category the network believes the image to be. 

The key insight of transfer learning is that because our problem, distinguishing two types of images, is similar to recognizing images in general, we can building off of powerful pre-trained networks to vastly speed up training of and improve our network.

In [0]:

base_model=ResNet50(weights='imagenet',include_top=False)




Now we have a network without a final layer, we need to build it. 

The most important thing to recognise is that, in the last layer of the new output, we make there be 2 nodes, because we have two classification categories. 

In [0]:
newOutput=base_model.output
newOutput=GlobalAveragePooling2D()(newOutput)
newOutput=Dense(1024,activation='relu')(newOutput) 
newOutput=Dense(1024,activation='relu')(newOutput) 
newOutput=Dense(512,activation='relu')(newOutput)

preds=Dense(2,activation='softmax')(newOutput) #final layer with softmax activation

now we simply stitch together the premade CNN with our new last layers, and we have the form of our network.


In [0]:
model=Model(inputs=base_model.input,outputs=preds)

We can now print the layer names in our model, and we see the last four layers are the new dense layers we just added.

In [0]:

for i,layer in enumerate(model.layers):
  print(i,layer.name)


we use a keras library to convert our pictures into a format the model can use.

In [0]:
train_datagen=ImageDataGenerator(preprocessing_function=preprocess_input)

train_generator=train_datagen.flow_from_directory(folder_path,
                                                 target_size=(224,224),
                                                 color_mode='rgb',
                                                 batch_size=32,
                                                 class_mode='categorical',
                                                 shuffle=True)

Now we can train the new layers we have added on our data.

In [0]:
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])


step_size_train=train_generator.n//train_generator.batch_size
model.fit_generator(generator=train_generator,
                   steps_per_epoch=step_size_train,
                   epochs=5)

Now we have an output, we can intepret the results. There are two metrics we get from our model, accuracy and loss. Accuracy is self explanitory, a measure of how many images the model categorized correctly. Loss is a measure of how many mistakes the model made, with a minimum of 0. In general, lower loss is better. 