## Overview
1. Exploring the Data
2. Logistic Regression
3. CNN
4. __Building Test Functions__

In this section, I build function in order to create the Sketch It Game. Most of the development was completed using a text editor (Atom) and saved as a py file because openCV, the library that allows me to access the connected webcam, does not work in jupyter notebook. It will cause the kernel to crash. In order to run the game, run the file demo_game.py found in the demo folder. To run on your laptop webcam, change the source cv2.VideoCapture(1) to cv2.VideoCapture(0) where applicable in test_functions.py and demo_game.py.

In [19]:
from imutils.video import VideoStream
import argparse
import datetime
import imutils
import time
import cv2
import glob
import numpy as np
from PIL import Image
from sklearn.preprocessing import LabelEncoder
from keras.models import model_from_json

## load_cnn_model 
load_cnn_model takes two inputs, model_name and model_weight_name, and returns a loaded CNN model and a list of the categories based on what the model was trained on

In [20]:
def load_cnn_model(model_name, model_weight_name):
    json_file = open(model_name)
    loaded_model_json = json_file.read()
    json_file.close()
    loaded_model = model_from_json(loaded_model_json)
    loaded_model.load_weights(model_weight_name)
    loaded_model.compile(loss = 'sparse_categorical_crossentropy',
                      optimizer = 'Adam',
                      metrics = ['accuracy'])

    #get path names
    path = r'/Users/john/Desktop/Demo_Day/Data/npy'
    file_names = sorted(glob.glob(path +'/*.npy'))
    file_names 
    
    #Get the Names of the classes
    sketch_list = []
    for path in file_names:
        name = (path[56:path.find('.')])
        sketch_list.append(name)
    sketch_list


    return(loaded_model,sketch_list)

In [21]:
#testing the function
model,sketch_list = load_model('CNN_model.json','CNN_model.h5')
print(model)
print(sketch_list)

<keras.engine.sequential.Sequential object at 0x1a4be2abe0>
['angel', 'sword', 'wine_glass', 'yoga']


## get_background 
get_background takes three inputs (set_width, set_font, font_y_pad) and returns a background_processed.  When the function is called, the webcam is activated and each image from the webcam is stored in the frame variable.  When the computer key 'y' is pressed, it takes the last image the webcam has stored and processes it. The processing includes turning the BGR image into grayscale, then removing small noise by applying Gaussian Blur. Set_width, set_font, font_y_pad are inputs that affect the shape of the frame and the text that appear on the display window. It is necccessary to scale the display window based on whichever monitor is being used.

In [22]:
def get_background(set_width,set_font,font_y_pad):
    #initiate webcam
    cam = cv2.VideoCapture(1)
    
    background= None
    run = True
    
    while run:
        ret, frame = cam.read()
        #resize frame based on set_width
        frame2 = imutils.resize(frame, width = set_width)
        
        cv2.putText(frame2, 'Calibrating Background',(10,font_y_pad),
        cv2.FONT_HERSHEY_SIMPLEX, set_font, (0, 0, 0), 2)

        cv2.putText(frame2, 'Press (y) to Continue',(10,frame2.shape[0]-font_y_pad),
        cv2.FONT_HERSHEY_SIMPLEX, set_font, (0, 0, 0), 2)

        cv2.imshow("CAN YOU DRAW",frame2)

        key = cv2.waitKey(1)


        print(key)
        if key == ord("y"):
            run = False
        if key == ord("q"):
            break
    #change to grayscale and apply GaussianBlur to remove small noise.
    background = frame
    background_gray = cv2.cvtColor(background, cv2.COLOR_BGR2GRAY)
    background_gray_blur = cv2.GaussianBlur(background_gray, (5,5),0)
    background_proccessed = background_gray_blur
    cam.release()
    cv2.destroyAllWindows
    return(background_proccessed)

## frame_preprocessing
frame_preprocessing takes 5 inputs. frame_capture is image from the webcam that is to be processed.  x, y, w, h, are the values of a crop box around the object inside the image.  The values of x, y, w, h are found in the function get_level which will be described further down. frame_preprocessing takes an image, converts it to grayscale, and then crops the image while adding a padding.  Then, it thresholds the values at 100. This is neccessary to help when downsizing from the input frame_capture, to a 28x28 sized array which is neccesary for the prediction model.  During resizing, a BICUBIC sampling method is used.  Multiple samplying methods were tested and I found BICUBIC to give the best results.

In [23]:
def frame_preprocessing(frame_capture,x,y,w,h):

    #convert to image for grascaling and crop based on crop box co-ordinates
    pil_image = Image.fromarray(frame_capture)
    pil_image = pil_image.convert('L')
    
    #if the image is taller than it is wide, adds extra padding to the width
    if h>w:
        pil_crop = pil_image.crop((x-50-((h-w)/2),y-50,x+w+50+((h-w)/2),y+h+50))
        
    #if the image is wider than it is tall, adds extra padding to the height
    if h<w:
        pil_crop = pil_image.crop((x-50,y-50-((w-h)/2),x+w+50,y+h+50+((w-h)/2)))
        
    #convert to array for thresholding(cv2.threshold can does the same thing)
    frame_array = np.array(pil_crop)
    frame_array = np.where(frame_array <= 100,0,frame_array)
    frame_array = np.where(frame_array > 100, 255, frame_array)

    #turn back into an image, then resize image with BICUBIC sampling and turn into an array
    resized_image = Image.fromarray(frame_array)
    test_array = np.array(resized_image.resize((28,28), Image.BICUBIC))

    #returns an array
    return(test_array)

## model_reshape_predict
model_reshape_predict reutrns the prediction_label, as well as the softmax output from the cnn_model. It's required inputs are the image to be tested, the model to be used, and a list of the possible categories to choose from.

In [24]:
def model_reshape_predict(image, model, y_unique):
    #Define the img rows and cols
    img_rows = 28
    img_cols = 28

    #Reshape the image
    test= image.reshape(1, img_rows, img_cols, 1)

    #Predict based on model
    prediction = model.predict(test)

    #Create the labels off of y_unique
    my_encoder = LabelEncoder()
    my_encoder.fit(y_unique)

    #Change prediction to label
    pred_label = my_encoder.inverse_transform([np.argmax(prediction)])[0]

    return (pred_label,prediction)

## get_level
get_level creates different levels for the sketch it game.  
level_name (string) is the a name of the object to be drawn. 
correct_answer (string) is text to be displayed when the model guesses the right answer. 
background_processed (array of integers) is the processed background photo from the function get_background. 
model and sketch_list is the same input from the function load_cnn_model. 
next_level(int) keeps track of which level will come next.  
set_width (int) determines the width of the display window

If the correct answer was guessed, the function will return the input next_level which allows the user to move onto the next level.

In [25]:
def get_level(level_name, correct_answer, background_processed, model, sketch_list,next_level,set_width):
    cv2.destroyAllWindows
    cam = cv2.VideoCapture(1)
    
    #initiate variables
    guess_array = np.array([])
    correct = level_name
    answer = 0
    result = 0
    set_font = int(set_width/400)
    font_color = (0,0,0)
    font_y_pad = int(set_width/10)
    
    #initiate camera
    start = True
    while start == True:
        ret, frame = cam.read()
        
        #display window text shows "Undetected"
        text_detection = "Undetected"

        #Process webcam image
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        gray_blur = cv2.GaussianBlur(gray, (5,5),0)
        
        #Find the difference between the gray_blur image and the background_processed image, threshold and dilate it.
        frameDelta = cv2.absdiff(background_processed,gray_blur)
        #this function can be used in the earlier frame preprocessing function
        thresh = cv2.threshold(frameDelta, 50, 255, cv2.THRESH_BINARY)[1]
        #dilation thickens the contours in the image
        dilate = cv2.dilate(thresh, None, iterations = 2)

        #Find and grab contours. cnts is an array of arrays.
        cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
            cv2.CHAIN_APPROX_SIMPLE)
        cnts = imutils.grab_contours(cnts)
        
        if answer == 0:
            text_guess = 'Hmmmm'
        
        sketch_x = None
        sketch_y = None
        sketch_w = None
        sketch_h = None
        
        #loop through the contours
        for c in cnts:
            #ignore really small contours
            if cv2.contourArea(c) <set_width*set_width/250:
                continue
                
            #create and draw a bounding rectangle
            (x,y,w,h) = cv2.boundingRect(c)
            c_area = w*h
            #another filter to drop out small contours (this is redundant now that I see it)
            
            if c_area > 4000:
                sketch_x = x
                sketch_y = y
                sketch_w = w
                sketch_h = h
        #if there are contours that meet the requirement, return a guess of what it is
        
        try:
            #draws a contour onto the display window
            cv2.rectangle(frame, (sketch_x-20,y-20),(sketch_x+sketch_w+20, sketch_y+sketch_h+20), (0,255,0),2)
            
            #change display window text to 'Detected'
            text_detection = "Detected"

            #Call our test functions from above to return a guess of the image
            processed_frame = frame_preprocessing(dilate, sketch_x, sketch_y, sketch_w, sketch_h)
            guess, confidence = model_reshape_predict(processed_frame, model, sketch_list)
            guess_array = np.append(guess_array, guess)
            print(guess)

            #check to see if the last 40 items in guess.  This is useful because it shows that the model 
            #is consistently guessing the right image, as opposed to a lucky guess.
            final_guess = np.unique(guess_array[-40:])
            print(final_guess)
            
            #if the guess is corrext, the display window will display a message
            if final_guess == correct:
                if next_level != 5:
                    text_guess = correct_answer + " Press (n) to continue"
                else:
                    text_guess = correct_answer
                result = next_level
                answer = 1

        #if no contours have been detected, pass
        except:
            pass
        
        #Places Texts onto display window
        cv2.putText(frame, 'Lets see you draw a {}'.format(level_name),(10,font_y_pad),
                    cv2.FONT_HERSHEY_SIMPLEX, int(set_font*1.5), (0, 0, 0), 2)

        cv2.putText(frame, 'Sketch: {}'.format(text_detection),(10,2*font_y_pad),
                    cv2.FONT_HERSHEY_SIMPLEX, set_font, (0, 0, 0), 2)


        cv2.putText(frame, "{}".format(text_guess), (10, frame.shape[0]-font_y_pad),
                    cv2.FONT_HERSHEY_SIMPLEX, set_font, (0, 0, 0), 2)



        frame = imutils.resize(frame, width = set_width)
        #show display window
        cv2.imshow("CAN YOU DRAW",frame)
        
        #Optional/for development --> shows the image that is being fed into the model
        try:
            cv2.imshow("Frame_processing", processed_frame)
        except:
            pass
        key = cv2.waitKey(1)


        print(key)
        if key == ord("n"):
            start = False
        if key == ord("q"):
            result = 0
            break

    cam.release()
    return(result)

## Sketch It, a machine learning drawing game
This is the code to run the Sketch It. 

In [None]:
#DEMO GAME
from imutils.video import VideoStream
import argparse
import datetime
import imutils
import time
import cv2
import numpy as np
from PIL import Image
from sklearn.preprocessing import LabelEncoder
from keras.models import model_from_json
import test_functions


#determine window resize(1920x1080, 1280x720, 640,480)
set_width = 1080
set_font = int(set_width/400)
font_color = (0,0,0)
font_y_pad = int(set_width/10)

#Calibrate Background
background_processed = test_functions.get_background(set_width, set_font, font_y_pad)
print(background_processed)
print(background_processed.shape)

#Load Test models
model,sketch_list = test_functions.load_cnn_model('CNN_model_demo_day2.json','CNN_model_demo_day2.h5')
print(sketch_list)

what = cv2.VideoCapture(1)
run = True
while run:
    #Load Level

    result = 0
    ret, frame = what.read()
    frame = imutils.resize(frame, width = set_width)

    cv2.putText(frame, "Welcome to 'Can you draw?''",(10,font_y_pad),
            cv2.FONT_HERSHEY_SIMPLEX, set_font, font_color, 2)

    cv2.putText(frame, "Press (s) to start",(10,frame.shape[0]-2*font_y_pad),
            cv2.FONT_HERSHEY_SIMPLEX, set_font, font_color, 2)

    cv2.putText(frame, "Press (r) to recalibrate",(10,frame.shape[0]-font_y_pad),
            cv2.FONT_HERSHEY_SIMPLEX, set_font, font_color, 2)

    cv2.imshow("CAN YOU DRAW",frame)

    key = cv2.waitKey(1)
    
    # Loads different levels, notice that the next level can only be accessed if the drawing was guessed correctly.
    if key == ord("r"):
        background_processed = test_functions.get_background(set_width, set_font, font_y_pad)

    if key == ord("s"):
        cv2.destroyAllWindows
        result = test_functions.get_level('wine glass','Thats a wine glass alright.',background_processed, model, sketch_list,2, set_width)

    if result == 2:
        result = test_functions.get_level('sword','ZING ZING, thats a sword!',background_processed, model, sketch_list,3,set_width)

    if result == 3:
        result = test_functions.get_level('angel','What a beautiful Angel',background_processed, model, sketch_list,4,set_width)

    if result == 4:
        result = test_functions.get_level('yoga','Nice! Thats its, thanks for playing!',background_processed, model, sketch_list,5,set_width)

    if key == ord("q"):
      break