# Car Photo Classification

Photos obtained from [Edmunds.com API](http://developer.edmunds.com/api-documentation/overview/).

The goal of this project is to perform image classification. Can I build an artificial neural net that can correctly label the car in each photo. Considering that the photos vary widely, this may be a difficult problem. However, the images obtained from the Edmunds API are all high quality, which will increase performance of the neural net. If instead we used photos of cars posted for saled on Craigslist, for example, the neural net may take a longer time to train. For now, we will stick to the model shots obtained from Edmunds.

<img src='FP_data/images/ford_explorer_2015/2014_ford_explorer_4dr-suv_sport_fq_oem_3_300.jpg'>

<img src='FP_data/images/toyota_tacoma/2016_toyota_tacoma_crew-cab-pickup_limited_f_oem_1_400.jpg'>

<img src='FP_data/images/bmw_3-series/2016_bmw_3-series_sedan_340i_fq_oem_4_1600.jpg'>

<img src='FP_data/images/porsche_cayman/2014_porsche_cayman_coupe_s_rq_oem_1_500.jpg'>

In [1]:
import requests
import json
from collections import defaultdict, Counter
import numpy as np
import os
from PIL import Image, ImageOps
import urllib
import re
import string
from time import sleep

In [2]:
# use these api keys when using the Media api, for obtaining car photos
edmunds_api_key_photos = # insert API key
edmunds_api_secret_photos = # insert API secret

# use these api keys when using the Vehicles api, for obtaining makes, models, and years
edmunds_api_key_complete = # insert API key
edmunds_api_secret_complete = # insert API secret

In [3]:
photos_url = 'https://api.edmunds.com/api/media/v2/photoset?'
makes_url = 'https://api.edmunds.com/api/vehicle/v2/makes?'
#models_url = 'https://api.edmunds.com/api/vehicle/v2/{0}/models?'.format(make)
#years_url = 'http://api.edmunds.com/api/vehicle/v2/{0}/{1}/years?'.format(make, model)

__Getting a list of all car makes for 2016:__

_(For exploring what makes and models are available and deciding for which to pull pictures)_

In [7]:
makes_params = {'state': 'new',
               'year': '2016',
               'view': 'basic',
               'fmt': 'json',
               'api_key': edmunds_api_key_complete}
makes_response = requests.get(makes_url, params=makes_params)
assert makes_response.status_code == 200
makes_16 = json.loads(makes_response.content) # saving as json/dict Python object

In [8]:
makes_16.keys()

[u'makesCount', u'makes']

In [10]:
makes_16['makes'][0] # an example of the info included about 1 make

{u'id': 200002038,
 u'models': [{u'id': u'Acura_ILX',
   u'name': u'ILX',
   u'niceName': u'ilx',
   u'years': [{u'id': 200713715, u'year': 2016}]},
  {u'id': u'Acura_MDX',
   u'name': u'MDX',
   u'niceName': u'mdx',
   u'years': [{u'id': 200726800, u'year': 2016}]},
  {u'id': u'Acura_RDX',
   u'name': u'RDX',
   u'niceName': u'rdx',
   u'years': [{u'id': 200727186, u'year': 2016}]},
  {u'id': u'Acura_RLX',
   u'name': u'RLX',
   u'niceName': u'rlx',
   u'years': [{u'id': 200729233, u'year': 2016}]},
  {u'id': u'Acura_TLX',
   u'name': u'TLX',
   u'niceName': u'tlx',
   u'years': [{u'id': 401583109, u'year': 2016}]}],
 u'name': u'Acura',
 u'niceName': u'acura'}

In [11]:
# Creating a dictionary of {make: [make_id, (Model_Name, model_name), (Model_Name1, model_name2, ...)]}
# -- in case we want to more easily select a make and model to pass to the Photos API call
all_makes = defaultdict(list)
counter = 0
for make in makes_16['makes']:
    all_makes[(make['niceName'])].append(make['id'])
    for model in make['models']:
        all_makes[(make['niceName'])].append((model['id'], model['niceName']))
        counter += 1

In [9]:
all_makes

defaultdict(list,
            {u'acura': [200002038,
              (u'Acura_ILX', u'ilx'),
              (u'Acura_MDX', u'mdx'),
              (u'Acura_RDX', u'rdx'),
              (u'Acura_RLX', u'rlx'),
              (u'Acura_TLX', u'tlx')],
             u'alfa-romeo': [200464140, (u'Alfa_Romeo_4C', u'4c')],
             u'aston-martin': [200001769,
              (u'Aston_Martin_DB9_GT', u'db9-gt'),
              (u'Aston_Martin_Rapide_S', u'rapide-s'),
              (u'Aston_Martin_V12_Vantage_S', u'v12-vantage-s'),
              (u'Aston_Martin_V8_Vantage', u'v8-vantage'),
              (u'Aston_Martin_Vanquish', u'vanquish')],
             u'audi': [200000001,
              (u'Audi_A3', u'a3'),
              (u'Audi_A3_Sportback_e_tron', u'a3-sportback-e-tron'),
              (u'Audi_A4', u'a4'),
              (u'Audi_A5', u'a5'),
              (u'Audi_A6', u'a6'),
              (u'Audi_A7', u'a7'),
              (u'Audi_A8', u'a8'),
              (u'Audi_Q3', u'q3'),
            

In [10]:
print counter # 363 total models in 2016

363


In [11]:
# printing the list of all makes to see our selection of choices for which images to pull
for make in all_makes.keys():
    print make

mini
bentley
tesla
ram
subaru
alfa-romeo
buick
lexus
audi
maserati
gmc
chevrolet
porsche
volkswagen
dodge
scion
cadillac
honda
hyundai
ford
mazda
lamborghini
infiniti
aston-martin
land-rover
mercedes-benz
kia
mitsubishi
rolls-royce
fiat
lincoln
acura
jaguar
jeep
nissan
toyota
volvo
smart
chrysler
bmw


In [31]:
# all Jeep models for 2016
all_makes['jeep']

[200001510,
 (u'Jeep_Cherokee', u'cherokee'),
 (u'Jeep_Compass', u'compass'),
 (u'Jeep_Grand_Cherokee', u'grand-cherokee'),
 (u'Jeep_Grand_Cherokee_SRT', u'grand-cherokee-srt'),
 (u'Jeep_Patriot', u'patriot'),
 (u'Jeep_Renegade', u'renegade'),
 (u'Jeep_Wrangler', u'wrangler')]

## Writing the json objects to files

In [88]:
# These are the 20 cars whose images we will try to classify.
cars = [('toyota', 'tacoma'),
       ('ford', 'explorer'),
       ('bmw', '3-series'),
       ('audi', 'a4'),
       ('subaru', 'forester'),
       ('ford', 'f-150'),
        ('ram', '1500'),
       ('alfa-romeo', '4c'),
       ('honda', 'civic'),
       ('honda', 'odyssey'),
       ('toyota', 'corolla'),
       ('chevrolet', 'tahoe'),
       ('porsche', 'cayman'),
       ('chevrolet', 'colorado'),
       ('nissan', 'frontier'),
       ('nissan', 'altima'),
       ('mazda', '3'),
       ('toyota', 'prius'),
       ('chevrolet', 'volt'),
       ('jeep', 'wrangler')]

In [113]:
assert 1 == 2 # This will stop me from accidentally running an API call (don't want to waste them!)

for make,model in cars:
    parameters = {'category': 'exterior',
                  'view' : 'full',
                  'fmt' : 'json',
                  'api_key': edmunds_api_key_photos}
    # url to call the photos of this car make and model for 2016
    url = 'https://api.edmunds.com/api/media/v2/{0}/{1}/{2}/photos?'.format(make, model, '2016')
    response = requests.get(url, params=parameters)
    try:
        assert response.status_code == 200 # make sure api call was successful
        json_content = json.loads(response.content)
        filename = 'FP_data/{0}_{1}'.format(make, model)
        with open(filename, 'w') as f: # saving the json of the images to a file named after the make and model
            json.dump(json_content, f)
    except AssertionError:
        # if the api call wasn't successful, it's because I used all the calls for the day
        print 'Daily call limit reached. Start at {0} {1} tomorrow.'.format(make, model)
    sleep(0.25)
    # ^ ensures that I don't exceed the call limit of 5 calls per second, since I only get 25 calls daily.

Now I have a folder (FP_data) of all the json files of the photos for the makes and models I selected from 2016.

Here are the 20 cars I will use for the image classification problem:

In [4]:
# looking at the files of jsons we collected that contain the images
for fname in os.listdir('FP_data'):
    if not fname.startswith('.'):
        print fname

alfa-romeo_4c
audi_a4
bmw_3-series
chevrolet_colorado
chevrolet_tahoe
chevrolet_volt
ford_explorer
ford_f-150
honda_civic
honda_odyssey
images
jeep_wrangler
mazda_3
nissan_altima
nissan_frontier
porsche_cayman
ram_1500
subaru_forester
toyota_corolla
toyota_prius
toyota_tacoma


Now I need to navigate to the images saved in the json objects.

In [11]:
assert 1 == 2 # ensures I don't run the following code over and over again...

## This navigates the jsons and pulls all the photos and saves each image to its car's folder in the directory 'images/'

photo_base_url = 'https://media.ed.edmunds-media.com'

# for each file in FP_data
for item in os.listdir('FP_data'):
    # if it's a file with the photos json
    if not (item.startswith('.') or item == 'images'):
        # we'll count the number of photos collected from the file
        counter = 0
        
        # each car will have a separate folder of images in the 'images/' directory
        im_dir = 'FP_data/images/' + item
        os.mkdir(im_dir)
        
        # load the data to a Python object we can navigate
        fname = 'FP_data/' + item
        with open(fname) as f:
            data = json.load(f)
        
        # navigate to the image link
        for photo in data['photos']:
            for source in photo['sources']:
                tail = source['link']['href']
                url = photo_base_url + tail
                # create what will be the filename for the image -- just what immediately precedes .jpg
                im_filename = re.search("[^\/]*.jpg", url).group()
                im_path = '{0}/{1}'.format(im_dir, im_filename)
                
                # save the image to a file
                try:
                    urllib.urlretrieve(url, im_path)
                    counter += 1
                except IOError:
                    # if it didn't work, we'll try one more time before moving on...
                    try:
                        urllib.urlretrieve(url, im_path)
                        counter += 1
                    except IOError:
                        continue
        
        # keep record of the number of photo files saved for each car
        print '%s: \t %d photos' % (item, counter)
        
    else:
        continue

alfa-romeo_4c: 	 180 photos
audi_a4: 	 180 photos
bmw_3-series: 	 180 photos
chevrolet_colorado: 	 180 photos
chevrolet_tahoe: 	 180 photos
chevrolet_volt: 	 108 photos
ford_explorer: 	 126 photos
ford_f-150: 	 180 photos
honda_civic: 	 180 photos
honda_odyssey: 	 180 photos
jeep_wrangler: 	 180 photos
mazda_3: 	 180 photos
nissan_altima: 	 180 photos
nissan_frontier: 	 180 photos
porsche_cayman: 	 180 photos
ram_1500: 	 180 photos
subaru_forester: 	 180 photos
toyota_corolla: 	 180 photos
toyota_prius: 	 180 photos
toyota_tacoma: 	 180 photos


## Converting each photo to a matrix with 32 x 32 pixels and 3 color channels, and accumulating those into a tensor

[PIL Image documentation](http://pillow.readthedocs.io/en/3.1.x/reference/Image.html)

In [67]:
def data_processing_flat(image_file):
    '''
    Converts the image to its pixel form, with 3 color channels, and flattens it.
    INPUT: image jpg
    OUTPUT: flat array of pixels
    '''
    
    # using Image from PIL package to open the jpg and resize to 32x32 pixels.
    im = Image.open(image_file, 'r').resize((32,32))
    pixels = list(im.getdata())
    # flattens the original 32x32x3 list of lists
    pixels_flat = [x for colors in pixels for x in colors]
    pixels_Im = np.reshape(pixels_flat, (1,32*32*3))
    # returns a flat array of pixels
    return pixels_Im

In [12]:
# the labels for the images (car names)
labels = [fname for fname in os.listdir('FP_data') if not fname.startswith('.') and fname != 'images']
print len(labels)

20


In [13]:
def one_hot_encode(label_idx):
    '''
    One-hot encodes the image labels, which are currently strings of car names.
    INPUT: index of the label in labels list
    OUTPUT: 1x20 list of all 0's except for the index corresponding to the car's label.
    '''
    # create list of all 0's
    on_hot = [0] * 20
    # enter 1 at index corresponding to the image label
    on_hot[label_idx] = 1
    return on_hot

In [69]:
# create empty array in which to put the images' pixels
X = np.empty((1,32*32*3))
# empty list in which to store the labels of the images we add to X
y_labels = []

# for each car folder of images
for item in os.listdir('FP_data/images/'):
    if not item.startswith('.'):
        im_dir = 'FP_data/images/' + item
        print item
        # for each pic in the car folder
        for pic in os.listdir(im_dir):
            # store the car label of the pic
            y_labels.append(item)
            pic_dir = '{0}/{1}'.format(im_dir,pic)
            # convert the image file to a flat array of pixels
            pixels = data_processing_flat(pic_dir)
            # append the pixels to X
            X = np.append(X, pixels, axis=0)
X = X[1:,:] # the first row of X is from the "empty" array we initially created

# make sure there's a label for every image
assert len(X) == len(y_labels)

alfa-romeo_4c
audi_a4
bmw_3-series
chevrolet_colorado
chevrolet_tahoe
chevrolet_volt
ford_explorer
ford_f-150
honda_civic
honda_odyssey
jeep_wrangler
mazda_3
nissan_altima
nissan_frontier
porsche_cayman
ram_1500
subaru_forester
toyota_corolla
toyota_prius
toyota_tacoma


In [70]:
X.shape # 3474 pictures each with 3072 pixel/color values

(3474, 3072)

In [71]:
# one-hot encode the labels we stored for X
y_one_hot = []
for label in y_labels:
    idx = labels.index(label)
    one_hot = one_hot_encode(idx)
    y_one_hot.append(one_hot)
y_one_hot = np.asarray(y_one_hot)

In [72]:
print y_one_hot[:2]
print
print y_one_hot[-2:]

[[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]]


In [73]:
# need to shuffle the images to ensure that the training data is representative of the whole dataset
shuffle_idx = np.random.permutation(len(X))
X_shuff, y_shuff = X[shuffle_idx], y_one_hot[shuffle_idx]

In [79]:
# training on 2474 rows, testing on 1000 rows (slightly less than 30%)
X_train, y_train, X_test, y_test = X_shuff[1000:], y_shuff[1000:], X_shuff[:1000], y_shuff[:1000]

# The Neural Net!

[Tensorflow Documentation](https://www.tensorflow.org/versions/r0.11/api_docs/python/index.html)

In [86]:
### Using a convolutional network with 3 convolution and pooling layers each and 2 fully connected layers.
### Activation functions: relu. Loss function: cross entropy. Optimizer: Adam
### Batch normalization on all layers, and dropout before the last fully connected layer.

# CODE ADAPTED FROM MIKE BOWLES, gU professor

import tensorflow as tf

def weight_variable(shape, name):
    initial = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(initial, name)

def bias_variable(shape, name):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial, name)

def conv2d(x, W):
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

tf.reset_default_graph() 
graph = tf.Graph() 
with graph.as_default():

    #inputs
    x = tf.placeholder(tf.float32, shape=[None, 32*32*3]) # [batch size, total pixels]
    y_ = tf.placeholder(tf.float32, shape=[None, 20]) # [batch size, labels]

    #reshape to image format for conv functions
    x_image = tf.reshape(x, [-1,32,32,3]) 
    # 1st number: number of pictures (minibatch size)
    # 2nd number: number of pixels tall
    # 3rd number: number of pixels wide
    # 4th number: number of channels--in the first convolutional layer, it means number of colors

    #weight and bias for 1st conv
    W_conv1 = weight_variable([5, 5, 3, 32], 'W_conv1')
    b_conv1 = bias_variable([32], 'b_conv1')

    #conv and max-pool - layers 1 and 2
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) ## output = -1,32,32,32
    h_pool1 = max_pool_2x2(h_conv1) ## output = -1,16,16,32
    h_bnorm1 = tf.contrib.layers.batch_norm(h_pool1)


    #weight and bias for 2nd convolution
    W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
    b_conv2 = bias_variable([64], 'b_conv2')

    #ops for layers 3 and 4
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) ## output = -1,16,16,64
    h_pool2 = max_pool_2x2(h_conv2) ## output = -1,8,8,64
    h_bnorm2 = tf.contrib.layers.batch_norm(h_pool2)
    
    #weight and bias for 3rd convolution
    W_conv3 = weight_variable([5, 5, 64, 128], 'W_conv3')
    b_conv3 = bias_variable([128], 'b_conv3')
    
    #ops for layers 5 and 6
    h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3) ## output = [-1,8,8,128]
    h_pool3 = max_pool_2x2(h_conv3) ## output = -1,4,4,128
    h_bnorm3 = tf.contrib.layers.batch_norm(h_pool3)

    #reshape for FC layers
    W_fc1 = weight_variable([4 * 4 * 128, 1024], 'W_fc1')
    b_fc1 = bias_variable([1024], 'b_fc1')

    #layer 7
    h_pool3_flat = tf.reshape(h_pool3, [-1, 4*4*128]) #flattening [-1,4,4,128] to [-1, 2048]
    h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat, W_fc1) + b_fc1)
    h_fc1_bnorm = tf.contrib.layers.batch_norm(h_fc1)

    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    # dropout in the later layers to reduce overfitting

    W_fc2 = weight_variable([1024, 20], 'W_fc2')
    b_fc2 = bias_variable([20], 'b_fc2')
    
    h_fc2 = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
    h_fc2_bnorm = tf.contrib.layers.batch_norm(h_fc2)

    y_conv=tf.nn.softmax(h_fc2_bnorm)
    # probability prediction time. For each of the 20 classes, it will give a probability.

    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
    # this is your loss function
    train_step = tf.train.AdamOptimizer(5e-5).minimize(cross_entropy)
    # computer understands to modify all the tf.Variables to lower this loss
    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    # nice printout for humans to read

    #histograms
    tVar = [W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2]
    tVarNames = ['W_conv1', 'b_conv1', 'W_conv2', 'b_conv2', 'W_fc1', 'b_fc1', 'W_fc2', 'b_fc2']
    #merged = tf.merge_summary([tf.histogram_summary(tv.name, tv) for (tvName, tv) in zip(tVarNames, tVar)])


#xTrain, xTest, yTrain, yTest = mnist()
with tf.Session(graph=graph) as sess:
    result = sess.run(tf.initialize_all_variables())
    writer = tf.train.SummaryWriter('logs/',graph=sess.graph)
    miniBatchSize = 40
    startEnd = zip(range(0, len(X_train), miniBatchSize), range(miniBatchSize, len(X_train) + 1, miniBatchSize))
    costList = []
    nPasses = 10
    iteration = 0
    for iPass in range(nPasses):
        for (s, e) in startEnd:
            #[cost, tbSummary] = sess.run([train_step, merged], feed_dict={x: xTrain[s:e,], y_: yTrain[s:e], keep_prob:1.0})
            cost = sess.run(train_step, feed_dict={x: X_train[s:e,], y_: y_train[s:e], keep_prob:0.5})
            #writer.add_summary(tbSummary, iteration)
            iteration += 1
            costList.append(cost)
        testResult = sess.run([accuracy], feed_dict={x: X_test, y_: y_test, keep_prob: 1.0})
        # need a lot of memory here
        print iPass, testResult

0 [0.30700001]
1 [0.57499999]
2 [0.75300002]
3 [0.87699997]
4 [0.92500001]
5 [0.94700003]
6 [0.97899997]
7 [0.99199998]
8 [0.99900001]
9 [1.0]


__The model was extremely successful.__ It reached full accuracy in 10 passes through the data. __This seems suspicious!__ 

Let's see how it fares when we include all photos from 2015 as well...

__Will the neural net perform just as well attempting to differentiate between the same car for 2015 and 2016?__

## Gathering images from 2015 to add to the model

In [91]:
for make,model in cars:
    parameters = {'category': 'exterior',
                  'view' : 'full',
                  'fmt' : 'json',
                  'api_key': edmunds_api_key_photos}
    url = 'https://api.edmunds.com/api/media/v2/{0}/{1}/{2}/photos?'.format(make, model, '2015')
    response = requests.get(url, params=parameters)
    try:
        assert response.status_code == 200
        json_content = json.loads(response.content)
        filename = 'FP_data/{0}_{1}_2015'.format(make, model)
        with open(filename, 'w') as f:
            json.dump(json_content, f)
    except AssertionError:
        print 'Daily call limit reached. Start at {0} {1} tomorrow.'.format(make, model)
    sleep(0.25)
    # ^ ensures that I don't exceed the call limit of 5 calls per second, since I only get 25 calls daily.

In [92]:
photo_base_url = 'https://media.ed.edmunds-media.com'
for item in os.listdir('FP_data'):
    if item.endswith('2015') and not (item.startswith('.') or item == 'images'):
        counter = 0
        im_dir = 'FP_data/images/' + item
        os.mkdir(im_dir)
        fname = 'FP_data/' + item
        with open(fname) as f:
            data = json.load(f)
        for photo in data['photos']:
            for source in photo['sources']:
                tail = source['link']['href']
                url = photo_base_url + tail
                im_filename = re.search("[^\/]*.jpg", url).group()
                im_path = '{0}/{1}'.format(im_dir, im_filename)
                try:
                    urllib.urlretrieve(url, im_path)
                    counter += 1
                except IOError:
                    try:
                        urllib.urlretrieve(url, im_path)
                        counter += 1
                    except IOError:
                        continue
        print '%s: \t %d photos' % (item, counter)
    else:
        continue

alfa-romeo_4c_2015: 	 180 photos
audi_a4_2015: 	 180 photos
bmw_3-series_2015: 	 180 photos
chevrolet_colorado_2015: 	 144 photos
chevrolet_tahoe_2015: 	 54 photos
chevrolet_volt_2015: 	 126 photos
ford_explorer_2015: 	 180 photos
ford_f-150_2015: 	 180 photos
honda_civic_2015: 	 180 photos
honda_odyssey_2015: 	 144 photos
jeep_wrangler_2015: 	 180 photos
mazda_3_2015: 	 180 photos
nissan_altima_2015: 	 180 photos
nissan_frontier_2015: 	 180 photos
porsche_cayman_2015: 	 180 photos
ram_1500_2015: 	 180 photos
subaru_forester_2015: 	 180 photos
toyota_corolla_2015: 	 180 photos
toyota_prius_2015: 	 180 photos
toyota_tacoma_2015: 	 180 photos


Adding the new images from 2015 to the existing X matrix with the car images from 2016:

In [94]:
labels_new = [fname for fname in os.listdir('FP_data') if fname.endswith('2015')]
print len(labels_new)

X_new = X
y_labels_new = y_labels
# adding the new images to X and the corresponding labels to y (now X_new and y_new)
for item in os.listdir('FP_data/images/'):
    if item.endswith('2015'):
        im_dir = 'FP_data/images/' + item
        print item
        for pic in os.listdir(im_dir):
            y_labels_new = np.append(y_labels_new,item)
            pic_dir = '{0}/{1}'.format(im_dir,pic)
            pixels = data_processing_flat(pic_dir)
            X_new = np.append(X_new, pixels, axis=0)
assert len(X_new) == len(y_labels_new)

20
alfa-romeo_4c_2015
audi_a4_2015
bmw_3-series_2015
chevrolet_colorado_2015
chevrolet_tahoe_2015
chevrolet_volt_2015
ford_explorer_2015
ford_f-150_2015
honda_civic_2015
honda_odyssey_2015
jeep_wrangler_2015
mazda_3_2015
nissan_altima_2015
nissan_frontier_2015
porsche_cayman_2015
ram_1500_2015
subaru_forester_2015
toyota_corolla_2015
toyota_prius_2015
toyota_tacoma_2015


In [95]:
# must define new one-hot encode function to one-hot encode for 40 labels
# I could've modified the original function to include an argument for num_labels, but this will do for now...
def one_hot_encode2(label_idx):
    on_hot = [0] * 40
    on_hot[label_idx] = 1
    return on_hot

In [96]:
# add the new labels corresponding to the image data added to X
labels2 = labels + labels_new
y_one_hot_new = []
# one-hot encode based on all 40 labels
for label in y_labels_new:
    idx = labels2.index(label)
    one_hot = one_hot_encode2(idx)
    y_one_hot_new.append(one_hot)
y_one_hot_new = np.asarray(y_one_hot_new)

In [102]:
print y_one_hot_new[:1]
print
print y_one_hot_new[-1:]

[[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 0]]

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  0 0 1]]


In [98]:
y_one_hot_new.shape # 6822 pics, 40 labels

(6822, 40)

In [99]:
X_new.shape # 6822 pics, 3072 pixel/color columns

(6822, 3072)

In [103]:
6822*.3 # seeing how much about 30% of the dataset is, for train/test split

2046.6

In [104]:
# need to shuffle the images to ensure that the training data is representative of the whole dataset
shuffle_idx_new = np.random.permutation(len(X_new))
X_shuff_new, y_shuff_new = X_new[shuffle_idx], y_one_hot_new[shuffle_idx]

# training on 4822 rows, testing on 2000 rows (slightly less than 30%)
X_train2, y_train2, X_test2, y_test2 = X_shuff_new[2000:], y_shuff_new[2000:], X_shuff_new[:2000], y_shuff_new[:2000]

## Neural Net to classify images of 2015 and 2016 cars

In [112]:
### Using a convolutional network with 3 convolution and pooling layers each and 2 fully connected layers.
### Activation functions: relu. Loss function: cross entropy. Optimizer: Adam
### Batch normalization on all layers, and dropout before the last fully connected layer.

# CODE ADAPTED FROM MIKE BOWLES, gU professor

import tensorflow as tf

tf.reset_default_graph() 
graph = tf.Graph() 
with graph.as_default():

    #inputs
    x = tf.placeholder(tf.float32, shape=[None, 32*32*3]) # [batch size, total pixels]
    y_ = tf.placeholder(tf.float32, shape=[None, 40]) # [batch size, labels]

    #reshape to image format for conv functions
    x_image = tf.reshape(x, [-1,32,32,3]) 
    # 1st number: number of pictures (minibatch size)
    # 2nd number: number of pixels tall
    # 3rd number: number of pixels wide
    # 4th number: number of channels--in the first convolutional layer, it means number of colors

    #weight and bias for 1st conv
    W_conv1 = weight_variable([5, 5, 3, 32], 'W_conv1')
    b_conv1 = bias_variable([32], 'b_conv1')

    #conv and max-pool - layers 1 and 2
    h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1) ## output = -1,32,32,32
    h_pool1 = max_pool_2x2(h_conv1) ## output = -1,16,16,32
    h_bnorm1 = tf.contrib.layers.batch_norm(h_pool1)


    #weight and bias for 2nd convolution
    W_conv2 = weight_variable([5, 5, 32, 64], 'W_conv2')
    b_conv2 = bias_variable([64], 'b_conv2')

    #ops for layers 3 and 4
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) ## output = -1,16,16,64
    h_pool2 = max_pool_2x2(h_conv2) ## output = -1,8,8,64
    h_bnorm2 = tf.contrib.layers.batch_norm(h_pool2)
    
    #weight and bias for 3rd convolution
    W_conv3 = weight_variable([5, 5, 64, 128], 'W_conv3')
    b_conv3 = bias_variable([128], 'b_conv3')
    
    #ops for layers 5 and 6
    h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3) ## output = [-1,8,8,128]
    h_pool3 = max_pool_2x2(h_conv3) ## output = -1,4,4,128
    h_bnorm3 = tf.contrib.layers.batch_norm(h_pool3)

    #reshape for FC layers
    W_fc1 = weight_variable([4 * 4 * 128, 1024], 'W_fc1')
    b_fc1 = bias_variable([1024], 'b_fc1')

    #layer 7
    h_pool3_flat = tf.reshape(h_pool3, [-1, 4*4*128]) #flattening [-1,4,4,128] to [-1, 2048]
    h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat, W_fc1) + b_fc1)
    h_fc1_bnorm = tf.contrib.layers.batch_norm(h_fc1)

    keep_prob = tf.placeholder(tf.float32)
    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
    # dropout in the later layers to reduce overfitting

    W_fc2 = weight_variable([1024, 40], 'W_fc2')
    b_fc2 = bias_variable([40], 'b_fc2')
    
    h_fc2 = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
    h_fc2_bnorm = tf.contrib.layers.batch_norm(h_fc2)

    y_conv=tf.nn.softmax(h_fc2_bnorm)
    # probability prediction time. For each of the 40 classes, it will give a probability.

    cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
    # this is your loss function
    train_step = tf.train.AdamOptimizer(epsilon=5e-5, learning_rate=0.0001).minimize(cross_entropy)
    # computer understands to modify all the tf.Variables to lower this loss
    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    # nice printout for humans to read

    #histograms
    tVar = [W_conv1, b_conv1, W_conv2, b_conv2, W_fc1, b_fc1, W_fc2, b_fc2]
    tVarNames = ['W_conv1', 'b_conv1', 'W_conv2', 'b_conv2', 'W_fc1', 'b_fc1', 'W_fc2', 'b_fc2']
    #merged = tf.merge_summary([tf.histogram_summary(tv.name, tv) for (tvName, tv) in zip(tVarNames, tVar)])


#xTrain, xTest, yTrain, yTest = mnist()
with tf.Session(graph=graph) as sess:
    result = sess.run(tf.initialize_all_variables())
    #writer = tf.train.SummaryWriter('logs/',graph=sess.graph)
    miniBatchSize = 40
    startEnd = zip(range(0, len(X_train2), miniBatchSize), range(miniBatchSize, len(X_train2) + 1, miniBatchSize))
    costList = []
    nPasses = 20
    iteration = 0
    for iPass in range(nPasses):
        for (s, e) in startEnd:
            #[cost, tbSummary] = sess.run([train_step, merged], feed_dict={x: X_train2[s:e,], y_: y_train2[s:e], keep_prob:1.0})
            cost = sess.run(train_step, feed_dict={x: X_train2[s:e,], y_: y_train2[s:e], keep_prob:0.5})
            #writer.add_summary(tbSummary, iteration)
            iteration += 1
            costList.append(cost)
        testResult = sess.run([accuracy], feed_dict={x: X_test2, y_: y_test2, keep_prob: 1.0})
        # need a lot of memory here
        print iPass, testResult

0 [0.58999997]
1 [0.8355]
2 [0.94499999]
3 [0.98250002]
4 [0.99150002]
5 [0.99199998]
6 [0.99199998]
7 [0.99199998]
8 [0.99199998]
9 [0.99199998]
10 [0.99199998]
11 [0.99299997]
12 [0.995]
13 [0.99650002]
14 [0.99550003]
15 [0.99599999]
16 [0.99949998]
17 [0.99900001]
18 [0.99949998]
19 [0.99949998]


^ I only let it run through 20 passes of the data, but it reached an accuracy of over 0.999. For 40 cars, some of which are extremely similar since there are the 2015 and 2016 models included, this seems fishy...

After looking more closely at the images, I noticed that for each photo, there are multiple resolutions of the same image returned from the API call. For example, there can be duplicate photos, one with width 1600 and the other with width 98. Therefore, when passing a given photo to the test set, the neural net likely trained on at least one other version of the same photo, albeit in a different original resolution, making it relatively easy to come to the correct label for that photo. 

The solution: image augmentation. I'll manipulate some versions of each photo in order to add some noise into the dataset, to attempt to make the image classification task more difficult for the neural net. __See the notebook on image augmentation.__