# Landscape or Portrait? (Aldo 3rd Quarter Project)

Continuing on my work with paintings during the 2nd Quarter, I decided to use a neural network to distinguish between portrait and landscape images.

I trained it by using RandomForestClassifier on 11000 jpegs (resized to 100px * 100px) of paintings from over 300 different artists across multiple centuries.
- the images in this dataset were borrowed from a Kaggle competetion entitled "Painter by Numbers"
- https://www.kaggle.com/c/painter-by-numbers
    - many were sourced from https://www.wikiart.org/ and https://www.wikipedia.org/ among other websites

I decided to load them in as grayscale images, as it greatly reduces file size and training time. Although, in a further iteration, it may be interesting to train with RGB images and larger pixel dimensions to see if it will increase the accuracy.

Figuring out how to clean, process, upload, and properly format such a large dataset took quite a long time. Luckily, YouTuber 'sentdex' had a very helpful video entitled "Loading in your own data - Deep Learning basics with Python, TensorFlow and Keras p.2", which is cited at points where I adapted parts of his code.
- https://www.youtube.com/watch?v=j-3vuBynnOE

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os
import cv2
from PIL import Image
!pip install opencv-python



In [3]:
mainpath = "/Users/aldoschwartz/Desktop/train/"
dirs = os.listdir( mainpath )
CATEGORIES = ['Landscape', 'Portrait']

In [4]:
# csv with names of all Landscape and Portrait files
df = pd.read_csv("landscape_portrait_titles.csv")

In [5]:
# column names in dataframe
df.columns

Index(['Landscapes', 'Portraits'], dtype='object')

In [6]:
# turn dataframe into lists of filenames for either 'Landscape' or 'Portrait'
landscape_filenames = df["Landscapes"].tolist()
portrait_filenames = df["Portraits"].tolist()
correct_filenames = landscape_filenames + portrait_filenames

In [37]:
# delete all paintings that aren't landscapes or portraits
def cleanup():
    for item in dirs:
        if item not in correct_filenames:
            os.remove(mainpath+item)

In [38]:
cleanup()

FileNotFoundError: [Errno 2] No such file or directory: '/Users/aldoschwartz/Desktop/train/52112.jpg'

In [7]:
import shutil

# move all landscapes to one folder, move all portraits to another folder
def sortcategory():
    for item in dirs:
        if item in landscape_filenames:
            shutil.move(mainpath+item, '/Users/aldoschwartz/Desktop/train/Landscape')
        if item in portrait_filenames:
            shutil.move(mainpath+item, '/Users/aldoschwartz/Desktop/train/Portrait')

In [42]:
sortcategory()

In [8]:
IMG_SIZE = 100    # new pixel width and height of resized image (square)

In [10]:
# code partially adapted from Youtuber 'sentdex'
# "Loading in your own data - Deep Learning basics with Python, TensorFlow and Keras p.2" 
# https://www.youtube.com/watch?v=j-3vuBynnOE

training_data = []

def create_training_data():
    for category in CATEGORIES:
        path = os.path.join(mainpath, category)    # path to 'Landscape' or 'Portrait' dir
        class_num = CATEGORIES.index(category)     # assigning 0 or 1 to 'Portrait' or 'Landscape' images respectively
        for img in os.listdir(path):
            img_array = cv2.imread(os.path.join(path,img), cv2.IMREAD_GRAYSCALE)  # grayscale to reduce file size
            new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE)) # make the image into a 200 x 200 square
            training_data.append([new_array, class_num])

create_training_data()

error: OpenCV(4.0.0) /Users/travis/build/skvark/opencv-python/opencv/modules/imgproc/src/resize.cpp:3784: error: (-215:Assertion failed) !ssize.empty() in function 'resize'


In [9]:
print(len(training_data))

NameError: name 'training_data' is not defined

In [22]:
# shuffle training data 
import random
random.shuffle(training_data)

In [23]:
# look at sample data to make sure it is properly shuffled
for sample in training_data[:10]:
    print(sample[1])

0
1
0
0
0
0
0
0
0
0


In [24]:
X = []   # feature set
y = []   # label

In [25]:
for features, label in training_data:
    X.append(features)
    y.append(label)

X = np.array(X).reshape(-1, IMG_SIZE, IMG_SIZE, 1)

In [26]:
images = X
labels = np.asarray(y)   # y was a list, needed to turn into array

In [27]:
# examine shape of images
print(images.shape)
# examine shape of labels
print(labels.shape)

(11748, 100, 100, 1)
(11748,)


In [28]:
images = images.reshape((images.shape[0], -1))
images.shape

(11748, 10000)

In [29]:
from sklearn.ensemble import RandomForestClassifier

In [30]:
classifier = RandomForestClassifier()

In [31]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.20, random_state=42)

In [32]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(9398, 10000)
(2350, 10000)
(9398,)
(2350,)


In [33]:
classifier.fit(X_train, y_train)

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [49]:
score = classifier.score(X_test,y_test)
score

0.8038297872340425

In [1]:
# This shows an example image with correct label, and then the neural network's prediction
# it prints 'Correct!' if the prediction was correct

image_number = 45

plt.gray() 
test_img = X_test[image_number].reshape(100,100)
imgplot = plt.imshow(test_img)
if y_test[image_number] == 0:
    print("label: Landscape")

if y_test[image_number] == 1:
    print("label: Portrait")
plt.show()

t = X_test[image_number].reshape(1,-1)
pred = classifier.predict(t)
if pred == 0:
    print("prediction: Landscape")
if pred == 1:
    print("prediction: Portrait")

if pred == y_test[image_number]:
    print('Correct!')

if pred != y_test[image_number]:
    print('Incorrect!')

NameError: name 'plt' is not defined