# Machine Learning on Images

In this notebook we are going to see how we can load an image dataset, and apply Machine Learning algorithms on the Images' raw pixel values. This is always a good place to start with Image Classification.


For this we are going to use a Breast Ultrasound Image dataset, that contains some benign samples and some malignant samples. 

Dataset source:  Rodrigues, Paulo Sergio (2017), “Breast Ultrasound Image”, Mendeley Data, v1 http://dx.doi.org/10.17632/wmy84gzngw.1 

First, we connect the Google drive, where our data is located.

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


Next, we check if we can access the image files or not...

In [None]:
import os
benignDirs = os.listdir("/content/drive/My Drive/GUIST_Webinar_Files/Jupyter_Notebook_Codes/Data/BreastUS_Images/0")
malignantDirs = os.listdir("/content/drive/My Drive/GUIST_Webinar_Files/Jupyter_Notebook_Codes/Data/BreastUS_Images/1")
print(benignDirs)
print(malignantDirs)

['us38.bmp', 'us36.bmp', 'us39.bmp', 'us37.bmp', 'us40.bmp', 'us20.bmp', 'us16.bmp', 'us18.bmp', 'us17.bmp', 'us19.bmp', 'us27.bmp', 'us29.bmp', 'us30.bmp', 'us28.bmp', 'us26.bmp', 'us42.bmp', 'us41.bmp', 'us43.bmp', 'us44.bmp', 'us45.bmp', 'us62.bmp', 'us61.bmp', 'us64.bmp', 'us63.bmp', 'us65.bmp', 'us75.bmp', 'us74.bmp', 'us73.bmp', 'us72.bmp', 'us71.bmp', 'us98.bmp', 'us99.bmp', 'us96.bmp', 'us97.bmp', 'us100.bmp', 'us4.bmp', 'us1.bmp', 'us3.bmp', 'us5.bmp', 'us2.bmp', 'us9.bmp', 'us10.bmp', 'us8.bmp', 'us6.bmp', 'us7.bmp', 'us15.bmp', 'us13.bmp', 'us11.bmp', 'us12.bmp', 'us14.bmp', 'us21.bmp', 'us25.bmp', 'us22.bmp', 'us23.bmp', 'us24.bmp', 'us32.bmp', 'us31.bmp', 'us35.bmp', 'us33.bmp', 'us34.bmp', 'us48.bmp', 'us50.bmp', 'us47.bmp', 'us49.bmp', 'us46.bmp', 'us55.bmp', 'us51.bmp', 'us52.bmp', 'us53.bmp', 'us54.bmp', 'us60.bmp', 'us59.bmp', 'us56.bmp', 'us58.bmp', 'us57.bmp', 'us69.bmp', 'us66.bmp', 'us68.bmp', 'us70.bmp', 'us67.bmp', 'us78.bmp', 'us79.bmp', 'us77.bmp', 'us76.bmp',

In [None]:
# Check the number of positive and negative samples
print('Number of negative samples: ',len(benignDirs))
print('Number of positive samples: ',len(malignantDirs))

100
150


Now, we read the images and store inside lists...

In [None]:
import imageio

# Declare the lists where the images and labels are to be stored.
allImages = list()
allLabels = list()

# Get the benign images and create labels
for i in range(len(benignDirs)):
  allImages.append(imageio.imread('/content/drive/My Drive/BreastUS_Images/0/'+benignDirs[i]))
  allLabels.append(0)

# Get the malignant images and create labels
for i in range(len(malignantDirs)):
  allImages.append(imageio.imread('/content/drive/My Drive/BreastUS_Images/1/'+malignantDirs[i]))
  allLabels.append(1)

print(len(allImages))
print(len(allLabels))
print(allLabels)

250
250
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]


Now, let's check how the images are, in terms of shape..

In [None]:
for i in range(len(allImages)):
  print(allImages[i].shape)

(65, 105)
(65, 105)
(65, 105)
(65, 105)
(65, 105)
(87, 153)
(87, 153)
(87, 153)
(87, 153)
(87, 153)
(75, 95)
(75, 95)
(75, 95)
(75, 95)
(75, 95)
(79, 127)
(79, 127)
(79, 127)
(79, 127)
(79, 127)
(105, 157)
(105, 157)
(105, 157)
(105, 157)
(105, 157)
(73, 107)
(73, 107)
(73, 107)
(73, 107)
(73, 107)
(69, 95)
(69, 95)
(69, 95)
(69, 95)
(69, 95)
(75, 95)
(75, 95)
(75, 95)
(75, 95)
(75, 95)
(69, 123)
(69, 123)
(69, 123)
(69, 123)
(69, 123)
(79, 105)
(79, 105)
(79, 105)
(79, 105)
(79, 105)
(57, 93)
(57, 93)
(57, 93)
(57, 93)
(57, 93)
(73, 107)
(73, 107)
(73, 107)
(73, 107)
(73, 107)
(79, 93)
(79, 93)
(79, 93)
(79, 93)
(79, 93)
(75, 75)
(75, 75)
(75, 75)
(75, 75)
(75, 75)
(71, 99)
(71, 99)
(71, 99)
(71, 99)
(71, 99)
(75, 143)
(75, 143)
(75, 143)
(75, 143)
(75, 143)
(93, 121)
(93, 121)
(93, 121)
(93, 121)
(93, 121)
(77, 105)
(77, 105)
(77, 105)
(77, 105)
(77, 105)
(95, 97)
(95, 97)
(95, 97)
(95, 97)
(95, 97)
(81, 157)
(81, 157)
(81, 157)
(81, 157)
(81, 157)
(99, 177)
(161, 199)
(99, 177)
(91,

Now, we bring all the images to the same shape, and then flatten them...

In [None]:
# Import necessary packages
import cv2
import numpy as np

for i in range(len(allImages)):
  # Resize all the images into 64x64
  allImages[i] = cv2.resize(allImages[i], (64,64))
  # Now flatten the image, to get one long list of numbers
  allImages[i] = np.ravel(allImages[i])

# Check the final dimensionality
for i in range(len(allImages)):
  print(allImages[i].shape)

(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)
(4096,)


In [None]:
print(list(allImages[0]))

[122, 91, 67, 50, 46, 42, 37, 54, 65, 66, 82, 81, 102, 119, 119, 96, 73, 64, 47, 32, 32, 33, 38, 46, 65, 65, 71, 64, 71, 72, 66, 75, 78, 69, 55, 52, 63, 66, 103, 108, 82, 93, 90, 102, 106, 108, 113, 95, 81, 100, 122, 117, 121, 123, 107, 103, 110, 89, 94, 86, 98, 97, 95, 27, 103, 77, 53, 44, 47, 41, 43, 60, 72, 68, 82, 89, 106, 107, 105, 86, 65, 58, 44, 28, 34, 39, 48, 63, 78, 72, 68, 63, 70, 73, 75, 75, 71, 63, 59, 60, 70, 90, 117, 115, 96, 95, 98, 107, 106, 111, 104, 90, 71, 95, 105, 116, 116, 113, 107, 108, 103, 108, 107, 80, 83, 89, 87, 25, 78, 71, 51, 47, 48, 50, 59, 63, 72, 73, 85, 91, 102, 94, 82, 73, 50, 47, 32, 26, 41, 49, 53, 64, 70, 67, 57, 52, 66, 74, 76, 65, 74, 74, 80, 69, 79, 95, 112, 111, 116, 102, 104, 103, 97, 94, 87, 89, 67, 83, 84, 97, 94, 98, 110, 105, 83, 95, 98, 79, 73, 78, 82, 26, 68, 74, 58, 52, 42, 59, 74, 76, 71, 84, 94, 88, 92, 84, 67, 63, 43, 42, 29, 32, 50, 59, 57, 65, 60, 60, 54, 44, 63, 77, 78, 66, 82, 86, 98, 84, 84, 88, 105, 107, 124, 110, 98, 91, 92, 8

Next, we split the data into training and testing sets, for performing ML classification....

In [None]:
from sklearn.model_selection import train_test_split

# 80-20 Split, in a stratified manner
xTrain, xTest, yTrain, yTest = train_test_split(allImages, allLabels, test_size=0.2, shuffle = True, stratify = allLabels)

Now, we simply fit our classification models, and check their testing performances...

In [None]:
# Import the necessary packages
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
import pandas as pd

# Create a list of models
models = []
models.append(('KNN', KNeighborsClassifier()))
models.append(('SVC', SVC(max_iter = 100000)))
models.append(('LR', LogisticRegression(max_iter = 100000)))
models.append(('DT', DecisionTreeClassifier()))
models.append(('GNB', GaussianNB()))
models.append(('RF', RandomForestClassifier()))
models.append(('GB', GradientBoostingClassifier()))

# Classify and evaluate
names = []
scores = []
for name, model in models:
    model.fit(xTrain, yTrain)
    y_pred = model.predict(xTest)
    scores.append(accuracy_score(yTest, y_pred))
    names.append(name)

# Print the results in a pretty manner, using Pandas
tr_split = pd.DataFrame({'Name': names, 'Score': scores})
print(tr_split)

  Name  Score
0  KNN   0.94
1  SVC   1.00
2   LR   1.00
3   DT   0.98
4  GNB   0.88
5   RF   1.00
6   GB   0.98


In [None]:
# Lets see the prediction and the actual class labels for the 0th sample on the test dataset
print(model.predict([xTest[0]])[0])
print(yTest[0])

0
0
