# Machine Learning with Python

These lessons have been written with a certain type of learner in mind. They assume you are comfortable with writing and using Python code. They also assume a general understanding of statistical concepts. Nothing too esoteric but you should have a working knowledge of regression and significance. ![opencvlogo](images/OpenCV_logo.png)

## Application: Classifying Honeybee Tags

Background on project.

![](images/setup.jpg)

Control Tags             |  Treatment Tags
:-------------------------:|:-------------------------:
![control tags](images/tag1.jpg)  |  ![Treatment Tags](images/tag2.jpg)

![](images/queen.jpg)
![](images/beehive.png)
![](images/harddrives.jpg)

## What is OpenCV?

OpenCV (Open Source Computer Vision) is an open-source BSD-licensed library that includes hundreds of computer vision algorithms. It is:

* Cross-platform: Windows, Mac, Linux (even Raspberry Pi)
* Languages: C++, C, Python & Java
* Wrappers: MATLAB/OCTAVE, C#, Ruby, etc
* Fast - utilizing threading, CUDA & OpenCL
* Python interface uses NumPy matrices for images
* Able to be integrated into iPhone and Android apps
* Includes machine learning algorithms (although we'll be using scikit-learn)

It can be downloaded for free from their [website](http://opencv.org) or installed via a package manager like homebrew on Macs.

## What is an image?

Images, pixels, grayscale, colour, show pic of image matrix.

## What is machine learning?

Quick intro: statistics, training, un/supervised

In [1]:
# libraries we're using
import cv2
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

## Reading Images

We're going to be reading in these images as grayscale...

In [None]:
image = cv2.imread('images/hela.jpg', 1) # 0: grayscale, 1: color, -1: unchanged
plt.figure(figsize = (10, 7))
plt.title('HeLa Cells')
plt.axis('off')
plt.imshow(image)

## Image Properties

Can check size, shape, dimensions, increase contrast, brightness

In [None]:
print (image.shape)
print (image.dtype)
print (image[55, 10]) # access pixel values by row and column
print (image[55, 10, 1]) # access green pixel
# image[55, 10] = [11, 55, 22] change pixel value
image_region = image[55:100,200:500]
bgr = cv2.split(image)
merged = cv2.merge((bgr))

## Feature Engineering

Feature are important, looking at brightness, smoothing, etc

## Contrast/Brightness

In [None]:
increase_contrast = image * 3
decrease_contrast = image * 0.5
increase_brightness = image + 30
decrease_brightness = image - 30

plt.figure(figsize = (10, 7))
plt.title('HeLa Cells')
plt.axis('off')
plt.imshow(decrease_contrast)

## Smoothing
More commonly know as applying a blur. Used to eliminate noise.

In [None]:
mean_smoothed = cv2.blur(gray_image, (15, 15))
median_smoothed = cv2.medianBlur(gray_image, 15)
gaussian_smoothed = cv2.GaussianBlur(gray_image, (15, 15), 0)

mean_compare = np.hstack((gray_image, mean_smoothed))
median_compare = np.hstack((gray_image, median_smoothed))
gaussian_compare = np.hstack((gray_image, gaussian_smoothed))

plt.figure(figsize = (15, 12))
plt.title('Mean')
plt.axis('off')
plt.imshow(mean_compare, cmap = cm.Greys_r) 

plt.figure(figsize = (15, 12))
plt.title('Median')
plt.axis('off')
plt.imshow(median_compare, cmap = cm.Greys_r)

plt.figure(figsize = (15, 12))
plt.title('Gaussian')
plt.axis('off')
plt.imshow(gaussian_compare, cmap = cm.Greys_r)

## Visualising Data

Important to visualise, etc

## Clustering - PCA, K means, supervised clustering - LDA

## Splitting Training/Test Data

## SVM Classification

Include pic of SVM theory.

## Code from notebook talk:

In [None]:
import cv2
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np
import glob
%matplotlib inline

def read_gray_img(img_loc):
    image = cv2.imread(img_loc, 0)
    return image

def flatten_image(roi):
    flat_roi = roi.flatten()
    return flat_roi

In [None]:
tags = []
classify = []

tags.extend(glob.glob('I/*.png'))
classify.extend(len(glob.glob('I/*.png')) * [1])
tags.extend(glob.glob('O/*.png'))
classify.extend(len(glob.glob('O/*.png')) * [2])
tags.extend(glob.glob('Q/*.png'))
classify.extend(len(glob.glob('Q/*.png')) * [3])

plt.figure()
plt.imshow(read_tags[0], cmap = cm.Greys_r)
plt.figure()
plt.imshow(read_tags[108], cmap = cm.Greys_r)
plt.figure()
plt.imshow(read_tags[211], cmap = cm.Greys_r)

In [None]:
tag_flat = list(map(flatten_image, read_tags))
X = np.array(tag_flat)
print(X.shape)
y = np.array(classify)
print(y.shape)

In [None]:
from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=4)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

In [None]:
from sklearn.decomposition import PCA
from sklearn.lda import LDA

pca = PCA(n_components=2)
X_r = pca.fit(X).transform(X)
plt.figure(figsize = (35, 20))
plt.scatter(X_r[:, 0], X_r[:, 1], c=y, s=200)

lda = LDA(n_components=2)
lda = lda.fit(X_train, y_train)
X_lda = lda.transform(X_train)
Z = lda.transform(X_test)
plt.figure(figsize = (35, 20))
plt.scatter(X_lda[:, 0], X_lda[:, 1], c=y_train, s=200)

In [None]:
from sklearn import svm
from sklearn import metrics

clf = svm.SVC(gamma=0.001, C=10)
clf.fit(X_lda, y_train)
y_pred = clf.predict(Z)

print metrics.accuracy_score(y_test, y_pred)