# Week 9 - Beyond Text

This week, we "trascend" text to explore analysis of sound and visual content. Trillions of digital audio, image, and video files have been generated by cell phones and distributed sensors, preserved and shared through social medial, the web, private and government administrations. In this notebook, we read in and visualize audio and image files, process them to extract relevant features and measurement, then begin to explore how to analyze and extract information from them through the same approaches to supervised and unsupervised learning we have performed thoughout the quarter with text.

For this notebook we will use the following packages:

In [None]:
#All these packages need to be installed from pip
import scipy #For frequency analysis
import scipy.fftpack
import nltk #the Natural Language Toolkit
import requests #For downloading our datasets
import numpy as np #for arrays
import pandas #gives us DataFrames
import matplotlib.pyplot as plt #For graphics
import seaborn #Makes the graphics look nicer
import IPython #To show stuff

#Image handling install as Pillow
import PIL
import PIL.Image
import PIL.ImageOps

#install as scikit-image, this does the image manupulation
import skimage
from skimage.filters import threshold_otsu
from skimage.segmentation import clear_border
from skimage.measure import label, regionprops
from skimage.morphology import closing, square
from skimage.color import label2rgb
from skimage import data
from skimage.feature import blob_dog, blob_log, blob_doh
from skimage.future import graph
from skimage import data, segmentation, color, filters, io
from skimage.util.colormap import viridis
from skimage.color import rgb2gray

#these three do audio handling
import pydub #Requires ffmpeg to be installed
import speech_recognition #install as speechrecognition
import soundfile #Install as pysoundfile 

#This 'magic' command makes the plots work better
#in the notebook, don't use it outside of a notebook.
#Also you can ignore the warning it may generate.
%matplotlib inline

import os
import os.path
import csv
import re
from math import sqrt

# Audio analysis 

First we will consider media that predates written language...sound and spoken language. Audio (and video) files come in two major categories, lossy or lossless. Lossless files save all information the microphone recorded. Lossy files, by contrast, drop sections humans are unlikely to notice. Recorded frequencies for both types are then typically compressed, which introduces further loss. To work with audio files, we want a format that is preferably lossless or minimally compressed. We will work with `wav` files here. Note that `mp3` is not acceptable. If you do not have `wav` files, we can use python to convert to `wav`.

## <span style="color:red">*Your Turn*</span>

<span style="color:red">Construct cells immediately below this that read in 10 audio files (e.g., produced on your smartphone recorder?) from at least two different speakers, which include sentences of different types (e.g., question, statement, exclamation). At least two of these should include recordings of the two speakers talking to each other (e.g., a simple question/answer). Contrast the frequency distributions of the words spoken within speaker. What speaker's voice has a higher and which has lower frequency? What words are spoken at the highest and lowest frequencies? What parts-of-speech tend to be high or low? How do different types of sentences vary in their frequency differently? When people are speaking to each other, how do their frequencies change? Whose changes more?

In [None]:
samplePath = 'data/audio_samples/SBC060.mp3'
transcriptPath = 'data/audio_samples/SBC060.trn'
wavPath = '{}.wav'.format('.'.join(samplePath.split('.')[:-1]))
IPython.display.Audio(samplePath)

In [None]:
Sanders1Path = 'mydata/Sanders1.m4a'
Sanders2Path = 'mydata/Sanders2.m4a'
Sanders3Path = 'mydata/Sanders3.m4a'
Sanders4Path = 'mydata/Sanders4.m4a'
Trump1Path = 'mydata/Trump1.m4a'
Trump2Path = 'mydata/Trump2.m4a'
Trump3Path = 'mydata/Trump3.m4a'
Trump4Path = 'mydata/Trump4.m4a'
RapPath = 'mydata/Rap.m4a'
Hey_EugenePath = 'mydata/Hey_Eugene.m4a'

Sanders1trnPath = 'mydata/Sanders1.trn'
Sanders2trnPath = 'mydata/Sanders2.trn'
Sanders3trnPath = 'mydata/Sanders3.trn'
Sanders4trnPath = 'mydata/Sanders4.trn'
Trump1trnPath = 'mydata/Trump1.trn'
Trump2trnPath = 'mydata/Trump2.trn'
Trump3trnPath = 'mydata/Trump3.trn'
Trump4trnPath = 'mydata/Trump4.trn'
RaptrnPath = 'mydata/Rap.trn'
Hey_EngenetrnPath = 'mydata/Hey_Eugene.trn'

Sanders1wavPath = '{}.wav'.format('.'.join(Sanders1Path.split('.')[:-1]))
Sanders2wavPath = '{}.wav'.format('.'.join(Sanders2Path.split('.')[:-1]))
Sanders3wavPath = '{}.wav'.format('.'.join(Sanders3Path.split('.')[:-1]))
Sanders4wavPath = '{}.wav'.format('.'.join(Sanders4Path.split('.')[:-1]))
Trump1wavPath = '{}.wav'.format('.'.join(Trump1Path.split('.')[:-1]))
Trump2wavPath = '{}.wav'.format('.'.join(Trump2Path.split('.')[:-1]))
Trump3wavPath = '{}.wav'.format('.'.join(Trump3Path.split('.')[:-1]))
Trump4wavPath = '{}.wav'.format('.'.join(Trump4Path.split('.')[:-1]))
RapwavPath = '{}.wav'.format('.'.join(RapPath.split('.')[:-1]))
Hey_EugenewavPath = '{}.wav'.format('.'.join(Hey_EugenePath.split('.')[:-1]))

In [None]:
# testing
IPython.display.Audio(Trump1Path)

In [None]:
# We are using a different package to convert than the in the rest of the code
def convertToWAV(sourceFile, outputFile, overwrite = False):
    if os.path.isfile(outputFile) and not overwrite:
        print("{} exists already".format(outputFile))
        return
    #Naive format extraction
    sourceFormat = sourceFile.split('.')[-1]
    sound = pydub.AudioSegment.from_file(sourceFile, format=sourceFormat)
    sound.export(outputFile, format="wav")
    print("{} created".format(outputFile))

In [None]:
# convert audio files from m4a to wav
convertToWAV(Sanders1Path, Sanders1wavPath)
convertToWAV(Sanders2Path, Sanders2wavPath)
convertToWAV(Sanders3Path, Sanders3wavPath)
convertToWAV(Sanders4Path, Sanders4wavPath)
convertToWAV(Trump1Path, Trump1wavPath)
convertToWAV(Trump2Path, Trump2wavPath)
convertToWAV(Trump3Path, Trump3wavPath)
convertToWAV(Trump4Path, Trump4wavPath)
convertToWAV(RapPath, RapwavPath)
convertToWAV(Hey_EugenePath, Hey_EugenewavPath)

Now that we have created our `wav` file, notice that it is much large than the source `mp3`. We can load it with `soundfile` and work with it as a numpy data array.

In [None]:
San1soundArr, San1soundRate = soundfile.read(Sanders1wavPath)
San2soundArr, San2soundRate = soundfile.read(Sanders2wavPath)
San3soundArr, San3soundRate = soundfile.read(Sanders3wavPath)
San4soundArr, San4soundRate = soundfile.read(Sanders4wavPath)
Tr1soundArr, Tr1soundRate = soundfile.read(Trump1wavPath)
Tr2soundArr, Tr2soundRate = soundfile.read(Trump2wavPath)
Tr3soundArr, Tr3soundRate = soundfile.read(Trump3wavPath)
Tr4soundArr, Tr4soundRate = soundfile.read(Trump4wavPath)
RapsoundArr, RapsoundRate = soundfile.read(RapwavPath)
HEsoundArr, HEsoundRate = soundfile.read(Hey_EugenewavPath)

In [None]:
Tr1soundArr.shape

This is the raw data as a column array, which contains two channels (Left and Right) of the recording device. Some files, of course, will have more columns. The array comprises a series of numbers that measure the location of the speaker membrane (0=resting location). By quickly and rhythmically changing the location a note can be achieved. The larger the variation from the center, the louder the sound; the faster the oscillations, the higher the pitch. (The center of the oscillations does not have to be 0).

In [None]:
Tr1soundRate

The other piece of information we get is the sample rate. This tells us how many measurements made per second, which allows us to know how long the entire recording is:

In [None]:
San1numS = San1soundArr.shape[0] // San1soundRate
San2numS = San2soundArr.shape[0] // San2soundRate
San3numS = San3soundArr.shape[0] // San3soundRate
San4numS = San4soundArr.shape[0] // San4soundRate
Tr1numS = Tr1soundArr.shape[0] // Tr1soundRate
Tr2numS = Tr2soundArr.shape[0] // Tr2soundRate
Tr3numS = Tr3soundArr.shape[0] // Tr3soundRate
Tr4numS = Tr4soundArr.shape[0] // Tr4soundRate
RapnumS = RapsoundArr.shape[0] // RapsoundRate
HEnumS = HEsoundArr.shape[0] // HEsoundRate

print("The Sanders1 is {} seconds long".format(San1numS))
print("Or {:.2f} minutes".format(San1numS / 60.))

Let's look at the first five seconds of the recordings:

In [None]:
plt.plot(San1soundArr[:San1soundRate*5]) # Sanders1

In [None]:
plt.plot(San2soundArr[:San2soundRate*5]) # Sanders2

In [None]:
plt.plot(San3soundArr[:San3soundRate*5]) # Sanders3

In [None]:
plt.plot(San4soundArr[:San4soundRate*5]) # Sanders4

In [None]:
plt.plot(Tr1soundArr[:Tr1soundRate*5]) # Trump1

In [None]:
plt.plot(Tr2soundArr[:Tr2soundRate*5]) # Trump2

In [None]:
plt.plot(Tr3soundArr[:Tr3soundRate*5]) # Trump3

In [None]:
plt.plot(Tr4soundArr[:Tr4soundRate*5]) # Trump4

In [None]:
plt.plot(RapsoundArr[:RapsoundRate*5]) # Rap

In [None]:
plt.plot(HEsoundArr[:HEsoundRate*5]) # Hey Eugene!

## <span style="color:red">*Your Turn*</span>

<span style="color:red">Construct cells immediately below this that use the 10 audio files from at least two different speakers read in previously, attempt to automatically extract the words from Google, and calculate the word-error rate, as descibed in Chapter 9 from *Jurafsky & Martin*, page 334. How well does it do? Under what circumstances does it perform poorly? 

## Speech-to-Text

We can also do speech recognition on audio, but this requires a complex machine learning system. Luckily there are many online services to do this. We have a function that uses Google's API. There are two API's: one is free but limited; the other is commercial and you can provide the function `speechRec` with a file containing the API keys, using `jsonFile=` if you wish. For more about this look [here](https://stackoverflow.com/questions/38703853/how-to-use-google-speech-recognition-api-in-python) or the `speech_recognition` [docs](https://github.com/Uberi/speech_recognition).

In [None]:
#Using another library so we need to use files again
def speechRec(targetFile, language = "en-US", raw = False, jsonFile = 'data/googleAPIKeys.json'):
    r = speech_recognition.Recognizer()
    if not os.path.isfile(jsonFile):
        jsonString = None
    else:
        with open(jsonFile) as f:
            jsonString = f.read()
    with speech_recognition.AudioFile(targetFile) as source:
        audio = r.record(source)
    try:
        if jsonString is None:
            print("Sending data to Google Speech Recognition")
            dat =  r.recognize_google(audio)
        else:
            print("Sending data to Google Cloud Speech")
            dat =  r.recognize_google_cloud(audio, credentials_json=jsonString)
    except speech_recognition.UnknownValueError:
        print("Google could not understand audio")
    except speech_recognition.RequestError as e:
        print("Could not request results from Google service; {0}".format(e))
    else:
        print("Success")
        return dat

The example above is of too low quality so we will be using another file `data/audio_samples/english.wav`

In [None]:
speechRec('mydata/Sanders1.wav')

In [None]:
speechRec('mydata/Sanders2.wav')

In [None]:
speechRec('mydata/Sanders3.wav')

In [None]:
speechRec('mydata/Sanders4.wav')

In [None]:
speechRec('mydata/Trump1.wav')

In [None]:
speechRec('mydata/Trump2.wav')

In [None]:
speechRec('mydata/Trump3.wav')

In [None]:
speechRec('mydata/Trump4.wav')

In [None]:
speechRec('mydata/Rap.wav')

In [None]:
speechRec('mydata/Hey_Eugene.wav')

## <span style="color:red">*Your Turn*</span>

<span style="color:red">Construct cells immediately below this that read in 10 image files (e.g., produced on your smartphone, harvested from the web, etc.) that feature different kinds of objects and settings, including at least one indoor and one outdoor setting. Perform blob detection and RAG segmentation using the approaches modeled above. How well does each algorithm identify objects or features of interest?

# Image analysis

Now we will explore image files. First, we will read in an image:

In [None]:
image1 = PIL.Image.open('mydata/image1.jpg')
image2 = PIL.Image.open('mydata/image2.jpg')
image3 = PIL.Image.open('mydata/image3.jpg')
image4 = PIL.Image.open('mydata/image4.jpg')
image5 = PIL.Image.open('mydata/image5.jpg')
image6 = PIL.Image.open('mydata/image6.jpg')
image7 = PIL.Image.open('mydata/image7.jpg')
image8 = PIL.Image.open('mydata/image8.jpg')
image9 = PIL.Image.open('mydata/image9.jpg')
image10 = PIL.Image.open('mydata/image10.jpg')

imageArr1 = np.asarray(image1)
imageArr2 = np.asarray(image2)
imageArr3 = np.asarray(image3)
imageArr4 = np.asarray(image4)
imageArr5 = np.asarray(image5)
imageArr6 = np.asarray(image6)
imageArr1 = np.asarray(image7)
imageArr2 = np.asarray(image8)
imageArr3 = np.asarray(image9)
imageArr4 = np.asarray(image10)

The image we have loaded is a raster image, meaning it is a grid of pixels, each pixel contains 1-4 numbers giving the amounts of color contained in it. In this case, we can see it has 3 values per pixel, these are RGB or Red, Green and Blue. If we want to see just the green we can look at just that array:

In [None]:
plt.imshow(imageArr1[:,:,2], cmap='Greens') #The order is R G B, so 2 is the Green

In [None]:
plt.imshow(imageArr1[:,:,1], cmap='Blues')

In [None]:
plt.imshow(imageArr1[:,:,0], cmap='Reds')

In [None]:
plt.imshow(imageArr2[:,:,2], cmap='Greens') #The order is R G B, so 2 is the Green

In [None]:
plt.imshow(imageArr2[:,:,1], cmap='Blues')

In [None]:
plt.imshow(imageArr2[:,:,0], cmap='Reds')

In [None]:
image_gray1 = PIL.ImageOps.invert(image1.convert('L'))
image_gray2 = PIL.ImageOps.invert(image2.convert('L'))
image_gray3 = PIL.ImageOps.invert(image3.convert('L'))
image_gray4 = PIL.ImageOps.invert(image4.convert('L'))
image_gray5 = PIL.ImageOps.invert(image5.convert('L'))
image_gray6 = PIL.ImageOps.invert(image6.convert('L'))
image_gray7 = PIL.ImageOps.invert(image7.convert('L'))
image_gray8 = PIL.ImageOps.invert(image8.convert('L'))
image_gray9 = PIL.ImageOps.invert(image9.convert('L'))
image_gray10 = PIL.ImageOps.invert(image10.convert('L'))

image_grayArr1 = np.asarray(image_gray1)
image_grayArr2 = np.asarray(image_gray2)
image_grayArr3 = np.asarray(image_gray3)
image_grayArr4 = np.asarray(image_gray4)
image_grayArr5 = np.asarray(image_gray5)
image_grayArr6 = np.asarray(image_gray6)
image_grayArr7 = np.asarray(image_gray7)
image_grayArr8 = np.asarray(image_gray8)
image_grayArr9 = np.asarray(image_gray9)
image_grayArr10 = np.asarray(image_gray10)


A grayscale image is defined by its pixel intensities (and a color image can be defined by its red, green, blue pixel intensities).

In [None]:
plt.imshow(image_gray1)

In [None]:
plt.imshow(image_gray2)

## Blob Detection

Recall our earlier use of scikit-learn for machine learning. Now we will use scikit-image to do some simple image processing. Here we will perform three operations for 'blob' of simple object detection. In computer vision, blob detection methods aim to detect regions in a digital image that differ in properties, such as brightness or color, compared to surrounding regions. Informally, a blob is a region of an image in which some properties are approximately constant or similar to each other. We will do this in three ways.

First, we will take the Laplacian of an image, which is a 2-D isotropic (applying equally well in all directions) measure of the 2nd spatial derivative of an image. The Laplacian of an image highlights regions of rapid intensity change and is therefore often used for edge detection. This Laplacian is taken of the image once a Gaussian smoothing filter has been applied in order to reduce its sensitivity to noise.

The Laplacian $L(x,y)$ of an image with pixel intensity values $I(x,y)$ is given by: $L(x,y)=\frac{\delta^2x}{\delta x^2} + \frac{\delta^2y}{\delta y^2}$. A Gaussian smoothing filter takes a 2 dimensional Guassian, $G(x,y)=\frac{1}{2 \pi \sigma^2} e^\frac{-x^2 + y^2}{2\sigma^2}$, which looks like: <img src="http://www.librow.com/content/common/images/articles/article-9/2d_distribution.gif">

This Gaussian *kernel* is applied to the pixel intensities of the image via *convolution* -- the kernel is multiplied by the pixel intensities, while centered on each pixel, then added.

The blob detector computes the Laplacian of Gaussian (LoG) images with successively increasing standard deviation and stacks them up in a cube. Blobs are local maximas within this cube. Detecting larger blobs is slower because of larger kernel sizes during convolution. Bright blobs on dark backgrounds are detected.

In [None]:
blobs_log1 = blob_log(image_grayArr1, max_sigma=30, num_sigma=10, threshold=.1)
blobs_log1[:, 2] = blobs_log1[:, 2] * sqrt(2) #Radi
fig, ax = plt.subplots()

plt.imshow(image_gray1, interpolation='nearest')
for blob in blobs_log1:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_log2 = blob_log(image_grayArr2, max_sigma=30, num_sigma=10, threshold=.1)
blobs_log2[:, 2] = blobs_log2[:, 2] * sqrt(2) #Radi
fig, ax = plt.subplots()

plt.imshow(image_gray2, interpolation='nearest')
for blob in blobs_log2:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_log3 = blob_log(image_grayArr3, max_sigma=30, num_sigma=10, threshold=.1)
blobs_log3[:, 2] = blobs_log3[:, 2] * sqrt(2) #Radi
fig, ax = plt.subplots()

plt.imshow(image_gray, interpolation='nearest')
for blob in blobs_log:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

Second, we look at Difference of Gaussian (DoG), a much faster approximation of the LoG approach in which an image is blurred with increasing standard deviations and the difference between two successively blurred images are stacked up in a cube. 

In [None]:
blobs_dog1 = blob_dog(image_grayArr1, max_sigma=30, threshold=.1)
blobs_dog1[:, 2] = blobs_dog1[:, 2] * sqrt(2)
fig, ax = plt.subplots()

plt.imshow(image_gray1, interpolation='nearest')
for blob in blobs_dog1:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_dog2 = blob_dog(image_grayArr2, max_sigma=30, threshold=.1)
blobs_dog2[:, 2] = blobs_dog2[:, 2] * sqrt(2)
fig, ax = plt.subplots()

plt.imshow(image_gray2, interpolation='nearest')
for blob in blobs_dog2:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_dog3 = blob_dog(image_grayArr3, max_sigma=30, threshold=.1)
blobs_dog3[:, 2] = blobs_dog3[:, 2] * sqrt(2)
fig, ax = plt.subplots()

plt.imshow(image_gray3, interpolation='nearest')
for blob in blobs_dog3:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

Finally, we consider the Determinant of Hessian (DoH) approach. The Hessian matrix or Hessian is a square matrix of second-order partial derivatives $\frac{\partial^2 f}{\partial x_i \partial x_j}(x_1^{*}, \ldots, x_n^{*})$ and is calculated on square pixel patches of the image. The determinant is the scaling factor of each patch. This approach is fastest and detects blobs by finding maximas in this matrix (of the Determinant of the Hessian of the image). Detection speed is independent of the size of blobs as the implementation uses box filters, $\begin{bmatrix}1 & 1 & 1 \\
    1 & 1 & 1 \\
    1 & 1 & 1\end{bmatrix}$, instead of Gaussians for the convolution. As a result, small blobs (< 3 pixels) cannot be detected accurately. 

In [None]:
blobs_doh1 = blob_doh(image_gray1, max_sigma=30, threshold=.01)
fig, ax = plt.subplots()

plt.imshow(image_gray1, interpolation='nearest')
for blob in blobs_doh1:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_doh2 = blob_doh(image_gray2, max_sigma=30, threshold=.01)
fig, ax = plt.subplots()

plt.imshow(image_gray2, interpolation='nearest')
for blob in blobs_doh2:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

In [None]:
blobs_doh3 = blob_doh(image_gray3, max_sigma=30, threshold=.01)
fig, ax = plt.subplots()

plt.imshow(image_gray3, interpolation='nearest')
for blob in blobs_doh3:
    y, x, r = blob
    c = plt.Circle((x, y), r, linewidth=2, fill=False)
    ax.add_patch(c)

Humans possess an incredible ability to identify objects in an image. Segmentation is the process of dividing an image into meaningful regions. All pixels belonging to a region should receive a unique label in an ideal segmentation.

Region Adjacency Graphs (RAGs) are a common data structure for many segmentation algorithms. First, we define regions through the SLIC algorithm that assigns a unique label to each region or a localized cluster of pixels sharing some similar property (e.g., color or grayscale intensity). Then we'll consider each region a node in a graph, and construct a region boundary RAG, where the edge weight between two regions is the average value of the corresponding pixels in edge_map along their shared boundary. Then edges below a specified threshold are removed and a connected component is labeled as one region. 

In [None]:
labels1 = segmentation.slic(image_gray1, compactness=30, n_segments=100)
edges1 = filters.sobel(image_grayArr1)
edges_rgb1 = color.gray2rgb(edges1)
g1 = graph.rag_boundary(labels1, edges1)
out1 = graph.draw_rag(labels1, g1, edges_rgb1, node_color="#999999", colormap=viridis)

labels2 = segmentation.slic(image_gray2, compactness=30, n_segments=100)
edges2 = filters.sobel(image_grayArr2)
edges_rgb2 = color.gray2rgb(edges2)
g2 = graph.rag_boundary(labels2, edges2)
out2 = graph.draw_rag(labels2, g2, edges_rgb2, node_color="#999999", colormap=viridis)

labels3 = segmentation.slic(image_gray3, compactness=30, n_segments=100)
edges3 = filters.sobel(image_grayArr3)
edges_rgb3 = color.gray2rgb(edges3)
g3 = graph.rag_boundary(labels3, edges3)
out3 = graph.draw_rag(labels3, g3, edges_rgb3, node_color="#999999", colormap=viridis)

labels4 = segmentation.slic(image_gray4, compactness=30, n_segments=100)
edges4 = filters.sobel(image_grayArr4)
edges_rgb4 = color.gray2rgb(edges4)
g4 = graph.rag_boundary(labels4, edges4)
out4 = graph.draw_rag(labels4, g4, edges_rgb4, node_color="#999999", colormap=viridis)

labels5 = segmentation.slic(image_gray5, compactness=30, n_segments=100)
edges5 = filters.sobel(image_grayArr5)
edges_rgb5 = color.gray2rgb(edges5)
g5 = graph.rag_boundary(labels5, edges5)
out5 = graph.draw_rag(labels5, g5, edges_rgb5, node_color="#999999", colormap=viridis)

labels6 = segmentation.slic(image_gray6, compactness=30, n_segments=100)
edges6 = filters.sobel(image_grayArr6)
edges_rgb6 = color.gray2rgb(edges6)
g6 = graph.rag_boundary(labels6, edges6)
out6 = graph.draw_rag(labels6, g6, edges_rgb6, node_color="#999999", colormap=viridis)

labels7 = segmentation.slic(image_gray7, compactness=30, n_segments=100)
edges7 = filters.sobel(image_grayArr7)
edges_rgb7 = color.gray2rgb(edges7)
g7 = graph.rag_boundary(labels7, edges7)
out7 = graph.draw_rag(labels7, g7, edges_rgb7, node_color="#999999", colormap=viridis)

labels8 = segmentation.slic(image_gray8, compactness=30, n_segments=100)
edges8 = filters.sobel(image_grayArr8)
edges_rgb8 = color.gray2rgb(edges8)
g8 = graph.rag_boundary(labels8, edges8)
out8 = graph.draw_rag(labels8, g8, edges_rgb8, node_color="#999999", colormap=viridis)

labels9 = segmentation.slic(image_gray9, compactness=30, n_segments=100)
edges9 = filters.sobel(image_grayArr9)
edges_rgb9 = color.gray2rgb(edges9)
g9 = graph.rag_boundary(labels9, edges9)
out9 = graph.draw_rag(labels9, g9, edges_rgb9, node_color="#999999", colormap=viridis)

labels10 = segmentation.slic(image_gray10, compactness=30, n_segments=100)
edges10 = filters.sobel(image_grayArr10)
edges_rgb10 = color.gray2rgb(edges10)
g10 = graph.rag_boundary(labels10, edges10)
out10 = graph.draw_rag(labels10, g10, edges_rgb10, node_color="#999999", colormap=viridis)

In [None]:
io.imshow(out1)io.show()

In [None]:
io.imshow(out2)
io.show()

In [None]:
io.imshow(out3)
io.show()

In [None]:
io.imshow(out4)
io.show()

In [None]:
io.imshow(out5)
io.show()

In [None]:
io.imshow(out6)
io.show()

In [None]:
io.imshow(out7)
io.show()

In [None]:
io.imshow(out8)
io.show()

In [None]:
io.imshow(out9)
io.show()

In [None]:
io.imshow(out10)
io.show()

## <span style="color:red">*Your Turn*</span>

<span style="color:red">Construct cells immediately below this that report the results from experiments in which you place each of images taken or retrieved for the last exercise through the online demos for [caffe](http://demo.caffe.berkeleyvision.org) and [places](http://places.csail.mit.edu/demo.html). Paste the image and the output for both object detector and scene classifier below, beside one another. Calculate precision and recall for caffe's ability to detect objects of interest across your images. What do you think about Places' scene categories and their assignments to your images? What would be improved labels for your images? Could you use image classification to enhance your research project and, if so, how?

# Object Detection & Scene Classification

Modern image and video analysis is typically performed using deep learning implemented as layers of convolutional neural nets to classify scenes and to detect and label objects. To learn more about deep learning and convolutional neural networks, spend some time with Andrew Ng's excellent [tutorial](http://ufldl.stanford.edu/tutorial/). Because such algorithms require substantial computing power, none of the high-quality classifiers or detectors currently available are implemented in python, although many can be called via api. The most popular open source image object detector is UC Berkeley's [*caffe*](http://caffe.berkeleyvision.org) library of trained and trainable neural nets written in C++. (Check out the [python api](https://github.com/BVLC/caffe/blob/master/python/caffe/pycaffe.py)). Scene classifiers can be built on top of caffe, such as MIT's [Places](http://places.csail.mit.edu). 

## Caffe classifications
The following images are screenshopts of Caffe classifications of my 6 images.

In [None]:
IPython.display.Image('mydata/caffe1.png')

In [None]:
IPython.display.Image('mydata/caffe2.png')

In [None]:
IPython.display.Image('mydata/caffe3.png')

In [None]:
IPython.display.Image('mydata/caffe4.png')

In [None]:
IPython.display.Image('mydata/caffe5.png')

In [None]:
IPython.display.Image('mydata/caffe6.png')

In [None]:
IPython.display.Image('mydata/caffe7.png')

In [None]:
IPython.display.Image('mydata/caffe8.png')

In [None]:
IPython.display.Image('mydata/caffe9.png')

In [None]:
IPython.display.Image('mydata/caffe10.png')

In [None]:
IPython.display.Image('mydata/caffe11.png')

In [None]:
IPython.display.Image('mydata/caffe12.png')

## Place
The following images are screenshots of Place classifications of my six images. 

In [None]:
IPython.display.Image('mydata/place1.png')

In [None]:
IPython.display.Image('mydata/place2.png')

In [None]:
IPython.display.Image('mydata/place3.png')

In [None]:
IPython.display.Image('mydata/place4.png')

In [None]:
IPython.display.Image('mydata/place5.png')

In [None]:
IPython.display.Image('mydata/place6.png')

## <span style="color:red">*Your Turn*</span>

I have imported above screenshots of caffe and place classification results of the same six images I used previously. For caffe, I would say that the classification by algorithm is accurate for 50% of the images: more specifically, the second, third and fourth images. The algorithm classifies the first image as 'bearskin', about which I have no clue. 

Although there is not a direct way to incorporate the image classification technology into my current project, It may be interesting to compare the cover pages or photographs of the same subject used by different newspapers.