In [3]:
from IPython.core.display import HTML

def css_styling():
    styles = open("../Data/www/styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

# Image manipulation

There is an extensive amount of data that is stored in images and is available for analysis. On the web, images are everywhere and being able to algorithmically filter them (say for a search engine or to identify infringement) is an essential task. Scientifically, many studies rely on visual images to ascertain the presence or absence of some behavior (remember, a video is really just a series of images in time!).

To start we're going to work on the basics of what an image is, how to read it into code, and how to manipulate it in Python.

# What is an image?

Seems like a silly question, right? When you open an image on your computer, you simply see an image. But if I inspect the file with the terminal, I actually get something like this:
    
    head -n10 ../Data/Picasso/1907-Self-Portrait.-13.jpg
    
    ����JFIF��;CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 80
    ��C
    
    �B�ݲj"ݽKj��.\�k�9��9!#��I�#���D��pBG'���Չ�I>!�!�DKc�p�v��I�2\&�i��%���H2dۑi���'�ی���	o0=Ndc]�ӑ���E�����n6��%�3���oL��&��?.���xuDd�����0�c�,}Z�ݛ8�7�2H2�i�*d	ldC,c$c'Ӎ_{��Be�.	pr���Ɔ�!�	�p�r�/�D,�5	
What we see here is that the image file is still stored as textual data - just like everythign else. At the start of the file it says what the image file type is, how it was compressed, and the general quality of it. These are the pieces of information that another program needs to know in order to render the image from this string data. After that it's the actual image information that's stored with some symbols that we can see with ASCII and others that we would need unicode enabled to see. 

Fortunately, we don't have to learn how image information is stored since there is a simple function within `pylab` that will read an image for us. 

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import glob
import math
from pylab import imread

In [2]:
picasso_img = imread('../Data/Picasso/1907-Self-Portrait.-13.jpg')
picasso_img

array([[[151,  74,   6],
        [169,  92,  24],
        [183, 106,  38],
        ..., 
        [161, 119,   0],
        [162, 127,   1],
        [255, 229,  96]],

       [[176,  99,  31],
        [177, 100,  32],
        [177, 100,  32],
        ..., 
        [164, 120,   0],
        [163, 125,   0],
        [255, 228,  95]],

       [[187, 110,  42],
        [174,  97,  29],
        [167,  90,  22],
        ..., 
        [168, 119,   0],
        [164, 123,   0],
        [255, 225,  94]],

       ..., 
       [[176, 169, 140],
        [174, 167, 138],
        [176, 169, 140],
        ..., 
        [158, 158, 150],
        [159, 159, 151],
        [248, 248, 240]],

       [[166, 161, 132],
        [165, 160, 131],
        [167, 162, 133],
        ..., 
        [198, 198, 190],
        [178, 178, 170],
        [255, 255, 247]],

       [[163, 159, 132],
        [161, 157, 130],
        [163, 159, 132],
        ..., 
        [194, 194, 184],
        [166, 166, 156],
        [239, 239,

## Images are actually arrays (to the computer)!

An image is actually just an array once it's been decompressed. The digital image is essentially a 2-dimensional grid of values, where each element in this matrix corresponds to a pixel (so the larger the number of megapixels for your camera, the bigger the grid!). 

Digital images consist of a values that are interpreted by the computer to color a grid of pixels.
This is one of the ways we can load images into python.

In [None]:
#imread loads the image

picasso_img = imread('../Data/Picasso/1907-Self-Portrait.-13.jpg')

# imshow is a matplotlib command used to draw images.
plt.imshow(picasso_img)
plt.show()

Lets examine the image object, not as matplotlib displays it, but how the python code looks at it.

In [None]:
print('type is:', type(picasso_img))
print()
print(picasso_img)

Numpy is a python package that is ubiquitous in python's scientific, machine learning, and numerical code because it provides a way to store and manipulate large amounts of numerical data very quickly.

## Numpy Arrays are useful for calculations

You can make your own numpy arrays by putting a list (or list of lists) into the numpy.array function.

In [None]:
odds = [1, 3, 5, 7, 9, 11, 13, 15]
odds = np.array(odds)
# evens = ?

when math is done on an array it happens to the entire array at once rather than to each element

In [None]:
# Numpy
evens = odds + 1
other_evens = 2 * odds

print(evens)
print(other_evens)

We have a whole extra notebook on [Numpy](../Additional-Modules/Numpy.ipynb). In this notebook you can even learn how to program [Conway's Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life)!

## Lets examine the image array a bit more

Numpy arrays have a large collection of built in methods
and two main attributes: shape and dtype.

In [None]:
picasso_img.min()
#picasso_img.max()

In [None]:
# Numpy array has built in attributes

In [None]:
#picasso_img.dtype
picasso_img.shape

## RGB Color Explanation

[RGB](https://en.wikipedia.org/wiki/RGB_color_model)
color wheel. tricks for storing colors in a computer.

In [None]:
musicians = imread('../Data/Picasso/1921-Three_Musicians.-25.jpg')

print('split representation:')

fig, ax = plt.subplots(nrows=1, ncols=3)
cmaps = [cm.Reds, cm.Greens, cm.Blues]
labels = ['red', 'green', 'blue']

for i in range(3):
    ax[i].imshow(musicians[:,:,i], cmap=cmaps[i])
    ax[i].set_title(labels[i])
    
plt.tight_layout()
plt.show()

print('combined representation:')
plt.imshow(musicians)
plt.show()

In [None]:
# this code creates a patch of color with the specified r, g, b values
r = 50
g = 0
b = 0

color_patch = np.ones(shape=(300, 300, 3)) # make a 3D numpy array filled with ones
color_patch[:,:,0] *= r   # red
color_patch[:,:,1] *= g   # green
color_patch[:,:,2] *= b   # blue

plt.imshow(color_patch) # show color
plt.show()

### Exercise

In [None]:
#By modifying this code

r = 500
g = 0
b = 0

color_patch = np.ones(shape=(300, 300, 3)) # make a 3D numpy array filled with ones
color_patch[:,:,0] *= r   # red
color_patch[:,:,1] *= g   # green
color_patch[:,:,2] *= b   # blue

plt.imshow(color_patch) # show color
plt.show()

#1. find r,g,b values that will give you yellow

#2. find r,g,b values that will give you black

#3. find r,g,b values that will give you purple

## Array Dimensions


A numpy array has anywhere between 1 and N dimensions.

Black and white images have 2 dimensions. Colored images have 3: X, Y, Color(red, green, blue).

You can tell how many dimensions your data has by looking at the array's shape attribute

In [None]:
picasso_img.shape

You can pull out the address of any value by putting in three coordinates.

In [None]:
picasso_img[0, 10, 1]

### Exercise:

Using the picasso_img array, what are the largest/smallest values you can enter as an 'address' without getting an error? 

(change around the numbers and see what happens)

In [None]:
picasso_img[0, 27, 2]

## Slicing an Array

In [None]:
a = np.arange(21, 29)

print('full array:')
print(a)

print()

print('array slice:')
print(a[::2])

### Exercises

In [None]:
a = np.array(['hey', 'you', 'guys', '!'])

# 1. grab just the word 'you'


# write code here

In [None]:
# 2. make an array with just ['you', 'guys', '!']

# write code here

In [None]:
# 3. create an array with the reverse order

# write code here

### Slicing N-Dimensional Data:

In [None]:
array_slice = picasso_img[50:200]

print('original shape:', picasso_img.shape)
print('slice shape:', array_slice.shape)
plt.imshow(array_slice)
plt.show()

# array_slice = picasso_img[:, 300:500]
# print('origional shape:', picasso_img.shape)
# print('slice shape:', array_slice.shape)
# plt.imshow(array_slice)
# plt.show()

We can also slice off a lower dimensional portion of the array.

In [None]:
array_slice = picasso_img[0]
#array_slice = picasso_img[:, 0]
#array_slice = picasso_img[:,:, 0]

print(array_slice)
print('shape:', array_slice.shape)
plt.imshow(array_slice, cmap='Greys')
plt.show()

We can use slicing in order to change some portion of the data

In [None]:
print('no red')
no_red = imread('../Data/Picasso/1921-Three_Musicians.-25.jpg')
no_red[:,:,0] = 0           # set the entire red part of the array to 0
plt.imshow(no_red)
plt.show()

print('no green')
no_green = imread('../Data/Picasso/1921-Three_Musicians.-25.jpg')
no_green[:,:,1] = 0          # set the entire green part of the array to 0
plt.imshow(no_green)
plt.show()

print('no blue')
no_blue = imread('../Data/Picasso/1921-Three_Musicians.-25.jpg')
no_blue[:,:,2] = 0       # set the entire blue part of the array to 0
plt.imshow(no_blue)
plt.show()

### Exercises

In [None]:
#Q1. reverse the x axis on the image

flipped_image = imread('../Data/Picasso/1905-Harlequin_Sitting_on_a_Red__Couch.-12.jpg')


# add code here
# flipped_image = flipped_image
# 

plt.imshow(flipped_image)
plt.show()

In [None]:
#Q2. make an image with only the rightmost muscician
# (ie. slice the x and y axis so other musicians are cropped out)

just_bearded_guy = imread('../Data/Picasso/1921-Three_Musicians.-25.jpg')

# add code here
# just_bearded_guy = just_bearded_guy

plt.imshow(just_bearded_guy)
plt.show()

In [None]:
#Q3. reverse the order of colors so they go: 'Blue', 'Green', 'red'

recolored_painting = imread( '../Data/Picasso/1900-A_Spanish_Couple_in_front_of_an_Inn.-34.jpg')

# add code here
# recolored_painting = recolored_painting

plt.imshow(recolored_painting)
plt.show()

In [None]:
#Q4. shrink the size of the image so it is 1/2 of origional size

shrunken_image = imread('../Data/Picasso/1903-The_Old_Guitarist.-7.jpg')

# add code here
# shrunken_image = shrunken_image

plt.imshow(shrunken_image)
plt.show()

# Array Methods

## Column and Row operations

Many NumPy functions (especially summary statistics) allow you to specify if the operation should be performed on the rows or columns with the `axis` keyword.

Some functions like [`np.concatenate()`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html) allow you to make new arrays by sticking existing arrays together

In [None]:
a = np.array([[19.72, 20.34], 
              [21.30, 17.26]])



print(a)
print('\nmean, no axis specified')
print(a.mean())

print('\nmean, axis 0')
print(a.mean(axis=0))

print('\nmean, axis 1')
print(a.mean(axis=1))

we can use this to pull out color profiles x,y for different images.

### Using Column and Row operations on our images

In [None]:
music = imread('../Data/Picasso/1903-The_Old_Guitarist.-7.jpg')
plt.imshow(music)
plt.show()
# http://localhost:8888/notebooks/Projects/Image-Analysis.ipynb#
colors = ['r', 'g', 'b']
fig, ax = plt.subplots(figsize=(5,3))
for i in range(3):
    ax.plot(music.mean(axis=0)[:, i], color=colors[i])
    ax.set_xlim([0, 500])
plt.show()


fig, ax = plt.subplots(figsize=(3, 5))
for i in range(3):
    ax.plot( music.mean(axis=1)[:, i],np.arange(len(music))[::-1], color=colors[i])
    ax.set_xlim([0, 180])

plt.show()

# The Full Dataset

The Picasso directory in `../Data` contains a number of Picasso images

In [None]:
ls ../Data/

We can `glob` all the images and store them in a list

In [None]:
images = list(glob.glob('../Data/Picasso/*.jpg'))
images[:20]

Names are formatted like this:
[year]-[painting name]-[random number].jpg

## plotting Picasso's color usage over time

Lets pull out the year from the image name and the average amount of Red, Green, and Blue.
These numbers we can store in lists.

In [None]:
reds = []
greens = []
blues = []
years = []

import re

def get_year(path):
    match = re.search(r"""  # lets search in the string for something that...
            (/|\\)          # starts with a slash or backslash
            (?P<year>       # now lets call this group (stuff in parenthesis) "year"
                1               # starts with a "1"
                [0-9]{3}        # followed by a 0-9, 3 times
            )               # (end group)
            -               # ...after which is a dash
            """, path, flags=re.VERBOSE)
    
    year = match.groupdict()['year']
    return int(year)

for img_path in images:
    y = get_year(img_path)
    
    if not y:
        print(y)
        continue
        
    img = imread(img_path)
    r = img[:,:,0].mean()
    g = img[:,:,1].mean()
    b = img[:,:,2].mean()
    
    reds.append(r)
    blues.append(b)
    greens.append(g)
    years.append(int(y))
#     means.append(img.mean)

Now we'll make three plots showing how the average color changes over time.

In [None]:
fig, ax = plt.subplots(figsize=(12, 3))
ax.plot(years, np.array(reds), '.', color='r')
ax.set_ylabel('average Red usage')
ax.set_xlabel('Year')
plt.show()


fig, ax = plt.subplots(figsize=(12, 3))
ax.plot(years, np.array(greens), '.', color='g')
ax.set_ylabel('average Green usage')
ax.set_xlabel('Year')
plt.show()

fig, ax = plt.subplots(figsize=(12, 3))
ax.plot(years, blues, '.', color='b')
ax.set_ylabel('average Blue usage')
ax.set_xlabel('Year')
plt.show()

## picking out Picasso's Blue Period (1901-1904)

Although Picasso has a number of images that are refered to as his 'Blue Period', it's not immediatly obvious
from the graphs we just created when that period is.

What if we compared the relative amounts of blue compared to red or green.

to do this, lets convert each of the lists into an arrays.

In [None]:
blue_array = np.array(blues)
red_array =  np.array(reds)
green_array = np.array(greens)

now we can divide every number in one array by every number in another

In [None]:
blue_red_ratio = blue_array / red_array
blue_green_ratio = blue_array / green_array

And plot the arrays by sticking them into matplotlib

In [None]:
fig, ax = plt.subplots(figsize=(12, 3))
ax.set_title('blue / red')
ax.plot(years, blue_red_ratio , '.', color='purple')
#ax.set_xlim(1900, 1904)
plt.show()


fig, ax = plt.subplots(figsize=(12, 3))
ax.set_title('blue / green')
ax.plot(years,blue_green_ratio , '.', color='teal')
#ax.set_xlim(1901, 1904)
plt.show()

Now, we can see a distint spike in the blue / red ratios from (one image in) 1901 to 1903.  

Just to make sure, lets look at all the images between 1901 and 1904

In [None]:
for img_path in images:
    year = get_year(img_path)
    if 1901 <= year <= 1904:
        print(year)
        img = imread(img_path)
        plt.imshow(img)
        plt.show()

# Masks and boolean Arrays

It's possible to exclude or include values in an array (in this case, image) based on boolean logic (such as greater than, less than, equal too, etc.).

This can be helpful for cutting out certain parts of the image you are most interested in. This is also very useful for other types of data.

First, lets load an image and slice just the first layer to create a black and white image.

In [None]:
woman = imread('../Data/Picasso/1904-Woman_with_a_Crow.-4.jpg')

plt.imshow(woman)
plt.show()

woman_reds = woman[:,:, 0]
plt.imshow(woman_reds, cmap='Greys')
plt.show()

Slicing off the first layer has changed the shape of the image.

In [None]:
print('original shape', woman.shape)
print('new shape', woman_reds.shape)

This new two dimensional image consists of many numbers between 0 and 255

In [None]:
woman_reds

In order to create a boolean array, we just use boolean logic on the previous array.

In [None]:
image_mask = woman_reds > 100
image_mask

In [None]:
print('type:', type(image_mask))
print('dtype', image_mask.dtype)
print('shape', image_mask.shape)

plt.imshow(image_mask, cmap='Greys', interpolation='nearest')
plt.show()

by multiplying the masked array with the image, we can exclude any points that are 'False' in the boolean array.
This is essentially like multiplying those values by zero.

In [None]:
masked_image = woman_reds * image_mask
plt.imshow(masked_image, cmap='Greys')
plt.show()

We can string together multiple boolean operations by using the `&` symbol.

Three dimensional masks work as well, they just must be the same dimensions as the image you want to apply them too.

In [None]:
image_mask3 = (woman > 50) & (woman < 200)

masked_image3 = woman * image_mask3
plt.imshow(masked_image3, cmap='Greys')
plt.show()

# Image Processing Additional Reading

## Image procesing library
Pillow -(a package that replaces the old Python Imaging Library) 

## image processing tools
http://scikit-image.org/

## Create Animations / Video editing with Python
http://zulko.github.io/moviepy/index.html