# Day 1

Day 1 is all about getting familiar with how image data is stored on a computer, and how to load these files in software.

Associated learning goal: Understand what a digital image is and how it is commonly represented as bits and bytes.
    - Loading and viewing images
        - Load image into iPython notebook
        - view image in iPython notebook
        - get bit depth of image
    - Indexing and arrays
        - Print a subset of pixel values (such as the top right corner)
        - Set that subset of values to 0
        - View the modified image in iPython
    - Histogram the pixel values in the image
    - File size, disk space, and memory
        - Report the size of the file on the disk
        - Report the size of the file read into memory
        - Save the file in another format (png, maybe?) and then report the file size on disk

In [1]:
%matplotlib inline

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Now the `numpy` numerical array library is available as `np`. Plotting functions are available with `plt`, and `seaborn`'s advanced plots are accessed through `sns`. Just importing `seaborn` at all makes `Matplotlib` look nicer.

Let's set some defaults for the packages we just imported

In [3]:
sns.set_style("dark") # Get rid of grid line on our image plots!

In [69]:
1/0

## Loading and viewing images

### Load image into Jupyter notebook

In [4]:
from skimage.io import imread

In [5]:
imread?

In [6]:
data = imread("/Users/noah/Desktop/000014.tif")

In [7]:
data.shape

These are the image dimensions. Since `data` is `numpy` array, as we will appreciate soon, it's shape is described like you might describe a mathematical matrix: row-by-column. Rows are arranged vertically, and columns horizontally, so that a row-by-column description is height-by-width in image speak. Keep this in mind - for some it's more natural to refer to width-by-height, but that's not how things work here!

### Get bit depth of image

In [8]:
data.dtype

`uint16` means "unsigned (not negative) integer with 16 bits.

16 bits means there are $2^{16} = 65536$ possible pixel values. This is in contrast to a non-scientific image that will have 8 bits, or $2^{8} = 256$ levels.

### View image in iPython notebook

In [9]:
plt.imshow(data, cmap='gray')

We had to specify how we wanted our colorless image to be rendered on our colorful screen, which is why we included `cmap=grey` (cmap for colormap). Also, the brightness is scaled so the brightest pixel in our image is as bright as possible, and the darkest pixel is black. If we want to see the image in the full $2^{16}$ range, we need to ask for that explicitly.

In [10]:
plt.imshow(data, cmap='gray', vmin=0, vmax=2**16-1)

As you can see, this is a very dim image!

We will talk more about colormaps and channels later.

## Indexing and arrays

TODO: diagram showing how indexes are laid out in image space

### Print a subset of pixel values 

How would we index into the upper left-most pixel?

In [11]:
data[0,0]

What about the lower left?

In [12]:
data[-1,0] # row -1 aka last, column 0

What about a 10x10 slice from the upper left?

In [13]:
data[0:10,0:10] # row from 0 to 10 (not inclusive for 10! and column 0 to 10

Does this look like the upper left?

In [14]:
plt.imshow(data[0:10,0:10], cmap='gray')

Why is it so blurry? Remember we're only looking at 100 pixels here, but the image is more than 100 pixels on our screen. Matplotlib is trying to deal with this by _interpolating_, but has chosen to do this in a funny way. Let's be explicit about how we want Matplotlib to interpolate.

In [15]:
plt.imshow(data[0:10,0:10], cmap='gray', interpolation='nearest')

Note that ranges of indices are exclusive on the high side, inclusive on the low. What happens if I have a slice `1:2`?

In [16]:
data[1:2, 1:2]

We get row and column "from 1, up to (but not including) 2". This is the same as `data[1,1]`.

To save some typing when slicing into your data, we can leave off the value before the colon, meaning (to the beginning). Leaving off the value after means (to the end).

In [17]:
data[:10,:10]

Can you use this to grab the lower right?

In [18]:
data[-10:,-10:] # row 10th-to-last to the end, column 10th-to-last to the end

### Set that subset of values to 0 

Let's not ruin our original by making a copy!

In [19]:
modified_data = data.copy()

We've viewed data using slicing, now let's set data using slicing!

In [20]:
modified_data[:1000,:1000] = 0

Even though `modified_data[:1000,:1000]` is a 1000x1000 array, and 0 is just a scalar, `numpy` is smart and will _broadcast_ the 1x1 value `0` so that the whole 1000x1000 array is set to a 100x100 array of zeros.

### View the modified image

In [21]:
plt.imshow(modified_data, cmap='gray')

## Histogram the pixel values in the image 

Pixels in an image are just represented by numbers. We can get a sense for the distribution of brightness in our image by looking at a histogram of intensities. Here we don't think about an image as representing something spacial - just a collection of numbers.

In [22]:
# make our array into a simple 1D list of data
flat_data = data.flatten()

In [23]:
sns.distplot(flat_data)
plt.xlabel("Pixel intensity")
plt.ylabel("Frequency in the image")

## Color science 

If we wanted to visualize things in a more striking way, with false colors and more contrast, we could use a different colormap.

In [24]:
plt.imshow(data, cmap='jet')

Jet is a bad colormap because it is perceptually non-uniform

In [27]:
from IPython.display import Image

In [28]:
Image("./jet.png")

The jumps in the "perceptual deltas" plot are values where this colormap makes it look like sharp transitions occur when they do not. Colorbind users rely more on lightness than hue, and will be especially misled.

In [30]:
Image("option_b.png")

This colormap is perceptually uniform by design! It looks cool too.

In [25]:
plt.imshow(data, cmap='CMRmap')

## File size, disk space, and memory 

### Report the size of the file on the disk 

Pro tip: a leading `!` in a notebook drops you into bash.

In [44]:
!ls -lh "/Users/noah/Desktop/000014.tif" # a human-readable description of the image file we've been using

### Report the size of the file read into memory 

In [45]:
data.nbytes

What is this in MB?

In [47]:
data.nbytes / 1024.**2 # 1K = 1024 bytes, 1M = 1024K

Note that the `-h` flag in `ls` is for "human-readable" and rounds 10.5 to 11. So our image on-disk and loaded into Python are the same size. Therefore, this was an _uncompressed_ or _raw_ tif. Such files are quick to read and write, but take up lots of space on your hard drive. Image files always take up $ bitdepth * x * y * z$ in memory.

### Save an image in a different format from its original

In [48]:
from skimage.io import imsave

In [49]:
imsave?

In [50]:
imsave("/Users/noah/Desktop/000014.jpg", data)

`Matplotlib` sees that you used the `.jpg` extension, and guesses that you want to apply jpeg compression to the image and save it in the jpeg format. There are ways to be more explicit about this (for example, specifying the bit-depth of a TIFF).

In [51]:
!ls -lh "/Users/noah/Desktop/000014.jpg"

This image is only 1.7M, but has sustained _loss_. Even if you can't see it, when you load this image, it will differ ever so slightly from the original.

The other reason jpegs may be smaller is that they only support 8-bit data, so converting a standard 16-bit TIFF from your microscope to a jpeg is probably a **Bad Idea**.

In [55]:
data_from_jpg = imread("/Users/noah/Desktop/000014.jpg")

In [59]:
np.all(data_from_jpg == data) # check for equality pixel by pixel. Are they all the same?

In [60]:
np.all(imread("/Users/noah/Desktop/000014.tif") == data) # check this on our original TIFF file too.

How are the two different?

In [62]:
difference_image = data_from_jpg - data

There can be negative values here, so we'll manually set vmin and vmax so they're symmetric around 0, and we will use a colormap that diverges from zero: blue is negative, red positive, zero white.

In [65]:
max_divergence = np.max(abs(difference_image))

In [67]:
plt.imshow(difference_image, cmap='RdBu', vmin=-max_divergence, vmax=max_divergence)
plt.colorbar()