# Lesson 1: Digital Images

Our goal today is to understand what a digital image is and how it is commonly represented as bits and bytes. We will cover the following topics:

- Nevigate jupyter note books (5 mins)
- Load data file (10 mins)
- Load and use metadata (7 mins)
- View images (3 mins)
- Color maps and color science (10 mins) 

- Histogram the pixel values in the image (8 mins)
- Bitdepth, File size, disk space, and memory (5 mins)
- Indexing and arrays (8 mins)
- Errors and debugs (15 mins)

## 1.Nevigate Jypyter note books
5 minuts

### Notes about jupyter notebooks:
There are two modes of the notebook cells: Edit mode and Command mode:  
   
From Edit mode (green-selected cell that you can type into)
   - Press ESC/click outside of the edit box: enters Command mode

From Command mode (blue-marked cell)
   - "a" enters a new cell above the selected cell
   - "b" enters a new cell below the selected cell
   - "dd" deletes the selected cell
   - "y" changes the cell type to python code
   - "m" changes the cell to markdown
   - Doulbe cliking on a markdown makes it into Command mode
   - "h" pulls up a help window
   - Press ENTER/click inside of the edit box: enters edit mode
   
   - SHIFT-ENTER runs the cell as code or markdown and selects the cell below 

### Exercise: Practice using & navigating jupyter notebooks

- Double clike on this cell.
- Change this cell between Edit mode and Command mode.
- Add a new cell above and below this cell, then delete them.
- Change this cell to python code and run.
- Change this cell to markdown and run.

### Challenge: Practise more shotcuts in the help window




## 2. Load data file
10 minuts

### Load libraries
1 minute

First some boilerplate code to make it easier to access useful libraries, and to make it easier to visualize data in the notebook.

"%matplotlib inline" is a magic function to sets the backend of matplotlib to the 'inline' backend, which means plotting right after commond, and saving the results.

Now the `numpy` numerical array library is available as `np`. 

Plotting functions are available with `plt`.

`seaborn`'s advanced plots are accessed through `sns`. Just importing `seaborn` at all makes `Matplotlib` look nicer.

### Build a comman data path
5 minuts

Let's set some defaults for the packages we just imported, which gets rid of grid line on our image plots!

Use bash commands to:
1. Create a folder on your desktop for this minicourse, call it "datalucence2018"
2. Create two subfolders, one called "code" and a second called "data"
    - ~/Desktop/datalucence2018/code
    - ~/Desktop/datalucence2018/data
2. Download the Data_DrugConfocalPanel folder from Canvas (under files)
3. Move this folder into ~/Desktop/datalucence2018/data
4. Navigate to datalucence2018


The images in this folder are part of an experiment to determine the effects of drug A on the cells in the image. DMSO is the solvent/vehicle for drugA.  Thus the DMSO.tif file is the control condition and the drugA.tif is the test condition

Let's make a common data path for our class.

### Load image file into Jupyter notebook
4 minutes

### Challenge: Understanding the function imread

In [None]:
imread?

## 3. Metadata
7 minutes

`data` is a `numpy` array. it's shape indicates that it has four dimensions.  So, this file is more complex than a single two-dimensional data array.  Fortunately, our former lab member (or former self) has left behind a metadata file to decode what data are in this file.

### Load the metadata
5 minutes

Our former labmate has put the metadata into a file format called JSON. JSON files are easily loaded into python as the dictionary data type. Dictionaries in python are indexed with keys, which are strings instead of numerical indices (such as used in lists). To understand this concept, load the JSON file and examine it.

This is a bit difficult to digest, so let's work our way through the information

### Exercise: Look at the information imported from the metadata file

print out the axes

### Challenge: 
Based on the metadata and the shape, what is a likely meaning of the four axes of ctl_data?

### Use the metadata 
2 minutes

It can be useful to organize your data into a dict instead of a numerical array when one of the dimensions of the array corresponds to something that is non-numerical in nature. Here, the channel dimension is stored as another dimension in the numerical array that is wt_data. To get the image corresponding to one of the channels, you would have to remember which of the channel slices corresponds to the channel you would like to see. Below we'll organize the data into a dict so that the channels can be indexed by an intuitive string and not a numerical index.

create a new dictionary that refers to one slice and all channeles

## 4. View image in iPython notebook
3 minutes

use seaborn package to suppress grid lines

get the dimensions of this single slice in pixels

## 5. Color maps and color science
10 mins

We had to specify how we wanted our colorless image to be rendered on our colorful screen, which is why we included `cmap=gray` (cmap for colormap).

If we wanted to visualize things in a more striking way, with false colors and more contrast, we could use a different colormap.

### Exercise: Find your favorate color map
Which color map is better? Jet or plasma? Read the documentation for the matplotlib colormaps online and find your favorate color map. 
Here is the link: https://matplotlib.org/users/colormaps.html

Any of the rainbow colormaps are bad colormaps because they are perceptually non-uniform.

## 6. Histogram the pixel values in the image
8 minutes

### Get bit depth of image
2 minutes

They are many kinds of data types. Here the data type is `uint16`.
 `uint16` means "unsigned (not negative) integer with 16 bits per pixel".

16 bits means there are $2^{16} = 65536$ possible pixel values. This means the values of the pixels can range from 0 to 65536.

Many scientific images will use 8 or even 12 bits which will have less contrast.

### Make histogram of the pixel values
6 minuts

Pixels in an image are just represented by numbers. We can get a sense for the distribution of brightness in our image by looking at a histogram of intensities. Here we don't think about an image as representing something spacial - just a collection of numbers.

Make our array into a simple 1D list of data

distplot=distribution plot

In [None]:
sns.distplot?

### Exercise: adjust the contrast of the image

Run the following cell to look up the imshow documentation on how to adjust the contrast by specifying vmin and vmax to increase the contrast of the iamge. 

In [None]:
plt.imshow?

## 7. File size, disk space, and memory 
5 minutes

### The size of the file read into memory

Image files always take up $bitdepth \times x \times y \times z$ in memory.

1K = 1024 bytes, 1M = 1024K

The result should be the same as below:

### The size of the file on the disk 

a human-readable description of the image file we've been using

Note the "35M". That's our file size.

So our image on-disk and loaded into Python are the same size. Therefore, this was an _uncompressed_ or _raw_ tif. Such files are quick to read and write, but take up lots of space on your hard drive. 

## 8. Indexing and arrays
10 minutes

### Print a subset of pixel values 

How would we index into the upper left-most pixel?

What about the lower left?

row -1 is the last row; column 0

What about a 10x10 slice from the upper right?

row 0:10 is 0 to 9, not includes for 10;
column -10:-1 is last tenth to last secondm not includes the last one.

Does this look like the upper right of the source image?

We only have 90 pixels not 100 pixels.

Note that ranges of indices are exclusive on the high side, inclusive on the low. What happens if I have a slice `1:2`?

We get row and column "from 1, up to (but not including) 2". This is the same as `data[1,1]`.

How can we get to the last column?

To save some typing when slicing into your data, we can leave off the value before the colon, meaning (to the beginning). Leaving off the value after means (to the end).

###  Challenge: Set that subset of values to 0 

Let's not ruin our original by making a copy! (Keep RAW DATA RAW)

In [None]:
modified_data = ctl_slice["actin"].copy()

We've viewed data using slicing, now let's set data using slicing!

Even though `modified_data[:500,:500]` is a 500x500 array, and 0 is just a scalar, `numpy` is smart and will _broadcast_ the 1x1 value `0` so that the whole 500x500 array is set to a 500x500 array of zeros.

View the modified image

## 9. Errors and debugs 
15 minutes

Let's make an random function

In [None]:
def load_image(filename):
    return np.random.rand(400, 600)

### Syntax errors

What is this code supposed to do?

In [None]:
for i in range(50)
    image_data = load_image("image_{}".format(i))
    my_images.append(image_data)

This file is trying to load a series of numbered images into an array of images.

_However_ running it gives you an error

Python errors are given to you as a "stack trace". Let's first dissect a stack trace.

**`File "<ipython-input-some_number-some_ID_number>", line 3`** tells you what file, and where in the file, the error came from. Because we're in a notebook, instead of a file we get a message that tells us we're in a notebook using IPython and that the error was in line 3.

**`  for i in range(50)`** is where Python conveniently reminds us what code was at line 3. Sometimes it shows us a bit of code before and after to give us context.

**`^`** marks exactly _where_ in line 3 the problem was noticed. This can be tricky because a problem with a function call may not be noticed until the closing ')' at the very end of the function. Nonetheless, here it might be helpful.

**`SyntaxError: invalid syntax`**. This tells us that our problem is a SyntaxError, which is one of a large hierarchy of errors python can provide us with.

`SyntaxError`s happen when Python sees you violating the rules of the language - it's a problem with the literal code characters you have typed rather than what your code is trying to do conceptually. Therefore they are usually short errors with quick fixes.

Do you see how to fix _this_ error?

In [None]:
for i in range(50) # do something here to fix
    image_data = load_image("image_{}".format(i))
    my_images.append(image_data)

### NameError 

`NameError`s happen when you try to access a python variable or function that does not exist yet. To get a NameError, Python has to actually try to run your code, so NameError is a type of `RuntimeError`.

Can you add in a line that fixes this code?

In [None]:
#add a line here to fix

for i in range(50):
    image_data = load_image("image_{}".format(i))
    my_images.append(image_data)

In this case "my_images" does not exist, so it cannot be appended to, thus the name error. Making an empty my_images array solves this problem.

Now that our images are loaded, let's check to make sure each images's mean intensity changes by less than 10% compared with the next timage. This might be a good check to make sure no one bumped our microscope or turned on a light while we were taking images.

In [None]:
for i in range(50):
    intensity = my_images[i].mean()
    next_intensity = my_images[i+1].mean()
    if abs(intensity - next_intensity) > 0.10 * intensity:
        print("Notice: intensity jumped between images {} and {}".format(i, i+1))
        break

### IndexError

`IndexError`s happen when you try to access data from a list-like object, such as a `numpy` array or image, but the location you requested does not exist. Like asking for index 10 for a list of 10 items.

In image processing, this is often caused by switching your rows/columns or width/height. Say you have a 400x600 image and try to access a pixel at row 401-600.

It is also generally common when looping through something by index. This is a good reason to use Python's `for item in collection` syntax rather than looping through indices!

In [None]:
data = [1, 3, "cat", 0.4]

# This is less clear and prone to error
for index in range(4):
    item = data[index]
    print(item)

# Than this
for item in data:
    print(item)

One interesting thing about Python loop's current value, `i` in this case, is available outside the loop and does not reset until you run the loop again. This let's us quickly check what value `i` took on when the code crashed!

In [None]:
print(i)

Now can you explain what happened here? Can you fix this code?

In [None]:
for i in range(50): # hint, you can fix it here
    intensity = my_images[i].mean()
    next_intensity = my_images[i+1].mean()
    if abs(intensity - next_intensity) > 0.10 * intensity:
        print("Notice: intensity jumped between images {} and {}".format(i, i+1))
        break

### Exercise: Sufficiency and necessity of error messages

What would have happened if instead of comparing image `i` to image `i+1` we had compared image `i-1` (previous image) to `i`?

In [None]:
for i in range(50):
    intensity = my_images[i-1].mean()
    next_intensity = my_images[i].mean()
    if abs(intensity - next_intensity) > 0.10 * intensity:
        print("Notice: intensity jumped between images {} and {}".format(i-1, i))
        break

There was no error! Did this solve our problem?

No, it did not. Consider the first pass through the loop. `i` is 0. `i-1` is -1. What is `my_images[-1]`?

In [None]:
my_images[-1]

`my_images[-1]` doesn't give a name error, -1 refers to the _last_ item in the list. But we don't want to compare the first and last images, so although this runs it's not the right behavior.

### Long stack traces

What can make the errors that we see in this course particularly daunting is that we use many libraries which use other libraries which in turn use more libraries, etc. This means that the peice of code that reports the error is often code we didn't write or didn't even know was being run, which can make errors feel unfair or unsolveable. But here's what's really happening.

Imagine that you send me on an errand to buy groceries. You give me detailed instructions (a program) describing the steps to take. So I get in the car and start driving to the store. Half way there, I notice I'm out of gas. That's OK, I have my own program for dealing with that. "Buy gas at a station" is a bit of instructions you didn't know I had, nor did you anticipate me using, but it's being used now anyway. I pull into the gas station, get out of my car, pay for the gas, and try to start filling up. However, the gas cover locks from the inside. I don't know about this (new car), and send you a text trying to precisely describe the error: "Nozzle cannot pass through solid metal".

So here you are, having sent me to buy groceries, and I tell you I can't because "nozzle cannot pass through solid metal". Python is frustrating in the same way: generally it tells you the _lowest-level problem_ when it fails. It's important to keep this in mind: the piece of code that reports an error is probably not the one that caused it!

Stack traces report errors with the first call at the top. **This means that you should read a long stack trace by starting at the bottom and working up until you see a line of code that you wrote or a function you called.** This line is likely to be the line you have to change. Maybe you passed a string to a function when it needed an integer.

Lines below code you wrote in the stack trace may contain hints about what's wrong. Maybe the lowest error at the bottom of the stack trace is "Cannot subtract string from int", which is a cluse that there was a string where an int should be.

Higher lines tell you the context that the error happened in, i.e. what code ran before the error. Maybe you have a function that you use several times - you want to know _which_ usage of the function is giving you the error. If it's the 2nd time you use the function, then either the function _can_ work, but not in a particular context, or the function is being called in the 2nd location _before_ the first: the flow of your program is not what you expected.