# Proof by Pixel: Basic Image Analytics for Research on Historical Maps

## Part 1: Basics of working with images in Python

This notebook draws partly on a Data Carpentry [lesson](https://datacarpentry.org/image-processing/) and the documentation for the libraries discussed below. 

### Outline: 
This notebook covers some basic components of working with images in python. It briefly covers:
- Basics of how images are stored on a disk 
- How to load images using Python
- Getting metadata about an image

#### Loading images
- loading images
- display images 

#### Metadata
- basic metadata 

#### Manipulating images
- Resize images
- Navigating images (coordinates)
- Converting images 

#### Saving outputs
- How to save images back to disk

### Notebook hint
- Google Colab is a essentially a Jupyter notebook so it supports most common jupyter notebook tricks 
- 
The output of plotting commands is displayed inline within your notebook if you use this ipython magic command. 
You will often want to include this when using notebooks for interactive data exploration.

https://jakevdp.github.io/PythonDataScienceHandbook/01.03-magic-commands.html

%matplotlib inline

In [None]:
# Importing required packages 
from matplotlib import pyplot as plt

# Images on computers 

## Storage 

## Pixels

## Formats 
- TIFF 
- GeoTIFF


# RGB channels 

## Take original image 
![Original Image](https://datacarpentry.org/image-processing/fig/02-chair-orig.jpg)

## RGB channels 
![RGB channels](https://datacarpentry.org/image-processing/fig/02-chair-layers.png)

# Python packages for working with images

There are various Python libraries available for working with images using Python. These libraries all have slightly different focuses but often have overlapping functionality:

## [Pillow](https://python-pillow.org/)
- Pillow is a "friendly fork" of PIL (Python Image Library)
- It focuses more on 'day-to-day' image processing tasks (opening, manipulating, and converting images)
- It is fast and can be made faster with [Pillow-SIMD](https://github.com/uploadcare/pillow-simd)

## [OpenCV](https://opencv.org/) 
- OpenCV (Open Source Computer Vision Library) is a C/C++ image processing library with a Python API
- It has a wider range of implemented algorithms than Pillow 
- It is also more focused on Computer Vision tasks (for example it has models for face detection included)

## [Scikit-Image](https://scikit-image.org/)
- Scikit-image is another image processing library focused more on scientific research requirements
- If you are familiar with the rest of the [Scipy](https://www.scipy.org/) ecosytem the api for this package should feel familiar
- It has support for working with Numpy arrays (more on this below)
- It also implements many image processing and analysis algorithms

# Importing packages
- To work with these Python libraries we need to import them into our notebook

In [None]:
# This cell imports Pillow
from PIL import Image 

In [None]:
# This cell imports openCV
import cv2 as cv # A common convention in python is to import a package 'as' something else to save typing out long library names

## Loading Images 

### Notebook hint
- A nice feature of Jupyter notebooks/lab is to get documentation/tips as you are working
- Using ? follow by a method or function will give you some documentation for that library/function e.g. 

```
?cv 
```

In [None]:
?cv

In [None]:
# remove the 'hash/#' comment symbol below to see docs for im.read
#?cv.imread

To upload files into Colab:

In [2]:
from google.colab import files
uploaded = files.upload()
# Select the files you downloaded earlier 

ModuleNotFoundError: No module named 'google'

To load our image we can use the imread function. We need to tell this function where to find the file we want to read. 

In [None]:
image = cv.imread(filename = "")

Lets try printing the image

In [None]:
print(image)

## huh?
That doesn't look much like a picture! We can usually find out what type of data we are working with in in python using the 'type' function. 

In [None]:
print(type(image))

This is telling us that image is a numpy array. [Numpy] is a library which supports scientific work in Python. A numpy array is a container for storing information, in this case, the pixel values loaded by openCV. These arrays will become useful again later on but they aren't super helpful if you want to see the actual images. To prove that images really are just pixels let's use the Pillow library to turn the array into an image. 

In [None]:
pillow_image = Image.fromarray(image)  # This takes the Image module from Pillow and uses the functiton 'fromarray' to generate an image from an array 

In [None]:
?Image

We can display pillow images as follows 

In [None]:
pillow_image

# Access image metadata
- **Terminology alert**: in this context we mean metadata associated with the image object not metadata about the original physical item/map the image came from

- We can use OpenCV to access some metadata about images

In [None]:
print(image.shape)
#  It returns a tuple of number of rows, columns, and channels (if image is color)

In [None]:
print(image.size)

### Notebook hint
Normally in python you use print() to see a variable/output of a function but in notebooks the last output in the cell is displayed by default so we can skip that step. 

In [None]:
image.size

## Manipulating images
Often when working with images we will often have to change them in some way to make them easier to work with. In the case of maps for example we are working with:
- very large images (dimensions)
- large file sizes 
- with 3 channels by default 

It can be useful to change all of these factors to make the images easier to work with. 

### Convert to grayscale

In [None]:
gray_image = cv.cvtColor(image, cv.COLOR_RGB2GRAY)
gray_image.shape

In [None]:
# Plot 

In [None]:
gray_image_pillow = Image.fromarray(gray_image)
gray_image_pillow

# Resizing images
- dimensions 
- file size / compression 


### Reducing dimensions of image 

In [None]:
print(gray_image.size)
small = cv.resize(src = gray_image, dsize = None, fx = 0.1, fy = 0.1)
print(small.size)

## Compressing image and saving to a new file format 
We might want to reduce the file size of our image, change format and save with a new name. To do this in pillow we access the Image object we created above and use the 'save' method. We need to pass a filename, a format can be infered from the extension given in a filename but it can be helpful to be explict about the format when using this function. An optional parameter 'quality' can be used to reduce the filesize further. 

In [None]:
gray_image_pillow.save('gray_image.jpg',format='jpeg', quality=50)

### Notebook hint 
It can often be annoying to remember what arguments a function like Image.save requires. In Jupyter notebooks you can see expected function arguments by pressing: 
```
Shift+Tab
```
With the example above you could enter gray_image_pillow.save and then hit 'shift+tab' to see what arguments this function requires and which optional ones are available. 

In [None]:
# Try running Shift+Tab 
gray_image_pillow.save

# Next notebook 
The next notebook will move to exploring what features can be found by working with pixel values 

# Further resources 

For further resources on working with images in python see:  

## Tutorials
- [Data  Carpentry lesson  on image analytics and processing](https://datacarpentry.org/image-processing/)
- [Scikit-image video tutorial](www.youtube.com%2Fwatch%3Fv%3Dd1CIV9irQAY&usg=AOvVaw2WFU7698vaTlPF8EudVUOD)

## Documentation 
- [Pillow Documentation](https://pillow.readthedocs.io/en/stable/)
- [OpenCV Documentation](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_tutorials.html)
- [Scikit-image documentation](https://scikit-image.org/docs/stable/index.html) 