![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)
 
<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2FML-exploration&branch=main&urlpath=image-classification/00-mathematics-of-images.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"></a>

# Mathematics of Images

Throughout human history, people have used images to convey information in a much quicker and more permanent manner than speech (or later, text) could possibly convey. As they say, a picture is worth a thousand words, and we'll see that the information contained in a digital image file is much more than that.

Although it might seem strange to reduce something visual down to a series of numbers, we'll show here that it can be done in a surprisingly elegant way that allows the images to be interpreted by many programs, and to be further manipulated to learn more about their contents.

We'll start with the basics.

### Resolution and Channels

If you've spent much time around computers and digital media, you've no doubt heard the term **resolution**. Resolution describes the number of **pixels** that span a 2-dimensional image, usually described in the convention of **horizontal** x **vertical**, and a higher resolution indicates that an image (or screen) has more pixels and finer detail. At the same (physical) size, an image with higher resolution can have more detail, or be enlarged without the individual pixels being visible. The below image shows the level of detail available as the resolution of the image increases while the physical size of the image remains the same:

<center><img src="https://upload.wikimedia.org/wikipedia/commons/f/f2/Resolution_illustration.png" /></center>
<center> <a href="https://en.wikipedia.org/wiki/Image_resolution">https://en.wikipedia.org/wiki/Image_resolution</a></center><br>

Resolution is a useful descriptor not just for images, but also for the screens on which they're displayed. Sometimes the resolution will be described solely by the vertical measurement (i.e. 1080), or alternatively it can be described in words (i.e. High Definition). Here are a few common resolutions you may have encountered:

- 396 x 484 Apple Watch (Series 7)
- 640 x 480 Standard Definition
- 1920 x 1080 High Definition
- 3840 x 2160 4K / Ultra HD
- 7680 x 4320 8K / Full Ultra HD

Yet another way to describe the resolution, often of an image-capturing device like a camera, is by describing the number of pixels the image contains. For example, an HD image has $1920 \times 1080 = 2,073,600$ pixels, or roughly 2.0 **mega**pixels (1 megapixel = $10^6$ pixels).

We've talked about resolution in terms of pixels, but we haven't yet talked about *what a pixel is*. Pixels are the [smallest unit of detail available](https://en.wikipedia.org/wiki/Pixel) in a digital image, representing a single colour that can be arranged alongside other pixels to form an image. The pixel itself contains information about what colours it shows, or in some file formats, whether it's visible at all. Each piece of colour information a pixel contains is known as a **channel**. Different colour formats have a different number of channels:

| Name | Description | Channels |
| - | - | - |
Grayscale | Black and white | 1 Channel
RGB | Colour (Red/Green/Blue) | 3 Channels
RGBA | Colour +  Transparency (Alpha)| 4 Channels 

### Colour Depth

For an [RGB image](https://www.youtube.com/watch?v=T0jzClmP2pc), each pixel would have 3 channels: one each for red, green, and blue. Each of the channels then would have a number that indicates the *intensity* of that colour, with the lowest possible value being lack of colour (black), and the highest possible value being the maximum brightness of that colour. The *range* of the values available to describe the colour is known as the **colour depth**, also sometimes referred to as the **bit depth**. Being a digital medium, using **bits** to describe the colour range is convenient, where the number of bits is the exponent *x* in $2^x$. 

A quick intro to bits: 

| Bits | Formula | Value | 
| - | - | - | 
1 | $2^1$ | 2 
2 | $2^2$ | 4 
4 | $2^4$ | 16 
8 | $2^8$ | 256 

<br>
Low colour depths tend to produce images with 'blockier' transitions between colours, while higher depths preserve the smoothness of the original image. The below image shows the effect of bit depth on a single channel (grayscale) image:

<center><img src="https://digamation.files.wordpress.com/2008/07/digamation-bit-depth1.jpg" /></center>
<center><a href="https://digamation.wordpress.com/2008/07/18/understanding-bit-depth">https://digamation.wordpress.com/2008/07/18/understanding-bit-depth </a></center><br>

There's a noticable difference in image quality as we move across colour depth in the image above. The left side of the image is 1-bit, so it has only $2^1=2$ colours available to detail the image: black and white. The result is a sharp contrast between the different sections of the image. The 4-bit colour depth produces $2^4=16$ shades of grey, including full black and full white. Lastly, the 8-bit has 256 possible values, giving a far clearer image, and is also by far the most common standard used for modern devices. Colour depths beyond 24-bit (or 8-bit per channel) have diminishing returns on their ability to preserve clarity at an increase in file size, but some image editing software (and graphics connector specifications) will support 30- or even 48-bit colour depths.

In the next 24-bit RGB image, we can see a diagram representation of a single pixel within an image file. While a grayscale image would only have one channel, the final colour of the pixels below is determined by the amount of red, green, and blue, with intensity on a scale from 0 to 255:

<center><img src="img/pixel-colour-diagram.png" /></center><br>

As 24-bit color is such a widely adopted standard, there are multiple resources where you [mix your own colours](https://www.csfieldguide.org.nz/en/interactives/rgb-mixer/) by adjusting the values of each of the RGB channels. Entering the RGB values for the individual colours above will produce the same colour as that within the pixel in the diagram.

Another way to visualize the colours in an RGB pixel is by considering a cube, where each dimension is one of the 3 colours. The cube has side length 255 (for 8-bit colour) and all three colours originating at (0,0,0), which would be pure black. The point (255, 255, 255) represents pure white, and the dashed line between the two points would be a grayscale line in RGB. Shown again is the colour from the pixel in the previous image, along with the corners of the cube and what colours they represent:

<center><img src="img/colour-cube.png" /></center><br>

Included below is a colour mixer that will allow you to play around with different 8-bit values for RGB (24-bit colour) to see what effect they have!

In [None]:
import plotly.graph_objects as go
import ipywidgets as w

sliderRed = w.IntSlider(value=1, min=0, max=255, step=1, description='Red')
sliderGreen = w.IntSlider(value=1, min=0, max=255, step=1, description='Green')
sliderBlue = w.IntSlider(value=1, min=0, max=255, step=1, description='Blue')

fig = go.FigureWidget()

def response(change):
    with fig.batch_update():
        colorMix=f'rgb({sliderRed.value}, {sliderGreen.value}, {sliderBlue.value})'       
        fig.update_layout(title=f'Colour Mixer ({sliderRed.value},{sliderGreen.value},{sliderBlue.value})',
                          plot_bgcolor=colorMix,
                          paper_bgcolor='white',
                          xaxis=dict(showgrid=False, visible=False),
                          yaxis=dict(showgrid=False, visible=False))
        
sliderRed.observe(response, names="value")
sliderGreen.observe(response, names="value")
sliderBlue.observe(response, names="value")
response('')

w.VBox([fig, sliderRed, sliderGreen, sliderBlue])

### Header

You might also have noticed in the previous image the presence of a **header** block of code at the beginning of the file. Most data formats (not just images) will contain [some form of header](https://en.wikipedia.org/wiki/File_format#File_header) that describes what the file contains. For images, the header will contain (among other metadata):
- File format
- Number (and type) of channels
- Colour depth
- Image dimensions

When you use software to open an image file, it will first read the header to understand how to interpret the file, as the data is effectively stored as one long string. By knowing the resolution (dimensions), the software knows where to end a horizontal line of the image and start a new one. Alternatively, some file formats include blocks of code after the last pixel in a row to indicate the end of a line. In the image above, the header would determine whether the resulting resolution is 1x4, 2x2, or even 4x1. The colour channel information within each pixel determines its final colour, and the software is able to render the image!

This is a very simplified example of a small, monocolour image, but the process is the same for larger, more complex images. Hopefully you understand the basics of how images are stored as data before we dive into how we manipulate the images.

## How Images are Stored as Data

Now that you've seen how images are represented digitally, let's actually look at an image and bring them into the notebook. We'll do that using a few Python libraries, primarily the Pillow library, and start with the Callysto logo:

In [None]:
from PIL import Image

callystoLogo = Image.open('img/Callysto_Icon.png')
display(callystoLogo)
print(f'Image resolution: {callystoLogo.size}')

Wow that's a large image!

Of course, in the graphic design world it's always more useful to start with a high-resolution image, and downscale it as necessary for the application. It's much more difficult to increase the resolution of an image without introducing some form of artifact.

For our purposes, an image this size could be quite unwieldly, so we'll reduce the resolution down to 256x256. Note that the original image wasn't a square, it was approximately 5% (2361 / 2254 * 100 = 104.7%) wider than taller, so by forcing the image to a square shape we are slightly compressing it horizontally.

In [None]:
logoSmall = callystoLogo.resize((256,256))
logoSmall

Now we know the resolution of our image, what can we learn about its colour channels? Let's see what we're working with:

In [None]:
logoSmall.mode

We can see that this image has 4 colour channels, one each for red, green, and blue, as well as a fourth channel for transparency (alpha). Depending on the colour scheme of the program you're using to read this notebook, the transparency might not be obvious. If you have the ability to switch back and forth between light and dark themes, you should be able to see that the background (i.e. outside the overlapping circles) will match whatever colour theme is being used by your software.

The `mode` label of 'RGBA' also indicates that each channel is 8-bit depth, as per Pillow's [documentation](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes).

We can further verify this by changing the channels from 'RGBA' to 'RGB' (remains 8-bit, but dropping the transparency channel) and there should now be a noticeable background present:

In [None]:
logoSmall.convert(mode='RGB')

We can also average the colour channels and force the image into 8-bit grayscale:

In [None]:
logoSmall.convert(mode='L')

...or 1-bit black and white. Note how though pixels can only take one of two values in this image, by differing the density of the mix of black and white, the image gives the impression of more than just two colours:

In [None]:
logoSmall.convert(mode='1')

Next up, let's take a look at what the actual underlying data looks like. Before we do that though, we're going to want to downsize our image even more to 32x32, or the resulting data won't very easy to see! We'll also drop the transparency channel.

In [None]:
logoTiny = logoSmall.resize(size=(32,32)).convert(mode='RGB')
logoTiny

Let's take a look at the data in the file. It might be a little difficult to interpret the resulting output, but each aspect of the image is 'grouped' with square brackets, `[ ]`. We've added extra spaces between the brackets to make the separation more obvious!

Starting from inside out, you have the RGB colour values for each pixel: `[#, #, #]`

Next, each row of pixels is also further grouped: `[ [#, #, #], ..., [#, #, #] ]`

And finally, the rows are grouped to form a 2D matrix of the data: `[ [ [#, #, #], ..., [#, #, #] ], ..., [ [#, #, #], ..., [#, #, #] ] ]`

Here's the first row:

In [None]:
import numpy as np
logoArray = np.array(logoTiny)
with np.printoptions(threshold=np.inf):
    display(logoArray[:1])

As you can see, the first row has only 2 unique values; either pure black (`[0, 0, 0]`), or pure white (`[255, 255, 255]`).

We can slice the first row of the image to compare it to the underlying numbers:

In [None]:
line = logoTiny.crop((0, 0, 32, 1))
line

That's somewhat hard to see, so let's blow it up 16x to make it easier:

In [None]:
line.resize((512,16), resample=Image.BOX) # BOX resample doesn't blend edges when upscaling

To appreciate the colour values in the logo, let's take a slice through the middle of the logo and do the same process:

In [None]:
with np.printoptions(threshold=np.inf):
    display(logoArray[15:16])

In [None]:
line2 = logoTiny.crop((0, 15, 32, 16))
line2.resize((512,16), resample=Image.BOX)

Try playing around with the colour mixer above to recreate the colours in the image slice!

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)