# Mathematics of Images

Throughout human history, people have used images to convey information in a much quicker and more permanent manner than speech (or later, text) could possibly convey. As they say, a picture is worth 1000 words, and as we'll see, the actual information contained in a digital image file is much more than that. Though it might initially seem strange to be able to reduce an object that's so visual down to a series of numbers, we'll show here that it can be done in a surprisingly elegant way that allows the images to be interpreted by many programs, and to be further manipulated to learn more about their contents.

We'll start with the basics.

### Resolution and Channels

If you've spent much time around computers and digital media, you've no doubt heard the term **resolution** thrown around. A resolution describes the number of **pixels** that span a 2-dimensional image, usually described in the convention of **horizontal** x **vertical**, and a higher resolution indicates that an image (or screen) has more pixels and finer detail. Sometimes the resolution will be described solely by the vertical resolution (i.e. 1080), or alternatively it can be described in words (i.e. High Definition). 

<center><img src="https://upload.wikimedia.org/wikipedia/commons/f/f2/Resolution_illustration.png" /></center>
<center> <a href="https://en.wikipedia.org/wiki/Image_resolution">https://en.wikipedia.org/wiki/Image_resolution </a></center>

Here are a few common resolutions you may have encountered:

- 396 x 484 Apple Watch (Series 7)
- 640 x 480 Standard Definition
- 1920 x 1080 High Definition
- 3840 x 2160 4K / Ultra HD
- 7680 x 4320 8K / Full Ultra HD

Yet another way to describe the resolution, often of an image-capturing device like a camera, is by describing the number of pixels the image contains. For example, an HD image has $1920 * 1080 = 2,073,600$ pixels, or roughly 2.0 **mega**pixels. Many numbers have been offered up as the resolution at which the human eye can't tell the difference between an image and real life with some estimates putting that as high as [576 megapixels](https://clarkvision.com/articles/eye-resolution.html). For reference, the largest TVs available on the market right now have a resolution of 8K, which is equivalent to 33.2 megapixels! We're still a long way from creating screens that approach the limits of human ability.

So now that we know how the resolution describes the number of pixels within an image, what exactly is a pixel? Well, a pixel is the [smallest unit of detail available](https://en.wikipedia.org/wiki/Pixel) in a digital image. The pixel itself contains information about what colours it shows, or in some file formats, whether it's visible at all. Each piece of colour information a pixel contains is known as a **channel**. Different colour formats have a different number of channels:

| Name | Description | Channels |
| - | - | - |
Grayscale | Black and white | 1 Channel
RGB | Colour (Red/Green/Blue) | 3 Channels
RGBA | Colour +  Transparency (Alpha)| 4 Channels 

### Colour Depth

For an RGB image, each pixel would have 3 channels: one each for red, green, and blue. Each of the channels then would have a number that indicates the *intensity* of that colour, with the lowest possible value being lack of colour (black), and the highest possible value being the maximum brightness of that colour. The *range* of the values available to describe the colour is known as the **colour depth**, also sometimes referred to as the **bit depth**. For example, an image that has a colour depth of *8 bits per channel* would have $2^8 = 256$ possible values for each channel. Having three channels of 8 bits is a standard known as **24-bit colour**, and is by far the most common colour depth on modern devices. Low colour depths tend to produce images with 'blockier' transitions between colours, while higher depths preserve the smoothness of the original image. The below image shows the effect of bit depth on a single channel (grayscale) image:

<center><img src="https://digamation.files.wordpress.com/2008/07/digamation-bit-depth1.jpg" /></center>
<center><a href="https://digamation.wordpress.com/2008/07/18/understanding-bit-depth">https://digamation.wordpress.com/2008/07/18/understanding-bit-depth </a></center><br>

There's a noticable difference in image quality as we move across colour depth in the image above. The left side of the image is 1-bit, so it has only $2^1=2$ colours available to detail the image: black and white. The result is a sharp contrast between the different sections of the image. The 4-bit colour depth produces $2^4=16$ shades of grey, including full black and full white. Lastly, as we've shown before, the 8-bit has 256 possible values, giving a far clearer image. Colour depths beyond 24-bit (or 8-bit per channel) have diminishing returns on their ability to preserve clarity, but some image editing software (and graphics connector specifications) will support 30- or even 48-bit colour depths.

In the next 24-bit RGB image, we can see a diagram representation of a single pixel within an image file. Whereas a grayscale image would only have one channel, the final colour of the pixels below is determined by the amount of red, green, and blue, with intensity on a scale from 0 to 255:

<center><img src="img/pixel-colour-diagram.png" /></center><br>

As 24-bit is such a widely adopted standard, there are multiple resources where you [mix your own colours](https://www.csfieldguide.org.nz/en/interactives/rgb-mixer/) by adjusting the values of each of the RGB channels. Entering the RGB values for the individual colours above will produce the same colour as that within the pixel in the diagram.

Another way to visualize the colours in an RGB pixel is by considering a cube, where each dimension is one of the 3 colours. The cube has side length 255 (for 8-bit colour) and all three colours originating at (0,0,0), which would be pure white. The point (255, 255, 255) represents pure black, and the dashed line between the two points would be a grayscale line in RGB. Shown again is the colour from the pixel in the previous image, along with the corners of the cube and what colours they represent:

<center><img src="img/colour-cube.png" /></center><br>



### Header

You might also have noticed in the above image the presence of a **header** block of code at the beginning of the file. Most data formats (not just images) will contain [some form of header](https://en.wikipedia.org/wiki/File_format#File_header) that describes what the file contains. For images, the header will contain (among other metadata):
- File format
- Number (and type) of channels
- Colour depth
- Image dimensions

When you use software to open an image file, it will first read the header to understand how to interpret the file, as the data is effectively stored as one long string. By knowing the resolution (dimensions), the software knows where to end a horizontal line of the image and start a new one. Alternatively, some file formats include blocks of code after the last pixel in a row to indicate the end of a line. In the image above, the header would determine whether the resulting resolution is 1x4, 2x2, or even 4x1. The colour channel information within each pixel determines its final colour, and the software is able to render the image!

This is a very simplified example of a small, monocolour image, but the process is the same for larger, more complex images. Hopefully you understand the basics of how images are stored as data before we dive into how we manipulate the images.