In [1]:
# john k johnstone; jkj at uab dot edu; CS104; MPL2 license
from datascience import *
import numpy as np

## Lab 10: grayscale image histogram

In this lab, you will build an image histogram and write it to a file.\
This will help you better understand file I/O, NumPy arrays, and histograms.

### Reading an image

We will read an image using OpenCV, as discussed in an earlier lab (5).\
We will need to handle the image parts of the code in a script, since
Jupyter notebooks do not interact well with OpenCV (vision and graphics modules
are often challenging for IDE's such as Jupyter notebooks and Visual Code).\

First import the OpenCV library using `import cv2 as cv`.\
Then read an image using cv.imread: 
`cv.imread (<image filename>, <flag>)`.\
To read an image as a grayscale image, use the 0 flag.\
imread will return a NumPy array containing the image.\
If this image is grayscale, it will be an array of shape (height, width).\
For example, `img = cv.imread('apollo.jpg', 0)`\

Try it now: read apollo.jpg and report its dimensions below.

Dimension of apollo.jpg (h x w) = 584 * 640

### Building a histogram

A pixel in a grayscale image has 256 choices for intensity: 
0, 1, 2, ..., 255.\
256 choices are enough so that differences between different intensities 
are indistiguishable by human eyes.\
256 choices also fit perfectly into a byte of memory, which holds 8 bits.\
Since the pixel is recording light intensity (or luminance),\
0 represents black (no light) and 255 represents white (max light).


In [2]:
2**8

256

Use the type function to discover the type of a scalar in your grayscale image array.\
Report it below.

Type of the image array scalar = Type of the image array scalar: <class 'numpy.uint8'>

### Initializing the histogram

You will build an image histogram h:
an array that counts the pixels of each intensity.\
Clearly this array should be of size 256.\
But what type should it be?\
That is, you will initialize the histogram to `h = np.zeros(256, dtype=?)`\
Since this is a count, you want an unsigned integer type.\
NumPy offers various sizes for unsigned integers: 
np.uint8 (1 byte), np.uint16 (2 bytes), np.uint32 (4 bytes), and np.uint64 (8 bytes).\
The smallest integer that you can store in these types is 0.\
What is the largest integer you can store, for each type?\
Record your answer below.

In [3]:
max_uint8  = 255
max_uint16 = 65535
max_uint32 = 4294967295
max_uint64 = 18446744073709551615

Now express these more intuitively, using the short scale naming of large numbers (the American system): see the Wikipedia page `Names of large numbers` to learn more.\
You may be approximate, such as 'about three hundred' for the max uint8 value.

max_uint8_ballpark  = 'about three hundred'
max_uint16_ballpark = ''
max_uint32_ballpark = ''
max_uint64_ballpark = ''

You will be working with large leaf images, so will want to use a scalar that can handle sizes up to 4 million.
What scalar type should you use? Record it below.

scalar type to use = np.unit32

In [4]:
import numpy as np

# Initialize the histogram
h = np.zeros(256, dtype=np.uint32)

Go ahead and initialize your histogram now in your Python script.

### Populating the histogram

Next you will build your histogram using a nested for loop, 
iterating over each pixel of the array.\
The body of the loop increments the appropriate intensity counter.\
For example, if the pixel contains the value 20, it increments hist[20],
the counter of the number of pixels of intensity 20.\
Write this code now in your script, for the apollo image.


In [5]:
from PIL import Image

# Load the Apollo image (assuming it's a grayscale image)
apollo_image = Image.open('apollo.jpg').convert('L')
apollo_array = np.array(apollo_image)

# Initialize the histogram array
hist = np.zeros(256, dtype=np.uint32)

# Build the histogram using a nested for loop
for row in apollo_array:
    for pixel in row:
        hist[pixel] += 1

# Get the number of pixels of intensity 0 and 20
nPixel0InApollo = hist[0]
nPixel20InApollo = hist[20]

print(f"Number of pixels of intensity 0: {nPixel0InApollo}")
print(f"Number of pixels of intensity 20: {nPixel20InApollo}")


Number of pixels of intensity 0: 88449
Number of pixels of intensity 20: 633


How many pixels of intensity 0 and 20 does the apollo image contain?

In [6]:
nPixel0InApollo  = 88449
nPixel20InApollo = 633

### Testing

Suppose that you had mistakenly used an array of type np.uint8.\
How many pixels of intensity 0 and 20 would you compute for the apollo image?

In [7]:
nPixel0InApolloIfTypeUint8  = np.uint32
nPixel20InApolloIfTypeUint8 = np.uint32

This establishes the importance of testing your code.\
How could you find this error using testing?\
One idea would be to use a large black image, for which it is simple to calculate the correct answer for the 0-count.\
The code should also document the maximum size image it will handle.\
If you use np.uint16 type for your histogram, what is the largest square image you can handle?

In [8]:
max_hist_count = np.iinfo(np.uint16).max  
maxSqSizeWithUint16 = int(np.sqrt(max_hist_count))

In [9]:
max_hist_count = np.iinfo(np.uint16).max
maxSqSizeWithUint16 = int(np.sqrt(max_hist_count))
print(maxSqSizeWithUint16)  

255


### Capturing this code in a function

Now write a function (in your Python script) that computes the histogram of an image.\
Below is its docstring.
(Be sure to move back to the correct type after your experimentation above with the wrong type.)

In [14]:
import numpy as np
from PIL import Image

def luminanceHist(fn):
    """
    Build a histogram of the intensities of a grayscale image.

    Params:
    fn (str): image filename

    Returns:
    np.ndarray: histogram of grayscale intensities
    """
    # Load the image and convert to grayscale
    image = Image.open(fn).convert('L')
    image_array = np.array(image)

    # Initialize the histogram array
    hist = np.zeros(256, dtype=np.uint32)

    # Build the histogram
    for row in image_array:
        for pixel in row:
            hist[pixel] += 1

    return hist


#filename = 'Quercus_stellata_27.jpg'
filename = 'apollo.jpg'
histogram = luminanceHist(filename)
print(histogram)


[88449 13619  5321  2892  2826  1921  1731  1691  1345  1299  1142  1050
  1002   937   863   811   768   782   673   668   633   669   639   592
   623   620   569   601   611   590   592   579   646   633   594   613
   587   665   650   656   619   671   643   698   678   653   723   630
   658   720   746   729   773   786   810   811   809   859   837   881
   963   946   969   934   989  1030  1034  1028  1073  1103  1089  1139
  1121  1223  1223  1215  1283  1294  1312  1349  1403  1447  1438  1490
  1531  1557  1514  1621  1575  1612  1638  1630  1696  1588  1713  1705
  1650  1613  1654  1588  1599  1639  1581  1573  1564  1510  1476  1501
  1468  1537  1437  1487  1401  1427  1462  1471  1441  1398  1512  1377
  1427  1500  1405  1428  1363  1407  1417  1384  1375  1365  1342  1341
  1311  1374  1302  1334  1310  1320  1269  1344  1324  1295  1359  1264
  1277  1287  1280  1181  1200  1244  1304  1318  1284  1285  1294  1249
  1250  1189  1267  1237  1217  1228  1231  1231  1

Call this function on the leaf image I have given you:\
Quercus_stellata_27.jpg.
Then calculate the following values for the leaf.

In [11]:
filename = 'Quercus_stellata_27.jpg'
histogram = luminanceHist(filename)

nPixel0InLeaf = histogram[0]
nPixel20InLeaf = histogram[20]
maxCountInLeaf = np.max(histogram)

print(f"Number of pixels of intensity 0: {nPixel0InLeaf}")
print(f"Number of pixels of intensity 20: {nPixel20InLeaf}")
print(f"Maximum count in the leaf image: {maxCountInLeaf}")


Number of pixels of intensity 0: 15
Number of pixels of intensity 20: 2036
Maximum count in the leaf image: 3829909


### Writing the histogram array

Now that you have computed the histogram, using the image, 
you can write the data to a file, so that another piece of code
(e.g., a Jupyter notebook) can use it, by reading it.
Since we like building histogram images using tables, we will write it as a csv file, since we can easily read this into a table.  

### CSV file

A csv file uses comma-separated values.\
In our table csv files, the first line of the file is the names of the columns.\
We will have two columns: Intensity and Count.\
Then every subsequent row has these two values, separated by a comma.\
For example, here is the start of a csv file recording the apollo histogram.

Intensity,Count\
0,102963\
1,3312\
2,2770\
...

### Opening a file

The first step in writing a file is to open a file.\
The Python function is open.
You can check that it is a built-in function, so requires no imports.

The first parameter is the name of the file, a string.\
We will call our output file of the leaf histogram `hist_Quercus_stellata_27.csv`.\
The suffix indicates the type of the file.\
The filename indicates its contents.\
Avoid filenames like `paper.pdf` or `hw.py` or `histogram.csv`
that tell nothing about the file and do not distinguish it from other files.\
This is one place where a longer name is valid.

The second parameter is the mode of opening: read, write, etc.\
The default mode is read, so we need to set the mode to write.\
We will use 'w' mode which will empty the file before writing to it.\
If we use 'a' mode, the file would be left as is and writing will append.

We will also set a parameter called `encoding`: we will use a standard encoding `utf-8`.\
(Interestingly, there is no default encoding.)
Here we will name the parameter to set this encoding parameter, 
since it is not the third parameter.
(Named parameters are a nice Python feature, not present in most other languages.)

The open function returns a file handle, a pointer to the file that can be used to refer to the file in future commands.

Pulling all of this together, here is how we open the file.


In [12]:
f = open('hist_Quercus_stellata_27.csv', 'w', encoding='utf-8')

### Writing a file

Output to a file must be a string, even if it is a number.\
Therefore, you will cast numbers to strings.\
For example, 25 is written as str(25).
(Reading a file will also use strings exclusively: everything is read as a string, and then cast to the appropriate type. 
In this case, the 25 would be read as '25' and cast using int().)

The file object returned by open (whose type depends on how open was called) will have a method called `write` with a string parameter.\
For example, the following code writes the number 42 to a file called `hitchhiker.txt`.

In [13]:
f = open('hitchhiker.txt', 'w', encoding='utf-8')
f.write(str(42) + '\n')
f.close()

### Adding whitespace to the output file

You will notice the addition of a newline character in this example.\
An interesting thing about writing to a file is that whitespace must be written explicitly.\
This is also true about output to the screen (printing), but here the addition is often implicit, since print automatically adds a newline character to the end of the string (by default).

Whitespace is a very important component of output
(as indeed it is in writing code, for a different reason of readability).
Whitespace communicates important information about the data,
and is used in reading the file.\
For example, in writing a table to a csv file, each row of the file
is a different row in the table.

The standard whitespace added is spaces and newline characters.\
' ' is a space (a string of length one containing a space).\
The newline character is more subtle: there are actually many choices here, and Windows expects different newline combinations than Unix systems.
But we will simply use the Unix newline character '\n' here.

### Closing a file

A file should be closed after output is complete, for many reasons,
including security, memory management, and aesthetic symmetry.

### Writing the histogram

Go ahead and write your leaf histogram now.\
Hint: your file should have 257 lines.

### Finished, fini, completo, fertig

Congratulations.