<a href="https://colab.research.google.com/github/besmets/RSE_course/blob/main/RSE_Lecture_01_image_processing_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# REMOTE SENSING OF THE ENVIRONMENT - LECTURE 1
# Introduction to raster images and image manipulation with Python
(c) Vrije Universiteit Brussel, Prof. Dr. Benoît SMETS - 2023-2024

------------
<br>

### Objectives of this tutorial:
1. TOOL: Learn how to use notebooks and Google Colab
2. TOOL: Learn basic Python functions to manipulate (geospatial) raster images;  
3. PROCESSING: Discover the effect(s) of image characteristics on image processing.

### PREREQUISITES
- Computer with internet connection and web browser.
- Google GMAIL account.

<br>
To make this tutorial work, you have to systematically run all cells containing code lines, from the first to the last one (Type SHIFT+ENTER to run a cell).  

<br>

------------

## PART 1: Google Colab, notebooks and Python

To start using this notebook, start by copying it on your drive.  
--> On the upper toolbar, **click on "Copy to Drive"**.  

Next, we need to access the data we will use on you Google Drive, so you can access them directly in the notebook. To do this, we will use Google Drive.  

Follow these steps:  

1) In the [GitHub repository of the RSE course](https://github.com/besmets/RSE_course), copy the URL of the folder where the exercise data are stored.  
2) Open https://download-directory.github.io/ and past the URL. Hit Enter. The folder containing the data is downloaded as a zip file.  
3) Unzip the file and move it to your Google Drive.  

Now, use Python to mount Google Drive (STEP 1) and create a ***variable*** corresponding to the directory path where the data are located (STEP 2).


In [None]:
# import useful packages and functions
from google.colab import drive
from google.colab import files

# STEP 1
drive.mount('/content/gdrive')
  # This line will ask you the permission to connect to your Google Drive. Follow the steps to grant the access.

# STEP 2
lecture01_path = 'gdrive/My Drive/__RSE_colab_data/Lecture_01'
  # Change directory path to the one of your project folder!

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


You are now ready to start using Google Colab for this lecture. Let's play with images!

(end of Part 1)
<br>
-----------------

## PART 2: Introduction to image visualisation with Python

### 2.1.  Opening and displaying an image

Several Python packages exist to read images. We will now test some of them to understand how to handle such type of data with Python, and we will display these images in different ways.  

**Opening and diplaying a simple JPEG image with matplotlib**  
A simple way to read and display an image is to use de popular plotting package MATPLOTLIB  
<span style="color:Gray">*(further information on matplotlib --> https://matplotlib.org/)*</span>

In [None]:
# First, load the useful matplotlib modules
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

# Load the image (adapt the directory path!)
directory = f"{lecture01_path}/JPG/"
image_name = 'VUB_logo.jpg'
image_path = f"{directory}{image_name}"
image_matplotlib = mpimg.imread(image_path)

# Display the loaded image
plt.imshow(image_matplotlib)

**Opening and displaying a simple JPEG image with Pillow (PIL)**  
Let's now use a package more specifically dedicated to images!  
<span style="color:Gray">*(further information on Pillow --> https://python-pillow.org/)*</span>

In [None]:
# First, load the useful Pillow module
from PIL import Image

# Load the image
directory = f"{lecture01_path}/JPG/"
image_name = 'VUB_logo.jpg'
image_path = f"{directory}{image_name}"
image_pil = Image.open(image_path)

# Display the loaded image
display(image_pil) # To display within the jupyter notebook
#image.show()  # To display in a separate window

**Opening and displaying a simple JPEG image with OpenCV (cv2)**  
OpenCV is a very powerful software primarily dedicated to Computer Vision. It contains cutting-age tools and algorithms for image processing, like template-mathcing techniques used in digital image correlation (DIC; also called "pixel-offset tracking"). OpenCV can also be used for more basic image reading and processing.  
<span style="color:Gray">*(further information on OpenCV --> https://opencv.org/ and https://pypi.org/project/opencv-python/)*</span>

In [None]:
# First, load OpenCV
import cv2
import matplotlib.pyplot as plt  # used for the display, here

# Load the image
directory = f"{lecture01_path}/JPG/"
image_name = 'VUB_logo.jpg'
image_path = f"{directory}{image_name}"
image_cv = cv2.imread(image_path)

# Display the loaded image
plt.imshow(image_cv)

<span style="color:firebrick">... oups! The image does display but the colors are weird.</span>  

<span style="color:firebrick">Do you have any clue on the reason why?</span>  

<span style="color:firebrick">*--> Try to find the answer and wait for instruction to fill in the code cell here below!*</span>   

<br>

**SOLUTION:**  

### 2.2.  Reading image metadata

Displaying an image with Python is a first step, but we need to do more. An important step usually performed is to read the characteristics of images, e.g., their dimensions, resolution, pixel depth/encoding, etc.  

Here, we will see how to do it with the logo of the VUB, and more specifically how to extract useful values that can be used in our processing afterwards.  

**Reading image dimensions**

In [None]:
# With pillow
width, height = image_pil.size
print('Image width with Pillow  : ' + str(width))
print('Image height with Pollow : ' + str(height))
print('--------------------------------------------------')

# With OpenCV
dimensions = image_cv.shape
height = image_cv.shape[0]
width = image_cv.shape[1]
bands = image_cv.shape[2]
print('Image dimensions with OpenCV      : ', dimensions)
print('Image width with OpenCV           : ', width)
print('Image height with OpenCV          : ', height)
print('Number of image bands with OpenCV : ', bands)

**Reading image metadata**  
Images usually contain metadata with useful information. Let's see what it looks like...

In [None]:
# Import the missing Pillow module
from PIL.ExifTags import TAGS

# Select a photograph, this time
photo_name = 'DSC_0907.JPG'
photo = Image.open(directory + photo_name)

# Extracting EXIF data
exifdata = photo.getexif()

# Make the EXIF info readable
for tag_id in exifdata:
    # get the tag name
    tag = TAGS.get(tag_id, tag_id)
    data = exifdata.get(tag_id)
    # decode bytes
    if isinstance(data, bytes):
        data = data.decode()
    print(f"{tag:25}: {data}")

Try to understand the meaning of the diplayed parameters!  
- What is the difference between ***width/length*** and ***XResolution/YResolution***?
- What does the ***datetime*** mean?
- What does the ***BitsPerSample*** mean?

### 2.3.  Opening and displaying geospatial imagery

Before opening geospatial images, it is necessary to install useful python packages that are not necessarily available in the colab environment.  

In [None]:
%pip install rasterio
%pip install georasters
%pip install earthpy
%pip install glob2

**a) GDAL** (https://gdal.org/index.html; https://gdal.org/python/index.html)

GDAL (<u>G</u>eospatial <u>D</u>ata <u>A</u>bstraction <u>L</u>ibrary) is a translator library for raster and vector geospatial data formats. Actually, it is two libraries – GDAL for manipulating geospatial raster data and OGR for manipulating geospatial vector data – but we’ll refer to the entire package as the GDAL library. It also comes with a variety of useful command line utilities for data translation and processing.  

GDAL is a reference library for basic geospatial raster manipulation and processing. This is the library that is actually behind the main raster functions of QGIS. A GDAL Python module exists and can be used ast it. Another option is to call directly GDAL by launching a command line in a terminal (or command prompt) via a Python script and with the help of the 'subprocess' module *(not shown here)*.  

In [None]:
# Import the packages
import matplotlib.pyplot as plt
from osgeo import gdal

# Load image and specify the band (even if only one band)
directory_geo = f'{lecture01_path}/CumbreVieja/'
img_name = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
img = gdal.Open(f"{directory_geo}{img_name}")
img_band = img.GetRasterBand(1)

# Read the image band as a Numpy array
band_array = img.ReadAsArray()

# Plot the array image
fig = plt.figure()
plt.imshow(band_array, cmap='gray')   # The 'cmap' argument allows you to change the colormap used to display the image (default cmap = 'viridis')
plt.show()

**b) Rasterio** (https://rasterio.readthedocs.io/en/latest/)

Rasterio is based on GDAL and strongly simplifies the manipulation and display of geospatial raster images. It is a key module for geospatial raster manipulation with Python.

In [None]:
# Import the packages
import rasterio as rio
from rasterio.plot import show

# Load the image
directory_geo = f'{lecture01_path}/CumbreVieja/'
img_name = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
img = rio.open(f"{directory_geo}{img_name}", 'r')   # 'r' is for 'reading mode'. It is a default argument and, consequently, it is optional to cite it in the line of code.

# Display the image
show(img, cmap='inferno')   # The 'cmap' argument allows you to change the colormap used to display the image (default cmap = 'viridis')
    # LOOK AT "https://rasterio.readthedocs.io/en/latest/api/rasterio.plot.html?highlight=rasterio.show#" FOR MORE OPTIONS!

**c) Georasters** (https://georasters.readthedocs.io/en/latest/) *!!! This is not georaster (old module) !!!*

In [None]:
# Import the packages
import georasters as gr

# Load the image
directory_geo = f'{lecture01_path}/CumbreVieja/'
img_name = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
img = gr.from_file(f"{directory_geo}{img_name}")

# Display the image
img.plot(cmap='gray')

### 2.4.  Reading the metadata of geospatial images

**First, let's read everything!**

In [None]:
# Import useful module
from osgeo import gdal

# Image paths
directory_geo = f'{lecture01_path}/CumbreVieja/'
single_band = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
multi_band = 'landsat_stack.tif'

# Load the single band image
single_band_img = gdal.Open(f"{directory_geo}{single_band}")

# Display all informations
print(gdal.Info(single_band_img))

**Now, let's select the information we are interested in!**

In [None]:
# Load the single band image
single_band_img = gdal.Open(f"{directory_geo}{single_band}")

# Specify the layer (band in the image)
selected_band = single_band_img.GetRasterBand(1)  # this seems useless, but a band must be specified for DataType

## Extract the pixel resolution
image_resolution = single_band_img.GetGeoTransform()
pixX_image = image_resolution[1]
pixY_image = -image_resolution[5]

# Display the information
print('Key information on the image')
print('----------------------------')
print("Driver = {}/{}".format(single_band_img.GetDriver().ShortName,single_band_img.GetDriver().LongName))
print("Size = {} x {} x {}".format(single_band_img.RasterXSize,single_band_img.RasterYSize,single_band_img.RasterCount))
print('X resolution = ' + str(pixX_image) + ' m')
print('Y resolution = ' + str(pixY_image) + ' m')
print('Pixel depth = {}'.format(gdal.GetDataTypeName(selected_band.DataType)))

**Finally, let's read a multi-band image!**

In [None]:
# Load the single band image
multi_band_img = gdal.Open(f"{directory_geo}{multi_band}")

# Specify any band to display the datatype
selected_band = multi_band_img.GetRasterBand(1)  # this seems useless, but a band must be specified for DataType

## Extract the pixel resolution
image_resolution = multi_band_img.GetGeoTransform()
pixX_image = image_resolution[1]
pixY_image = -image_resolution[5]

# Display the information
print('Key information on the image')
print('----------------------------')
print("Driver = {}/{}".format(multi_band_img.GetDriver().ShortName,multi_band_img.GetDriver().LongName))
print("Size = {} x {} x {}".format(multi_band_img.RasterXSize,multi_band_img.RasterYSize,multi_band_img.RasterCount))
print('X resolution = ' + str(pixX_image) + ' m')
print('Y resolution = ' + str(pixY_image) + ' m')
print('Pixel depth = {}'.format(gdal.GetDataTypeName(selected_band.DataType)))

More information here: https://automating-gis-processes.github.io/2016/Lesson7-read-raster.html  

### 2.5.  Influence of pixel depth on file size

In [None]:
import os
from osgeo import gdal
from osgeo import gdalconst


# Load a Landsat band (UInt16)
directory = f'{lecture01_path}/CumbreVieja/'
single_band = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
input_image = f"{directory}{single_band}"


# Create a UInt8 image (here, save the new image in the input folder!)
output_image = f"{directory}/band5_Byte.tif"
gdal.Translate(output_image, input_image, format='GTiff', outputType=gdalconst.GDT_Byte)

# Display the file sizes
file_size_UInt16 = os.path.getsize(input_image)
file_size_Byte = os.path.getsize(output_image)
print('-> UInt16 image = ' + str(round(file_size_UInt16/(1024*1024), 3)) + ' MB')
print('-> Byte (or UInt8) image = ' + str(file_size_Byte) + ' B' + ' or ' + str(round(file_size_Byte/1024, 1)) + ' KB' + ' or ' + str(round(file_size_Byte/(1024*1024), 3)) + ' MB')

-> UInt16 image = 1.669 MB
-> Byte (or UInt8) image = 875020 B or 854.5 KB or 0.834 MB


(end of Part 2)
<br>
-------------

## PART 3: Image Histogram, stretching and equalization

### 3.1.  Computing and displaying image histograms

In [None]:
import rasterio as rio
from rasterio.plot import show_hist
import matplotlib.pyplot as plt

# Image paths
directory = f'{lecture01_path}/CumbreVieja/'
single_band = 'LC08_L1TP_208040_20210926_20211001_01_T1_B5.TIF'
multi_band = 'landsat_stack.tif'

# Load the image
img = rio.open(f"{directory}{single_band}")

# Display the histogram
fig, axhist = plt.subplots(1, 1)
show_hist(source=img, bins=100, title='Histogram', histtype='stepfilled', alpha=0.5, ax=axhist)
axhist.get_legend().remove() # used to remove the legend within the histogram

In [None]:
## Let's diplay a histogram for a multi-band image

# Load the image
multi_band_img = rio.open(f"{directory}{multi_band}")

# Display the histogram
fig, axhist = plt.subplots(1, 1)
show_hist(source=multi_band_img, bins=100, stacked=False, title='Histogram', histtype='stepfilled', alpha=0.5, ax=axhist)
axhist.get_legend().remove() # used to remove the legend within the histogram

In [None]:
import cv2
import matplotlib.pyplot as plt

# Load the image and create a grayscale version
jpg_directory = f'{lecture01_path}/JPG/'
jpg_image = 'DSC_0907.JPG'
bgr_image = cv2.imread(f"{jpg_directory}{jpg_image}")

grayscale_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)

# Let's define a histogram function for a grayscale image
def draw_image_histogram(image, channels, color='k'):
    hist = cv2.calcHist([image], channels, None, [256], [0, 256])
    plt.plot(hist, color=color)
    plt.xlim([0, 256])
    plt.xlabel('DN')
    plt.ylabel('Count')
    plt.show()

# display the histogram
draw_image_histogram(grayscale_image, [0])

In [None]:
# Let's define a histogram function for a RGB image
def show_color_histogram(image, stretch):
    b, g, r = image[:,:,0], image[:,:,1], image[:,:,2]
    hist_b = cv2.calcHist([b],[0],mask=None,histSize=stretch,ranges=[0,256])
    hist_g = cv2.calcHist([g],[0],mask=None,histSize=stretch,ranges=[0,256])
    hist_r = cv2.calcHist([r],[0],None,stretch,[0,256])
    plt.plot(hist_r, color='r', label="r")
    plt.plot(hist_g, color='g', label="g")
    plt.plot(hist_b, color='b', label="b")
    plt.xlabel('DN')
    plt.ylabel('Count')
    plt.legend()
    plt.show()

# Display the histograms
show_color_histogram(bgr_image, stretch=[256])

### 3.2.  Histogram stretching

Contrast stretching as the name suggests is an image enhancement technique that tries to improve the contrast by stretching the intensity values of an image to fill the entire dynamic range. The transformation function used is always linear and monotonically increasing.

Below figure shows a typical transformation function used for Contrast Stretching.

![Sketch of histogram](https://i2.wp.com/theailearner.com/wp-content/uploads/2019/01/linear_Transform.png?w=365&ssl=1)

By changing the location of points (r1, s1) and (r2, s2), we can control the shape of the transformation function. For example,

- When r1 =s1 and r2=s2, transformation becomes a **Linear function**.
- When r1=r2, s1=0 and s2=L-1, transformation becomes a **thresholding function**.
- When (r1, s1) = (rmin, 0) and (r2, s2) = (rmax, L-1), this is known as **Min-Max Stretching**.
- When (r1, s1) = (rmin + c, 0) and (r2, s2) = (rmax – c, L-1), this is known as **Percentile Stretching**.

<br>

**MIN-MAX stretching**

Python/OpenCV can do contrast stretching via the cv2.normalize() method using min_max.

In [None]:
import numpy as np

# Make 2 version of the normalized version
norm_img1 = cv2.normalize(grayscale_image, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
norm_img2 = cv2.normalize(grayscale_image, None, alpha=0.5, beta=2, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
    # alpha represents the lower range boundary value (default = 0)
    # beta represents the upper range boundary value (default = 1)


# scale to uint8
norm_img1 = (255*norm_img1).astype(np.uint8)
norm_img2 = np.clip(norm_img2, 0, 1)
norm_img2 = (255*norm_img2).astype(np.uint8)

# display input and both output images
plt.title('Original')
plt.imshow(grayscale_image, cmap='gray')

In [None]:
plt.title('Just normalized')
plt.imshow(norm_img1, cmap='gray')

In [None]:
plt.title('min-max stretching')
plt.imshow(norm_img2, cmap='gray')

**--> Modify the alpha and beta values for 'norm_img2' and watch the changes!**

### 3.3.  Histogram equalization

In [None]:
# Equalize the histogram of the grayscale image
equalized_grayscale = cv2.equalizeHist(grayscale_image)

# Display the equalized version of the image
plt.imshow(equalized_grayscale, cmap='gray')

In [None]:
# Display the histograms
draw_image_histogram(grayscale_image, [0])
draw_image_histogram(equalized_grayscale, [0])

(end of Part 3)
<br>
-------------

### RECOMMENDATIONS

<b><u>GDAL</u></b>

Do not hesitate to learn some GDAL options by yourself. An alternative way of understanding GDAL is to use GDAL tools in QGIS. When the tool's window opens and parameters are selected, there is a cell in the window showing the command line used for the GDAL processing. Pretty useful to understand the tools and get to know the required arguments!

<b><u>Rasterio</u></b>

If you would like to discover additional functions of rasterio and play a bit more with raster data, I suggest you to consult the following links:
- https://atmamani.github.io/cheatsheets/open-geo/open-geo-raster-1/
- https://kodu.ut.ee/~kmoch/geopython2019/L4/raster.html  

Based on the information provided, write your own script and play with the data.

<b><u>Finding solutions by yourself</u></b>

If your script does not work or if you would like to add options you don't know, **"Google your question**! If you have a problem or question, the chance is very high that someone had it befor you. On the internet, Python is very well documented. There are lots of websites, tutorials and forums for programmers, where you can find a solution to your problem. Actually, even advanced users "google" their questions and copy-paste some lines of code they find on the internet. "This is the way!" ;-)

Keep in mind that **the best way to learn Python and its modules (here, for geospatial analysis) is to practice as much as possible.**

------------------