## Raster Algorithms

In previous lab, we have looked at band math and calculated [NDVI](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) 
for Landsat-8 imagery using red and near infrared bands. 
In this lab, we walk you through a few raster data algorithms. 
Some of these algorithms can be applied not only to geospatial raster data but any type of image data. 

Before we start with any of the raster algorithms, let's recalculate the **NDVI** from red and near infrared bands.

**Note:** We are processing the downsampled bands you created in a prior lab, [Transforms](./Transforms.ipynb).

In [None]:
import rasterio
import numpy as np
from rasterio.plot import show
import matplotlib.pyplot as plt

%matplotlib inline


with rasterio.open('../temp/redband_downsampled.TIF') as band1:
    redband = band1.read()

with rasterio.open('../temp/nearIRband_downsampled.TIF') as band2:
       nirband = band2.read()

nirband = nirband.astype(np.float64)
redband = redband.astype(np.float64)
ndvi = (nirband - redband)/(nirband + redband+0.0000000001) # +0.0000000001 is to avoid 0/0
        
fig = plt.figure(figsize=(6,6))
rasterio.plot.show(ndvi, cmap ='YlGn')

The first raster algorithm we are going to learn, thresholding, makes it possible to visualize only the pixels with high presence of live green vegetation.

### Thresholding

The simple [thresholding](https://en.wikipedia.org/wiki/Thresholding_%28image_processing%29) is a method that replaces each pixel in an image with a black pixel if the pixel value is is less than some fixed constant _T_ or a white pixel if the image intensity is greater than that constant. 

So, we can threshold the NDVI to obtain areas with live green vegetation.

From previous section, we know that areas containing a dense vegetation will tend to have positive values 
(say 0.3 to 0.8).
To isolate the areas contaning vegetation from others, 
we threshold the NDVI index i.e. we set a pixel with NDVI value >=0.3 to 1 and < 0.3 to 0 i.e. _T=0.3_.

In [None]:
ndvi[np.where(ndvi>=0.3)] = 1
ndvi[np.where(ndvi<0.3)] = 0
fig = plt.figure(figsize = (6,6))
rasterio.plot.show(ndvi)


In the above image, we have set the pixels with NDVI value >=0.3 to 1 and < 0.3 to 0. It can be observed that the coastal area accomodates most of the green vegetation. Apart from that, there are a few little islands of pixels that also contain some vegetation. 

---

Let's say we want to calculate the area of major vegetative fields. 
We can find the [convex hull](https://en.wikipedia.org/wiki/Convex_hull) that can highlight the major vegetative areas and find their geometrical area. 
In our case, we have 2 such areas, one in the center of the image, the other in the lower left corner. 
If we try to inclue all of the little islands, in our convex hull, then the area of convex hull would be an overestimation of actual vegetative area.

To get a reasonable approximation of vegetative area, 
we have to get rid of those little islands. This is where raster algorithms are applicable. 
We first clean the raster image by removing the small vegetative areas, and then calculate NDVI.

We use a specific algorithm called _Median Filtering_ to clean raster images.

## Median Filtering

The [median filtering](https://en.wikipedia.org/wiki/Median_filter) is a digital image processing technique, 
often used to remove noise from an image. 
Such noise reduction is a typical pre-processing step to improve the results of later processing. 

The main idea of the median filter is to run through the image from pixel to pixel and replace its value with the median of neighboring pixels.

Lets consider the below raster.

<table border=1>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td style="border:1px solid black"> 100 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
</table>

_Note that, every pixel has a value of 50, except one pixel which has a value of 100._

A neighborhood is a set of pixels surrounding a pixels. 
Every pixel in the 3&times;3 matrix with highlighted borders is a neighbor of the center pixel. Here the center pixel is the pixel with the value `100`.

<table >
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 50 </td><td style="border:1px solid black"> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 100 </td><td style="border:1px solid black"> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 50 </td><td style="border:1px solid black"> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
</table>


In median filtering, we sort the values of the highlighted 3&times;3 matrix i.e.
<table>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 50 </td><td style="border:1px solid black"> 50 </td>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 100 </td><td style="border:1px solid black"> 50 </td>
<tr><td style="border:1px solid black">50 </td><td style="border:1px solid black"> 50 </td><td style="border:1px solid black"> 50 </td>
</table>
Original values: 50,50,50,50,100,50,50,50,50

sorted values : 50,50,50,50,50,50,50,50,100

Median: 50




The value of the central pixel is replace with the median value of its neighbors. So the 100 is replaced with 50.

<table border=1>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td style="border:1px solid black"> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
<tr><td>50 </td><td> 50 </td><td> 50 </td><td> 50 </td><td> 50 </td></tr>
</table>

In our case, we have few isolated pixels with high NDVI value, surrounded by background (pixels with low NDVI value). 
So, to get rid of the isolated pixels, median filtering is performed below with a window or matrix of size (7&times;7).

We will leverage code already created for this operation, namely the median filter from 
[SciPy - Signal Processing](https://docs.scipy.org/doc/scipy/reference/signal.html).

In [None]:
# See : https://docs.scipy.org/doc/scipy/reference/signal.html
from scipy.signal import medfilt
%matplotlib inline

redband = medfilt(redband, (1,7,7))
nirband = medfilt(nirband, (1,7,7))

nirband = nirband.astype(np.float64)
redband = redband.astype(np.float64)
ndvi = (nirband - redband)/(nirband + redband+0.0000000001)

fig = plt.figure(figsize=(6,6))
rasterio.plot.show(ndvi, cmap ='YlGn')


Let's threshold the obtained image to find major vegetative areas.

In [None]:
ndvi[np.where(ndvi>=0.3)] = 1
ndvi[np.where(ndvi<0.3)] = 0
fig = plt.figure(figsize = (6,6))
rasterio.plot.show(ndvi)

Comparing the above image with the one obtained previously shows that there are very few isolated foreground (pixels with high ndvi value). 
You can change the window size from (1,7,7) to (1,11,11) or (1,21,21) or (1,3,3) in 

``` python
redband = medfilt(redband, (1,7,7))
nirband = medfilt(nirband, (1,7,7))
```
and observe the different thresholding results.


# Save your Notebook
## Then, Close and Halt