# Raster Data Formats
Raster data consists of rows and columns of cells or pixels.
The cell values represent the phenomenon portrayed by the raster dataset such as a category, magnitude, height, or spectral (color) value. 

 * The _category_ could be a land-use class such as grassland, forest, or road.
 * A _magnitude_ might represent gravity, noise pollution, or percent of rainfall. 
 * Height (distance) could represent surface elevation above mean sea level, which can be used to derive slope, aspect, and watershed properties. 
 * Spectral values (or color) represent the visible or non-visible "light" reflected from objects.
 
Cell values can be either positive or negative, integer, or floating point.
The next lab dives into _data types_ and how they are typically used.

Example raster:

<table border=1>
<tr><td> 1 </td><td> 1 </td><td> 1 </td><td> 2 </td></tr>
<tr><td> 1 </td><td> 1 </td><td> 2 </td><td> 2 </td></tr>
<tr><td> 1 </td><td> 3 </td><td> 3 </td><td> 2 </td></tr>
<tr><td> 4 </td><td> 4 </td><td> 3 </td><td> 2 </td></tr>
</table>

The easiest way to think of raster data is as images, which is how they are typically represented by software. 
However, raster datasets are not necessarily stored as images. 
They can also be ASCII text files or Binary Large Objects (BLOBs) in databases.

Raster datasets may contain multiple bands, meaning that different measurements, such as colors (wavelengths of light) can be collected at the same time over the same area. 
A typicaly example is that rasters may have the colors Red, Green, and Blue.
Often, this range is from 3-7 bands but can be several hundred in hyperspectral systems.

Like vector data, raster data can come in a variety of formats. 
The open source raster library called Geospatial Data Abstraction Library (GDAL), 
which includes the vector OGR library mentioned earlier, 
lists over 130 supported raster formats (http://www.gdal.org/formats_list.html).

**GeoTIFF** (Geospatial Tagged Image File Format) is one of the most commonly used geospatial raster formats. 
It may have following extensions _.tiff_, _.tif_, or _.gtif_. 
The data is stored as a single file in this format.
In general, these files tend to be very large. 
Various other formats have been proposed such as Multi-resolution Seamless Image Database (MrSID) and Enhanced Compression Wavelet (ECW) that stores a compressed version of original data (note, compression may degrade the source data).

In the rest of this lab, we are going to acquire, open, and visualize some raster geospatial data.

## Dataset

The dataset we would be using in these tutorials is a raster teaching data subset collected over the National Ecological Observatory Network's Harvard Forest. 
Read more about the dataset at [NEON Airborne data](http://www.neonscience.org/data-collection/airborne-remote-sensing).


## Library access

Most open source libraries and software that interacts with raster data rely on the [GDAL](http://www.gdal.org/) library.

### Rasterio
[Rasterio](https://mapbox.github.io/rasterio/installation.html) is a library that provides geospatial abstraction over GDAL. 

The following example provides an overview of how to read and visualize raster data using RasterIO.


### Accessing NEON remote sensing data

First we will download, then unzip the raster image.

In [None]:
import urllib.request
import shutil
from pathlib import Path
from zipfile import ZipFile

file_URL = 'https://ndownloader.figshare.com/files/3701578'

local_file_name = 'NEONDSAirborneRemoteSensing.zip'

file_path = Path('../temp/')
file_path /= local_file_name

with urllib.request.urlopen(file_URL) as response, file_path.open(mode='w+b') as outfile:
    shutil.copyfileobj(response, outfile) 
    
to_unzip = ZipFile('../temp/NEONDSAirborneRemoteSensing.zip', 'r')
unzipped = '../temp/NEONDSAirborneRemoteSensing_unzipped'
to_unzip.extractall(unzipped)
to_unzip.close()

If we look inside the directory, we can see that we have a folder and a ZIP file now:

```BASH
$ ls temp/
NEONDSAirborneRemoteSensing_unzipped  NEONDSAirborneRemoteSensing.zip
$ ls temp/*
temp/NEONDSAirborneRemoteSensing.zip

temp/NEONDSAirborneRemoteSensing_unzipped:
NEON-DS-Airborne-Remote-Sensing
```

If we search in the folder of unzipped data, we find a lot of GeoTIFF images:

```BASH
$ find temp/NEONDSAirborneRemoteSensing_unzipped
temp/NEONDSAirborneRemoteSensing_unzipped
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/CHM
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/CHM/HARV_chmCrop.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DSM
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_dsmCrop.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DSM/HARV_DSMhill.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DTM
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_dtmCrop.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/DTM/HARV_DTMhill_WGS84.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_Ortho_wNA.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_metadata.txt
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/HARV_RGB_Ortho.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_dsmCrop.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_dsmCrop.tif.aux.xml
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_dsmHill.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_dsmHill.tif.aux.xml
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_DSMhill_WGS84.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DSM/SJER_DSMhill_WGS84.tif.aux.xml
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DTM
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DTM/SJER_dtmCrop.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DTM/SJER_dtmCrop.tif.aux.xml
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DTM/SJER_dtmHill.tif
temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/SJER/DTM/SJER_dtmHill.tif.aux.xml
```

---


### Reading raster data

Rasterio provides python access to raster data. 
It provides an `open()` function to read the datastream and load the pixels into the memory.

In [None]:
RASTER_PATH = '../temp/NEONDSAirborneRemoteSensing_unzipped/NEON-DS-Airborne-Remote-Sensing/HARV/RGB_Imagery/'
RASTER_DATA_FILE = RASTER_PATH + 'HARV_RGB_Ortho.tif'
import rasterio

raster_data = rasterio.open(RASTER_DATA_FILE)
print(type(raster_data))

### Multi-band raster data

In [None]:
print("Number of bands: {}".format(raster_data.count))

Looks like our raster data has 3 bands.

#### So, What exactly is a band?

A band is a measure of single characteristic. 
Some rasters have a single band, or layer, of data; while others have multiple bands. 
A band is represented by a single matrix of cell values, and a raster with multiple bands contains multiple spatially coincident matrices of cell values representing the same spatial area. 

An example of a single-band raster dataset is a digital elevation model (DEM). 
Each cell in a DEM contains only one value representing surface elevation. 
A satellite image, for example, commonly has multiple bands representing different wavelengths from the ultraviolet through the visible and infrared portions of the electromagnetic spectrum. 
Landsat imagery, for example, is data collected from seven different bands of the electromagnetic spectrum. 
Bands 1–7, excluding 6, represent data from the visible, near infrared, and midinfrared regions. 
Band 6 collects data from the thermal infrared region.

In our case, the imagery is a `RGB (Red, Green, Blue)` dataset collected by high-resolution `RGB` camera. 
Each band represents the reflectance values of one of `RGB` color spectrum.
So, the first band is Red, the second Green, the third Blue.

One pixel position has three distinct measurements!
How much red, green, and blue light is reflected from that position.

Example multi-band raster:

<table border=1 cellpadding=5>
<tr><th>RED</th><th></th><th>GREEN</th><th></th><th>BLUE</th></tr>
<tr><td>
    <table border=1>
        <tr><td> 11 </td><td> 11 </td><td> 11 </td><td> 12 </td></tr>
        <tr><td> 11 </td><td> 11 </td><td> 12 </td><td> 12 </td></tr>
        <tr><td> 11 </td><td> 13 </td><td> 13 </td><td> 12 </td></tr>
        <tr><td> 14 </td><td> 14 </td><td> 13 </td><td> 12 </td></tr>
    </table>
</td><td> &nbsp; &nbsp; &nbsp; &nbsp;
</td><td>
    <table border=1>
        <tr><td> 1 </td><td> 1 </td><td> 1 </td><td> 2 </td></tr>
        <tr><td> 1 </td><td> 1 </td><td> 2 </td><td> 2 </td></tr>
        <tr><td> 1 </td><td> 3 </td><td> 3 </td><td> 2 </td></tr>
        <tr><td> 4 </td><td> 4 </td><td> 3 </td><td> 2 </td></tr>
    </table>
</td><td> &nbsp; &nbsp; &nbsp; &nbsp;
</td><td>
    <table border=1>
        <tr><td> 111 </td><td> 111 </td><td> 111 </td><td> 112 </td></tr>
        <tr><td> 111 </td><td> 111 </td><td> 112 </td><td> 112 </td></tr>
        <tr><td> 111 </td><td> 113 </td><td> 113 </td><td> 112 </td></tr>
        <tr><td> 114 </td><td> 114 </td><td> 113 </td><td> 112 </td></tr>
    </table>
</td></tr>


### Visualizing raster data

#### How to read individual bands using RasterIO?

`RasterIO` represents bands by numerical index, starting from 1.
To read the first band of a dataset as Numpy `ndarray` do the following:

In [None]:
%matplotlib notebook
from rasterio.plot import show
with rasterio.open(RASTER_DATA_FILE) as src:
    band1 = src.read(1)

rasterio.plot.show(band1)


In the above example, we read and visualized band 1 data. 
If you mouseover the image, you could see, `x,y` co-ordinates of your mouse and the corresponding pixel value in the lower right corner.

#### Multi-band rasters

In [None]:
%matplotlib notebook
from rasterio.plot import show
rasterio.plot.show(raster_data.read())

With the above code snippet, we visualized multi-band raster data.
If you mouseover the image, you could see `x,y` coordinates of the mouse and `Red, Green, Blue` reflectance values of that pixel.

### Georeferencing
A GIS raster dataset is different from an ordinary image; its elements (or “pixels”) are mapped to locations on the earth’s surface. 
Every pixels of a dataset is contained within a spatial bounding box.

The center spacing of each pixel is referred to as the _ground sample distance_ or GSD.
A raster that is 1 meter GSD means that each pixel, moving left or right, up or down, from a refernce pixel is in increments of 1 meter.
Similarly, a 15 m GSD has pixels spaced 15 m apart.
This is often times referred to as the _spatial resolution_.


In [None]:
raster_data.bounds

Our example covers the world from 101985 meters (in this case) to 339315 meters, left to right, and 2611485 meters to 2826915 meters bottom to top. It covers a region 237.33 kilometers wide by 215.43 kilometers high. These distances are with reference to dataset's CRS.


In [None]:
print(raster_data.crs)
print(" ------------")
print(raster_data.crs.wkt)


---

You have learned about raster formatted data and seen examples processing the GeoTIFF format specificcally.
Numerous other formats exist, and it simply comes down to having the appropriate file drivers available on a system.

Continue on to the next lab to learn about Pixel Datatypes.