# About Landsat Data

> At over 40 years, the Landsat series of satellites provides the longest temporal record of moderate resolution multispectral data of the Earth’s surface on a global basis. The Landsat record has remained remarkably unbroken, proving a unique resource to assist a broad range of specialists in managing the world’s food, water, forests, and other natural resources for a growing world population. It is a record unmatched in quality, detail, coverage, and value. Source: USGS 


![_._](img/timeline-only-for-webRGB.png)

The 40 year history of landsat missions. Source: USGS - USGS Landsat Timeline
Landsat data are spectral and collected using a platform mounted on a satellite in space that orbits the earth. The spectral bands and associated spatial resolution of the first 9 bands in the Landsat 8 sensor are listed below.


![_._](img/landsat-bands.png)

Review the Landsat 8 Surface Reflectance Product Guide for more details.

There are additional collected bands that are not distributed within the Landsat 8 Surface Reflectance Product such as the panchromatic band, which provides a finer resolution, gray scale image of the landscape, and the cirrus cloud band, which is used in the quality assessment process:

![_._](img/additional-bands.png)

## Understand Landsat Data
When working with landsat, it is important to understand both the metadata and the file naming convention. The metadata tell you how the data were processed, where the data are from and how they are structured.

The file names, tell you what sensor collected the data, the date the data were collected, and more.

![_._](img/collection-filename-diffs.png)
Landsat file names Source: USGS Landsat - Landsat Scene Naming Conventions


## Landsat File Naming Convention
Landsat and many other satellite remote sensing data is named in a way that tells you a about:

- When the data were collected and processed
- What sensor was used to collect the data
- What satellite was used to collect the data.

And more.

Here you will learn a few key components of the landsat 8 collection file name. The first scene that you work with below is named:

```LC080340322016072301T1-SC20180214145802```

First, we have LC08

- **L:** Landsat Sensor
- **C:** OLI / TIRS combined platform
- **08:** Landsat 8 (not 7)

- **034032:** The next 6 digits represent the path and row of the scene. This identifies the spatial coverage of the scene

Finally, you have a date. In your case as follows:

- **20160723:** representing the year, month and day that the data were collected.

The second part of the file name above tells you more about when the data were last processed. You can read more about this naming convention using the link below.

Learn more about Landsat 8 file naming conventions.

As you work with these data, it is good to double check that you are working with the sensor (Landsat 8) and the time period that you intend. Having this information in the file name makes it easier to keep track of this as you process your data.


In [None]:
import os
from glob import glob

import matplotlib.pyplot as plt
import numpy as np
import geopandas as gpd
import xarray as xr
import rioxarray as rxr

In [None]:
# Get list of all pre-cropped data and sort the data

# Create the path to your data
landsat_post_fire_path = os.path.join("data","cold-springs-fire",
                                      "landsat_collect",
                                      "LC080340322016072301T1-SC20180214145802",
                                      "crop")

# Generate a list of tif files
post_fire_paths = glob(os.path.join(landsat_post_fire_path,
                                        "*band*.tif"))

# Sort the data to ensure bands are in the correct order
post_fire_paths.sort()
post_fire_paths

Next, open a single band from your Landsat scene. Below you use the .squeeze() method to ensure that output xarray object only has 2 dimensions and not three.



In [None]:
# Open a single band without squeeze - notice the first dimension is 1
band_1 = rxr.open_rasterio(post_fire_paths[0], masked=True)
band_1.shape

In [None]:
# Open a single band using squeeze notice there are only 2 dimensions here
# when you use squeeze
band_1 = rxr.open_rasterio(post_fire_paths[0], masked=True).squeeze()
band_1.shape

In [None]:
# Plot the data
f, ax=plt.subplots()
band_1.plot.imshow(ax=ax,
                  cmap="Greys_r")
ax.set_axis_off()
ax.set_title("Plot of Band 1")
plt.show()

Below is a function called ```open_clean_bands``` that opens a single tif file and returns an xarray object. In the following lessons you will build this function out to process and clean your Landsat data.



In [None]:
def open_clean_bands(band_path):
    """A function that opens a Landsat band as an (rio)xarray object

    Parameters
    ----------
    band_path : list
        A list of paths to the tif files that you wish to combine.
        
    Returns
    -------
    An single xarray object with the Landsat band data.

    """
     
    return rxr.open_rasterio(band_path, masked=True).squeeze()

The code below takes each band that you opened, and stacks it into a new single output array. NOTE: this approach is only efficient if you wish to process ALL of the bands in your data. Given the size of Landsat data, you likely will want to remove bands that you don’t need and if your study area is smaller than the entire image, you may also want to clip your data. You will learn how to clip and subset your data in the next lesson.

In [None]:
# Open all bands in a loop
all_bands = []
for i, aband in enumerate(post_fire_paths):
    all_bands.append(open_clean_bands(aband))
    # Assign a band number to the new xarray object
    all_bands[i]["band"]=i+1

In [None]:
# OPTIONAL: Turn list of bands into a single xarray object    
landsat_post_fire_xr = xr.concat(all_bands, dim="band") 
landsat_post_fire_xr

In [None]:
landsat_post_fire_xr.plot.imshow(col="band",
                                 col_wrap=3,
                                 cmap="Greys_r")
plt.show()

## Plot RGB image
Just like you did with NAIP data, you can plot 3 band color composite images for Landsat 

In [None]:
landsat_rgb = (landsat_post_fire_xr[[3,2,1],:,:])
landsat_rgb.astype("int").plot.imshow(rgb="band",figsize=(10, 8))
plt.title('"RGB Composite Image\n Post Fire Landsat Data"')
plt.show()



Notice that the image above looks light. You can stretch the image as you did with the NAIP data, too.

In [None]:
landsat_rgb = (landsat_post_fire_xr[[3,2,1],:,:])
landsat_rgb.plot.imshow(rgb="band",robust = True, figsize=(10, 8))
plt.title('"RGB Composite Image\n Post Fire Landsat Data"')
plt.show()

### Plot CIR
Now you’ve created a red, green blue color composite image. Remember red green and blue are colors that your eye can see.

Next, create a color infrared image (CIR) using landsat bands: 4,3,2.

In [None]:
landsat_cir = (landsat_post_fire_xr[[4,3,2],:,:])
landsat_cir.plot.imshow(rgb="band",robust=True,figsize=(10, 8))
plt.title('"CIR Landsat Image Pre-Cold Springs Fire"')
plt.show()

## Crop a Landsat Band Using Rioxarray ```rio.clip()```
Above you opened up and plotted a single band. Often, you want to crop your data to the spatial extent of your study area. Crop, removes data that you don’t need in your analysis (that that is outside of your area of interest). You could chose to open and crop each file individually using the ```rxr.open_rasterio()``` function alongside the rioxarray ```opened_xarray.rio.clip(```) function as shown below.

In order to crop a band, you need to have a

1. GeoPandas or shapely object that represents the extent of the area you want to study in the Landsat image (your crop extent).
2. The crop extent shapefile and the Landsat data need to be in the same Coordinate Reference System, or CRS.
To clip an xarray DataFrame to a GeoPandas extent, you need to create the clipped dataframe with the following syntax.

```clipped_xarray = xarray_name.rio.clip(geopandas_object_name.geometry)```


In [None]:
# Open up boundary extent using GeoPandas
fire_boundary_path = os.path.join("data","cold-springs-fire",
                                  "vector_layers",
                                  "fire-boundary-geomac",
                                  "co_cold_springs_20160711_2200_dd83.shp")

fire_boundary = gpd.read_file(fire_boundary_path)

In [None]:
print("Landsat crs is:", landsat_post_fire_xr.rio.crs)
print("Fire boundary crs", fire_boundary.crs)

In [None]:
# Reproject data to CRS of raster data
fire_boundary_utmz13 = fire_boundary.to_crs(landsat_post_fire_xr.rio.crs)
fire_boundary_utmz13.plot()
plt.show()

Once the crs has been checked you can clip the data. The ideal scenario here is that you clip the data while opening it. Below you use ```from_disk = True``` which tells rioxarray to only open the data within the clip extent. This will speed up your workflow a bit.

In [None]:
landsat_post_xr_clip = rxr.open_rasterio(post_fire_paths[0]).rio.clip(
    fire_boundary_utmz13.geometry,
    from_disk=True).squeeze()

# Notice the x and y data dimensions of your data have changed
landsat_post_xr_clip

Now that your data are open, you can plot it.



In [None]:
# Plot the data
f, ax = plt.subplots(figsize=(10, 6))
landsat_post_xr_clip.plot.imshow(cmap="Greys_r",
                                 ax=ax)
ax.set_axis_off()
ax.set_title("Band 1 - Clipped To Your Study Area")
plt.show()

Plot of the clipped Landsat 8 data with the missing data values rendered as black. These values can be masked for nicer plotting.

The above plot has a large amount of “black” fill around the outside representing fill values. When you clipped the data to the geometry, rioxarray filled all of the pixels outside of the geometry extent with a large negative number -32768.

For plotting you may wish to clean this up by masking out values.

In [None]:
# Clean the data
valid_range = (0, 10000)
# Only run this step if a valid range tuple is provided
if valid_range:
    mask = ((landsat_post_xr_clip < valid_range[0]) | (
        landsat_post_xr_clip > valid_range[1]))
    landsat_post_xr_clip = landsat_post_xr_clip.where(
        ~xr.where(mask, True, False))

In [None]:
f, ax = plt.subplots()
landsat_post_xr_clip.plot(ax=ax)
ax.set_title("Band 1 plot")
ax.set_axis_off()
plt.show()

## A Function to Crop and Clean Landsat Data
It would be nice to combine all of the steps above into a single workflow that clips and cleans your landsat data. You can take the function that you started in the previous lesson and expand it to
do all of this for you.

In [None]:
def open_clean_band(band_path, clip_extent, valid_range=None):
    """A function that opens a Landsat band as an (rio)xarray object

    Parameters
    ----------
    band_path : list
        A list of paths to the tif files that you wish to combine.

    clip_extent : geopandas geodataframe
        A geodataframe containing the clip extent of interest. NOTE: this will 
        fail if the clip extent is in a different CRS than the raster data.

    valid_range : tuple (optional)
        The min and max valid range for the data. All pixels with values outside
        of this range will be masked.

    Returns
    -------
    An single xarray object with the Landsat band data.

    """

    try:
        clip_bound = clip_extent.geometry
    except Exception as err:
        print("Oops, I need a geodataframe object for this to work.")
        print(err)

    cleaned_band = rxr.open_rasterio(band_path,
                                     masked=True).rio.clip(clip_bound,
                                                           from_disk=True).squeeze()

    # Only mask the data if a valid range tuple is provided
    if valid_range:
        mask = ((landsat_post_xr_clip < valid_range[0]) | (
            landsat_post_xr_clip > valid_range[1]))
        cleaned_band = landsat_post_xr_clip.where(
            ~xr.where(mask, True, False))

    return cleaned_band

In [None]:
cleaned_band = open_clean_band(post_fire_paths[0], fire_boundary_utmz13)

f, ax = plt.subplots()
cleaned_band.plot(ax=ax)
ax.set_title("Band 1 plot")
ax.set_axis_off()
plt.show()

## Create Your Final, Automated Workflow
Great - you now have a workflow that opens, clips and cleans a single band. However, remember that your original goal is to open, clip and clean several
Landsat bands with the goal of calculating NDVI and producing some RGB and ColorInfrared (CIR) plots.

Below you build out the entire workflow using a loop. The vector data step is
reproduced here

In [None]:
# Open up boundary extent using GeoPandas
fire_boundary_path = os.path.join("data","cold-springs-fire",
                                  "vector_layers",
                                  "fire-boundary-geomac",
                                  "co_cold_springs_20160711_2200_dd83.shp")

fire_boundary = gpd.read_file(fire_boundary_path)

In [None]:
# Get a list of required bands - bands 2 through 5
all_landsat_post_bands = glob(os.path.join(landsat_post_fire_path,
                                           "*band[2-5]*.tif"))
all_landsat_post_bands.sort()
all_landsat_post_bands

In [None]:
# Reproject your vector layer
landsat_crs = rxr.open_rasterio(all_landsat_post_bands[0], masked=True).squeeze().rio.crs

# Reproject fire boundary for clipping
fire_boundary_utmz13 = fire_boundary.to_crs(landsat_crs)

Loop through each band path, open the data and add it to a list.



In [None]:
post_all_bands = []
for i, aband in enumerate(all_landsat_post_bands):
    cleaned = open_clean_band(aband, fire_boundary_utmz13)
    # This line below is only needed if you wish to stack and plot your data
    cleaned["band"] = i+1
    post_all_bands.append(cleaned)

### Stack Your Final Cleaned Data
If you wish, you can stack all of the bands in your workflow by using the xr.concat function. Stacking the data will store all bands in one single xarray object. This step is optional and may be needed for some workflows but not for others.

In [None]:
# OPTIONAL - Stack the data
post_fire_stack = xr.concat(post_all_bands, dim="band")
post_fire_stack

In [None]:
# Plot the final stacked data
post_fire_stack.plot.imshow(col="band",
                            col_wrap=2,
                            cmap="Greys_r")
plt.show()

### Create an RGB Plot of Your Landsat Raster Data


In [None]:
post_fire_rgb = (post_fire_stack[[2,1,0],:,:])
post_fire_rgb.plot.imshow(rgb="band",robust=True, figsize=(10, 8))
plt.title('"Cropped Post Fire Landsat Data"')
plt.show()