The [data story on Lidar data](https://www.earthdatascience.org/courses/use-data-open-source-python/data-stories/lidar-raster-data/lidar-intro/) reviews the basic principles behind Lidar raster datasets.

In previous chapters you learned how to use the open source Python package Geopandas to open vector data stored in shapefile format. In this chapter you will learn how to use the open source Python packages rasterio combined with numpy and earthpy to open, manipulate and plot raster data in Python. In this chapter, you will learn how to open and plot a lidar raster dataset in Python. You will also learn about key attributes of a raster dataset:

1. Spatial resolution
2. Spatial extent and
3. Coordinate reference systems

### What is a Raster?

Raster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. A raster file is composed of regular grid of cells, all of which are the same size.

You’ve looked at and used rasters before if you’ve looked at photographs or imagery in a tool like Google Earth. However, the raster files that you will work with are different from photographs in that they are spatially referenced. Each pixel represents an area of land on the ground. That area is defined by the spatial resolution of the raster.


![_._](img/raster-concept.png)
A raster is composed of a regular grid of cells. Each cell is the same size in the x and y direction. Source: Colin Williams, NEON.

### Raster Facts

A few notes about rasters:

Each cell is called a pixel.
And each pixel represents an area on the ground.
The resolution of the raster represents the area that each pixel represents on the ground. So, a 1 meter resolution raster, means that each pixel represents a 1 m by 1 m area on the ground.
A raster dataset can have attributes associated with it as well. For instance in a Lidar derived digital elevation model (DEM), each cell represents an elevation value for that location on the earth. In a LIDAR derived intensity image, each cell represents a Lidar intensity value or the amount of light energy returned to and recorded by the sensor.
![_._](img/raster-resolution.png)


### Open Raster Data in Open Source Python

Remember from the previous lesson that raster or “gridded” data are stored as a grid of values which are rendered on a map as pixels. Each pixel value represents an area on the Earth’s surface. A raster file is composed of regular grid of cells, all of which are the same size. Raster data can be used to store many different types of scientific data including

- elevation data
- canopy height models
- surface temperature
- climate model data outputs
- landuse / landcover data

and more.

In this section you will learn more about working with lidar derived raster data that represents both terrain / elevation data (elevation of the earth’s surface), and surface elevation (elevation at the tops of trees, buildings etc above the earth’s surface). If you want to read more about how lidar data are used to derive raster based surface models, you can [check out this chapter on lidar remote sensing data and the various raster data products derived from lidar data](https://www.earthdatascience.org/courses/use-data-open-source-python/data-stories/what-is-lidar-data/).

![_._](img/lidarTree-height.png)

Digital Surface Model (DSM), Digital Elevation Models (DEM) and the Canopy Height Model (CHM) are the most common raster format lidar derived data products. One way to derive a CHM is to take the difference between the digital surface model (DSM, tops of trees, buildings and other objects) and the Digital Terrain Model (DTM, ground level). The CHM represents the actual height of trees, buildings, etc. with the influence of ground elevation removed. Graphic: Colin Williams, NEON


In [None]:
# Import necessary packages
import os

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Use geopandas for vector data and xarray for raster data
import geopandas as gpd
import rioxarray as rxr


# Prettier plotting with seaborn
sns.set(font_scale=1.5, style="white")

In [None]:
dem_pre_path = os.path.join("data","colorado-flood",
                            "spatial",
                            "boulder-leehill-rd",
                            "pre-flood",
                            "lidar",
                            "pre_DTM.tif")

dtm_pre_arr = rxr.open_rasterio(dem_pre_path)
dtm_pre_arr

When you open raster data using **xarray** or **rioxarray** you are creating an ```xarray.DataArray```. The. ```DataArray``` object stores the:

- raster data in a numpy array format
- spatial metadata including the CRS, spatial extent of the object
- and any metadata
```Xarray``` and ```numpy``` provide an efficient way to work with and process raster data. ```xarray``` also supports dask and parallel processing which allows you to more efficiently process larger datasets using the processing power that you have on your computer

When you add ```rioxarray``` to your package imports, you further get access to spatial data processing using xarray objects. Below, you can view the spatial extent (```bounds()```) and ```CRS``` of the data that you just opened above.



In [None]:
# View the Coordinate Reference System (CRS) 
print("The CRS for this data is:", dtm_pre_arr.rio.crs)


Next, you’ll learn about **spatial extent** of your raster data. The spatial extent of a raster or spatial object is the geographic area that the raster data covers.

![_._](img/raster-spatial-extent-coordinates.png)

The spatial extent of raster data. Notice that the spatial extent represents the rectangular area that the data cover. Thus, if the data are not rectangular (i.e. points OR an image that is rotated in some way) the spatial extent covers portions of the dataset where there are no data. Image Source: National Ecological Observatory Network (NEON).

![_._](img/spatial-extent.png)


The spatial extent of vector data which you will learn next week. Notice that the spatial extent represents the rectangular area that the data cover. Thus, if the data are not rectangular (i.e. points OR an image that is rotated in some way) the spatial extent covers portions of the dataset where there are no data. Image Source: National Ecological Observatory Network (NEON)
The spatial extent of an Python spatial object represents the geographic “edge” or location that is the furthest north, south, east and west. In other words, ```extent``` represents the overall geographic coverage of the spatial object.

You can access the spatial extent using the ```.bounds()``` attribute in ```rasterio```.





In [None]:
# View the spatial extent

print("The spatial extent is:", dtm_pre_arr.rio.bounds())

A raster has horizontal (x and y) resolution. This resolution represents the area on the ground that each pixel covers. The units for your data are in meters as determined by the CRS above. In this case, your data resolution is 1 x 1. This means that each pixel represents a 1 x 1 meter area on the ground. You can view the resolution of your data using the ```.res``` function.

In [None]:
# What is the x and y resolution for your raster data?
dtm_pre_arr.rio.resolution()

The ```nodata``` value (or fill value) is also stored in the ```xarray``` object.



In [None]:
print("The no data value is:", dtm_pre_arr.rio.nodata)


You can assign this string to a Python object, too. The example below only shows the code example to set a crs for an object where it is missing and you know what the CRS should be. In this case your data already has a defined CRS so this step is not necessary.



In [None]:
a_crs = dtm_pre_arr.rio.crs
# Assign crs to myCRS object - this is just an example of how you would do that
dtm_pre_arr = dtm_pre_arr.rio.set_crs(a_crs, inplace=True)

The ```CRS``` EPSG code for your ```lidar_dem``` object is 32613. Next, you can look that EPSG code up on the [spatial reference.org website](http://www.spatialreference.org/ref/epsg/32613/) to figure out what CRS it refers to and the associated units. In this case you are using UTM zone 13 North.

Digging deeper you can view the proj 4 string which tells us that the horizontal units of this project are in meters (```m```).


![_._](img/UTM-zones.png)


The UTM zones across the continental United States. Source: Chrismurf, wikimedia.org.
The ```CRS``` format, returned by python, is in a EPSG format. This means that the projection information is represented by a single number. However on the spatialreference.org website you can also view the proj4 string which will tell you a bit more about the horizontal units that the data are in. An overview of proj4 is below.

```+proj=utm +zone=18 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0```

Once you have imported your data, you can plot is using xarray.plot().


In [None]:
dtm_pre_arr.plot()
plt.show()

The data above should represent terrain model data. However, the range of values is not what is expected. These data are for Boulder, Colorado where the elevation may range from 1000-3000m. There may be some outlier values in the data that may need to be addressed. Below you look at the distribution of pixel values in the data by plotting a histogram.

Notice that there seem to be a lot of pixel values in the negative range in that plot.

In [None]:
# A histogram can also be helpful to look at the range of values in your data
# What do you notice about the histogram below?
dtm_pre_arr.plot.hist(color="purple")
plt.show()

Histogram for your lidar DTM. Notice the number of values that are below 0. This suggests that there may be no data values in the data.


Looking at the min and max values of the data, you can see a very small negative number for the minimum. This number matches the nodata value that you looked at above.


In [None]:
print("the minimum raster value is: ", np.nanmin(dtm_pre_arr.values))
print("the maximum raster value is: ", np.nanmax(dtm_pre_arr.values))

### Raster Data Exploration - Min and Max Values


Looking at the minimum value of the data, there is one of two things going on that need to be fixed:

1. there may be no data values in the data with a negative value that are skewing your plot colors
2. there also could be outlier data in your raster

You can explore the first option - that there are no data values by reading in the data and masking no data values using the ```masked=True``` parameter like this:

```rxr.open_rasterio(dem_pre_path, masked=True)```

Above you may have also noticed that the array has an additional dimension for the “band”. While the raster only has one layer - there is a 1 in the output of shape that could get in the way of processing.

You can remove that additional dimension using ```.squeeze()```

In [None]:
# Notice that the shape of this object has a 1 at the beginning
# This can cause issues with plotting
dtm_pre_arr.shape

In [None]:
# Open the data and mask no data values
# Squeeze reduces the third dimension given there is only one "band" or layer to this data
dtm_pre_arr = rxr.open_rasterio(dem_pre_path, masked=True).squeeze()
# Notice there are now only 2 dimensions to your array
dtm_pre_arr.shape

Plot the data again to see what has changed. Now you have a reasonable range of data values and the data plot as you might expect it to.

In [None]:
# Plot the data and notice that the scale bar looks better
# No data values are now masked
f, ax = plt.subplots(figsize=(10, 5))
dtm_pre_arr.plot(cmap="Greys_r",
                 ax=ax)
ax.set_title("Lidar Digital Elevation Model (DEM) \n Boulder Flood 2013")
ax.set_axis_off()
plt.show()

The histogram has also changed. Now, it shows a reasonable distribution of pixel values.



In [None]:
f, ax = plt.subplots(figsize=(10, 6))
dtm_pre_arr.plot.hist(color="purple",
                      bins=20)
ax.set_title("Histogram of the Data with No Data Values Removed")
plt.show()


Notice that now the minimum value looks more like an elevation value (which should most often not be negative).



In [None]:
print("The minimum raster value is: ", np.nanmin(dtm_pre_arr.data))
print("The maximum raster value is: ", np.nanmax(dtm_pre_arr.data))

### Plot Raster and Vector Data Together
If you want, you can also add shapefile overlays to your raster data plot. Below you open a single shapefile using Geopandas that contains a boundary layer that you can overlay on top of your raster dataset.

In [None]:
# Open site boundary vector layer
site_bound_path = os.path.join("data",
                               "colorado-flood",
                               "spatial",
                               "boulder-leehill-rd",
                               "clip-extent.shp")
site_bound_shp = gpd.read_file(site_bound_path)

# Plot the vector data
f, ax = plt.subplots(figsize=(8,4))
site_bound_shp.plot(color='teal',
                    edgecolor='black',
                    ax=ax)
ax.set(title="Site Boundary Layer - Shapefile")
plt.show()

Once you have your shapefile open, can plot the two datasets together and begin to create a map.

In [None]:
# Check CRS
site_bound_shp.crs

In [None]:
f, ax = plt.subplots(figsize=(11, 4))

dtm_pre_arr.plot.imshow(cmap="Greys",
                        ax=ax)
site_bound_shp.plot(color='None',
                    edgecolor='teal',
                    linewidth=2,
                    ax=ax,
                    zorder=4)

ax.set(title="Raster Layer with Vector Overlay")
ax.axis('off')
plt.show()