# Raster Intro Tutorial
This tutorial provides a brief introduction to the `Raster` class, which facilitates working with raster datasets.

## Introduction

Raster datasets are fundamental to pfdf - many routines require rasters as input, and many produce new rasters as output. In brief, a raster dataset is a rectangular grid composed of _pixels_, which are rectangular cells with assigned data values. The pixels are regularly spaced along the X and Y axes, and each axis may use its own spacing interval. A raster is usually associated with some spatial metadata, which locates the raster's pixels in space. Some rasters will also have a NoData value - when this is the case, pixels equal to the NoData value represent missing data.

A raster's spatial metadata consists of a coordinate reference system (CRS) and an affine transformation matrix (also known as the _transform_). The transform converts the data grid's column indices to spatial coordinates, and the CRS specifies the location of these coordinates on the Earth's surface. A transform defines a raster's resolution and alignment (the location of pixel edges) and takes the form:

$$
\begin{vmatrix}
dx & 0 & \mathrm{left}\\
0 & dy & \mathrm{top}
\end{vmatrix}
$$

Here _dx_ and _dy_ are the change in spatial coordinate when incrementing one column or row, and their absolute values define the raster's resolution. Meanwhile, _left_ and _top_ indicate the spatial coordinates of the data grid's left and top edges, which defines the raster's alignment. The two remaining coefficients can be used to implement shear transforms, but pfdf only supports rectangular pixels, so these will always be 0 for our purposes.

To facilitate working with these datasets, pfdf provides the ``Raster`` class. In brief, the class provides routines to

* Load and build rasters from a variety of sources,
* Manage data values,
* Manage spatial metadata,
* Preprocess datasets, and
* Save rasters to file.

This tutorial provides a brief introduction to the `Raster` class, sufficient for implementing a basic hazard assessment. You can also find more detailed discussions in the [Raster Properties](06_Raster_Properties.ipynb), [Raster Factories](07_Raster_Factories.ipynb), and [Preprocessing](04_Preprocessing.ipynb) tutorials.

## Prerequisites

### Install pfdf
To run this tutorial, you must have installed [pfdf 3+ with tutorial resources](https://ghsc.code-pages.usgs.gov/lhp/pfdf/resources/installation.html#tutorials) in your Jupyter kernel. The following line checks this is the case:

In [None]:
import check_installation

### Example Rasters
Next, we'll clean our workspace of any example datasets, and then create an example raster file to use in the tutorial. This dataset is a 50x75 grid of random integers between 0 and 100 with a border of -999 NoData values along the edges. The raster is projected in EPSG:26911 with a 10 meter resolution. 

In [None]:
from tools import workspace, examples
workspace.remove_examples()
examples.build_raster()

### Imports
Finally, we'll import the ``Raster`` class from pfdf, and some small tools to help run the tutorial. We'll also use ``numpy`` to work with raster data grids. (**Note**: Importing `Raster` can take a bit, as Python needs to compile [numba](https://numba.pydata.org/) to do so).

In [None]:
from pfdf.raster import Raster
from tools import print_path
import numpy as np

## Raster Objects

The `Raster` class is used to create and manipulate `Raster` objects. Each `Raster` object holds the data grid for a raster dataset, along with associated metadata. Here, we'll use the `from_file` command to create a new `Raster` object from our example raster file. We'll discuss this command [in a later section](#Raster-Factories), but for now, just know that it's creating a `Raster` object from our example dataset:

In [None]:
raster = Raster.from_file('examples/raster.tif')

Printing the raster to the console, we can see a summary of the raster's data grid and spatial metadata:

In [None]:
print(raster)

## Raster Properties

`Raster` objects have a variety of properties that return information of the raster's data grid and metadata. This section only introduces a few common properties, but you can find a more detailed discussion in the [Raster Properties](06_Raster_Properties) tutorial.

### Data Grid

You can use the `values` property to return a `Raster` object's data grid as a numpy array. For our example dataset, the data is an array of random integers, with a border of -999 NoData values along the edges:

In [None]:
print(raster.values)

The raster values are read-only. This means they'll work fine for most mathematical routines, but you'll need to make a copy if you want to alter the data elements directly. For example:

In [None]:
# Most routines are fine
median = np.median(raster.values)
print(median)

In [None]:
# But this will fail because it attempts to alter array elements
try:
    raster.values[0,:] = 0
except Exception as error:
    print('Failed because we attempted to alter the array directly')

In [None]:
# This is fine because we copied the array first
values = raster.values.copy()
values[0,:] = 0
print(values)

### Array Metadata
`Raster` objects have properties to report the data array's metadata. Some useful properties include:

* `shape`: The shape of the data array
* `dtype`: The data type of the dataset
* `nodata`: The NoData value
* `nbytes`: The size of the array in bytes.

For example, inspecting these properties for our example raster, we see the data grid is 50 x 75 pixels, uses a 64-bit integer data type, has a NoData value of -999, and uses 30 KB of memory:

In [None]:
print(raster.shape)
print(raster.dtype)
print(raster.nodata)
print(raster.nbytes)

## Spatial Metadata

Other properties return the raster's spatial metadata. The most commonly used properties include:

* `crs`: The coordinate reference system as a [pyproj.crs](https://pyproj4.github.io/pyproj/stable/api/crs/crs.html) object,
* `transform`: The affine transform as a [pfdf.projection.Transform](https://ghsc.code-pages.usgs.gov/lhp/pfdf/api/projection/transform.html) object, and
* `bounds`: The bounding box as a [pfdf.projection.BoundingBox](https://ghsc.code-pages.usgs.gov/lhp/pfdf/api/projection/bbox.html) object

You can learn more about these metadata objects in the [Spatial Metadata](08_Spatial_Metadata.ipynb) tutorial.

Inspecting our example `Raster`, we can see it is projected in EPSG:26911, has a resolution of 10 CRS units (in this case, meters), and spans from 0 to 750 along the X axis, and from -500 to 0 along the Y axis:

In [None]:
raster.crs

In [None]:
raster.transform

In [None]:
raster.bounds

## Raster Factories
To create a `Raster` object, you should use a `Raster` factory method. These methods build new `Raster` objects from different types of data sources. The factories follow the naming convention `from_<type>`, where `<type>` is a particular type of data source. Some common factories include:

* `from_file`: Loads a raster from the local filesystem,
* `from_url`: Loads a raster from a web URL
* `from_array`: Builds a raster from a numpy array
* `from_points`: Builds a raster from a collection of Point or MultiPoint features
* `from_polygons`: Builds a raster from a collection of Polygon or MultiPolygon features

Each factory includes options for building `Raster` objects from the associated data source. For example, `from_file` includes an option to only load data in an area of interest, and `from_url` includes options for connecting to the remote server. You can find a detailed discussion of these factories in the [Raster Factory Tutorial](07_Raster_Factories.ipynb).

## Saving Rasters

It's often useful to save a `Raster` object to file, particularly when an analytical routine produces a new raster dataset as output. You can save `Raster` objects using the `save` command. This command takes a file name or path as input, and returns the path to the saved file as output. For example:

In [None]:
path = raster.save('examples/my-raster.tif')
print_path(path)

By default, the `save` command will not allow you to overwrite existing files. For example, calling the `save` command a second time with the same file name will fail because the file already exists:

In [None]:
try:
    raster.save('examples/my-raster.tif')
except FileExistsError:
    print('Failed because the file already exists')

You can permit overwriting by setting `overwrite=True`:

In [None]:
raster.save('examples/my-raster.tif', overwrite=True)
print('overwrote the existing file')

## Conclusion

In this tutorial, we've introduced the `Raster` class, which facilitates working with raster datasets. We've seen how to access a `Raster` object's data array, and examined properties with important metadata. We've learned that `Raster` objects are created using dedicated factory methods, and we've seen how to save raster datasets to file.

This tutorial was deliberately brief, and later tutorials examine the class in greater detail. As a reminder, you can learn more about `Raster` objects in the [Raster Properties](06_Raster_Properties.ipynb), [Raster Factories](07_Raster_Factories.ipynb), and [Preprocessing](04_Preprocessing.ipynb) tutorials. In the [next tutorial](03_Download_Data.ipynb), we'll see how to use the `data` package to download commonly used datasets (many of which are rasters) from the internet.