# Semantic Segmentation of Aerial Imagery with Raster Vision 
## Part 5: Overview of Raster Vision Model Configuration and Setup

This tutorial series walks through an example of using [Raster Vision](https://rastervision.io/) to train a deep learning model to identify buildings in satellite imagery.</br>

*Primary Libraries and Tools*:

|Name|Description|Link|
|-|-|-|
| `Raster Vision ` | Library and framework for geospatial semantic segmentation, object detection, and chip classification with python| https://rastervision.io/ |
| `Singularity` | Containerization software that allows for transportable and reproducible software | https://docs.sylabs.io/guides/3.5/user-guide/introduction.html |
| `pandas` | Dataframes and other datatypes for data analysis and manipulation | https://pandas.pydata.org/ |
| `geopandas` | Extends datatypes used by pandas to allow spatial operations on geometric types | https://geopandas.org/en/stable/ |
| `rioxarray` | Data structures and routines for working with gridded geospatial data | https://github.com/corteva/rioxarray |
| `plotnine` | A plotting library for Python modeled after R's [ggplot2](https://ggplot2.tidyverse.org/) | https://plotnine.readthedocs.io/en/v0.12.3/ |
| `pathlib` | A Python library for handling files and paths in the filesystem | https://docs.python.org/3/library/pathlib.html |

*Prerequisites*:
  * Basic understanding of navigating the Linux command line, including navigating among directories and editing text files
  * Basic python skills, including an understanding of object-oriented programming, function calls, and basic data types
  * Basic understanding of shell scripts and job scheduling with SLURM for running code on Atlas
  * A SCINet account for running this tutorial on Atlas
  * **Completion of tutorial parts 1-4 of this series**

*Tutorials in this Series*:
  * 1\. **Tutorial Setup on SCINet**
  * 2\. **Overview of Deep Learning for Imagery and the Raster Vision Pipeline**
  * 3\. **Constructing and Exploring the Singularity Image**
  * 4\. **Exploring the dataset and problem space**
  * 5\. **Overview of Raster Vision Model Configuration and Setup <span style="color: red;">_(You are here)_</span>**
  * 6\. **Breakdown of Raster Vision Code Version 1**
  * 7\. **Evaluating training performance and visualizing predictions**

## Overview of Raster Vision Model Configuration and Setup

Raster Vision provides a plethora of classes to allow the user to configure a model. Raster Vision relies heavily on Abstract Base Classes (ABC's) and pydantic models. If you are not familiar with ABC's in python, you can learn more about them [here](https://docs.python.org/3/library/abc.html#abc.ABC), and if you are not familiar with pydantic models, you can find a brief introduction [here](https://docs.pydantic.dev/latest/) and a thorough description of how to use them [here](https://docs.pydantic.dev/latest/concepts/models/).

One of the biggest hurdles to understanding Raster Vision code is understanding all of the different classes that Raster Vision defines. Many classes in Raster Vision are subclasses of other classes in Raster Vision, or have other class objects as attributes. This can make the documentation confusing for a newcomer, as further research into one class will only yield several more unfamiliar classes. Here, we provide an overview of what classes and functions are used to configure a basic model.

###### Note: In this tutorial, all Raster Vision class names will be hyperlinks to documentation, although they will be in code format so they won't be blue or underlined.

### 1. Config Objects and the get_config() Function

Raster Vision users can configure a model pipeline by writing a python script that defines a function called `get_config()`. This function builds and returns an instance of [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html). The class [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html) is an Abstract Base Class (ABC), and users must build an instance of one of [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html)'s three concrete subclasses: 
- [`ChipClassificationConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.chip_classification_config.ChipClassificationConfig.html#chipclassificationconfig)
- [`ObjectDetectionConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.object_detection_config.ObjectDetectionConfig.html)
- [`SemanticSegmentationConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.semantic_segmentation_config.SemanticSegmentationConfig.html).
The [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html) object encapsulates all the information that the Raster Vision pipeline needs to build the model, including what Deep Learning task to perform, where the data is stored, what model architecture to build, and various hyperparameter values. The Raster Vision pipeline calls the `get_config()` function defined by the user, uses the function's output [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html) object as a blueprint for how to build the desired model, and follows the steps of the pipeline as described in [step 2](#step_2).</br></br>

When reading through the Raster Vision documentation and code, you will see many classes defined by Raster Vision with names that end with [`Config`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.pipeline.config.Config.html), such as [`RVPipelineConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.rv_pipeline_config.RVPipelineConfig.html), [`ClassConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.data.class_config.ClassConfig.html), and [`DatasetConfig`](https://docs.rastervision.io/en/0.21/search.html?q=datasetconfig&check_keywords=yes&area=default). All of these objects are subclasses of Raster Visions [`Config`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.pipeline.config.Config.html) class, which is itself a pydantic model. Config objects are created to take advantage of pydantic's validation features, so behind the scenes, Raster Vision can validate the user's input to ensure that all of the parameters are valid. Many [`Config`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.pipeline.config.Config.html) objects have associated objects - for example, [`DatasetConfig`](https://docs.rastervision.io/en/0.21/search.html?q=datasetconfig&check_keywords=yes&area=default) objects are blueprints for pytorch [`Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) objects and [`SemanticSegmentationConfig`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.semantic_segmentation_config.SemanticSegmentationConfig.html) objects are blueprints for [`SemanticSegmentation`](https://docs.rastervision.io/en/0.21/api_reference/_generated/rastervision.core.rv_pipeline.semantic_segmentation.SemanticSegmentation.html#rastervision.core.rv_pipeline.semantic_segmentation.SemanticSegmentation) objects. This allows Raster Vision to validate the user's input before creating and using an object.</br></br>

### 2. Directory Tree
There are many different ways a user can set up a directory tree to store their singularity file, code scripts, input data, and output files. This is the directory tree present in the `/reference/workshops/rastervision`. Recall that in the setup instructions, we transfered the `model/` directory and this document to the project directory. The model directory contains a symbolic link to the input directory, so we don't have to transfer all of our data files. You will have the opportunity to create the singularity image using the `make_singularity_img.sh` script.

|-- model/ </br>
|-- |-- local/ </br>
|-- |-- src/ </br>
|-- |-- run_model.sh </br>
|-- |-- @input </br>
|-- |-- make_singularity_img.sh </br>
|-- input/ </br>
|-- |-- train/ </br>
|-- |-- test/ </br>
|-- |-- val/ </br>
|-- rastervision-0.21.2-dev.sif </br>
|-- Raster_Vision_workshop.ipynb </br>
|-- rastervision_env

The `model` directory contains all of our code. The `model/src` directory contains python scripts that define different versions of the `get_config()` function. Later in this tutorial, we will run the Raster Vision pipeline with these different scripts to to compare the outputs. The file `model/run_model.sh` is a shell script we use to execute the pipeline through SLURM. This script builds the singularity image with the needed path bindings, invokes the raster vision pipeline, and refers to the desired python script in `model/src`. The `model/local` directory is included to provide scratch space for singularity. We don't need to put any files in this directory, but singularity will use it when we build our container, and will throw errors if it does not exist. Lastly, the `output` directory is where the raster vision pipeline will put the model output files, including the model evaluation metrics, the model bundle, and prediction rasters.

The `input` folder contains all of our data, separated into training, testing, and validation. Lastly, we have our singularity image, `rastervision-0.21.2-dev.sif`. We will reference these files in place.