# Semantic Segmentation of Aerial Imagery with Raster Vision 
## Part 3: Constructing and Exploring the Singularity Image

This tutorial series walks through an example of using [Raster Vision](https://rastervision.io/) to train a deep learning model to identify buildings in satellite imagery.</br>

*Primary Libraries and Tools*:

|Name|Description|Link|
|-|-|-|
| `Raster Vision ` | Library and framework for geospatial semantic segmentation, object detection, and chip classification with python| https://rastervision.io/ |
| `Singularity` | Containerization software that allows for transportable and reproducible software | https://docs.sylabs.io/guides/3.5/user-guide/introduction.html |
| `pandas` | Dataframes and other datatypes for data analysis and manipulation | https://pandas.pydata.org/ |
| `geopandas` | Extends datatypes used by pandas to allow spatial operations on geometric types | https://geopandas.org/en/stable/ |
| `rioxarray` | Data structures and routines for working with gridded geospatial data | https://github.com/corteva/rioxarray |
| `plotnine` | A plotting library for Python modeled after R's [ggplot2](https://ggplot2.tidyverse.org/) | https://plotnine.readthedocs.io/en/v0.12.3/ |
| `pathlib` | A Python library for handling files and paths in the filesystem | https://docs.python.org/3/library/pathlib.html |

*Prerequisites*:
  * Basic understanding of navigating the Linux command line, including navigating among directories and editing text files
  * Basic python skills, including an understanding of object-oriented programming, function calls, and basic data types
  * Basic understanding of shell scripts and job scheduling with SLURM for running code on Atlas
  * A SCINet account for running this tutorial on Atlas
  * **Completion of tutorial parts 1 and 2 of this series**

*Tutorials in this Series*:
  * 1\. **Tutorial Setup on SCINet**
  * 2\. **Overview of Deep Learning for Imagery and the Raster Vision Pipeline**
  * 3\. **Constructing and Exploring the Singularity Image <span style="color: red;">_(You are here)_</span>**
  * 4\. **Exploring the dataset and problem space**
  * 5\. **Overview of Raster Vision Model Configuration and Setup**
  * 6\. **Breakdown of Raster Vision Code Version 1**
  * 7\. **Evaluating Training Performance and Visualizing Predictions**

## Constructing and Exploring the Singularity Image
#### Users who are not familiar with containerization are strongly encouraged to go through [this tutorial](https://carpentries-incubator.github.io/singularity-introduction/). </br>

### 1. Containerization Background and Setup
One of the most difficult aspects of software development is setting up the computing environment - ensuring you are running your code with all the right software configurations set and dependency versions installed. You may build an application on your machine, but struggle to get it to work the same way on a different machine because of differing software installations and configurations. Containerization is used to prevent dependency issues and improve the portability of code. Containers are collections of code along with all the needed libraries and dependencies that can be easily moved from one machine to another. Since all the correct versions of all the dependencies are included in the container, users won't run into issues of needing different versions of a dependency for different applications. Docker and Singularity are two different containerization platforms, each with their own pros and cons. </br>

##### Terminology note: an *image* is a snapshot of a computing environment, like a blueprint for a container. A *container* is an isolated computing environment built from the instructions in the image. Containers are running instances of images.</br>

The developers of Raster Vision publish the Raster Vision software as Docker images to simplify the process of running the Raster Vision pipeline. New versions of Raster Vision are released as Docker images [here](https://quay.io/repository/azavea/raster-vision?tab=tags). Docker is a popular containerization tool, however it requires root access and therefore can't be used on an HPC. Singularity, on the other hand, can be used on an HPC, so in the following instructions, we will build a singularity image out of Raster Vision's docker image so we can run Raster Vision on Atlas. </br></br>
First, ensure that the variables `$project_dir` and `$project_name` are available. If you have started a new Jupyter session since creating these variables, then you will need to create them again. Check to see if they are available by running:</br>
`echo $project_dir`</br>
`echo $project_name`</br>
##### If the project directory and project name do not appear, then return to the tutorial setup instructions in Part 1 of the series to create these variables before proceeding. </br>

By default, singularity will cache all downloaded images to `$HOME/.singularity` so if the user deletes an image and attempts to re-download the same version, the image will be pulled from the local cache instead of a remote repository. This is a useful feature to decrease network demand, however Atlas users have limited space in their home directories, and the singularity cache can quickly fill up the limited space. The SCINet office recommends configuring the cache directory as follows to avoid filling up your home directory:</br></br>
`export SINGULARITY_CACHEDIR=$TMPDIR`</br>
`export SINGULARITY_TMPDIR=$TMPDIR`</br></br>
Next, we will navigate to the project directory and run a script to pull a Raster Vision image from the remote repository. Note that this will take a while to run, so we recommend continuing with the following reading while this code runs. </br></br>
`cd ${project_dir}/model` </br>
`sbatch --account=$project_name make_singularity_img.sh` 

### 2. Singularity File Systems
In addition to providing an isolated computing environment, singularity containers also have their own file systems separate from the host system's file system. Directories in the host system are made available within the container's file system by _binding_ directories. For example, say you have a directory of data files on the host file system at `/project/example/data` that you would like to have access to within the container. You could make this directory available within the container by binding the directory `/project/example/data` to a directory in the container's file system, such as `/opt/data`. Then, when you start the container, you can navigate to `/opt/data` within the container and access the files in `/project/example/data` on the host system. If you modify files in the container in `/opt/data`, then these changes will also affect the host system at `/project/example/data`. This way, we can save files to the host system from within the container to access later. Note that the permissions you have on the host system will be identical to the permissions you have within the container, so you can't perform any actions to the host's file system within a container that you couldn't otherwise do outside of the container.</br></br>
Depending on the administrative configurations of the host system, certain directories in the host's file system are bound to directories in the container's file system by default. For example, it is common for the directory `$HOME` in the host's file system to be bound to the directory `/home` within the container, and for the working directory on the host system to be bound to a directory with the same name in the container. If you wish to bind additional directories, you can specify the directories you'd like to bind when you launch the container. We will discuss the specifics of how to bind directories later in section 4 after we discuss how to launch a container.

### 3. Launching a Singularity Container
There are several singularity commands that we can use to launch a singularity container from a singularity image file (.sif file). The most common commands are `shell`, `run`, and `exec`. Here is a quick overview of these three commands: </br>

`singularity shell my_image.sif` will build the container and launch an interactive shell environment in the container. This is useful for exploring the container interactively, and for debugging. You can shut down the container with the `exit` command. We will use this command soon to explore the Raster Vision container.</br></br>
`singularity run my_image.sif` will run the default _runscript_ within the my_image container. A _runscript_ is included within a singularity image to specify the default behavior when we "run" a container. </br></br>
`singularity exec my_image.sif command` allows us to run a specific command within the container, instead of the default behavior described in the runscript. This allows us to specify a different script within the container to run. For example, `singularity exec my_image.sif python python_script.py` will execute the `python_script.py` within the container. </br>

### 4. Exploring the Raster Vision Container

Once the `make_singularity_img.sh` script has completed running, you should see the file `raster-vision_pytorch-0.21.sif` in your `model/` directory. We will first explore the container as is, then we will bind a directory of data files from the host system to a directory within the container. First, load the singularity module:</br>
`module load singularity` </br>
Then, from your `model/` directory, run the command: </br>
`singularity shell raster-vision_pytorch-0.21.sif` </br></br>
The container will take a minute to launch. Once it does, you will see your prompt changes to `Singularity >`. Next, run the commands: </br>
`pwd` </br>
`ls` </br></br>
You will see the `model/` directory that you launched the singularity container from. This directory is bound to the container by default, and the path to the `model/` directory within the container is the same as the path to the `model/` directory on the host system. Next, run the commands: </br>
`cd /opt/src` </br>
`ls` </br></br>
Here we have the directory for the Raster Vision files within the container. We won't need to touch these files in order to run the pipeline, but this is where the code is that runs the pipeline. When new versions of Raster Vision are released, new containers are published with updated code in this directory. Next we will launch the container with our data directory bound to the container. To exit the container, run the command:</br>
`exit`</br></br>
To bind a directory to the container, we use the option `-B` or `--bind`, followed by our binding specifications in the format `/host/system/directory/:/container/directory/`. Our input data files are stored at `/reference/workshops/rastervision/input/`. Run the following command to launch the container with the `input` directory on the host system bound to `/opt/data/input` in the container. Note that if the directory we specify does not already exist in the container, it will be created. </br>
``singularity shell -B /reference/workshops/rastervision/input/:/opt/data/input raster-vision_pytorch-0.21.sif`` </br>
`cd /opt/data/input` </br>
`ls` </br></br>
Now we can see that our data is available within our container! When we run Raster Vision, we will bind our input and output directories so the Raster Vision pipeline can access our input data, and the pipeline can store output files to a directory on the host system. This way, we can see our model output files after the container is shut down.

#### Conclusion
Now you should have a basic understanding of the singularity image, and how we access files on the host system from within the server. In the next tutorial, we will explore the dataset we will use for this tutorial, and discuss the problem we are trying to solve with the Raster Vision model we are building in this tutorial series.