# Notebook 1: An introduction to __Solaris__ and your working environment

This notebook is developed for the FOSS4G International 2019 `solaris` Workshop. If you're using it outside of that context, some of the working environment materials will be unavailable. Check the GitHub repo for instructions on how to alter the notebooks for usage outside of the workshop.

This notebook provides five parts:

1. [__Checking your `solaris` Installation__](#Checking-your-solaris-installation)
2. [__Listing the data provided__](#section2)
3. [__Input tile to building footprint vectors with 7 Python commands__](#section3)
3. [__Getting your pre-trained model ready__](#section4)
4. [__Running inference with `Solaris` on SpaceNet MVOI data__](#section5)
5. [__Visualizing outputs from the models__](#section6)

Let's get started!

## Checking your solaris installation

The working environment provided for this workshop has `solaris` and all its dependencies pre-installed in a conda environment. If you're using the notebook outside of the workshop and need installation instructions, [click here](https://solaris.readthedocs.io/en/latest/installation.html).

Let's import `solaris` and check the package version to make sure it's available.

In [None]:
import solaris as sol
sol.__version__

<a id="section2"></a>
## Listing the data provided

We've provided a subset of the [SpaceNet](https://spacenet.ai) dataset for use in this workshop. If you're using the notebook outside of the FOSS4G International workshop, you'll need to collect the data yourself - see the GitHub repo containing this notebook for instructions on how to get the data you'll need.

First, let's look at the data provided. Everything is stored in one directory, `/data` (unless you're viewing this outside of the workshop).

In [None]:
import os

data_path = '/data'  # NON-WORKSHOP PARTICIPANTS: change this path to point to the directory where you've stored the data.
print('{} directory contents:'.format(data_path))
print(os.listdir(data_path))
print()
print('SpaceNet MVOI data stored in the directory "MVOI_data":')
print(os.listdir(os.path.join(data_path, 'MVOI_data')))
print()
print('SpaceNet 2 Khartoum imagery stored in the directory "Khartoum_data":')
print(os.listdir(os.path.join(data_path, 'Khartoum_data')))
print()
print('Configuration files stored in the directory "workshop_configs":')
print(os.listdir(os.path.join(data_path, 'workshop_configs')))


The configurations path also contains .csv files that specify data for inference.

Here, you can see the different data that you have access to:

- Test images for the SpaceNet Off-Nadir Dataset (AKA [SpaceNet MVOI](https://arxiv.org/abs/1903.12239))
- Training images for SpaceNet Khartoum building footprint extraction
- Configuration files for a few different model training and inference processes

<a id="section3"></a>
## Running the full pipeline

First, we'll run the entire inference process, just to show you the end result of what you get from `solaris`. Below, we break down each step to describe what's going on.

In [None]:
import time
import skimage
from shapely.ops import cascaded_union  # just for visualization purposes

print('Loading config...')
config = sol.utils.config.parse(os.path.join(data_path, 'workshop_configs/xdxd_workshop_infer.yml'))
print('config loaded. Initializing model...')
xdxd_inferer = sol.nets.infer.Inferer(config)
print('model initialized. Loading dataset...')
inf_df = sol.nets.infer.get_infer_df(config)
print('dataset loaded. Running inference on the image.')
start_time = time.time()
xdxd_inferer(inf_df)
end_time = time.time()
print('running inference on one image took {} seconds'.format(end_time-start_time))
print('vectorizing output...')
resulting_preds = skimage.io.imread(os.path.join('xdxd_inference_out', 'MVOI_nadir10_test_sample.tif'))
predicted_footprints = sol.vector.mask.mask_to_poly_geojson(
    pred_arr=resulting_preds,
    reference_im=os.path.join(data_path, 'MVOI_data', inf_df.loc[0, 'image']))
print('output vectorized. A few of the vector-formatted building predictions:')
predicted_footprints.head()

Excluding the printing and recording commands, __it only took 7 lines of code to run an entire inference pipeline, from input tile to output vectors!__

Let's visualize those labels alongside the source image and ground truth.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

src_im_path = os.path.join(data_path, 'MVOI_data/MVOI_nadir10_test_sample.tif')
# read the image in
im_arr = skimage.io.imread(os.path.join(data_path, 'MVOI_data/viz_version.tif'))
# rescale to min/max in each channel
# im_arr = im_arr.astype('float') - np.amin(im_arr, axis=(0,1))
# im_arr = im_arr/np.amax(im_arr, axis=(0,1))
# im_arr = (im_arr*255).astype('uint8')
# switch B and R for viz
# tmp = im_arr[:, :, 0].copy()
# im_arr[:, :, 0] = im_arr[:, :, 2]
# im_arr[:, :, 2] = tmp
# generate mask from the predictions
pred_arr = sol.vector.mask.footprint_mask(predicted_footprints,
                                          reference_im=src_im_path)
ground_truth = sol.vector.mask.footprint_mask(
    os.path.join(data_path, 'MVOI_data/MVOI_nadir10_test_sample.geojson'),
    reference_im=src_im_path)

f, axarr = plt.subplots(1, 3, figsize=(16, 12))
axarr[0].imshow(im_arr[:, :, 0:3])
axarr[0].set_title('Source image', size=14)
axarr[1].imshow(pred_arr, cmap='gray')
axarr[1].set_title('Predictions', size=14)
axarr[2].imshow(ground_truth, cmap='gray')
axarr[2].set_title('Ground Truth', size=14)

# A step-by-step walkthrough of the above steps

<a id="section4"></a>
## Getting your pre-trained model ready

For our first pass, we'll use a [standard configuration file for XD_XD's model](https://github.com/CosmiQ/solaris/blob/master/solaris/nets/configs/xdxd_spacenet4.yml). See [the YAML config tutorial](https://solaris.readthedocs.io/en/latest/tutorials/notebooks/creating_the_yaml_config_file.html) for a description of what each item means. We'll display the configuration below, but don't worry if you can't follow what each config parameter is - it's just in case you're curious.

In [None]:
config = sol.utils.config.parse(os.path.join(data_path, 'workshop_configs/xdxd_workshop_infer.yml'))
config

As you can see, `solaris` reads the config YAML file in as a dictionary. `solaris` uses this `config` dictionary to specify all of the parameters for model training and inference (as well as some pre-processing steps). Then, you just pass the `config` object to the inference object:

In [None]:
xdxd_inferer = sol.nets.infer.Inferer(config)

You already have XD_XD's pretrained model stored on your EC2 instance, but if you hadn't, the above line would have downloaded the model weights for you. Note that this will happen automagically for any pre-trained SpaceNet model provided by `solaris` (if you haven't downloaded it already). If you wish to use your own model weights, you can modify the configuration YAML file to point the `"model_path"` parameter of the config YAML file to your weights file.

Next, let's load in the .csv file that specifies the image we're going to run inference on. Below the next cell, you'll see the contents of the inference target `pandas.DataFrame`: a single row specifying the path to the image you ran inference on before.

In [None]:
inf_df = sol.nets.infer.get_infer_df(config)
inf_df

Now that we've loaded in the path to the image we want to analyze, we're ready to identify buildings in the image! 

<a id="section5"></a>
## Running inference

Running inference is as easy as calling your inferer (`xdxd_inferer`) with the inference target dataframe (here, `inf_df`) as an argument. This will run the entire inference process on that image and save the resulting mask as a TIFF file. _Non-workshop participants: this may take a couple of minutes if you're not using a GPU - be patient!)_

In [None]:
model_result_mask = xdxd_inferer(inf_df)

_The above cell won't generate any output. Watch for the asterisk to the left to turn into a number to know when it finishes._

And you're done! Simple as that. Let's check out what that mask looks like:

<a id="section6"></a>
## Visualizing inference outputs

We saw a binary black-and-white image of building footprints in the full pipeline example earlier, but that's not actually what comes directly out of a deep learning model. They actually produce a continuous "probability mask", corresponding to the likelihood that the neural net thinks each pixel is part of a building. Run the cell below to see what that looks like.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import skimage

resulting_preds = skimage.io.imread(os.path.join('xdxd_inference_out', 'MVOI_nadir10_test_sample.tif'))
plt.imshow(resulting_preds[:, :, 0], cmap='gray')

The above is a pixel mask where higher values indicate higher probability of a pixel corresponding to buildings. What `solaris` does internally and we'll do below is binarize this to convert to a building/no building image:

In [None]:
binary_preds = resulting_preds > 0
plt.imshow(binary_preds.astype('uint8')[:, :, 0], cmap='gray')

All that you can get directly from this, though, is "which pixels are part of buildings?" This isn't that useful, though, for identifying individual buildings; let's generate a more useful output, i.e. georegistered building footprints:

In [None]:
from shapely.ops import cascaded_union  # just for visualization
predicted_footprints = sol.vector.mask.mask_to_poly_geojson(
    pred_arr=resulting_preds,
    reference_im=inf_df.loc[0, 'image'],
    do_transform=True)

cascaded_union(predicted_footprints['geometry'].values)

(Note that the above doesn't necessarily display in some Jupyter notebook environments - but it's more or less identical to the raster-formatted version above, except each polygon is separated and outlined.)

The building footprints are stored as WKT polygons in a `geopandas.GeoDataFrame`. The next cell will show you what those look like:

In [None]:
print(predicted_footprints['geometry'])

__Congratulations!__ You've run an _entire_ inference pipeline to predict where buildings are using `solaris` - it's as simple as using the commands above!

_Coming up next:_ We'll talk about what's going on under the hood in the code you just ran, including a quick tutorial on how neural nets work. To start with, continue to `2_under_the_hood.ipynb` and go through the first part of the notebook until it tells you to stop.