# Getting started on geospatial visualisation workshop



<aside style="background-color: #e7f3f7; margin: 1em 0px; padding: 0.5em 1.5em 1em 3.7em; font-size: 0.9em;">
<h3 style="position: relative;"><i style="position: absolute; left: -2.15em; top: -0.2em;"></i>By the end of this workshop, you should be able to:</h3>
<ul>
<li>describe the different types of geospatial data</li>
<li>use python libraries to load, analyse and visualise geospatial data</li>
<li>use  visualisation to analyse interaction between different geospatial data</li>
</ul>
</aside>

<aside style="background-color: #e7f3f7; margin: 1em 0px; padding: 0.5em 1.5em 1em 3.7em; font-size: 0.9em;">
<h3 style="position: relative;"><i style="position: absolute; left: -2.15em; top: -0.2em;"></i>The workshop is divided as:</h3>
<ul>
<li>introduction to raster data [1.5 hours]</li>
<li>introduction to vector data [1.5 hours]</li>
<li>case study which combines raster and vector data [1 hour]</li>
</ul>
</aside>

<p></p>

# Checklist to complete the workshop

- Readthrough the getting started notebook
- Read the paper on colorscales [here](https://www.nature.com/articles/s41467-020-19160-7)
- Follow the instructions in **Next steps - vector data**
- Follow the instructions in **Next steps - raster data**
- Follow the instructions in **Case study assignment**

*Note: The notebooks have been tested with Python 3.8*

# Introduction to geospatial data

**Geospatial data** represents information (or features) with respect to locations on Earth.  
Example: A geospatical data show the populations of different provinces in a country. The data also has the geographical information of those provinces, like longitudianal and latitudinal coordinates at the boundaries of the provinces.

There are two main types of geospatial data: vector and raster.  
- **Vector data**: are based on the foundation of points. The related graphics can be points, lines, or polygons. For example, earthquakes can be associated with points, roads with lines, countries with polygons.
- **Raster data**: are based on the foundation of gridcells or pixels.  

An illustration of these two types of digitalising geospatial data is shown in a figure below:   
  
<img src="Data/images/geospatial_data_types.png" alt="drawing" width="500"/>

# Introduction to vector data

<aside style="background-color: #e7f3f7; margin: 1em 0px; padding: 0.5em 1.5em 1em 3.7em; font-size: 0.9em;">
<h3 style="position: relative;"><i style="position: absolute; left: -2.15em; top: -0.2em;"></i>In this session, the following topics will be covered:</h3>
<ul>
<li>vector data types</li>
<li>vector data formats</li>
<li>vector data structures</li>
<li>vector data visualisation</li>
</ul>
</aside>


## Vector data types

There are three types of vector data - **Points, Lines** and **Polygons**.

The definitions of these are given below:

<img src="Data/images/vector_data.png" alt="drawing" width="500"/>

Source: National Ecological Observatory Network (NEON)

## Vector data formats


These three data can be saved in different data formats. Some of these data formats: 

<img src="Data/images/vector_data_formats_sample.png" alt="drawing" width="900"/>

An exhaustive list can be found [here](https://gisgeography.com/gis-formats/). The two main data formats that are generally used and we will be working with in this workshop are **Shapefiles** and **Geojson**. 

General information about these two formats are given below:

<img src="Data/images/vector_data_formats.png" alt="drawing" width="500"/>

Source: GIS Geography

## Vector data structures

Geopandas is one of the efficient python data structures to load and analyse geospatial data, especially vector data. In this workshop, we will be working with **geopandas**. The functionalities of the geopandas can be found [here](https://geopandas.org/en/stable/docs/user_guide/data_structures.html). There are also basic visualisation functionalities inbuilt in geopandas. However, they do not offer dynamic and interactive visualisations.

## Vector data visualisation

There are multiple python libraries that can be used to visualise geospatial vector data. Some of the best can be found [here](https://towardsdatascience.com/best-libraries-for-geospatial-data-visualisation-in-python-d23834173b35).

In this workshop, we will be working with **folium** library. The entire functionalities of the library can be found [here](https://python-visualization.github.io/folium/).

# Next steps - vector data

There are three seperate notebooks to understand vector data - Points.ipynb, Linestrings.ipynb and Polygon.ipynb. 

In each of them, formats, structures and visualisation are tackled. Practice the different functionalities of the geopandas and folium libraries with these notebooks.

We would recommend practicing the notebooks in the following order:
- Walkthrough Points.ipynb
- Walkthrough Geopandas documentation
- Walkthrough Linestrings.ipynb
- Walkthrough Polygons.ipynb 
- Walkthrough Folium documenation

# Introduction to raster data

Alternative to vector data are raster data. In its simplest form, a raster consists of a matrix of cells (or pixels) organized into rows and columns (or a grid) where each cell contains a value representing information. In a more complex form, raster data can have multiple layers where each layer again represents information.

![](Data/images/raster_data_simple.png)

In raster datasets, each cell (which is also known as a pixel) has a value of digital number. The cell values represent the phenomenon portrayed by the raster dataset such as a category, magnitude, or spectral value. The category could be a land-use class such as grassland, forest, or road. A magnitude might represent temperature, or surface elevation above mean sea level. Spectral values are used in satellite imagery and aerial photography to represent light reflectance and color.

Cell values can be either positive or negative, integer, or floating point. Integer values are best used to represent categorical (discrete) data and floating-point values to represent continuous surfaces.

The area (or surface) represented by each cell consists of the same width and height and is an equal portion of the entire surface represented by the raster. For example, a raster representing elevation (that is, digital elevation model) may cover an area of 100 square kilometers. If there were 100 cells in this raster, each cell would represent 1 square kilometer of equal width and height (that is, 1 km x 1 km).

The dimension of the cells can be as large or as small as needed to represent the surface conveyed by the raster dataset and the features within the surface, such as a square kilometer, square foot, or even square centimeter. The cell size determines how coarse or fine the patterns or features in the raster will appear.

The location of each cell is defined by the row or column where it is located within the raster matrix. Essentially, the matrix is represented by a Cartesian coordinate system in image space, in which the rows of the matrix are parallel to the x-axis and the columns to the y-axis of the Cartesian plane. Row and column values typically begin with 0. These Cartesian coordinates then correspond to real world coordinates in a certain projection system which typically is called a Coordinate Reference System (CRS). There exist many CRS which all serve different purposes and you can find an overview on [https://spatialreference.org/](https://spatialreference.org/ref/epsg/)

 ![](Data/images/raster_data_relation.gif)

## Raster data formats


A multitude of raster file format types are typically available for use in different software tools. Among the most common raster files used on the web are the JPEG, TIFF, and PNG formats, all of which are open source and can be used with most software packages. Native JPEG, TIFF, and PNG files do not have georeferenced information associated with them and therefore cannot be used in any geospatial mapping efforts as they only exist in image space and not in coordinate space.

In order to employ these files in a geospatial context, image to coordinate translation should be available that specifies the locations and transformations that allow the image to be projected into a standard coordinate system (e.g., Universal Transverse Mercator [UTM] or State Plane). Therefore we will work with georeferenced raster data of which GeoTiff is a common data format (see overview of other formats [here](https://pro.arcgis.com/en/pro-app/latest/help/data/imagery/supported-raster-dataset-file-formats.htm))

In such georeferenced dataset we need at least information on the extent and resolution of the raster data:

 - Extent: The spatial extent is the geographic area that the raster data covers. The spatial extent of an R spatial object represents the geographic edge or location that is the furthest north, south, east and west. In other words, extent represents the overall geographic coverage of the spatial object.

 -  Resolution: A resolution of a raster represents the area on the ground that each pixel of the raster covers. The image below illustrates the effect of changes in resolution. The image below illustrates the effect of changes in resolution for images with a similar extent.

![](Data/images/raster_resolution.png)

## Raster data visualisation


Similar to vector data, there are multiple python libraries that can be used to visualise geospatial raster data.

In this workshop, however, we will be working with **folium** library. The entire functionalities of the library can be found [here](https://python-visualization.github.io/folium/), with specific guidelines to visualize [raster data](https://python-visualization.github.io/folium/modules.html#module-folium.raster_layers).

# Next steps - raster data

There is a seperate notebooks to understand raster data, where the different steps to tackle raster data is handled. Please follow the instruction in [4.Rasters.ipynb](4.Rasters.ipynb)

# Case study assignment

Use the workshop materials to tackle a geospatial visualisation problem. 

We have provided two data sets 
- vector data containing the road network of Vietnam and the provincial boundaries of Vietnam
- raster data containing the flood risk of Vietnam

In this assignment, you have to complete the following tasks:
- Identify provinces that are vulnerable to flooding
- Idenfiy roads that are vulnerable to flooding
- Identify provinces that have vulnerable roads


# Additional resources

- [Data carpentry workshops](https://datacarpentry.org/organization-geospatial/)
- [Carpentry lab workshops](https://carpentries-lab.github.io/python-aos-lesson/)