This notebook relies on Python 3.6

# Pre-processinig land cover data for use in deriving local wind multipliers

This notebook demonstrates a quick way to automatically assign a numeric value to (combinations of) land cover categories in a vector shapefile format. It's based on the land cover data available through the [PacGeo repository](http://www.pacgeo.org/), which holds siginficant amounts of geospatial data for Pacific Island nations. 

We need to assign a numeric value to land cover categories, so we can then convert the data into a raster layer for ingestion into the [wind multiplier code](https://github.com/GeoscienceAustralia/Wind_multipliers), for determination of local wind modification factors.

As always, start with importing the required modules. 

In [None]:
%matplotlib inline

import geopandas as gpd
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns
sns.set_context("notebook")

In [None]:
inputFile = "C:/WorkSpace/data/raw/fiji_vector/fiji_vector.shp"
gdf = gpd.read_file(inputFile)

Examine the first few lines to see what the file contains. It's a fairly simple file, with only four fields, plus the `geometry` field that `GeoPandas` adds to hold the geometry of the polygons.

In [None]:
gdf.head(10)

OK, so it looks like the classification of the land cover is in the `CLASS_NAME` field. We use the `DataFrame.unique()` method to determine this.

In [None]:
coverTypes = gdf['CLASS_NAME'].unique()
print(coverTypes)

 Is there anything in the `SUB_CLASS` field? Now we need to see what the unique values are in this field. 

In [None]:
subTypes = gdf['SUB_CLASS'].unique()
print(subTypes)

Yes there is. So we need to handle the case of different combinations of `CLASS_NAME` and `SUB_CLASS`.

In [None]:
gdf[['CLASS_NAME', 'SUB_CLASS']].drop_duplicates()

In [None]:
unique_groups = gdf[['CLASS_NAME', 'SUB_CLASS']].drop_duplicates()
classes = unique_groups.to_dict('split')

Next, we need to assign a numeric value to each class, which will then be inserted into a new field in the shape file. At a later point, we can assign the required roughness values to each class. 

Here we list out all the unique combinations of `CLASS_NAME` and `SUB_CLASS` in the dataset.

In [None]:
classes['data']

In [None]:
classification = dict()
for i, t in enumerate(classes['data']):
    key = (t[0], t[1])
    classification[key] = i+1

Now to insert the new field into the `GeoDataFrame`. First create a list containing the new numeric category for each row of the `GeoDataFrame`.

In [None]:
newclass = []
for i, r in gdf.iterrows():
    newclass.append(classification[(r.CLASS_NAME, r.SUB_CLASS)])

Then we add the new `category` field to the `GeoDataFrame`...

In [None]:
gdf['category'] = np.asarray(newclass)

In [None]:
gdf.head()

In [None]:
gdf[['CLASS_NAME', 'SUB_CLASS', 'category']].drop_duplicates()

Write the updated `GeoDataFrame` back to a shapefile. If we really wanted, we could convert the coordinate reference system to a projected system for Fiji (e.g. EPSG:32760, i.e. UTM Zone 60S) using the `to_crs` method, then write the file.

(This also solves a bit of an issue in displaying the data in GIS applications, where Fiji straddles the dateline.)

In [None]:
gdfb = gdf.copy()
gdfb = gdfb.to_crs({'init': 'epsg:32760'})
gdfb.to_file("C:/WorkSpace/data/raw/fiji_vector/fiji_vectorprj.shp")


You can then use [`gdal_rasterize`](http://www.gdal.org/gdal_rasterize.html) to convert the shapefile into a raster for use in the wind multiplier calculation:

    gdal_rasterize -a category -tr 25.0 25.0 -l fiji_vectorprj -ot Int32 fiji_vectorprj.shp landcoverprj.tif

Just to have a cursory look a the data, we plot the count of all features in each of the categories. 

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(16,12))
sns.countplot(x='CLASS_NAME',  data=gdf, ax=ax)
plt.xticks(rotation=90)
ax.set_xlabel("Land cover class")

What we really should do is look at the area of each category. We can calculate the area of each category, by first transforming the coordinate reference system of the `GeoDataFrame` to an equal area CRS, using the `to_crs` method. 

We then calculate the total area in each class, by grouping first by the `CLASS_NAME` field, then summing over each group.

In [None]:
gdfa = gdf.copy()
gdfa= gdfa.to_crs({'init': 'epsg:3395'})
gdfa["area"] = gdfa['geometry'].area/ 10**6

areasum = gdfa.groupby(['CLASS_NAME', ])['area'].sum()

Finally, plot a simple bar chart of the area (in km$^2$) of each class of land cover.

In [None]:
fig, ax = plt.subplots(figsize=(12, 12))
areasum.plot(kind='bar', ax=ax)
ax.set_xlabel('Category')
ax.set_ylabel(r"Area (km$^2$)")

### Writing out the categories 

Finally, so we can use this data in the wind multiplier code, we need to prepare a CSV file with the category description and value. We will manually edit the CSV to add in an estimated roughness length for each category at a later stage.

In [None]:
catdf = gdf[['CLASS_NAME', 'SUB_CLASS', 'category']].drop_duplicates()
catdf['Description'] = catdf['CLASS_NAME'] + '-' + catdf['SUB_CLASS'].fillna("None")
header = ['category', 'Description']
catdf.to_csv("C:/WorkSpace/data/raw/fiji_vector/fiji_vector2.csv", columns=header, index=False)

In [None]:
gdfb = gdf.copy()
gdfb = gdfb.to_crs({'init': 'epsg:32760'})
gdfb.to_file("C:/WorkSpace/data/raw/fiji_vector/fiji_vectorprj.shp")