# Feature Engineering

After the cleaning, we do some feature engineering. The calculation of the home ranges is summarized in a separate [notebook](EDA_home_ranges.ipynb) and is a prerequisite for the following steps. The remaining steps are explained in this notebook. Finally, the data is saved in the final_shapefiles folder.

In [1]:
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt
import rasterio
import geopandas as geopd
import rasterio.rio
import seaborn as sns
import datetime as dt 

from rasterio.plot import show

import pyreadr



First, we import the final dataframes.

In [4]:
df_all = geopd.read_file("../data/final_shapefiles/foxes_modelling_all.shp")
df_resamp = geopd.read_file("../data/final_shapefiles/foxes_modelling_resamp.shp")
sample_points = geopd.read_file("../data/cleaned_shapefiles/sample_points.shp")

## Distance to Forest
Import ShapeFile with edge of forest

In [None]:
forest = geopd.read_file("../data/forest_study_area.shp")

Explode multipolygon into several polygons, transform to the swedish coordinate system.

In [None]:
forest = forest.explode(ignore_index=True)
forest = forest.to_crs(3006)

Function to calculate distance to nearest forest:

In [None]:
def distance_to_forest(forest, point):
    return min(forest.distance(point))

Create "distForest" feature and use the above function to calculate the distance for every point.

In [None]:
df_all["distForest"] = df_all.geometry
df_all.distForest = df_all.distForest.apply(lambda x: distance_to_forest(forest,x))

In [None]:
df_resamp["distForest"] = df_resamp.geometry
df_resamp.distForest = df_resamp.distForest.apply(lambda x: distance_to_forest(forest,x))

In [None]:
sample_points["distForest"] = sample_points.geometry
sample_points.distForest = sample_points.distForest.apply(lambda x: distance_to_forest(forest,x))

## Create dummie variables
### Bin aspect feature
First, we put the aspect feature into bins. One bin for the -1 values (where the slope is zero) and eight bins for the eight geographic directions.

In [5]:
#in a fist step, the category "N" is created twice
df_all["aspect_bin"] = pd.cut(df_all.aspect, 
                                bins = [-1.1,0,22.5,67.5,112.5,157.5,202.5,247.5,292.5,337.5,360],
                                labels = ["None", "N", "NE", "E", "SE", "S", "SW", "W", "NW", "N2"])
#in a second step, the second category is renamed to resemble the first
df_all["aspect_bin"] = df_all.aspect_bin.replace("N2","N")

#repeat for resamp:
df_resamp["aspect_bin"] = pd.cut(df_resamp.aspect, 
                                bins = [-1.1,0,22.5,67.5,112.5,157.5,202.5,247.5,292.5,337.5,360],
                                labels = ["None", "N", "NE", "E", "SE", "S", "SW", "W", "NW", "N2"])
df_resamp["aspect_bin"] = df_resamp.aspect_bin.replace("N2","N")


### Create dummie variables for all categorical variables

In [6]:
cat_variables = ["soil", "veg", "aspect_bin"]

In [7]:
categories_all = pd.get_dummies(df_all[cat_variables], drop_first=True)
categories_resamp = pd.get_dummies(df_resamp[cat_variables], drop_first=True)

In [8]:
df_all = pd.concat([df_all, categories_all], axis = 1)
df_resamp = pd.concat([df_resamp, categories_resamp], axis = 1)

For the saving, we drop the column with categorical values.

In [15]:
df_all = df_all.drop("aspect_bin", axis = 1)
df_resamp = df_resamp.drop("aspect_bin", axis = 1)

Finally, safe the data to the final_shapefiles folder.

In [16]:
df_all.to_file("../data/final_shapefiles/foxes_modelling_all.shp")
df_resamp.to_file("../data/final_shapefiles/foxes_modelling_resamp.shp")
sample_points.to_file("../data/final_shapefiles/sample_points.shp")

  df_all.to_file("../data/final_shapefiles/foxes_modelling_all.shp")


KeyboardInterrupt: 