"Whatch out where the foxes go,
And don't you eat the yellow snow"

Whoever wrote does line does not know half as much about Data-Scientific Research (DSR) on Polar Foxes as I first thought, when I agreed to join this digital expedition.

But first things first: 
This is me, Florian Hofmann, your Storytelling Data Scientist.

And it seems like yesterday that I agreed to join scientific researches on habitat selection of Arctic Foxes.
But now as I'm writing those lines, this fateful decision already lies about 2 months in the past.


And what you are reading right now is my personal notebook of our mission to "predict and protect" where Arctic Foxes live.

As my grandma used to say:

"Never travel to the tundra without some freshly imported Python libraries".

As we had no reasons to doubt those, this is just what we did.

So our backpacks included the following packages:

In [None]:
import sys
sys.path.append("..")
sys.path.append("../modeling")

import home_ranges as hr
import features_for_observations as f4o

from keplergl import KeplerGl

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as geopd
import seaborn as sns
import datetime as dt 

from rasterio.plot import show

from datetime import datetime, timedelta
from shapely.geometry import Polygon
import shapely

import geopandas as gpd

The next obvious step consisted in getting as much information of the Tundra as we could possibly get.

Which definitely was not much.

All we had was 
- a dataset with GPS data of 12 foxes, with the foxes' ID number, sex, and of course timestamp and coordinates
- a dataset with fox dens in the whole area. Unfortunately, as Foxes don't put name tags next to their doors, it was not really obvious which fox was living where
- a dataset of sample points in our research area.

Even if it was not much, I was glad that my expedition members had already done a great job at gathering and cleaning the data, because when I started using it, it proved to be in perfect condition:
No missing values, no duplicates, just plain condensed information

In [None]:
foxes_all = geopd.read_file("../data/cleaned_shapefiles/foxes_all.shp")
sample_points = geopd.read_file("../data/cleaned_shapefiles/sample_points.shp")
dens_all = geopd.read_file("../data/cleaned_shapefiles/dens_norrbotten.shp")

This is where we hit our first cultural and linguistic barrier:

The GPS data of the foxes was in the Swedish coordinate system known as CRS3006.

In order to facilitate handling, I included as addidtional rows the same coordinates that were the only ones our good friend Kepler GL was able to understand - EPSG4326.

In [None]:
test = foxes_all.geometry
gdf = gpd.GeoDataFrame(test, crs=3006)

gdf = gdf.to_crs(epsg= 4326)

foxes_all["geo_kepler_lat"] = [geo.y for geo in gdf.geometry]
foxes_all["geo_kepler_lon"] = [geo.x for geo in gdf.geometry]

So for some Data-Based Storytelling, I deemed it necessary to insert some temporal information:

First at all, as we knew Polar Foxes to be nocturnal, we wanted to look at days "as the fox does".

More practically, we introduced the concept of a "fox day", going from noon to noon of a "human day".
This would allow us to represent fox activity based on their cycle of activity.

Also I introduced columns that extracted the month and the year of the timestamp, as to better group by those time categories.

In [None]:
foxes_all["fox_day"] = [str(datetime.strptime(x, '%Y-%m-%d-%H:%M:%S' ) + timedelta(hours=12))[:10]  for x in foxes_all.t_ ]

foxes_all["month"] = [x[5:7] for x in foxes_all.fox_day]
foxes_all["year"] = [x[:4] for x in foxes_all.fox_day]

After this finger exercise, which was just perfect with the cold wind in the tundra, I went for some slightly advanced feature engineering:
As our DataFrame foxes_all contained GPS data in combination with time stamps, we decided it would be helpful to know the temporal and spatial differences for to subsequent data points of the same fox

In [None]:
foxes_all["travel_distance"] = f4o.get_distance(foxes_all)
foxes_all["time_diff"] = f4o.get_time_diffs(foxes_all)

By know, it turned out that our data on foxes was somewhat sparse. While on some days, we had Data Points about every 15 minutes, other days barely included data points at all.

So I included into the table two more columns that for each "fox day" counted the number of data points (the more, the more information that day), and the maximum time delta between to data points on this day (the less, the more precise the information that day).

In [None]:
points_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).count().rename(columns={"time_diff": "points_this_day"})

max_window_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).max().rename(columns={"time_diff": "max_window"})

foxes_all_temp = pd.merge( foxes_all, points_per_day, left_on=["id", "fox_day"], right_on=["id", "fox_day"] )
foxes_all_2 = pd.merge( max_window_per_day, foxes_all_temp , left_on=["id", "fox_day"], right_on=["id", "fox_day"] )


As a next step, we wanted to look into what we defined as the "home range" of a fox. 
This essentially meant the minimal convex polygon to include 95 % of the data points of the fox.

Fortunately for me, the rest of the team had once more taken great care of it.

So it was rather easy to construct said polygon for each fox and insert it into a new table, together with the area of the polygon

In [None]:
foxes_homeranges = foxes_all.groupby(["id", "sex"], as_index=False).count()[["id","sex"]]

foxes_homeranges["geometry"] = [f4o.polygon_to_geojson(hr.hr_area(foxes_all.query('id ==@x'))) for x in foxes_homeranges.id ]
foxes_homeranges["hr_area"] = [hr.hr_area(foxes_all.query('id ==@x')).area for x in foxes_homeranges.id ]

This allowed for some analysis, like size comparision between the biggest and the smallest home range

In [None]:
foxes_homeranges.hr_area.min() / foxes_homeranges.hr_area.max()

Or analyzing the mean size of home ranges by sex

In [None]:
foxes_homeranges.groupby("sex", as_index = False).mean()

Now that we had a basic understanding of those home ranges, we wanted to see them on a map.
But first I decided to create the shapely objects for the analysis

In [None]:
# circles around the dens to see how those differ from home ranges. Not yet used for lack of ideas for radius
# future work might define radius as distance from den to farthest point of polygon 
circle_all = Polygon()

# all homeranges together as one Multipolygon.
hr_all = Polygon()

# Areas belonging to more than one home range
intersect_all = Polygon()


for fox_id in foxes_homeranges.id.unique():
    fox_hr_poly = hr.hr_area(foxes_all.query("id == @fox_id"))
    x = hr_all.intersection(fox_hr_poly)
    intersect_all = intersect_all.union(x)
   # circle_all = circle_all.union(circle)
    hr_all = hr_all.union(fox_hr_poly)

This preliminary work done, we wanted to see it on the map

In [None]:
map1=KeplerGl(height=500)

cols_df = ["id", "geo_kepler_lat", "geo_kepler_lon"]
cols_geo = ['fox_day']


for fox_id in foxes_all.id.unique():    
    fox_hr_poly = hr.hr_area(foxes_all.query("id == @fox_id"))
    geojson = f4o.df_to_geojson_trip(foxes_all.query("id == @fox_id "), cols_geo)
    map1.add_data(data=geojson,name='Where does fox  ' + fox_id + ' trot?')
    map1.add_data(data = f4o.polygon_to_geojson(fox_hr_poly), name='homerange' + fox_id)


map1

Next goal was to analyze how much distance a fox traveled per day.

I concluded that this would only make sense if the time windows were small enough...if data points were two hours apart,
euclidian distance between them would not plausibly represent the actual distance the fox traveled

Maximal time windows of about 15 minutes seemed the best we could get, and I decided to add the safety requirement of 
at least 80 data points that day.
This might seem redundant, but otherwise it would possible that a day with only one data point, 14 minutes after the day started,
fulfilled the criterion.

In [None]:
foxes_relevant_days = foxes_all_2.query("max_window < 1000 and points_this_day > 80")

Based on this reduced data set, it was possible to get min, max, mean and median of the travelled distance (in meters)

In [None]:
foxes_relevant_days[["id", "fox_day","travel_distance"] ].groupby(["id", "fox_day"]).sum().describe()

Next I created a table to compare those distances by month, to see if there were particulary "active" months.

In our initial data, it seemed that in Juli, foxes travel only half the distance they travel in September.
But our data was too sparse, with only 13 such "day trips" in July, so this was not necessarily representative.

In [None]:
c = foxes_relevant_days[["id", "fox_day", "month","travel_distance"]].groupby(["id", "fox_day", "month"], as_index=False).sum()

d = c[[ "month", "travel_distance"]].groupby([ "month"]).agg([np.min, np.max, np.mean, np.median, np.count_nonzero ], as_index=False)
d.rename(columns={"count_nonzero": "no_of_observations"})

As further work, I decided to represent those "day trips" as polygons.

Right now, use cases were limited.

But one day, with more complete data, it would be possible to look how much the daily polygons of the foxes overlapped,
to get an estimnation of they tended to avoid or meet each other.

Also it seemed interesting to compare overlap of daily polygons for subsequent days - did they make the same route every day,
or did they change the areas within their home ranges every day?

In [None]:

foxes_poly_day = gpd.GeoDataFrame(
    foxes_relevant_days[["id", "fox_day", "x_", "y_"]].groupby(["id", "fox_day"], as_index=False).apply(
        lambda d: pd.Series(
            {
                "geometry": shapely.geometry.Polygon(
                    d.loc[:, ["x_", "y_"]].values
                ),
            }
        )
    )
)
# appending the information as a new column, dividing value by 1.000.000 to get square kilometers rather than meters
foxes_poly_day.eval( "trip_area = geometry.area / 1000000", inplace=True )

In [None]:
foxes_poly_day

In [None]:
# sample look at a polygon
foxes_poly_day.geometry[23]