"Whatch out where the foxes go,
And don't you eat the yellow snow"

Whoever wrote does line does not know half as much about Data-Scientific Research (DSR) on Polar Foxes as I first thought, when I agreed to join this digital expedition.

But first things first: 
This is me, Florian Hofmann, your Storytelling Data Scientist.

And it seems like yesterday that I agreed to join scientific researches on habitat selection of Arctic Foxes.
But now as I'm writing those lines, this fateful decision already lies about 2 months in the past.


And what you are reading right now is my personal notebook of our mission to "predict and protect" where Arctic Foxes live.

As my grandma used to say.

"Never travel to the tundra without some freshly imported Python libraries".

As we had no reasons to doubt those, this is just what we did.

So our backpacks included the following packages:

In [None]:
import sys
sys.path.append("..")
sys.path.append("../modeling")

import home_ranges as hr
import features_for_observations as f4o

from keplergl import KeplerGl

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as geopd
import seaborn as sns
import datetime as dt 

from rasterio.plot import show

from datetime import datetime, timedelta
from shapely.geometry import Polygon
import shapely

import geopandas as gpd

The next obvious step consisted in getting as much information of the Tundra as we could possibly get.

Which definitely was not much.

All we had was 
- a dataset with GPS data of 12 foxes, with the foxes' ID number, sex, and of course timestamp and coordinates
- a dataset with fox dens in the whole area. Unfortunately, as Foxes don't put name tags next to their doors, it was not really obvious which fox was living where
- a dataset of sample points in our research area.

Even if it was not much, I was glad that my expedition members had already done a great job at gathering and cleaning the data, because when I started using it, it proved to be in perfect condition:
No missing values, no duplicates, just plain condensed information

In [None]:
foxes_all = geopd.read_file("../data/cleaned_shapefiles/foxes_all.shp")
sample_points = geopd.read_file("../data/cleaned_shapefiles/sample_points.shp")
dens_all = geopd.read_file("../data/cleaned_shapefiles/dens_norrbotten.shp")

This is where we hit our first cultural and linguistic barrier:

The GPS data of the foxes was in the Swedish coordinate system known as CRS3006.

In order to facilitate handling, I included as addidtional rows the same coordinates that were the only ones our good friend Kepler GL was able to understand - EPSG4326.

In [None]:
test = foxes_all.geometry
gdf = gpd.GeoDataFrame(test, crs=3006)

gdf = gdf.to_crs(epsg= 4326)

foxes_all["geo_kepler_lat"] = [geo.y for geo in gdf.geometry]
foxes_all["geo_kepler_lon"] = [geo.x for geo in gdf.geometry]

So for some Data-Based Storytelling, I deemed it necessary to insert some temporal information:

First at all, as we knew Polar Foxes to be nocturnal, we wanted to look at days "as the fox does".

More practically, we introduced the concept of a "fox day", going from noon to noon of a "human day".
This would allow us to represent fox activity based on their cycle of activity.

Also I introduced columns that extracted the month and the year of the timestamp, as to better group by those time categories.

In [None]:
foxes_all["fox_day"] = [str(datetime.strptime(x, '%Y-%m-%d-%H:%M:%S' ) + timedelta(hours=12))[:10]  for x in foxes_all.t_ ]

foxes_all["month"] = [x[5:7] for x in foxes_all.fox_day]
foxes_all["year"] = [x[:4] for x in foxes_all.fox_day]

After this finger exercise, which was just perfect with the cold wind in the tundra, I went for some slightly advanced feature engineering:
As our DataFrame foxes_all contained GPS data in combination with time stamps, we decided it would be helpful to know the temporal and spatial differences for to subsequent data points of the same fox

In [None]:
foxes_all["travel_distance"] = f4o.get_distance(foxes_all)
foxes_all["time_diff"] = f4o.get_time_diffs(foxes_all)

By know, it turned out that our data on foxes was somewhat sparse. While on some days, we had Data Points about every 15 minutes, other days barely included data points at all.

So I included into the table two more columns that for each "fox day" counted the number of data points (the more, the more information that day), and the maximum time delta between to data points on this day (the less, the more precise the information that day).

In [None]:
points_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).count().rename(columns={"time_diff": "points_this_day"})

max_window_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).max().rename(columns={"time_diff": "max_window"})

foxes_all_temp = pd.merge( foxes_all, points_per_day, left_on=["id", "fox_day"], right_on=["id", "fox_day"] )
foxes_all_2 = pd.merge( max_window_per_day, foxes_all_temp , left_on=["id", "fox_day"], right_on=["id", "fox_day"] )


In [None]:
#x = pd.DataFrame(columns=["a", "b", "c"])

y = foxes_all.groupby(["id", "sex"], as_index=False).count()[["id","sex"]]

#for z in y.id:
 #   print(z)
#y["geometry"] = [polygon_to_geojson(hr.hr_area(foxes_all.query('id ==@x'))) for x in y.id ]
y["hr_area"] = [hr.hr_area(foxes_all.query('id ==@x')).area for x in y.id ]
#y.describe()


In [None]:
y.hr_area.min() / y.hr_area.max()

In [None]:

circle_all = Polygon()
hr_all = Polygon()
intersect_all = Polygon()
map1=KeplerGl(height=500)

cols_df = ["id", "geo_kepler_lat", "geo_kepler_lon"]
cols_geo = ['fox_day']


for fox_id in foxes_all.query("id == '2019-FSBD609-002' and month == '08' and year =='2019' and fox_day < '2019-09-01' ").id.unique():    
    fox_hr_poly = hr.hr_area(foxes_all.query("id == '2019-FSBD609-002' and month == '08' and year =='2019' and fox_day < '2019-09-01' "))
  #  den = foxes_all.query('id ==@fox_id')[["id", "geo_round", "t_"]].groupby(["id", "geo_round"], as_index=False).count().sort_values(by="t_").tail(1).geo_round 
    #den_coord = Point(tuple(den)[0][0] , tuple(den)[0][1] )
    #circle = den_coord.buffer(max_dist_point_poly(den_coord, fox_hr_poly))
  #  x = hr_all.intersection(fox_hr_poly)
   # intersect_all = intersect_all.union(x)
   # circle_all = circle_all.union(circle)
    #hr_all = hr_all.union(fox_hr_poly)
    geojson = f4o.df_to_geojson_trip(foxes_all.query("id == '2019-FSBD609-002' and month == '08' and year =='2019' and fox_day < '2019-09-01' "), cols_geo)
   # map1.add_data(data= foxes_all.query("id == '2019-FSBD609-002' and month == '08' and year =='2019'")[cols_df], name = "where did the fox go")
    map1.add_data(data=geojson,name='Where does fox  ' + fox_id + ' trot?')

  
    
    #map1.add_data(data=g(den_coord, columns="geometry"), name = "center" + fox_id)
    map1.add_data(data = f4o.polygon_to_geojson(fox_hr_poly), name='homerange' + fox_id)
    #map1.add_data(data = polygon_to_geojson(circle), name="circle" + fox_id)


map1

In [None]:
sample_points_by_hr_with_id = pd.DataFrame(columns=["geometry", "id"])

for fox_id in foxes_all.id.unique():
    points = sample_points.intersection(hr.hr_area(foxes_all.query('id ==@fox_id')))
    df_temp = pd.DataFrame(columns=["geometry"], data=points[~points.is_empty].to_list())
    df_temp["id"] = fox_id
    sample_points_by_hr_with_id = pd.concat([sample_points_by_hr_with_id, df_temp ])

sample_points_in_hr_with_id = sample_points_by_hr_with_id.drop_duplicates().merge(sample_points, on = "geometry")

a = sample_points_in_hr_with_id[["id", "soil", "veg"]].groupby(["id", "veg"], as_index=False).count()
b = sample_points_in_hr_with_id[["id", "soil", "veg"]].groupby(["id", "soil"], as_index=False).count()

In [None]:
foxes_all.info()

In [None]:
points_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).count().rename(columns={"time_diff": "points_this_day"})

max_window_per_day = foxes_all[["id", "time_diff", "fox_day"]].groupby(["id","fox_day"], as_index=False ).max().rename(columns={"time_diff": "max_window"})

foxes_all_temp = pd.merge( foxes_all,points_per_day, left_on=["id", "fox_day"], right_on=["id", "fox_day"] )
foxes_all = pd.merge( foxes_all_temp, max_window_per_day,  left_on=["id", "fox_day"], right_on=["id", "fox_day"] )

foxes_relevant_days = foxes_all.query("max_window < 1000 and points_this_day > 80")

#for rel_day in foxes_relevant_days:



In [None]:
foxes_all.info()

In [None]:
foxes_relevant_days = foxes_all.query("max_window < 2000 and points_this_day > 40") #.groupby(["id", "fox_day"]).count()
len(foxes_relevant_days.fox_day.value_counts())

In [None]:
foxes_relevant_days[["id", "fox_day","travel_distance"] ].groupby(["id", "fox_day"]).sum().describe()

In [None]:

foxes_poly_day = gpd.GeoDataFrame(
    foxes_relevant_days[["id", "fox_day", "x_", "y_"]].groupby(["id", "fox_day"], as_index=False).apply(
        lambda d: pd.Series(
            {
   #             "name": "|".join(d["name"].tolist()),
                "geometry": shapely.geometry.Polygon(
                    d.loc[:, ["x_", "y_"]].values
                ),
            }
        )
    )
)
foxes_poly_day.eval( "trip_area = geometry.area / 1000000", inplace=True )

In [None]:
c = foxes_relevant_days[["id", "fox_day", "month","travel_distance"]].groupby(["id", "fox_day", "month"], as_index=False).sum()

d = c[[ "month", "travel_distance"]].groupby([ "month"]).agg([np.min, np.max, np.mean, np.median, np.count_nonzero ], as_index=False)
d
#foxes_all.query("month == '01'")
#c.query("id == '2019-FSBD609-002' and month == '08'")
foxes_all.query("month == '09'").fox_day.value_counts().sort_index()
foxes_all.query("month == '09'")[["id","year"]].groupby(["id", "year"]).count()

In [None]:
foxes_relevant_days[["id","sex", "travel_distance", "fox_day", "month"]].groupby(["id","sex", "month",  "fox_day"], as_index=False).sum()

foxes_all.points_this_day.sort_values()

In [None]:
foxes_poly_day.geometry[65]

In [None]:
fox_id = '2018-FSBD619_r-gr/r-y'

map2 = KeplerGl(height=500)
#map2.add_data(polygon_to_geojson( hr.hr_area(foxes_all.query('id ==@fox_id')) ), name = "hr")
map2.add_data(data = dens_all, name = "dens")
i = 0
for x in foxes_poly_day.query("id ==@fox_id")[:3].geometry:
    map2.add_data(f4o.polygon_to_geojson(x), name = "hr_" + str(i))
    i+=1
    

map2

In [None]:
foxes_poly_day.id.value_counts()