# Heatmap-gen
This notebook collects all the steps to generate a heatmap from a Strava archive or just a collection of gpx files.

Generating a heatmap from a Strava archive consists of the following steps:
- Convert non-gpx files to gpx, collect metadata of gpx files
- Merge all the gpx files of certain activity types together into one large gpx
- (Optional) Generate the background map for the heatmap, alternatively: use one of the pregenerated ones
- Generate the heatlayer(s) from the merged gpx files and put them on the background map

### How to use
Each step has its own section. The section starts with some explanation about what the section does and how to use it.

After the textblock, there is usually a codeblock with parameter definitions (in all caps).
You can change the behaviour of the code by changing the values of these parameters.
This should be the only part of the code where you need to make changes!
The following blocks of code execute all necessary operations for that step.

Each step can be run separately. For example if you have already generated maps or use pregenerated ones, you can skip the step "generate maps". If you just want to regenerate a heatmap with different colors, running only the last step is enough.

## Import libraries
Some libraries are required to process the data and generate maps

In [None]:
# add tools folder to path
import sys
sys.path.insert(1, "../tools")

# work with the metadata csv's
import pandas as pd
# fast array manipulations
import numpy as np
# use system commands
import os

# main tool to manipulate gpx files
import gpxpy
from gpxpy import gpxxml as gpxml
import geopy.distance

# the workhorse for map generation, provides routines to download tiles and do some
# coordinate transforms
import cartopy.io.img_tiles as cartotiles
# polygons to define regions
from shapely.geometry.polygon import Polygon
# more coordinate support
import globalmaptiles as gmaptiles
# for parallel downloads
import concurrent.futures

# image manipulating and saving
from PIL import Image
# for the colormaps
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

## Extracting Strava metadata
In this step metadata about the activities such the activitytype is extracted from the strava archive.

If you do not want to select activities based on type, you can skip to the next part

In [None]:
# folder containing the strava archive
STRAVA_FOLDER = "../stravadata/"

In [None]:
column_headers = [
    "Activity-ID", "Date", "Name", "Activitytype", "Description", "Elapsed time",
    "Distance", "Max. heartrate", "Similar attempts", "Home-work", "Private note", "Gear",
    "Filename", "Weight athlete", "Weight bike", "Elapsed time.1", "Moving time", "Distance.1",
    "Max. speed", "Average speed", "Elevation gain", "Elevation loss", "Lowest elevation", "Highest elevations",
    "Max. grade", "Average grade", "Average positive grade", "Average negative grade", "Max. cadans",
    "Average cadans", "Max. heartrate.1", "Average heartrate", "Max. power", "Average power", "Calories",
    "Max. temperature", "Average temperature", "Similar attempt.1", "Total work", "Number runningsessions",
    "Time downhill", "Time uphill", "Other time", "Pervieved effort", "Type", "Start Time",
    "Weighted average power", "Amount powerdata", "Preference percieved effort", "Percieved similar attempt",
    "Home-work.1", "Total weight", "Uploaded from", "Grade adjusted distance", "Time weather",
    "Weather", "Outside temperature", "Percieved temperature", "Dewpoint", "Humidity", "Pressure",
    "Windspeed", "Windgust", "Wind direction", "Precipation intensity", "Sunrise", "Sunset",
    "Moonphase", "Bike", "Gear 2", "Precipation probability", "Precipation type", "Cloud cover",
    "Visibility", "UV-index", "Ozonvalue", "Jump count", "Total grit", "Average flow",
    "Flagged", "Avg elapsed speed", "Dirt distance", "Newly explored distance", "Newly explored dirt distance",
    "Sport type", "Total steps", "Media"
]

df = pd.read_csv(STRAVA_FOLDER + "activities.csv")
df = df.rename(columns={df.columns[i] : column_headers[i] for i in range(len(column_headers))})

metadata = df[["Date", "Name", "Activitytype", "Filename"]]
metadata.to_csv(STRAVA_FOLDER + "activities/metadata.csv")

# print out all activity types
print("Activity types in archive:")
print(np.unique(metadata["Activitytype"].values))

# cleanup
del column_headers
del df
del metadata

## Converting non-gpx files, moving gpx files from strava archive
In this step non-gpx files in the strava archive (.tcx and .fit) are converted to gpx.

All files are then copied into `../gpx/source`

If you want to use the strava metadata, make sure that `CONVERT_METADATA`is set to `True`

**NOTE**: This part uses system commands to move files and utilizes the CLI of GPS-babel. This probably only works on linux with GPS-babel installed!

In [None]:
REMOVE_ORIGINALS = False
# If you extracted the metadata from the strava archive,
# set to true to update the filenames and copy to gpx folder
CONVERT_METADATA = True

SOURCE_FOLDER = STRAVA_FOLDER + "activities/"
DEST_FOLDER = "../gpx/source/"

In [None]:
from gpxtools import convert_to_gpx
convert_to_gpx(SOURCE_FOLDER, DEST_FOLDER, CONVERT_METADATA)

## Merge gpx files
This step merges gpx-files into one big gpx file that will be used to generate the heatmaps.

You can either merge everything in one folder (the `GPX_FOLDER`, by default the output folder of the prvious step.) or you can use the strava metadata to select by activitytype.
See the comment in the code for how to define the dictionary to do this

**Note**: depending on how much data you have, this can take a while. Definitely setting `SIMPLIFY=True` causes it to take a while. It is however NOT recommended to turn this of, this option produces significantly smaller output files resulting in faster processing in later steps with better results!

Processing 470 cycling trips for a total of 22 000 km took about 5 minutes

### Customization
The default config removes points that are close together. Additionally all time and elevation data will be removed.
This behaviour can be changed by setting the appropriate variables in the next codeblock

To toggle removing points: `SIMPLIFY=True/False`

To toggle removing time data: `REMOVE_TIME=True/False`

To toggle removing elevation data: `REMOVE_ELEVATION=True/False`

To toggle removing gpx extensions (such as heartrate): `REMOVE_EXTENSIONS = True`

To toggle removing waypoints: `REMOVE_WAYPOINTS = True`

### Select by activity type
To select by activity type set `SELECT_BY_TYPE` to `True`.
Additionally you need to define a dictionary that determines which types end up in which merged gpx file.
To do this the dictionary `MERGED_FILES` needs to be defined with the following structure

`{"merged_file_name" : ["activitytype1", "activitytype2", ...]}`

All activities of a type in the array will be merged together into the gpxfile `merged_file_name`.

A list of all activity types in the archive was printed in the step *Extracting Strava metadata*.
**Note**: Some names of an activity type have a trailing space in their name for some unknown reason. Don't forget to include this!

for an example dictionary: see the block with parameter definitions below

In [None]:
# Name of output file, irrelevant when selecting by activitytype
MERGE_NAME = "-"

# Directory containing the input gpxfiles (don't forget trailing "/")
# default: output folder from previous step
GPX_FOLDER = "../gpx/source/"
# Directory where the output will be saved
OUT_FOLDER = "../gpx/output/"


# Parameters that determine what gets removed from the gpx
# "SIMPLIFY" romoves points that are close togther in the gpx if
# removing them does not significantly alter the track
SIMPLIFY = True
REMOVE_TIME = True
REMOVE_ELEVATION = True
REMOVE_EXTENSIONS = True
REMOVE_WAYPOINTS = True
# not implemented yet!
REMOVE_SPEED = False


# If the gpxfiles are extracted from a strava archive, there is a csv with metadata
# present in the archive. If this metadata was prepared and converted along with the gpx files,
# this data can be used to select files based on activitytype
SELECT_BY_TYPE = True
# MERGED_FILES is a dictionary with the following structure:
# {merged_file_name : [activitytype1, activitytype2, ...]}
# the key should be a string, and will be the name of the output gpx file
# the value is an array of activitytypes that will be merged into the corresponding gpx file
MERGED_FILES = {
        "merged_ski.gpx" : ["Alpineskiën "], 
        "merged_fiets.gpx" : ["Fietsrit"], 
        "merged_stap.gpx" : ["Hiken", "Wandeling"], 
        "merged_loop.gpx" : ["Hardloopsessie", "Skeeleren"]
        }
#MERGED_FILES = {"merged_voet.gpx" : ["Hardloopsessie", "Hiken", "Wandeling", "Skeeleren"]}

In [None]:
from gpxtools import clean_and_merge

if not SELECT_BY_TYPE:
    # get the names of all the gpxfiles in the GPX_FOLDER
    filenames = []
    for name in os.listdir(GPX_FOLDER):
        # test if extension is .gpx or .GPX
        if name[-4:] == ".gpx" or name[-4:] == ".GPX":
            filenames.append(name)
    
    # generate merged file
    clean_and_merge(filenames, GPX_FOLDER, OUT_FOLDER, SIMPLIFY, REMOVE_TIME, REMOVE_ELEVATION, REMOVE_EXTENSIONS,
        REMOVE_WAYPOINTS, REMOVE_SPEED, MERGE_NAME, 20)

else:
    # iterate over the different output files
    df = pd.read_csv(GPX_FOLDER + "metadata.csv", index_col="Unnamed: 0")
    for merge_name, activity_types in MERGED_FILES.items():
        print("preparing %s" %merge_name)
        
        # iterate over the activities for this file and collect filenames
        filenames = []
        for activity_type in activity_types:
            filenames += list(df.loc[df["Activitytype"] == activity_type]["Filename"].values)
        # check if there are files found, if so, generate the merged gpx
        
        if len(filenames) > 0:
            clean_and_merge(filenames, GPX_FOLDER, OUT_FOLDER, SIMPLIFY, REMOVE_TIME, REMOVE_ELEVATION, REMOVE_EXTENSIONS,
                REMOVE_WAYPOINTS, REMOVE_SPEED, merge_name, 10)

## Generate background map
This step contains tools to fetch tiles from mapbox, in a user-defined style, and stitch them together into 1 big map image

### Usage
1) change the variables in the next codeblock to the values you need. i.e., select the right map from the regions.csv file

2) run the next two codeblocks and check the info that is printed out.

3) If the info is as expected, run the next  three codeblocks. For large maps this can take a while.

In [None]:
# acces token for mapbox and user
TOKEN = ""
USER = ""

MAP_FOLDER = "../maps/"
# csv containing information of regions
REGION_FILE = MAP_FOLDER + "regions.csv"
# descriptive name for region of interest, used to fetch map info from regions.csv
REGION_NAME = "Waasmunster-darkheat-14"

# if you intend to print the map, set the DPI value to get the physical size of the image
DPI = 200
# size in pixels of 1 tile. Default is 256 unless you did "louche aanpassing" in the sourcecode of cartopy
TILESIZE = 256


from maptools import gen_map_metadata, images_for_domain, merge_tiles, write_map_csv

In [None]:
map_tiles, area, image_pixel_size, area_tile_coords, map_coords, data, description = gen_map_metadata(
    REGION_FILE, REGION_NAME, TOKEN, USER, DPI=DPI, TILESIZE=TILESIZE)

write_map_csv(MAP_FOLDER, data)

print(description)

In [None]:
# download the tiles
tiles = images_for_domain(map_tiles, area, data["Zoom"])

In [None]:
img = merge_tiles(tiles, image_pixel_size, area_tile_coords, data["TileSize"])

In [None]:
# write image to disk
PIL_image = Image.fromarray(img)
PIL_image.save("%s%s.png"%(MAP_FOLDER, data["MapName"]), "png")

## Generate heatmap

In [None]:
# descriptive name for region of interest, used in filename for outputfiles
MAP_NAME = "Leuven-darkheat-14_14"
# INFO: these variables are required to fetch the metadata of the right map


# gpx data to use
ACTIVITY_TYPES = ["voet", "fiets"]
GPX_SOURCES = [
    ["../gpx/output/merged_stap.gpx", "../gpx/output/merged_loop.gpx"],
    ["../gpx/output/merged_fiets.gpx"]
]
# construct dictionary
SOURCES = {ACTIVITY_TYPES[i] : GPX_SOURCES[i] for i in range(len(ACTIVITY_TYPES))}

#GPX_SOURCE = "../gpx/output/merged_voet.gpx"

# folder to save the heatlayer to
HEAT_FOLDER = "../heatlayers/"
HEATMAP_FOLDER = "../heatmaps/"

# wether to create the heatmap or not, and where to look for the map
CREATE_HEATMAP = True
MAP_FOLDER = "../maps/"



# heatmapshit

# apply a (naive) denoising. sometimes two tracks on the same road go through adjacent pixels due to small errors in the data,
# leading to half the number of tracks going through both pixels. If they do overlap a bit further, this leads to noisy lines
# this is fixed by setting the value of each pixel, THAT IS CONTAINED IN A TRACK, to the max value of all pixels in a small region.
DENOISE = True
DENOISE_RADIUS = 2  # how much pixels around every pixel to use for denoising. Radius=1 uses a 3x3 square as mask, Radius=2 a 5x5 square etc

# when using high resolution maps, the single pixel wide track can be hard to spot
# this option widens the track by doing a denoising, but now also setting the pixels
# OUTSIDE the track to the max of the mask, leading to a wider track.
# is only carried out together with denoising
WIDEN = False
WIDEN_RADIUS = 1

# inverse cdf transform
LINEAR_MAX = 3
LINEAR_WEIGHT = 1/3

# make sure this is at least as long as the dictionary with GPX_SOURCES
COLOR_MAPS = ["BuPu_r", "YlOrRd_r"]
CMAPS = {ACTIVITY_TYPES[i] : COLOR_MAPS[i] for i in range(len(ACTIVITY_TYPES))}
# alpha of the generated heatlayers
ALPHA = 1.0

del ACTIVITY_TYPES, GPX_SOURCES, COLOR_MAPS

from heatmaptools import LatLonToMapPixel, generate_pixel_counts, generate_point_density, generate_heat_layer

In [None]:
# read in the map metadata
df = pd.read_csv("%smaps_metadata.csv" %MAP_FOLDER, sep=",")
df = df.set_index("MapName")
MAP_DATA = df.loc[MAP_NAME]


In [None]:
# for each pixel, count how many tracks pass through it of each type
pixel_sets = { activity_type : [] for activity_type in SOURCES.keys() }

# iterate over all sources
for source in SOURCES.keys():
    for gpx_file in SOURCES[source]:
        print("Processing %s" %gpx_file)
        pixel_sets[source] += generate_pixel_counts(gpx_file, MAP_DATA)

In [None]:
# generate the heatmaps
for activity_type in SOURCES.keys():
    print("producing layer %s" %activity_type)
    point_dense = generate_point_density(pixel_sets[activity_type], MAP_DATA, DENOISE, DENOISE_RADIUS, WIDEN, WIDEN_RADIUS)
    heat_layer = generate_heat_layer(point_dense, LINEAR_MAX, LINEAR_WEIGHT, plt.get_cmap(CMAPS[activity_type]), ALPHA)
    heat_layer = np.flip(heat_layer,0)
    heat_layer = (255*heat_layer).astype(np.uint8)
    big_img = Image.fromarray(heat_layer)
    big_img.save("%s%s-%s.png" %(HEAT_FOLDER,MAP_NAME,activity_type), "png")

    del point_dense, heat_layer, big_img

In [None]:

if CREATE_HEATMAP:
    background = Image.open("%s%s.png" %(MAP_FOLDER, MAP_NAME))
    for activity_type in SOURCES.keys():
        print("pasting layer %s" %activity_type)
        heat_layer = Image.open("%s%s-%s.png" %(HEAT_FOLDER,MAP_NAME,activity_type))
        background.paste(heat_layer, (0,0), heat_layer)
    background.save("%s%s.png" %(HEATMAP_FOLDER, MAP_NAME), "png")