rally-results_review.Rmd

---
title: "Visualising WRC Rally Timing and Results Data"
subtitle: "A RallyDataJunkie Adventure"
author: "Tony Hirst"
description: "An introduction to visualising timing and results data for WRC rally events."
knit: "bookdown::render_book"
site: bookdown::bookdown_site
always_allow_html: yes
new_session: no
---
```{r}
options(pillar.sigfig = 7)
```
# Index {-}

 
<!--chapter:end:index.Rmd-->

---
output:
  pdf_document: default
  html_document: default
    keep_md: true
    self_contained: true
---
# Introduction

For fans of WRC, the live timing data screens as well as results from *ewrc-results.com* provide up-to-date information about timing and results over the course of an event weekend, as well as a historical results for the current season (WRC) as well as back into the mists of time (*ewrc-results*).

In this recipe collection, I'll describe various ways of visualising WRC rally results and timing data using data retrieved from the WRC results API. Many of the techniques should also apply directly to data retrieved from other services, such as *ewrc-results.com* if the data is appropriately represented.

<!--chapter:end:intro.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/wrc-api-")
```
# Accessing Data from the WRC Live Timing API

We can get rally details, timing and results data from the WRC live timing service JSON API.

## Current Season Rallies
To start with, let's see what rallies are scheduled for the current, active season. The `jsonlite::fromJSON()` will retrieve a JSON (*JavaScript Object Notation*) file from a URL and attempt to unpack it into an *R* dataframe:

```{r message=F, warning=F}
library(jsonlite)
library(stringr)
library(dplyr)

season_url = "https://api.wrc.com/contel-page/83388/calendar/active-season/"

get_active_season = function(active_season_url=season_url, all=FALSE) {
  if (all)
    jsonlite::fromJSON(active_season_url)
  else
    jsonlite::fromJSON(active_season_url)$rallyEvents$items
}

s = get_active_season()

# Preview the column names of the resulting dataframe
colnames(s)
```

Let's preview the contents of a couple of those columns:

```{r}
# The tidyr / magrittr pipe syntax makes things easier to read
s %>% select(c('id', 'name')) %>% head()
```

We can search the *name* column to find the unique identifier value for a particular event:

```{r}
eventId = s[s['name']=='Rallye Monte-Carlo','id']

eventId
```

Or we can be more generic with a regular expression lookup:

```{r}
get_eventId_from_name = function(season, name){
  season[str_detect(season$name,
                    regex(name, ignore_case = T)), 'id']
}

get_eventId_from_name(s, 'monte')
```

## Itinerary Lookup

We can make another call to the WRC API to look up the itinerary for the event. Each leg of the event corresponds to a particular day:

```{r}
results_api = 'https://api.wrc.com/results-api'

get_itinerary = function(eventId) {
  itinerary = jsonlite::fromJSON(paste0(results_api,"/rally-event/",
                                 eventId,
                                 "/itinerary"))$itineraryLegs
  itinerary %>% arrange(order)
}

itinerary = get_itinerary(eventId)

itinerary %>% select(-itinerarySections)
```

The *itinerarySections* columns dataframes describing details of each leg.

### Leg Sections

Within each leg, the itinerary provides information about each section (that is, each "loop") of the rally. This information is retrieved in form of a dataframe in a standard format. We can use the base *R* `do.call()` to call the `rbind()` function against each row of the dataframe and bind all the dataframes in a specified column into a single dataframe:

```{r}
get_sections = function(itinerary){
  sections = do.call(rbind, itinerary$itinerarySections)
  sections %>% arrange(order)
}

sections = get_sections(itinerary)

sections %>% select(-c(controls, stages))
```

In the sections dataframe we have one row per section. Two of the columns, `*controls* and *stages* each use dataframes to "nest" subdataframes within each row.

For example, here's one of the *controls* dataframes that describes timing controls:

```{r}
sections$controls[[1]]
```

And an example of a dataframe from the first row of the *stages* column:

```{r}
sections$stages[[1]]
```

### Time Controls

We can look up information about each time control from data provided as part of the itinerary lookup using the same trick as before to "unroll" the contents of each dataframe in a specified column into a single dataframe.

An alternative to the `do.call()` approach is to use a tidy approach and use the `dplyr::bind_rows()` function on the `sections$controls` column values via a pipe. We can add a reference to the original section ID by naming each row in the *controls* column with the *itinerarySectionId* value and then ensuring an identifier column is defined when we bind the dataframes:

```{r}
get_controls = function(sections){
  # Name each row in the list of dataframes we want to bind
  names(sections$controls) = sections$itinerarySectionId
  
  controls = sections$controls %>%
    # Ensure that we create an identifier column (uses list names)
    bind_rows(.id='itinerarySectionId')
  
  controls
}

controls = get_controls(sections)

controls %>% head(2)
```

### Stage Details

We can pull stage details from the dataframes contained in the `sections` dataframe from the itinerary lookup:

```{r}
get_stages = function(sections){
  # Name each row in the list of dataframes we want to bind
  names(sections$stages) = sections$itinerarySectionId
  
  stages = sections$stages %>%
    # Ensure that we create an identifier column (uses list names)
    bind_rows(.id='itinerarySectionId')
    
  stages %>% arrange(number)
}

stages = get_stages(sections)

stages %>% head()
```

We can get a list of stage IDs from the `stageId` column (`stages$stageId`):

```{r}
get_stage_list = function(stages){
  stage_list = stages$stageId
  stage_list
}

get_stage_list(stages)
```

Perhaps more conveniently, we can create a lookup from code to stage ID:

```{r}
# https://stackoverflow.com/a/19265431/454773
get_stages_lookup = function(stages,
                             fromCol='code',  toCol='stageId'){
  stages_lookup = stages[[toCol]]
  names(stages_lookup) = stages[[fromCol]]
  stages_lookup
}

stages_lookup = get_stages_lookup(stages)
stages_lookup
# Lookup particular stage ID by stage code
#stages_lookup[['SS2']]
```

From the `stages` table, we can get the identifier for a particular stage, either by code (for example, *"SS3"*) or by (partial) name match:

```{r}
ssnum = 'SS3'

get_stage_id = function(stages, sname, typ='code'){
  # code, name
  if (typ=='code')
    stageId = stages[stages[typ] == sname, 'stageId']
  else
    stageId = stages[stringr::str_detect(stages[[typ]], sname), 'stageId']
  stageId
}

stageId = get_stage_id(stages, 'Mustalampi 1', 'name')
stageId
```

And the stage distance and name:

```{r}
get_stage_info = function(stages, sid, typ='stageId', clean=TRUE){
  # stageId, code
  name=stages[stages[typ] == sid, 'name']
  distance=stages[stages[typ] == sid, 'distance']
  if (clean)
    stringr::str_replace(name, ' (Live TV)', '')
  
  c(name=name, distance=distance)
}

get_stage_info(stages, stageId)
```

### Road Order Start Lists

The *startListId* can be used alongside the event ID to look up the startlist for a leg. We can order the startlist by start order:

```{r}
get_startlist = function(eventId, startListId) {
  startlist_url = paste0(results_api, '/rally-event/',
                         eventId,'/start-list-external/', startListId)
  
  startlist = jsonlite::fromJSON(startlist_url)$startListItems
  
  # Order the startlist dataframe by start order
  startlist %>% arrange(order)
}

# Example startlist ID
# Use a regular expression to find the startlist ID by day
startListId = itinerary[str_detect(itinerary$name,
                                   regex('Friday', ignore_case = T)),
                       'startListId']

startlist = get_startlist(eventId, startListId)

startlist %>% head()
```

Looking up a startlist ID is a little fiddly:

```{r}
get_startlist_id = function(itinerary, itinerarySectionId){
  sections = get_sections(itinerary)
  itineraryLegId = sections[sections$itinerarySectionId==itinerarySectionId,
                            'itineraryLegId']
  itinerary[itinerary$itineraryLegId==itineraryLegId,'startListId']
}

get_startlist_id(itinerary, stages$itinerarySectionId[[1]])
```

## Competitor Details

Details of car entries for each event can be retrieved from the WRC live timing API given an event ID.

```{r}
get_rally_entries = function(eventId) {
  cars_url = paste0(results_api, '/rally-event/',
                  eventId,'/cars')
  jsonlite::fromJSON(cars_url)
}

entries = get_rally_entries(eventId)
# $driver, $codriver, $manufacturer, $entrant, $group, $eventClasses
# $identifier, $vehicleModel, $eligibility, $status

entries %>% head(2)
```

### Looking Up Entries by Group

We can index the entries by group to find all the WRC car `entryId` values:

```{r}
entries[entries$group$name=='WRC', 'entryId']
```

### Driver & Codriver Details

Detailed information for each driver and codriver can be found in the corresponding sub-dataframes.

For example, we can look up the details for each driver, noting in this case that we need to column bind (`cbind()`) the subdataframes to produce the collated dataframe of driver details:

```{r}
get_drivers = function(entries){
  drivers = do.call(cbind, entries$driver)
  drivers
}

drivers = get_drivers(entries)

drivers %>% head(2)
```

We can similarly obtain data for the codrivers:

```{r}
#codrivers = do.call(cbind, entries$codriver)
# Again, there is a tidyverse approach with dplyr::bind_cols()
get_codrivers = function(entries){
  codrivers = bind_cols(entries$codriver)
  codrivers
}

codrivers = get_codrivers(entries)

codrivers %>% head(2)
```

We can conveniently obtain the identifier for a particular driver or codriver by searching against their name or three letter code, although note that *the three letter code may not be a unique identifier*:

```{r}
get_person_id = function(persons, sname, typ='fullName'){
  # code, fullName
  if (typ=='code')
    personsId = persons[persons[typ]==sname, 'personId']
  else
    personId = persons[str_detect(persons[[typ]],
                                           regex(sname,
                                                 ignore_case = T)),
                       'personId']
  personId
}

ogierDriverId = get_person_id(drivers, 'ogier')
ogierDriverId
```

From the driver person identifier we can get the entry identifier for the rally we're exploring:

```{r}
ogierEntryId  = entries[entries['driverId']==ogierDriverId, 'entryId']
ogierEntryId
```

### Summarising Essential Entry Data

We can manually create a dataframe containing essential fields from the original cars dataframe and the dataframes contained within it:

```{r}
get_car_data = function(entries){
  cols = c('entryId', 'driverId', 'codriverId','manufacturerId',
           'vehicleModel','eligibility', 'classname','manufacturer',
           'entrantname', 'groupname', 'drivername', 'code',
           'driverfullname', 'codrivername','codriverfullname'
           )
  entries = entries %>%
                        rowwise() %>% 
                        mutate(classname = eventClasses$name) %>%
                        mutate(manufacturer = manufacturer$name) %>%
                        mutate(entrantname = entrant$name) %>%
                        mutate(groupname = group$name) %>%
                        mutate(drivername = driver$abbvName) %>%
                        mutate(driverfullname = driver$fullName) %>%
                        mutate(codrivername = codriver$abbvName) %>%
                        mutate(codriverfullname = codriver$fullName) %>%
                        mutate(code = driver$code) %>%
                        select(all_of(cols))
  
  # If we don't cast, it's a non-rankable rowwise df
  as.data.frame(entries)
}

get_car_data(entries) %>% head(2)
```

## Penalties and Retirements

We can look up *penalties* from an event ID:

```{r}
get_penalties = function(eventId) {
  penalties_url = paste0(results_api, '/rally-event/',
                       eventId, '/penalties')
  jsonlite::fromJSON(penalties_url)
}

get_penalties(eventId) %>% head(2)
```

The event ID is also all we need to request a list of *retirements*:

```{r}
get_retirements = function(eventId) {
  retirements_url = paste0(results_api, '/rally-event/',
                       eventId, '/retirements')
  jsonlite::fromJSON(retirements_url)
}

get_retirements(eventId) %>% head(2)
```

## Results and Stage Winner

As well as retrieving penalties and retirements using just the event ID as a key, we can also retrieve the overall results and the stage winners:

```{r}
get_result = function(eventId) {
  result_url = paste0(results_api, '/rally-event/',
                    eventId,'/result')
  
  jsonlite::fromJSON(result_url)
}

get_result(eventId) %>% head(2)
```

And for the stage winners:

```{r}
get_stage_winners = function(eventId) {
  stage_winners_url = paste0(results_api, '/rally-event/',
                             eventId,'/stage-winners')
  
  jsonlite::fromJSON(stage_winners_url)
}

get_stage_winners(eventId) %>% head(2)
```

## Stage Result

At the end of each stage, there are actually two different sorts of results data are available: data relating to the result of the stage itself, and data relating to how the stage result affected the overall rally position.

Let's start by getting the overall rally result at the end of a particular stage. Note that the overall result does not include the stage ID in the returned data so we need to add it in:

```{r}
get_overall_result = function(eventId, stageId) {
  overall_url = paste0(results_api, '/rally-event/',
                           eventId, '/stage-result/stage-external/',
                           stageId)
  jsonlite::fromJSON(overall_url) %>%
    # Also add in the stage ID
    mutate(stageId = stageId)
}

overall_result = get_overall_result(eventId, stageId)

overall_result %>% head(2)
```

### Getting Stage Results for Multiple Stages

It will be convenient to be able to retrieve overall results for multiple stages from one function call. One way of achieving that is to create a function to retrieve the details for a single specified stage that can be applied via a `purrr::map()` function call to a list of the stage IDs we want overall results data for: 

```{r}
library(purrr)

get_overall_result2 = function(stageId, eventId) {
  get_overall_result(eventId, stageId)
}

get_multi_overall = function(stage_list){
  multi_overall = stage_list %>%
    map(get_overall_result2, eventId=eventId) %>% 
    bind_rows()
  multi_overall
}

# Specify the stage IDs for multiple stages
stage_list = c(1747,	1743)

multi_overall_results = get_multi_overall(stage_list)
  
multi_overall_results %>% tail(2)
```

## Stage Times

We can get the stage times for each stage on a rally by event and stage ID:

```{r}
get_stage_times = function(eventId, stageId) {
  stage_times_url = paste0(results_api, '/rally-event/',
                           eventId, '/stage-times/stage-external/',
                           stageId)
  jsonlite::fromJSON(stage_times_url)
}

stage_times = get_stage_times(eventId, stageId)

stage_times %>% head(2)
```

### Getting Stage Times for Multiple Stages

It will also be convenient to be able to retrieve stage times for multiple stages from a single function call. We can take the same approach we used previously:

```{r message=F, warning=F}
get_stage_times2 = function(stageId, eventId) {
                              get_stage_times(eventId, stageId)
                            }

get_multi_stage_times = function(stage_list){
  multi_stage_times = stage_list %>%
                      map(get_stage_times2, eventId=eventId) %>% 
                      bind_rows()
  multi_stage_times
}

multi_stage_times = get_multi_stage_times(stage_list)
  
multi_stage_times %>% tail(2)
```

### Getting Wide Stage Times for Multiple Stages

We can then widen the stage times for each driver:

```{r}
get_multi_stage_times_wide = function(multi_stage_times, stage_list){
  stage_times_cols = c('entryId', 'stageId', 'elapsedDurationMs')
  
  multi_stage_times_wide = multi_stage_times %>% 
                    select(all_of(stage_times_cols)) %>%
                    mutate(elapsedDurationS = elapsedDurationMs / 1000) %>%
                    select(-elapsedDurationMs) %>%
                    group_by(entryId) %>%
                    tidyr::spread(key = stageId,
                                  value = elapsedDurationS) %>%
                    select(c('entryId', as.character(stage_list))) %>%
                    # If we don't cast, it's a
                    # non-rankable rowwise df
                    as.data.frame()
  
  multi_stage_times_wide
}

multi_stage_times_wide = get_multi_stage_times_wide(multi_stage_times,
                                                    stage_list)

multi_stage_times_wide %>% head(2)
```

### Getting Wide Stage Positions

We can also get the stage positions:

```{r}
get_multi_stage_positions_wide = function(multi_stage_times, stage_list){
  stage_positions_cols = c('entryId', 'stageId', 'position')

  multi_stage_positions_wide = multi_stage_times %>% 
                    select(all_of(stage_positions_cols)) %>%
                    group_by(entryId) %>%
                    tidyr::spread(key = stageId,
                                  value = position) %>%
                    select(c('entryId', as.character(stage_list))) %>%
                    # If we don't cast, it's a
                    # non-rankable rowwise df
                    as.data.frame()
}

multi_stage_positions_wide = get_multi_stage_positions_wide(multi_stage_times, stage_list)

multi_stage_positions_wide %>% head(2)
```

### Getting Generic Wide Dataframes

We can start to work up a function that is able to handle widening data frames more generally, albeit with a potential need to handle exceptions:

```{r}
get_multi_stage_generic_wide = function(multi_stage_generic, stage_list,
                                        wide_val, group_key='entryId',
                                        spread_key='stageId'){
  
  stage_times_cols = c(group_key, spread_key, wide_val )
  
  if (wide_val=='elapsedDurationMs') {
    multi_stage_times_wide = multi_stage_times %>% 
      select(all_of(stage_times_cols)) %>%
      mutate(elapsedDurationS = elapsedDurationMs / 1000) %>%
      select(-elapsedDurationMs)
    
    wide_val = 'elapsedDurationS'
  }
  
  multi_stage_generic_wide = multi_stage_generic %>% 
    select(all_of(stage_times_cols)) %>%
    # group_by_at lets us pass in the grouping column by variable
    group_by_at(group_key) %>%
    tidyr::spread(key = spread_key,
                  value = wide_val) %>%
    select( c(group_key, as.character(stage_list))) %>%
    # If we don't cast, it's a
    # non-rankable rowwise df
    as.data.frame()
  
  multi_stage_generic_wide
}

multi_stage_positions_wide_g = get_multi_stage_generic_wide(multi_stage_times, stage_list, 'position')

multi_stage_positions_wide_g %>% head(2)
```

## Split Times

We can get split times and distance into stage data for each stage given the stage identifier:

```{r}
get_splits = function(eventId, stageId){
  splits_url=paste0(results_api, '/rally-event/', eventId,
                    '/split-times/stage-external/', stageId)

    jsonlite::fromJSON(splits_url)
}

splits = get_splits(eventId, stageId)
# $splitPoints
# $entrySplitPointTimes
```

This includes handy information about split locations, such as distance into stage. This can also be useful for pace calculations:

```{r}
splits$splitPoints
```

We can also view the split point times for each driver. This second dataframe contains rows summarising the stage for each driver, and includes the stage start time and duration as well as a column *splitPointTimes* that itself contains a data frame of elapsed duration split point times:

```{r}
splits$entrySplitPointTimes %>% select(-splitPointTimes) %>% head(2)
```

To view the split times for a specific driver, we can index into the dataframe using the driver `entryId` value:

```{r}
splits$entrySplitPointTimes[splits$entrySplitPointTimes['entryId']==ogierEntryId,]$splitPointTimes
```

Each dataframe gives the split times on the stage for a particular driver in a long format.

Note that the split point times are strictly increasing and describe the elapsed time into the stage at each split point from the start location and time.

### Driver Split Times Detail

We can get an unrolled long structure by combining the *splitPointTimes*  dataframes from all drivers, also taking the opportunity to convert the elapsed duration in milliseconds to seconds along the way:

```{r}
#driver_splits = do.call(rbind, entry_splits$splitPointTimes)
# The tidyverse approach is to use dplyr::bind_rows()
# We can also construct a pipe to streamline the processing
get_driver_splits = function(splits){
  driver_splits = splits$entrySplitPointTimes$splitPointTimes %>%
                    bind_rows() %>%
                    mutate(elapsedDurationS = elapsedDurationMs / 1000) %>%
                    select(-elapsedDurationMs)
  driver_splits
}

driver_splits = get_driver_splits(splits)

driver_splits %>% head(2)
```

### Wide Driver Split Times

We can cast the data into a wide format, with splits ordered by their distance into the stage. Start by creating a function to help get the split point codes in order by distance along the stage:

```{r}
get_split_cols = function(splits){
  split_cols =  as.character(arrange(splits$splitPoints, distance)$splitPointId)
  split_cols
}
```

Now create a function to get the driver splits in a wide format using the distance-into-stage ordered split point codes as the widened columns:

```{r}
get_driver_splits_wide = function(driver_splits, splits){
    split_cols =  get_split_cols(splits)
    splits_cols = c('entryId', 'splitPointId', 'elapsedDurationS')
    
    driver_splits_wide = driver_splits %>% 
                            group_by(entryId) %>%
                            select(all_of(splits_cols)) %>%
                            tidyr::spread(key = splitPointId,
                                          value = elapsedDurationS) %>%
                            select(all_of(c('entryId', split_cols))) %>%
                            # If we don't cast, it's a
                            # non-rankable rowwise df
                            as.data.frame()
    driver_splits_wide
}

driver_splits_wide =  get_driver_splits_wide(driver_splits, splits)

driver_splits_wide %>% head(2)
```

### Multiple Stage Long Splits Data

A convenient way of working with the split times across multiple stages is to put the splits into a long form and then filter out the rows we are interested in.

We can generate a long form dataframe using the `dlplyr::bind_rows()` that we have met before:

```{r}
get_split_times2 = function(stageId, eventId) {
                              splits = get_splits(eventId, stageId)
                              split_times = splits$entrySplitPointTimes
                              names(split_times$splitPointTimes) = splits$splitPoints$splitPointId
                              split_times$splitPointTimes
                            }

get_multi_split_times = function(stage_list){
  multi_split_times = stage_list %>%
                      map(get_split_times2, eventId=eventId) %>% 
                      bind_rows()
  multi_split_times
}

stage_list_sample = stage_list[1:2]

get_multi_split_times(stage_list[1:2]) %>% head(3)
```

<!--chapter:end:wrc-api.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/itinerary-")
```
# Itinerary & Road Position

The competitive phase of a full WRC Rally event typically extends over three days (Friday to Sunday), with either a ceremonial start or a short first stage on the Thursday evening. Shorter format events are also possible.

Each day is referred to as a *leg*, and each leg is structured as a *section*, often referred to as a *loop*.

## Load Base Data

Start by loading in the base WRC API helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
library(tidyr)
```

And grab some minimal event metadata:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')
```

## Displaying the Itinerary

We can grab the full itinerary with a single function call:

```{r}
itinerary = get_itinerary(eventId)
itinerary
```

The *status* often does not get updated at the end of the event, so completed events may still describe the final day as *Running*.

```{r}
itinerary %>% select(c('name', 'legDate'))
```

### Itinerary Legs

Let's have a look at the structure of a particular day:

```{r message=FALSE, warning=FALSE}
example_section = itinerary[[1, 'itinerarySections']]
example_section
```

We can also get the full set of itinerary sections in one dataframe:

```{r}
itinerary_sections_full = do.call(rbind, itinerary$itinerarySections)

itinerary_sections_full
```

### Itinerary Controls

The *controls* column details information about all the timing controls:

```{r}
example_controls = example_section[[1, 'controls']]
example_controls
```

Let's see the key information for the controls:

```{r message=FALSE}
controls_cols = c('controlId', 'eventId', 'type',
                  'code', 'location', 'distance', 'firstCarDueDateTime')

example_controls %>% select(controls_cols)
```

We can get a list of all the controls by combining data from the separate legs dataframes:

```{r}
get_multi_controls = function(itinerary_sections){
  multi_controls = do.call(rbind, itinerary_sections$controls)

  multi_controls %>% select(controls_cols)
}
```

Let's see how it works:

```{r}
multi_controls = get_multi_controls(itinerary_sections_full)
multi_controls %>% tail()
```

### Itinerary Stages

The *stages* column provides more information about each stage:

```{r}
example_stages = example_section[[1, 'stages']]
example_stages
```

Let's focus on the key columns:

```{r meddage=FALSE}
stages_cols = c('stageId', 'eventId', 'number', 'name',
                'distance', 'stageType', 'code')

stage_name_cleaner = function(df) {
  df %>%
    mutate(fullname=name,
         name=stringr::str_replace(name, ' \\(Live TV\\)', '')) %>%
    mutate(fullname=name,
         name=stringr::str_replace(name, ' \\(Wolf Power Stage\\)', ''))
}

example_stages %>% 
  select(stages_cols) %>%
  stage_name_cleaner
```

Once again, we can pull all the information into a single dataframe:

```{r}
get_multi_stage_details = function(itinerary){
  multi_stage_details = do.call(rbind, itinerary$stages)

  multi_stage_details %>% 
    select(stages_cols) %>%
    stage_name_cleaner()
}
```

Here's how it works:

```{r}
multi_stage_details = get_multi_stage_details(itinerary_sections_full)
multi_stage_details %>% tail()
```


<!--chapter:end:itinerary.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/stage-results-")
```
# Visualising Results for a Single Stage

In this chapter, we'll introduce some basic chart and chartable techniques for displaying stage timing and results data.

## Load Base Data

To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.

To begin with, load in our WRC API helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
```

Now let's grab some data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

entries = get_rally_entries(eventId)

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)
```

Get a sample stage ID:

```{r}
stageId = stages_lookup[['SS3']]
```

## Get Stage Results Data

Start by loading in some stage times data and previewing the columns available to us:

```{r}
stage_times = get_stage_times(eventId, stageId)

colnames(stage_times)
```

## Previewing Stage Results Data

Just using the stage results data, how might we display it?

Let's start with a view of the top 10. We can use the `knitr::kable()` function to provide a styled version of the table that slightly improves its appearance:

```{r}
library(knitr)

kable( head(stage_times, 10))
```

An alternative rich table formatter is the [`formattable`](https://github.com/renkun-ken/formattable) ([example usage](https://www.displayr.com/formattable/)) *R* package which builds on `kable()` and provides even more comprehensive support,, including cell colour highlighting, for rendering tables in a stylised way. In interactive HTML environments, the tables are rendered as an HTML widget, which allows for even more customisation, such as the inclusion of interactive HTML sparklines.

```{r}
library(formattable)

formattable( head(stage_times, 10) )
```

The data itself looks quite cryptic, so we need to convert it to something a little bit more human readable. To enrich the display, we might want to add in information relating to a stage, rather than just refer to it by stage ID, or to describe each entry in rather more detail than just by the entry ID.

The way the table is actually presented may also mean that not all the columns may be displayed, so reducing the number of columns would presumably help address that, in part at least.

### Adding Entry Metadata

In the first instance, it would probably make sense to pull in some human readable  data about each entry:

```{r}
cars = get_car_data(entries)

cars %>% head(2)
```

We can the merge this data into our original table, and filter out some of the less useful columns. Since the driver code may not be unique, we should retain the driver `entryId` in the table and then suppress its display when we render the dataframe. We'll also limit ourselves to just the top 10 results.

```{r}
top10_display_cols_base = c('position', 'identifier', 'code',
                            #'drivername', 'codrivername',
                            #'groupname', 'entrantname',
                            #'classname', 'eligibility',
                            #'elapsedDuration',
                            # gap is the time delta between a driver
                            # and the leader; diff (or interval)
                            # is the difference between a driver
                            # and the driver immediately ahead
                            'TimeInS', 'gap', 'diff')

top10_stage_times = stage_times %>%
                      # A minor optimisation step to 
                      # limit the amount of merging
                      arrange(position) %>%
                      head(10) %>%
                      # Merge in the entries data
                      merge(cars, by='entryId')  %>%
                      # Convert milliseconds to seconds
                      mutate(TimeInS = elapsedDurationMs/1000,
                             gap = diffFirstMs/1000,
                             diff = diffPrevMs/1000)  %>%
                      # Limit columns and set column order
                      select(all_of(top10_display_cols_base),
                             'entryId') %>%
                      # The merge may upset the row order
                      # so reset the order again
                      arrange(position) %>%
                      # Improve column names by renaming them
                      rename(Pos=position,
                             Car = identifier,
                             Code = code,
                             `Time (s)` = TimeInS,
                             Gap = gap, Diff = diff)

top10_stage_times %>% head(3) %>% formattable()
```

We can suppress the display of the *entryId* colum to keep the table tidy:

```{r}
top10_stage_times %>% head(3) %>% formattable(list(entryId=FALSE))
```

## Adding Stage Metadata to Table Captions

To improve the table further, we may want to add a caption to the table describing the stage to which the results actually refer.

The caption might include the stage code and the stage name, for example, and perhaps the stage distance. It might also be handy to retrieve the stage number so that if we are displaying several tables, we can check we present the stages in the correct running order:

```{r}
stage_cols = c('stageId', 'number', 'name', 'distance', 'code')

stage_info = stages %>%
                select(all_of(stage_cols)) %>%
                # Tidy up the stage name
                mutate(name = str_replace(name, ' \\(Live TV\\)', ''))

stage_info %>% head(2)
```

We can create a caption for our selected stage using what essentially amounts to a string template:

```{r}
stage_info_ = stage_info[stage_info['stageId']==stageId,]
# paste0() ensures there are no separators between substrings
caption = paste0('Stage ', stage_info_$code,
                 ', ', stage_info_$name, ' (',
                 stage_info_$distance, 'km)')

caption
```

We can add a caption to the table via the *caption* parameter. Using the `%>%` pipe operator to pass the dataframe as the first argument of the `formattable()` function allows us to more clearly see what parameter we need to set in the function to create the caption.

The pipe operator also allows us to limit the number of rows in the dataframe passed to the `formattable()` function via the `head()` function:

```{r}
top10_stage_times %>% head(3) %>% formattable(caption = caption,
                                              list(entryId=FALSE))
```

We can also align text within the columns:

```{r}
top10_stage_times %>% head(3) %>% formattable(align='c',
                                              list(entryId=FALSE))
```

## Colour Highlighting Stage Results

The `formattable::formattable()` function is capable of highlighting cell values in a variety of customisable ways.

One straightforward way of highlighting a table is to use colour to emphasise a ranking. Trivially, and perhaps redundantly, we might highlight stage positions for example:

```{r}
top10_stage_times %>%
    head() %>%
    formattable(align='c',
                list(Pos = color_tile("#DeF7E9", "#71CA97"),
                     entryId=FALSE))
```

This may not make so much sense when the ranking we are highlighting is the sort order of the table, but it makes more sense when we want to compare two columns, such as the stage position and the start order.

So let's also pull in the start order (that is, the road order) and see how it compares to the stage position.

TO DO:
- get `itinerarySectionId` from `stages` (`stages$itinerarySectionId`)
- get `startListId` (`get_startlist_id(itinerary, itinerarySectionId)`)
- lookup startlist details (`get_startlist(eventId, startListId)[,c('entryId','order')]`)
- merge startlist data into stage result

We can also explore highlights based on conditional requirements. For example, we can emphasise differences that exceed a specific amount: 

```{r}
large_diff = 2

formattable(top10_stage_times,
            list(Diff = formatter("span",
                          style = x ~ style(font.weight =
                                              ifelse(x>=large_diff,
                                                     "bold", 'normal'))),
                 entryId=FALSE)) 
```

Alternatively, we can add a coloured bar that depicts the increasing gap time down the leaderboard. If we pass an 8 hex digit colour code, rather than a sic digt RGB hex colour code, we can modify the transparency of the colour bar:

```{r}
#https://www.displayr.com/formattable/
unit.scale = function(x) (x - min(x)) / (max(x) - min(x))

formattable(top10_stage_times,
            list(Gap = color_bar("#FA614B66", fun = unit.scale),
                 entryId=FALSE))

```

There seems to be an edge effect there for the zero gap value? Let's see if we can tidy that up a bit:

```{r}
new_color_bar <- function(color = "lightgreen", ...){
  formatter("span",
            style = function(x) style(
              display = "inline-block",
              direction = "rtl", 
              `unicode-bidi` = "plaintext",
              "border-radius" = "4px",
              "background-color" = color,
              width = percent(proportion(abs(as.numeric(x)), ...))
            ))
}

formattable(top10_stage_times,
            list(Gap =  new_color_bar("#FA614B66"),
                 entryId=FALSE))
```

The edge effect is gone, but the default `color_bar()` function doesn't seem to render the values very well where the bars is narrow, at least when the table is rendered to HTML using `bookdown`.

If we provide am alternative color bar function that makes use of a CSS linear gradient to create the bar rather than a setting the width of text cell and colouring its background we can decouple the color bar and the size of the text area:

```{r}
bg = function(start, end, color, ...) {
  paste("linear-gradient(90deg,transparent ",percent(start),",",
        color, percent(start), ",", color, percent(end),
        ", transparent", percent(end),")")
} 

color_bar2 =  function (color = "lightgray", fun = "proportion", ...) 
{
    fun <- match.fun(fun)
    formatter("span", style = function(x) style(display = "inline-block",
                `unicode-bidi` = "plaintext", 
                "background" = bg(1-fun(as.numeric(x), ...), 1, color), "width"="100%" ))
}

top10_stage_times %>% formattable(list(Gap = color_bar2("#FA614B66"),
                                       entryId=FALSE))
```

### Heatmap Style Column Cell Backgrounds

As well as in cell bar charts, we can also use more of a heatmap style approach and colour the background down other numerical columns according to value.

```{r}
top10_stage_times %>% 
  formattable(list(Gap = color_bar2("#FA614B66"),
                   Diff = color_tile("#DeF7E9","#71CA97"),
                   entryId=FALSE))
```

## An Aside — Calculating DIFF and GAP times

As has previously been mentioned, the *GAP* to leader and  +/- *DIFF* times to any car placed directly ahead of a particular car, are typical across many forms of motorsport timing screen. In rally terms, these metrics might apply either in  overall rally  terms *or* in stage terms; in circuit racing, the measures might apply relative to overall race position *or* track position.

The *GAP* (time to leader) is calculated as the difference between a time associated with the current leader and a similarly measured time associated with every other driver.

The $\textrm{GAP}$ between driver in first position, $i=1$, and the driver in the $j$'th position is given in various abuses of the notation as:

$$\textrm{GAP}{_j}=t_{j,\textrm{GAP}}=t_{j,1,DIFF}=t_j-t_1$$

Alternatively, we can calculate the gap as the sum of differences between consecutively placed drivers, $j\neq1$, and the leader. The interval or DIFF between drivers in positions $i$ and $j$, where $i$ is ahead of $j$ (that is, $i<j$) and the driver in first position has $i=1$ is given as:

$$\textrm{DIFF}_{j,i}=t_{j,i,\textrm{DIFF}}=t_{j,i}=t_j-t_i: i<j, t_0=t_1$$

Strictly, $\textrm{GAP}_j=\textrm{DIFF}_{j,1}$.

To specify a particular stage, we might use ${_S}\textrm{GAP}{_j}$ and ${_S}\textrm{DIFF}{_j}$.

The $\textrm{GAP}$ between a driver in position $j=P$ and the leader $i=1$ is then:

$$\textrm{GAP}_j=t_{j,\textrm{GAP}}=\textrm{DIFF}_{2,1}+\textrm{DIFF}_{3,2}+..+\textrm{DIFF}_{P,P-1}$$

We can write this more succinctly as:

$$\textrm{GAP}{_j}=t_{j,\textrm{GAP}}=0+\sum_{m=1}^{j}\textrm{DIFF}_{m,m-1}=\sum_{m=1}^{j}\left ( t_m-t_{m-1} \right ): j\ge1, t_0=t_1$$

We can implement these calculations directly as follows:

```{r}
top10_stage_times %>% 
  mutate(DIFF = c(0, diff(`Time (s)`)),
         GAP = cumsum(DIFF)) %>%
  select(c('Car', 'Gap', 'GAP', 'Diff', 'DIFF')) %>%
  formattable(caption = caption,
              list(entryId=FALSE))
```

## Rebasing Stage Results

Simple as they are, the *GAP* and *DIFF* times are very powerful: for any driver, we can see how far off the stage winning time they were (the `Gap`) and by summing appropriate  `Diff` values you can quickly determine the time difference between any two drivers.

However, if we are interested in a particular driver, we can "rebase" the table to show the time differences between that driver and the other drivers explicitly.

To rebase times, $t_i$ for a set of drivers, $i$, relative to a particular driver, $j$, we set:

$$
t_{i}^{j} = t_i - t_j
$$
For a stage $S$, we might extend the notation to write:

$$
{_S}t_{i}^{j} = {_S}t_{i} - {_S}t_{j}
$$

using the simpler form with the $S$ prefix where the stage is known.

We might also abuse the $textrm{GAP}$ notation to specify a rebased time $_S\textrm{GAP}_{i,j}={_S}t_{i}^{j}$ noting ${_S}\textrm{GAP}_i={_S}\textrm{GAP}_{i,1}={_S}\textrm{DIFF}_{i,1}$.

In passing, we note that we can calculate the overall rally time (without penalties) for driver $i$, up to and including stage $N$ as: 

$$
{_N}T_{i}=\sum_{S=1}^{N}{_S}t_{i}
$$

The overall time at the end of the rally is then given as:

$$
T_{i}=\sum_{S=1}^{S{_{max}}}{_S}t_{i} + {penalties}_i
$$

Let's see how the rebasing works.

First, get a driver code:

```{r}
ogier = get_person_id(cars, 'ogier', ret='identifier')
ogier
```

Now we can start to build up a rebase function that takes a data frame, an *entryId* and a set of columns we want to rebase.

To begin with, we note that we can rebase a single column simply by finding the value associated with a particular driver in that column and subtracting that value from each row in the column.

For example, we can get Ogier's stage time:

```{r}
ogier_time = top10_stage_times[top10_stage_times[['Car']]==ogier,
                               "Time (s)"]
ogier_time
```

And we can then subtract that time from every other car's time:

```{r}
top10_stage_times$`Time (s)` - ogier_time
```

To rebase more than one column, we can specify a list of columns we want to rebase and then process the response as a named list before subtracting the items in that named list from each of the correspondingly named columns in each row of the dataframe:

```{r}
#https://stackoverflow.com/a/32267785/454773
rebase_cols = c('Time (s)', 'Gap')

df = top10_stage_times

# From each row, select specific columns
# From those values subtract correspondingly named items
# representing the times in those columns for our specified driver
df[,rebase_cols] - c(df[df$Car==ogier, rebase_cols])
```

Let's put that into a function, generalised to allow is to specify which column we want to use as a rebasing identifier column. Optionally allow the return of either just the rebased columns (and identifier) or the complete dataframe, including rebased columns, we well as the ability to "flip" the basis of the differences:

```{r}
rebase = function(df, id, rebase_cols,
                  id_col='entryId', base=FALSE,
                  base_id=FALSE, flip=FALSE) {
  
  df_ =  df
  
  rebase_cols = as.character(rebase_cols)
  
  # The rebase values are the ones
  # we want to subtract from each row
  rebase_vals = c(df[df[[id_col]]==id, rebase_cols])
  
  # Do the rebasing
  df_[,rebase_cols] =  df[,rebase_cols] - rebase_vals
  
  if (flip)
    df_[,rebase_cols] =  -df_[,rebase_cols]

  df_[[id_col]] = df[[id_col]]
  
  # Return just the rebased and identifier columns or the
  # whole dataframe
  cols = rebase_cols
  if (base_id)
    cols = c(id_col, cols)
  if (base)
    df_ %>% select(cols)
  else
    df_
}
```

We can now rebase the stage times across one or more columns relative to a specified driver:

```{r}
rebase_cols = c('Time (s)', 'Gap')

rov = get_person_id(cars, 'rov', ret='identifier')

rov_rebased_gap = rebase(top10_stage_times, rov, rebase_cols,
                           id_col='Car')
rov_rebased_gap
```

The rebased time dataframe makes it easier to see how a specified driver compares with every other driver. But can we make the differences jump out in a more striking fashion?

## Colour Highlighting Rebased Values
In the rebased tables, we are likely to be presented with a range of positive and negative values within a rebased column.

We can highlight the positive and negative values using colour. For example:

```{r}
formattable(rov_rebased_gap,
            list(Gap = formatter("span",
                          style = x ~ style(color = ifelse(x<0,
                                              "red", 
                                              ifelse(x>0, 'green', 'grey')),
                                            # Example additional style
                                            font.weight = ifelse(abs(x)>=2,
                                                            'bold',
                                                            'normal') )),
                 entryId=FALSE))
```

Although the `formattable()` function does not directly support divergent colour indicators, we can create a custom formatter that does provide such a view over the data.

For example, we can create a mapping that will display coloured backgrounds that diverge around the zero value to give distinct hues for positive and negative values.

The easiest way to render such a mapping is to map the rage of value onto the unit range, and map the 0 value in the original range to the 0.5 value in the normalised unit range.

The following function will create a normalised range across a set of positive *and* negative values, mapping the origin (0) to the normalised 0.5 value:

```{r}
xnormalize = function(x){
  # Normalise to the full range of values about 0
  # O will map to 0.5 in the normalised range
  x = c(x, -max(abs(x)), max(abs(x)))
  normalize(x)[1:(length(x)-2)]
}
```

Let's see how it works:

```{r}
xnormalize(c(-1, 0, 2))
```

We can now define a custom mapping to render red and green palettised backgrounds depending on whether the value is negative or positive. To maintain contrast in the displayed text values, we can render white or black text depending on the likely intensity of the background colour:

```{r}
#https://stackoverflow.com/a/49887341/454773
color_tile2 <- function (...) {
  formatter("span", style = function(x) {
    style(display = "block",
          'text-align' = 'center',
          padding = "0 4px", 
          `border-radius` = "4px",
          `font.weight` = ifelse(abs(x)> 0.3*max(x), "bold", "normal"),
          color = ifelse(abs(x)> 0.3*max(x),'white',
                         ifelse(x==0,'lightgrey','black')),
          `background-color` = csscolor(matrix(as.integer(colorRamp(...)(xnormalize(as.numeric(x)))), 
                byrow=TRUE, 
                dimnames=list(c("red","green","blue"), NULL),
                nrow=3)))
  })}

formattable(rov_rebased_gap, align='c',
            list(Gap = color_tile2(c("red",'white', "forestgreen")),
                 entryId=FALSE))
```

*For further possible discussions about divergent palette definitions, see [here](https://stackoverflow.com/questions/37482977/what-is-a-good-palette-for-divergent-colors-in-r-or-can-viridis-and-magma-b).*

```{r}
# Recall the CSS style from previously
#bg = function(start, end, color, ...) {
#  paste("linear-gradient(90deg,transparent ",percent(start),",",
#        color, percent(start), ",", color, percent(end),
#        ", transparent", percent(end),")")
#} 

pm_color_bar2 <- function(color1 = "lightgreen", color2 = "pink", ...){
  formatter("span",
            style = function(x) style(
              display = "inline-block",
              color = ifelse(x> 0,'green',ifelse(x<0,'red','lightgrey')),
              "text-align" = ifelse(x > 0, 'left', ifelse(x<0, 'right', 'center')),
              "width"='100%',
              "background" = bg(ifelse(x >= 0, 0.5,xnormalize(x)),
                                ifelse(x >= 0,xnormalize(x),0.5),
                                ifelse(x >= 0, color1, color2))
            ))
}

rov_rebased_gap %>%
  formattable(align='c',
              list(Gap = pm_color_bar2(),
                   entryId=FALSE))
```

<!--chapter:end:stage-results.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/multi-stage-results-")
```
# Visualising Results for Multiple Stages

As well as visualising the results for a single stage, we might want to visualise the results over multiple stages. The basic overall results can be retrieved from a single call to the WRC results API, but to view the stage times and rankings across multiple stages requires retrieving detailed for each stage and then combining it into a single dataframe.

## Load Base Data

To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.

To begin with, load in our WRC API helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
library(tidyr)
```

Now let's grab some data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

entries = get_rally_entries(eventId)

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)

stage_list = get_stage_list(stages)
stage_codes = stages$code
# To generate stage codes as an ordered factor:
# factor(stages$code, levels = stages$code)
```

## Retrieving Mutliple Stage Results

To being with, lets get the overall results at the end of each stage:

```{r}
multi_overall_results = get_multi_overall(stage_list)

multi_overall_results  %>% tail(2)
```

### Reshaping Overall Position Data

We can reduce the amount of data by casting the long raw result to a wide format, widening the data on a particular field of interest. For example, we can widen generate a wide dataframe describing overall positions, ${_S}o$ at the end of each stage, where a particular driver's position is given as ${_S}o_i$:

```{r}
multi_overall_wide_pos = multi_overall_results %>%
                            get_multi_stage_generic_wide(stage_list,
                                                         'position')

multi_overall_wide_pos %>% head(2)
```

### Reshaping Overall Rally Time Data

We can also create a wide format report of the overall times, where each column gives the overall, accumulated rally time up to and including each stage, ${_S}T$; each cell then represents the accumulated time for a particular driver, ${_S}T_i$.

The times themselves appear in units of milliseconds, so first create a column corresponding to time in seconds, then widen using those values:

```{r}
multi_overall_results = multi_overall_results %>%
                            mutate(totalTimeS = totalTimeMs/1000)

multi_overall_wide_time = multi_overall_results %>%
                              get_multi_stage_generic_wide(stage_list,
                                                           'totalTimeS')

multi_overall_wide_time %>% head(2)
```

We note that with stages presented in order, the rally time is strictly increasing across the rows.

We further note that we can derive stage times from the overall rally times by calculating the columnwise differences ${_S}t={_S}T-{_{S-1}}T: 1<S<N$ for an $N$ stage rally.

### Reshaping Time to First Data

Another useful time is the *time to first*, which is to say, the *gap*, ${_S}GAP_i$. Noting that the overall rally leader may change at the end of each stage, this measure is essentially a rebasing measure relative to a particular position rather than a particular driver:

```{r}
multi_overall_results = multi_overall_results %>%
                            mutate(diffFirstS = diffFirstMs/1000)

multi_overall_wide_gap = multi_overall_results %>%
                            get_multi_stage_generic_wide(stage_list,
                                                         'diffFirstS')

multi_overall_wide_gap %>% head(2)
```

However, we could also calculate the gap to leader from the overall times, first by identifying the minimum accumulated time at each stage (that is, the minimum time, excluding null values, in each overall time column) and then by subtracting those values from each row in the overall times dataframe, which is to say:

$$
\textrm{GAP}_{i} = {_S}t_i - \textrm{min}({_S}T)
$$

If we try to subtract a list of values (for example, $\forall S \in [1 \le S \le N]: \textrm{min}({_S}T)$) from an *R* dataframe, we need to tell *R* how we want that subtraction performed. Internally, the dataframe is represented as a long list of values made up from values in the first column, then the second, and so on. If we subtract a list of *N* values from the dataframe, the values are selected from the first *N* items in this long serialised version of the dataframe, then the next *N* values and so on.

So to subtract a "dummy" row of values from the dataframe, we need another approach. The `purrr::map2df()` function allows us to apply a function, in this case the subtraction `-` function, with a set of specified values we want to subtract, from each row in the dataframe.

So let's create a set of values representing the minimum overall time in each stage. The `matrixStats::colMins()` function will find the minimum values by row from a matrix, so cast the stage time columns from the wide dataframe to an appropriately sized matrix and then find the minimum in each column, ignoring null values:

```{r}
overall_m = as.matrix(multi_overall_wide_time[,as.character(stage_list)],
                      ncols=length(stage_list))

mins_overall = matrixStats::colMins(overall_m, na.rm=TRUE)

mins_overall
```

We can now subtract this "dummy row" of values from each row in the dataframe to find the gap to leader for each row on each stage:

```{r}
purrr::map2_df(multi_overall_wide_time[,as.character(stage_list)], 
              mins_overall, `-`) %>% head(2)
```

Comparison with the "diffToFirst" times should show them to be the same.

### Mapping Stage and Driver Identifiers to Meaningful Labels

To improve the look of the table, we might use stage codes and driver codes to label the columns and identify the rows.

To start with, we can map the column names that correspond to stage codes via a lookup list of stage ID to stage code values:

```{r}
map_stage_codes = function(df, stage_list) {
  # Get stage codes lookup id->code
  stages_lookup_code = get_stages_lookup(stages, 'stageId', 'code')
  
  #https://stackoverflow.com/a/34299333/454773
  plyr::rename(df, replace = stages_lookup_code,
               warn_missing = FALSE)
}

multi_overall_wide_time = multi_overall_wide_time %>%
                            map_stage_codes(stage_list)

multi_overall_wide_time %>% head(2)
```

We can also create a function to replace the entry ID with the driver code.
Note also the select statement at the end that puts the columns into a sensible order:

```{r}
cars = get_car_data(entries)

map_driver_names = function(df, cars){
  df %>%
    merge(cars[,c('entryId','code')],
          by='entryId')  %>%
    # Limit columns and set column order
    select(-'entryId') %>%
    # Move last column to first
    select('code', everything())
}

multi_overall_wide_time = multi_overall_wide_time %>%
                            map_driver_names(cars)

multi_overall_wide_time %>% head(2)
```

### Rebasing Overall Times

We can rebase the overall times with respect to a particular driver in the normal way, calculating the difference between each row and the row corresponding to a specified driver:

```{r}
example_driver = multi_overall_wide_time[2,]$code

overall_wide_time_rebased = rebase(multi_overall_wide_time,
                                   example_driver, stage_codes,
                                   id_col='code')

overall_wide_time_rebased %>% head(3)
```

### Finding Changes in Rebased Gaps Across Stages

The rebasing operation essentially allows us to select a row of times for one particular driver and then subtract that row from every other row to give us a direct comparison of the gap between a specified driver and every other driver.

But we can also perform a consecutive column-wise differencing operation on the rebased times that allows to see how much time was gained or relative to a particular driver in going from one stage to the next (observant readers may note that this results in the rebased stage time for each stage...).

To subtract one column from the next, create two offset dataframes, one containing all but the first stage (first stage column) and one containing all but the last stage (final stage column). If we subtract one dataframe from the other, it gives us our column differences. Inserting the original first column back in its rightful place gives us the columnwise differences table:

```{r}
#https://stackoverflow.com/a/50411529/454773
df = overall_wide_time_rebased

# [-1] drops the first column, [-ncol()] drops the last
 df_ = df[,stage_codes][-1] - df[,stage_codes][-ncol(df[,stage_codes])]
  
# The split time to the first split is simply the first split time
df_[stage_codes[1]] = df[stage_codes[1]]
# Return the dataframe in a sensible column order
df_ %>% select(stage_codes) %>% head(3)
```

*(A similar technique could be used to recreate stage times from the overall times.)*

## Visualising Overall Results

The overall stage results provides information regarding the overall times and positions at the end of each stage; the rebased overall times provide us with gap information from a specified driver to every other driver.

So how might we use exploratory data visualisation techniques to support a conversation with that data or highlight potential stories hidden within it?

### Visualising First Position

One way of enriching the wide position table might be to highlight the driver in first position at the end of each stage. We can do this using the `formattable::formattable()` function.

First, let's tidy up the overall position table:

```{r}
multi_overall_wide_pos = multi_overall_wide_pos %>%
                            map_stage_codes(stage_codes) %>%
                            map_driver_names(cars)

multi_overall_wide_pos %>% head(2)
```

We can also reorder the table by the final stage position, as described by the last stage code in the `stage_codes` list. However, because we want to sort on a column name as provided by a variable, we need to use the `!!` operator to force the evaluation of the single variable value to the column name symbol (`as.symbol()`):

```{r}
multi_overall_wide_pos = multi_overall_wide_pos %>%
              dplyr::arrange(!!as.symbol(stage_codes[length(stage_codes)]))

multi_overall_wide_pos %>% head(2)
```

Create a function to highlight the first position car:

```{r}
library(formattable)

highlight_first =  function (...) 
{
  formatter("span",
            style = function(x) ifelse(x==1,
                                       style(display = "block", 
                                             padding = "0 4px",
                                             `color` = "black",
                                             `column-width`='4em',
                                             `border-radius` = "4px",
                                             `background-color` = 'lightgrey'),
                                      style()))
}
```

And then use that function to help style the table:

```{r}
multi_overall_wide_pos %>%
      head(3) %>%
      formattable(# Align values in the center of each column
                  align='c',
                  list(area(col = stage_codes) ~ highlight_first()))
```

### Visualising Position Changes Over a Rally

Another useful way of summarising the positions is to chart showing the evolution of position changes.

The chart is constructed most straightforwardly from tidy (long format) data:

```{r}
overall_pos_long_top10 <- multi_overall_wide_pos %>%
                              head(10) %>%
                              #gather(key ="Stage",
                              #       value ="Pos",
                              #       stage_codes)
                              # pivot longer replaces gather
                              pivot_longer(c(stage_codes),
                                           names_to ="Stage",
                                           values_to ="Pos")

overall_pos_long_top10 %>% head(3)
```

It will be convenient for the stage codes to be represented as ordered factors:

```{r}
overall_pos_long_top10 = overall_pos_long_top10 %>%
                             mutate(Stage = factor(Stage,
                                                   levels = stage_codes))

overall_pos_long_top10$Stage[1]
```

We can then create a bump chart style plot showing how each driver's positioned changed across stages:

```{r simple_pos_chart, message=FALSE, warning=FALSE}
library(ggplot2)

pos_range = 1:max(overall_pos_long_top10$Pos)
g_pos = ggplot(overall_pos_long_top10, aes(x=Stage, y=Pos)) +
                  geom_line(aes(group = code)) +
                  # Invert scale and relabel y-axis
                  # https://stackoverflow.com/a/28392170/454773
                  scale_y_continuous(trans = "reverse",
                                     breaks = pos_range) +
      theme_classic()

g_pos
```

We can producing a cleaner chart by adding driver labels to the start and end of each line using the `directlabels:geom_dl()` function, as well as dropping the axes:

```{r cleaner_pos_chart, message=FALSE, warning=FALSE}
library(directlabels)

g_pos +
    geom_dl(aes(label = paste0(' ',code)), # Add space before label
            # Add label at end of line
            method = list('last.bumpup',
                          # cex is text label size
                          cex = 0.5)) +
    geom_dl(aes(label = paste0(code, ' ')), # Add space before label
            # Add label at start of line
            method = list('first.points', cex = 0.5)) +
    theme_void()
```

In the above chart, you may notice a gap at first stage position 6 where Ogier was originally placed. A more robust way to prepare the data for this sort of charts is to filter the data by class, for example limiting the data to cars in the *WRC* group/class, and then reranking the position by group/class. A finished position chart can then position the drivers by class ranking and use labels to overplot actual overall rally positions on stages where the overall stage position differs from the class rank.

### Visualising Position Gains/Losses

To visualise position changes for a driver from one driver to the next, we can create a table of position differences. Let's abstract out the code we used to find differences between columns in to a function:

```{r}
coldiffs = function(df, cols, dropfirst=FALSE, firstcol=NULL){
  cols = as.character(cols)
  # [-1] drops the first column, [-ncol()] drops the last
  df_ = df[,cols][-1] - df[,cols][-ncol(df[,cols])]
  
  # The split time to the first split is simply the first split time
  df_[cols[1]] = df[cols[1]]
  # Return the dataframe in a sensible column order
  df_ = df_ %>% select(cols)
  
  if (!is.null(firstcol))
    df_[, cols[1]] = firstcol
  
  if (dropfirst)
    df_[,cols][-1]
  else
    df_
}
```

Let's put that function through its paces. First, we can drop the first column:

```{r}
coldiffs(multi_overall_wide_pos, stage_codes, dropfirst=TRUE) %>% head(2)
```

Then we can retain the first column and replace it with a specified value:

```{r}
coldiffs(multi_overall_wide_pos, stage_codes, firstcol=999) %>% head(2)
```

We can now fund the position changes from stage to stage as well as tidying up the identifiers:

```{r}
pos_diffs = multi_overall_wide_pos %>% coldiffs(stage_codes,
                                                firstcol=0)
pos_diffs$code = multi_overall_wide_pos$code

# Reorder the columns by moving last column to first
pos_diffs = pos_diffs %>% select('code', everything())

pos_diffs %>% head(3)
```

Note that we need to be careful with the sense of how we read this table: a *negative* position change means the driver has *improved* their position. It might be more meaningful to have position gain/loss, rather than strict position difference columns, where a positive value denotes an improved position:

```{r}
pos_gains = pos_diffs
pos_gains[,stage_codes] = -pos_gains[,stage_codes]

pos_gains %>% head(3)
```

One  way of highlighting position changes is to use coloured up/down arrows:

```{r}
updown = function(...){
  formatter("span", 
            style = function(x) style(color = ifelse(x>0,
                                                     "green",
                                                     ifelse(x<0,
                                                            "red",
                                                            "lightgrey"))),          
            function(x) icontext(ifelse (x >0,
                                         # i.e. gained position
                                         "arrow-up",
                                         ifelse (x < 0,
                                                 # Lost position
                                                 "arrow-down" ,
                                                 # No position change
                                                 "resize-horizontal"))))
}
```

Let's see how that works (note that we need to cast the table `as.htmlwidget()` in order to render the arrows appropriately :

```{r}
pos_gains %>%
  head(3) %>% 
  formattable(list( 
    area(col = stage_codes) ~ updown())) %>%
  as.htmlwidget()
```

We can extend the formatter to also display the number of positions gained or lost:

```{r}
updown2 = function(...){
  formatter("span", 
            style = function(x) style(color = ifelse(x>0,
                                                     "green",
                                                     ifelse(x<0,
                                                            "red",
                                                            "lightgrey"))),          
            function(x) icontext(ifelse (x >0,
                                         # i.e. gained position
                                         "arrow-up",
                                         ifelse (x < 0,
                                                 "arrow-down" ,
                                                 "resize-horizontal")),
                                 # Add in the pos change value
                                 ifelse (x!=0, paste0('(',abs(x),')'),'')))
}
```

Let's see how it looks:

```{r}
pos_gains %>%
  head(3) %>% 
  formattable(list( 
    area(col = stage_codes) ~ updown2())) %>%
  as.htmlwidget()
```

Another way of visualising position changes is to create a simple summarising sparkline using the `sparkline::spk_chr()` function.

This requires us first to cast the data into a long format:

```{r}
library(sparkline)

pos_gain_long_top10 <- pos_gains %>%
                              head(10) %>%
                              gather(key ="Stage",
                                     value ="PosChange",
                                      stage_codes)

pos_gain_long_top10 %>% head(3)
```

We can then generate sparklines showing position changes:

```{r}
pos_gain_sparkline_top10 <- pos_gain_long_top10 %>%
                                group_by(code) %>%
                                summarize(spk_ = spk_chr(PosChange,
                                                         type ="bar"))

# We need to create an htmlwidget form of the table
out = as.htmlwidget(formattable(pos_gain_sparkline_top10))

# The table also has a requirement on the sparkline package
out$dependencies = c(out$dependencies,
                     htmlwidgets:::widget_dependencies("sparkline",
                                                       "sparkline"))
out
```

*Note that to render the sparkline, we need to cast the formatted table to an `htmlwidget` and also ensure that the required `sparkline` Javascript package is loaded into the widget.*

One issue with the sparkline bars is that the scales may differ. For example, across different drivers, a position change of +1 for one driver may have the same height as a position change of +2 for another driver.

### Per Driver Position Charts

As well as generating summary chapters over a set of drivers, we can also generate charts on a per driver basis, cf. sparkline charts.

For example, we can create a simple chart that captures a single driver's position over several stages, optionally using the `gt::ggplot_image()` function to create an HTML embeddable image tag with the chart encoded as a data URI:

```{r single-driver-pos, warning=FALSE}
rovCode = get_person_id(cars, 'rov', ret='code')

get_pos_chart = function(df_long, code, embed=FALSE,
                         height=30, aspect_ratio=1, size=5) {
  # Get the data for the specified driver
  subdf = df_long[df_long['code']==code,]
  
  ymax = max(10.6, max(subdf$Pos)+0.1)
  
  g = ggplot(subdf,
             aes(x=as.integer(Stage), y=Pos, group=code)) +
      geom_step(direction='mid', color='blue', size=size) +
      geom_hline(yintercept=0.8, linetype='dotted',
                 size=size, color='black') +
      geom_hline(yintercept=3.35, linetype='dotted', 
                 size=size, color='black') +
      geom_hline(yintercept=10.5, color='darkgrey',
                 size=size) +
      #scale_y_continuous(trans = "reverse") +
      scale_y_reverse( lim=c(ymax, 0.8)) +
      theme_void() + scale_x_continuous(expand=c(0,0)) #+
      #theme(aspect.ratio=0.1)
  
  if (embed)
    gt::ggplot_image(g, height = height, aspect_ratio=aspect_ratio)
  else
    g
}

get_pos_chart(overall_pos_long_top10, rovCode)
```

If we generate an image for each driver, we can then create a column of images to showing the change in position over the rally for each one on a row by row basis: 

```{r}
overall_wide_pos_top5 = multi_overall_wide_pos %>% head(5)

top5codes = overall_wide_pos_top5$code

gt_pos_plots = list()
# Iterate through each driver in the top 5
for (c in 1:length(top5codes)){
    # Add each plot to the plot list
    # The split is generated for the top 5
    gt_pos_plots[[length(gt_pos_plots) + 1]] <-
        get_pos_chart(overall_pos_long_top10, top5codes[c],
                      embed=T, aspect_ratio=3, size=5)
  }
```

We can then add the charts to the wide timing results dataframe as an extra column:

```{r warning=FALSE}
overall_wide_pos_top5$poschart = gt_pos_plots

formattable(overall_wide_pos_top5)
# How do we suppress stripes in formattable tables?
```

### Visualising Time to First

A convenient way of visualising the gap to leader across stages is to create a sparkline using the `sparkline::spk_chr()` function. This function requires data in a tidy (long) format which is then grouped for each driver. 
Let's remind ourselves of what the data looks like:

```{r}
multi_overall_wide_gap_top10 = multi_overall_wide_gap %>%
                              map_stage_codes(stage_list) %>%
                              map_driver_names(cars) %>%
                      dplyr::arrange(!!as.symbol(stage_codes[length(stage_codes)])) %>%
                      head(10)

multi_overall_wide_gap_top10
```

We can create the required long form data from the wide gap table as follows, retrieving just the top 10 drivers based on the gap on the final stage, for convenience:

```{r}
overall_long_gap_top10 <- multi_overall_wide_gap_top10 %>%
                            gather(key ="Stage",
                            value ="Gap", stage_codes)

overall_long_gap_top10 %>% head(3)
```

With the data in the appropriate form, we can create the sparkline, using a bar chart format:

```{r}
overall_long_gap_top10 <- overall_long_gap_top10 %>%
                                group_by(code) %>%
                                summarize(spk_ = spk_chr(-Gap,
                                                         type ="bar"))

spark_df = function(df){
  # We need to create an htmlwidget form of the table
  out = as.htmlwidget(formattable(df))

  # The table also has a requirement on the sparkline package
  out$dependencies = c(out$dependencies,
                     htmlwidgets:::widget_dependencies("sparkline",
                                                       "sparkline"))
  out
}

spark_df(overall_long_gap_top10)
```

### Visualising Rebased Gaps

As well as using the sparkline bar chart to visualise the gap to leader, we can use a similar approach to visualise the gap to other drivers following a rebasing step.


```{r}
overall_wide_gap_rebased = rebase(multi_overall_wide_gap_top10,
                                   example_driver, stage_codes,
                                   id_col='code')
overall_wide_gap_rebased
```

And now generate the sparklines:

```{r}
overall_spark_gap_rebased <- overall_wide_gap_rebased %>%
                            gather(key ="Stage",
                            value ="Gap", stage_codes) %>%
                            group_by(code) %>%
                            summarize(spk_ = spk_chr(Gap, type ="bar"))

spark_df(overall_spark_gap_rebased)
```

## Retrieving Stage Times and Results

This far, we have focused on exploring the overall results data. If required, we can also retrieve detailed results for multiple stages by requesting stage results for a specified list of stages:

```{r}
multi_stage_times = get_multi_stage_times(stage_list)
  
multi_stage_times %>% tail(2)
```

We can then cast the data to a wide format and relabel the resulting dataframe.

One recipe for doing this is pretty well proven now, so let's make it more convenient:

```{r}
relabel_times_df = function(df, stage_list, cars) {
  df %>%  
      map_stage_codes(stage_list) %>%
      map_driver_names(cars)
}
```

And let's create another function to relabel a dataframe and grab the top 10:

```{r}
clean_top10 = function(df) {
  df %>% relabel_times_df(stage_list, cars) %>%
          dplyr::arrange(!!as.symbol(stage_codes[length(stage_codes)])) %>%
          head(10)
}          
```

We can now get a wide format dataframe containing individual stage times cleaned and reduced to the top 10:

```{r}
multi_stage_times_wide = multi_stage_times %>%
                            get_multi_stage_times_wide(stage_list) %>%
                            clean_top10()

multi_stage_times_wide
```

Or we can get the stage rankings for each stage:

```{r}
multi_stage_wide_pos = multi_stage_times %>%
                              get_multi_stage_generic_wide(stage_list,
                                                         'position')  %>%
                            clean_top10()

multi_stage_wide_pos %>% head(2)
```

Or the gap to stage winner:

```{r}
multi_stage_wide_gap = multi_stage_times %>%
                            mutate(diffFirstS = diffFirstMs/1000) %>%
                            get_multi_stage_generic_wide(stage_list,
                                                         'diffFirstS')  %>%
                            clean_top10()

multi_stage_wide_gap %>% head(2)
```

## Visualising Stage Times

Two of the most useful chart types, at least in terms of glanceable displays, that we can generate from the detailed stage times over the course of the rally are at the individual level: sparklines showing the individual stage positions and the gap to stage winner.

So let's start off with those before looking at grouped and individual stage position charts.

### Stage Position Sparklines

Our pattern is now pretty well proven so we can routinise our sparkline production:

```{r}
generate_spark_bar = function(df, col, typ='Gap'){
  df %>% gather(key ="Stage",
                value =!!typ, stage_codes) %>%
        group_by(code) %>%
        summarize(spk_ = spk_chr(-!!as.symbol(typ), type ="bar"))
}
```

Let's see how it works:

```{r}
multi_stage_long_pos = generate_spark_bar(multi_stage_wide_pos)
spark_df(multi_stage_long_pos)
```

### Gap to Stage Winner Sparklines

For the gap to stage winner, 

```{r}
multi_stage_long_gap = generate_spark_bar(multi_stage_wide_gap)

spark_df(multi_stage_long_gap)
```

### Stage Position Charts

Let's reuse the approach we used before for generating the position chart:

```{r stage_position_chart, warning=FALSE}
multi_stage_long_pos = multi_stage_wide_pos %>%
                              pivot_longer(c(stage_codes),
                                           names_to ="Stage",
                                           values_to ="Pos") %>%
                          mutate(Stage = factor(Stage,
                                                levels = stage_codes))

pos_range = 1:max(multi_stage_long_pos$Pos)

ggplot(multi_stage_long_pos, aes(x=Stage, y=Pos)) +
                  geom_line(aes(group = code)) +
                  # Invert scale and relabel y-axis
                  # https://stackoverflow.com/a/28392170/454773
                  scale_y_continuous(trans = "reverse",
                                     breaks = pos_range) +
    geom_dl(aes(label = paste0(' ',code)), # Add space before label
            # Add label at end of line
            method = list('last.bumpup',
                          # cex is text label size
                          cex = 0.5)) +
    geom_dl(aes(label = paste0(code, ' ')), # Add space before label
            # Add label at start of line
            method = list('first.points', cex = 0.5)) +
    theme_void()

```

In this case we see the very obvious problem presented by drivers falling outside the typical position range. We really do need to organise this chart by group rank!

### Individual Stage position Charts

At the individual level, we can also reuse the approach we developed for charting overall stage positions at an individual level:

```{r stage_pos_chart_excample, warning=FALSE}
get_pos_chart(multi_stage_long_pos, rovCode)
```

```{r}
top10_codes = multi_stage_wide_pos$code

gt_stage_pos_plots = list()

# Iterate through each driver in the top 5
for (c in 1:length(top10_codes)){
    # Add each plot to the plot list
    gt_stage_pos_plots[[length(gt_stage_pos_plots) + 1]] <-
        get_pos_chart(multi_stage_long_pos, top10_codes[c],
                      embed=T, aspect_ratio=3, size=5)
}

multi_stage_wide_pos$poschart = gt_stage_pos_plots

formattable(multi_stage_wide_pos)
```

<!--chapter:end:multi-stage-results.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/finding-stage-pace-")
```
# Finding Pace Across Stages

Average speed on a rally is all very well, but it's not the most useful of metrics for making sense of what's actually going on in a rally. Far more useful is the notion of *pace* the reciprocal of a speed like measure, that tells you how many seconds it's taking each driver to cover one kilometer.

Knowing the pace allows you to make more direct comparisons between drivers, as well as simplifying rule of thumb calculations, like what sort of pace advantage a driver needs to make up the 2s to the leader over the remaining 100 kilometers available in the final four stages...

In this chapter, we look at some simple pace calculations, rebase pace values relative to a specified driver, and explore a couple of ways of visualising differential pace over the course of a rally in the form of *pace maps* and *off-the-pace* charts.

## Load Base Data

To get the stage data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages, and from that access the stage ID for the stage or stages we are interested in.

Load in the helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
```

And get the base data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)

# Driver details
entries = get_rally_entries(eventId)
cars = get_car_data(entries)
```

Get a sample stage ID:

```{r}
stageId = stages_lookup[['SS3']]
```

## Defining Pace

With variable stage distances on a stage rally, metrics such as average speed provide one way of comparing performances across stage, calculated as $\textrm{stage_time}/\textrm{stage_distance}$ with units of *kilometers* or *miles per hour*.

A more useful measure, particularly in rally terms, is the notion of *pace*, typically given with units of *seconds per kilometer*. *Speed* tells us much quickly a car covers distance in unit time; *pace* gives us an indication of how much time is required to travel a unit distance.

When used as a rebased difference measure between drivers, pace difference allows us to rapidly calculate how much time a driver is likely to gain or lose over a particular stage distance as per the word equation $\textrm{time_gain}=\textrm{stage_distance}\cdot\textrm{pace_difference}$.

Basic pace itself is given as $\textrm{pace}=\textrm{time}/\textrm{distance}$.

Developing our rally algebra, we might identify the stage distance for stage $S$ as $d_S$. For a stage time by driver $i$ of ${_S}t_i$ the stage pace ${_S}p_i$ for driver $i$ on stage $S$ is then given as:

$$
{_S}p_i = \frac{{_S}t_i}{d_S}
$$

## Calculating Stage Pace

We can calculate stage pace from stage times and stage distances.

We can find stages distances directly from the `stages` dataframe:

```{r}
stages %>% select(c('code','distance')) %>% head(3)
```

### Calculating Pace for a Single Stage

Let's start by looking a single stage using a recipe we have used before:

```{r}
# Example stage code
stage_code = 'SS3'

stageId = stages_lookup[[stage_code]]

# Get the stage distance
stage_distance = stages[stages['code']==stage_code, 'distance']

# Get driver metadata
cars = get_car_data(entries)

# Create stage times with merged in driver metadata
stage_times = get_stage_times(eventId, stageId) %>%
                      arrange(position) %>%
                      head(10) %>%
                      # Merge in the entries data
                      merge(cars, by='entryId')  %>%
                      # Convert milliseconds to seconds
                      mutate(TimeInS = elapsedDurationMs/1000)  %>%
                      # Limit columns and set column order
                      select(c('position', 'identifier',
                               'code', 'TimeInS')) %>%
                      # The merge may upset the row order
                      # so reset the order again
                      arrange(position) %>%
                      # Improve column names by renaming them
                      rename(Pos=position,
                             Car = identifier,
                             Code = code,
                             `Time (s)` = TimeInS)

formattable(stage_times )
```

We can now calculate pace as the stage time divided by the stage distance:

```{r}
stage_times$pace = stage_times$'Time (s)' / stage_distance

stage_times
```

### Calculating Pace for Multiple Stages

First, let's get the data for all the stages:

```{r}
stage_list = get_stage_list(stages)

multi_stage_times = get_multi_stage_times(stage_list)
  
multi_stage_times %>% tail(2)
```

We can generate the pace by adding the stage distance as an extra column and performing the pace calculation.

We'll also take the opportunity to merge in driver metadata and limit cars to WRC group entries:

```{r}
get_multi_stage_pace = function(multi_stage_times, cars) {
  multi_stage_times %>%
                    merge(stages[,c('stageId' ,'distance',
                                    'number', 'code')],
                          by='stageId') %>%
                    mutate(elapsedDurationS = elapsedDurationMs / 1000,
                            pace = elapsedDurationS / distance) %>%
                    merge(cars[,c('entryId','drivername',
                                  'code', 'groupname')],
                                    by='entryId',
                          suffixes=c('','_driver')) %>%
                    filter(groupname=='WRC') %>%
                    select(c('stageId', 'number', 'code_driver',
                             'elapsedDurationS', 'pace', 'code'))  %>%
                    arrange(number, elapsedDurationS)
  
}

multi_stage_pace = get_multi_stage_pace(multi_stage_times, cars)

multi_stage_pace %>% head(3)
```

Create a mapping from stage ID to stage codes and cast the ordered list of stage Ids to an ordered list of stage codes:

```{r}
get_stage_codes = function(stages){
     # Create a stage code mapping function
    stages_lookup_code = get_stages_lookup(stages, 'stageId', 'code')
    stage_code_map = function(stageId)
      stages_lookup_code[[as.character(stageId)]]
    
    # Map stage ID column names to stage codes
    stage_codes = unlist(purrr::map(stage_list,
                                    function (x) stage_code_map(x)))
    stage_codes 
}

stage_codes = get_stage_codes(stages)
```

Use the generic widener function to widen the pace dataframe to give the pace for each driver on each stage:

```{r}
pace_wide = get_multi_stage_generic_wide(multi_stage_pace,
                                         stage_codes, 'pace',
                                         # Unique group keys required
                                         # Driver code not guaranteed unique
                                         group_key=c('code_driver'),
                                         spread_key='code')

pace_wide %>% head(3)
```

## Rebasing Stage Pace

We can rebase the stage pace according to a specific driver:

```{r}
example_driver = pace_wide[2,]$code_driver

pace_wide_rebased = rebase(pace_wide, example_driver, stage_codes,
                           id_col='code_driver')

pace_wide_rebased %>% head(3)
```

More abstractly, the rebased pace, ${_S}p_i^j$, for driver $i$ relative to driver $j$ on stage $S$ is given as:

$$
{_S}p_i^j = {_S}p_i - {_S}p_j = \frac{{_S}t_i - {_S}t_j}{d_S} = \frac{{_S}t_i^j}{d_S}
$$

## The Ultimate Rally

Finally, in passing, it is worth noting that we can calculate an "ultimate rally" time from the sum of th fastest stage times completed on the rally, by any driver. This gives us the fastest possible rally time from recorded stage times against which we can compare the performance of the rally winner. (Of course, it might be that a particularly fast time on one stage by a particular driver ruined the rest of their loop!)

Furthermore, when split times are available, we can go even further and construct and ultimate ultimate (*sic*) rally time from ultimate stage times that have themselves been constructed from ultimate split times on the stage.


<!--chapter:end:finding-stage-pace.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/visualising-stage-pace-")
```
# Visualising Stage Pace

In this chapter, we'll start to explore various ways in which we might visualise pace data.

## Load Base Data

To get the stage data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages, and from that access the stage ID for the stage or stages we are interested in.

Load in the helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
source('code/wrc-wrangling.R')
source('code/wrc-charts.R')
```

And get the base data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)
stage_list = get_stage_list(stages)
stage_codes = get_stage_codes(stages)


# Driver details
entries = get_rally_entries(eventId)
cars = get_car_data(entries)
```

Get a sample stage ID:

```{r}
stageId = stages_lookup[['SS3']]
```

Let's also get some pace data:

```{r}
multi_stage_times = get_multi_stage_times(stage_list)
  
multi_stage_pace = get_multi_stage_pace(multi_stage_times, cars)

pace_wide = get_multi_stage_generic_wide(multi_stage_pace,
                                         stage_codes, 'pace',
                                         # Unique group keys required
                                         # Driver code not guaranteed unique
                                         group_key=c('code_driver'),
                                         spread_key='code')

example_driver = pace_wide[2,]$code_driver

pace_wide_rebased = rebase(pace_wide, example_driver, stage_codes,
                           id_col='code_driver')
```

## Pace Maps

To compare pace, it is useful to look at rebased pace times relative to a particular driver and also indicate the length of stage with which particular pace levels are associated.

We can do this with a chart that presents distance into stage along the horizontal x-axis and relative pace on the y axis, using a line to indicate the pace for each driver relative to a specified driver.

One of the easiest way of plotting charts is to plot from a tidy dataframe, so let's cast the rebased wide pace dataframe back to a long form and also add in the distance into stage at the start and end of each stage:

```{r}
library(tidyr)

stage_range = c(start=stage_codes[1],
                end=stage_codes[length(stage_codes)])

stages$cum_dist = cumsum(stages$distance)
stages$start_dist = c(0, stages$cum_dist[-length(stages$cum_dist)])

pace_stage = pace_wide_rebased %>% 
                gather(code, pace,
                       stage_range['start']:stage_range['end']) %>%
                merge(stages[,c('code', 'start_dist', 'cum_dist')],
                          by='code')
  
pace_stage %>% head(3)
```

We can now construct a chart using line segments to represent the pace for each driver on each stage:

```{r basic_stage_pace, message=FALSE, warning=FALSE}
library(ggplot2)

g0 = ggplot(pace_stage, aes(group=code_driver)) +
      geom_hline(yintercept = 0,
                 colour='lightgrey', linetype='dotted') +
      geom_segment(aes(x=start_dist, xend=cum_dist,
                       y=pace, yend=pace),
                   color = 'lightgrey')

g = g0 + geom_text(aes(x=(start_dist+cum_dist)/2,y=pace+0.03,
                    label=code_driver,group=code_driver),
                position = position_dodge(15), size=1) +
                coord_cartesian(ylim=c(-0.5,2)) +
                theme_classic()

g
```

We could highlight positive and negative differences in the label colourings:

```{r posneg_highlighted_stage_pace, warning=FALSE}
g0 + geom_text(aes(x=(start_dist+cum_dist)/2,
                   y=ifelse(pace>0,pace+0.03,pace-0.03),
                    label=code_driver,group=code_driver,
                   color=pace>0),
                position = position_dodge(15), size=1) +
                coord_cartesian(ylim=c(-0.5,2)) +
                theme_classic() + theme(legend.position="none")
```


We can also highlight values for a particular driver:

```{r driver_highlighted_stage_pace, warning=FALSE}
g + geom_segment(data=pace_stage[pace_stage$code_driver=='EVA',],
                 aes(x=start_dist, xend=cum_dist,
                       y=pace, yend=pace, color = pace>0)) +
   theme(legend.position="none")
```

Or abuse the `gghiglight` package to modify the aesthetics of unselected items:

```{r driver_highlighted_stage_pace2, warning=FALSE}
g + gghighlight::gghighlight(code_driver=='EVA',
                             unhighlighted_params=list(alpha=0.1))
```
Alternatively, abuse `gghighlight()` again with a negative form of selection to highlight items:

```{r driver_highlighted_stage_pace3, warning=FALSE}
g + gghighlight::gghighlight(code_driver!='EVA',label_key=code_driver,
                             unhighlighted_params=list(color='blue'))
```

We could even add a transparency layer bar to highlight the pace difference compared to a particular driver:

```{r block_highlighted_stage_pace, warning=FALSE}
g + geom_rect(data=pace_stage[pace_stage$code_driver=='EVA',],
              aes(xmin=start_dist, xmax=cum_dist,
                   ymin = ifelse(pace>0,0,pace),
                   ymax = ifelse(pace>0,pace,0),
                   fill = pace>0, alpha=0.7)) +
    theme(legend.position="none")
```

Could we perhaps also extend that a little to allow us to compare more drivers?

```{r}
pace_map_highlight = function(sub_df, m, n){
    # If we don't grab the actual value
    # the referenced value is used...
    m_ = m
    geom_rect(data=sub_df,
              aes(xmin= start_dist + (m_-1) * (cum_dist - start_dist)/n,
                  xmax= start_dist + m_ * (cum_dist - start_dist)/n,
                  ymin = ifelse(pace>0,0,pace),
                  ymax = ifelse(pace>0,pace,0),
                  fill = pace>0, alpha=0.7))
}

pace_map_highlight_many = function(df, g, codes,
                                   idcol='code_driver'){
  n = length(codes)
  for (m in 1:n){
    sub_df = df[df[idcol]==codes[m],]
    g = g + pace_map_highlight(sub_df, m, n)
  }
  
  g
}
```

Let's try it with two drivers:

```{r pace_map_dual, warning=FALSE}
pace_map_highlight_many(pace_stage, g, c('EVA', 'ROV' )) +
  theme(legend.position="none")
```

With multiple drivers, it may get difficult to see where the stages are delimited, so we might add separators to delimit them:

```{r pace_map_stage_delimited, warning=FALSE}
g + geom_vline(data = stages, aes(xintercept = cum_dist),
               color='lightgrey', linetype='dotted')
```

To highlight stages further, we could add a "banner" to the chart:

```{r banner_highlighted_stage_pace, warning=FALSE}
g + geom_rect(data=pace_stage[pace_stage$code_driver==example_driver,],
              aes(xmin=0, xmax=max(cum_dist),
                  ymin = 1.8,  ymax = 2.0,
                  alpha=0), fill = 'black') +
    geom_text(data=pace_stage[pace_stage$code_driver==example_driver,],
              aes(x=(cum_dist + start_dist)/2, label=code),
              y=1.9, color='yellow', size=3) +
    geom_segment(data=pace_stage[pace_stage$code_driver==example_driver,],
                 aes(x=cum_dist, xend=cum_dist,
                    y=1.8, yend=2.0), color='yellow') +
    theme(legend.position="none")
```

## A Pace Map Function

Let's start to work up a function based on the above sketches that will generate a pace map for us directly from a long format pace dataframe.

```{r}
pace_map = function(pace_long, limits=c(-0.5,2),
                    labels=TRUE, drivers=NULL, lines=TRUE,
                    xstart='start_dist', xend='cum_dist',
                    pace='pace', typ='bar', pace_label_offset=0.03,
                    label_dodge=15,
                    idcol='code_driver'){
  
  # There are downstream dependencies with colnames baked in atm...
  pace_long$start_dist = pace_long[[xstart]]
  pace_long$cum_dist = pace_long[[xend]]
  pace_long$pace = pace_long[[pace]]
  
  
  g0 = ggplot(pace_long, aes_string(group=idcol, label=idcol)) +
    geom_hline(yintercept = 0,
               colour='lightgrey', linetype='dotted') +
    geom_segment(aes(x=start_dist, xend=cum_dist,
                            y=pace, yend=pace),
                 color = 'lightgrey')
  
  if (lines) {
    lines_df = data.frame(cum_dist=unique(pace_long$cum_dist))
    g0 =g0 + geom_vline(data=lines_df, aes(xintercept = cum_dist),
               color='lightgrey', linetype='dotted')
  }
  
  if (labels){
    
    g0 = g0 + geom_text(aes(x= (start_dist+cum_dist)/2,
                                   y=pace+pace_label_offset),
                        position = position_dodge(label_dodge), size=1)
  }
  if (!is.null(drivers) ){
    if (typ=='bar'){
      g0 = pace_map_highlight_many(pace_long, g0,
                                   c(drivers), idcol=idcol)
    } else if (typ=='highlight')
    {
      focus = pace_long[pace_long[idcol] %in% c(drivers),]
      g0 = g0 + geom_segment(data=focus,
                             aes(x=start_dist, xend=cum_dist,
                                 y=pace, yend=pace, color = pace>0))
    }
  }
  
  g0 = g0 + coord_cartesian(ylim=limits)
  
  g0 + theme_classic() + theme(legend.position="none")
}

```

Let's try it:

```{r pace-map-function, warning=FALSE}
pace_map(pace_stage, drivers=c('EVA', 'ROV'))
```


## Off-the-Pace Charts

Another way or reviewing pace is to consider the gap to leader, or rebased gap to a particular driver across the stages, using distance into stage along the x-axis to locate the x-value and gap (measured in seconds) along the y-axis. A moment's consideration suggests that the gradient ($\textrm{change_in_gap}/\textrm{change_in_distance}$) is a measure of pace. The slope of the line thus indicates relative pace between the focal driver and the other drivers.

As with the pace map, if we have the data in a long, tidy form, we can create charts from it quite straightforwardly. So let's add in the accumulated distance into stage and accumulated stage time for each time:

```{r}
off_the_pace = multi_stage_pace %>%
                           merge(stages[,c('stageId', 'cum_dist')],
                                 by='stageId') %>%
                          arrange(number) %>%
                          group_by(code_driver) %>%
                          mutate(totalDurationS = cumsum(elapsedDurationS))

off_the_pace %>% head(3)
```

Now we can create a basic off-the pace chart:

```{r off-the-pace-basic, warning=FALSE}
ggplot(off_the_pace, aes(x=cum_dist, y=totalDurationS,
                         color=code_driver)) + geom_line()
```

As with the pace map, the chart is often most informative if we rebase it relative to a particular driver.

Let's create a wide dataframe to simplify the rebasing process:

```{r}
off_the_pace_wide = get_multi_stage_generic_wide(off_the_pace,
                                         stage_codes, 'totalDurationS',
                                         group_key=c('code_driver'),
                                         spread_key='code')

off_the_pace_wide %>% head(3)
```

Now we can rebase:

```{r}
off_the_pace_wide_rebased = rebase(off_the_pace_wide,
                                   example_driver, stage_codes,
                                   id_col='code_driver')

off_the_pace_wide_rebased %>% head(3)
```

And cast back to the long, tidy form:

```{r}
off_the_pace_long = off_the_pace_wide_rebased %>% 
                        gather(code, totalDurationGapS,
                               stage_range['start']:stage_range['end']) %>%
                               merge(stages[,c('code', 'cum_dist')],
                                     by='code')

off_the_pace_long %>% head(3)
```

And now we can plot the simple rebased off-the-pace chart:

```{r off-the-pace-rebased, warning=FALSE}
g_otp = ggplot(off_the_pace_long, aes(x=cum_dist,
                                      y=totalDurationGapS,
                                      color=code_driver)) +
            geom_line() +
            # Retain the points outside the limits
            # by using coord_cartesian()
            # We can also flip the coordinate axis
            coord_cartesian(ylim=c(100, -100)) + theme_classic()

g_otp
```

We might also want to zero the origin, for example by adding a row for each driver with a zeroed distance and gap.

Let's create some dummy data to represent that:

```{r warning=FALSE}
zero_df = data.frame(code_driver=unique(off_the_pace_long$code_driver))  %>%
            mutate(cum_dist=0, totalDurationGapS=0, code = 'SS0')
#zero_df$cum_dist = 0
#zero_df$totalDurationGapS = 0
#zero_df$code = 'SS0'
```

We can then bind that data into our long form splits data and view the result:

```{r off-the-pace-zeroed, warning=FALSE}
g_otp = bind_rows(off_the_pace_long, zero_df) %>%
        ggplot(aes(x=cum_dist,
                   y=totalDurationGapS,
                   color=code_driver)) +
            geom_line() +
            # Retain the points outside the limits
            # by using coord_cartesian()
            # We can also flip the coordinate axis
            coord_cartesian(ylim=c(100, -100)) + theme_classic()

g_otp
```

Trivially, we might try to add labels at the end of each line:

```{r off-the-pace-end1, warning=FALSE}
off_the_pace_end = off_the_pace_long %>% filter(cum_dist == max(cum_dist))
                                                
g_otp + geom_text(data = off_the_pace_end,
                  aes(x = cum_dist+ 10, y = totalDurationGapS,
                      label = code_driver, color = code_driver)) +
        theme(legend.position="none")
```

However, there are various other packages that provide alternative ways of doing this, including `directlabels` and `ggrepel`.

For example, using `directlabels`:

```{r off-the-pace-labeled1, warning=FALSE}
library(directlabels)

g_otp +
    geom_dl(aes(label = code_driver, x=cum_dist+2),
            # cex is text label size
            method = list('last.bumpup', cex = 0.5)) +
    theme(legend.position="none")
    
```

And using `ggrepel`, which also has the advantage of adding labels for drivers who curves are really of the pace, albeit not in an obviously natural order:

```{r off-the-pace-labeled2, warning=FALSE}
g_otp + ggrepel::geom_text_repel(data = off_the_pace_end,
                                 aes(label = code_driver),
                                 size = 3) +
        theme(legend.position="none")
```

The `gghighlight` package is also useful in highlighting traces, as well as usefully automatically labeling highlighted lines:

```{r off-the-pace-highlight, warning=FALSE, message=FALSE}
g_otp +
    gghighlight::gghighlight(code_driver %in% c('EVA','ROV'),
                             unhighlighted_params=list(alpha=0.1)) +
    theme(legend.position="none")
  
```

Again, let's routinise the process of chart production with the beginnings of a function to generate the off-the-pace chart directly from an appropriate form dataframe:

```{r}
off_the_pace_chart = function(pace_long, highlight=NULL,
                              label_typ='dl',
                              dist='cum_dist', t='totalDurationGapS',
                              code='code_driver', ylim=NULL){
  
  g_otp = ggplot(pace_long, aes_string(x=dist, y=t,
                                color=code)) +
            geom_line() +
            # Retain the points outside the limits
            # by using coord_cartesian()
            # We can also flip the coordinate axis
            coord_cartesian(ylim=ylim) + theme_classic()
  
  off_the_pace_end =  pace_long[pace_long[dist] == max(pace_long[dist]),]
  if (!is.null(highlight))
    g_otp = g_otp + gghighlight::gghighlight(code_driver %in% c(highlight),
                             unhighlighted_params=list(alpha=0.1))
  else if (label_typ=='dl')
    g_otp = g_otp + geom_dl(aes_string(label = code, x=dist),
            # cex is text label size
            method = list('last.bumpup', cex = 0.5))
  else 
    g_otp = g_otp + ggrepel::geom_text_repel(data = off_the_pace_end,
                                           aes_string(label = code),
                                          size = 3)
  
  g_otp + theme(legend.position="none")
}
```

Let's quickly test it, noting how we cast the limits to an inverted y-axis to show the leaders above the x-axis:

```{r off_the_pace_function, warning=FALSE, message=FALSE}
off_the_pace_chart(off_the_pace_long, ylim=c(50,-50))
```

And with highlighting:

```{r off_pace_function_highlight, warning=FALSE, message=FALSE}
off_the_pace_chart(off_the_pace_long, highlight=c('EVA', 'ROV'),
                   ylim=c(50,-50))
```

## Comparing Pace Across Stages

One way of characterising stages is based on pace As a quick guide to possible pace variations over the stages of a rally, we might review the average pace. For example, here's a look at pace over the course of the rally using a box plot to summarise the (non-outlier) pace values for each stage (we should probably use an ordered categorical *stageId* basis for the x-axis):

```{r median-pace, warning=FALSE}
ggplot(off_the_pace[off_the_pace$pace<40,],
       aes(x=cum_dist, y=pace)) +
    geom_boxplot(aes(group=cum_dist))
```

<!--chapter:end:visualising-stage-pace.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/split-times-")
```
```{r include=FALSE, cache=FALSE}
# Import code into empty but named chunks
knitr::read_chunk('code/wrc-wrangling.R')
```
# Working With Split Times

In this chapter we'll explore some of the ways in which we might start to work with split times data from a particular stage. On the one hand, we can consider split times purely on a time basis; on the other, we can explore the split times in terms of *pace* differences calculated by normalising the split times over split distances.

As with stage times, we can rebase split times to compare a specified driver directly with other drivers, not just the stage winner. We can also calculate how much time was spent on each split section by finding the difference between consecutive split times for a particular driver.

From the split section times, we can also calculate various derived measures, such as the ultimate possible stage time, based on the sum of fastest times to complete each split section.

Having access to split times also sets up the potential for rating the performance of a driver on each split against various stage and stage route metrics, such as road order or route metrics such as the "wiggliness" of the split section, although these will not be covered here.

## Load Base Data

To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.

Let's start by loading in key helper libraries:

```{r message=F, warning=F}
source('code/wrc-api.R')
source('code/wrc-wrangling.R')
source('code/wrc-charts.R')
```

And getting some initial data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)

# For driver details
entries = get_rally_entries(eventId)
cars = get_car_data(entries)
```

As a working example, let's define an example stage by its code:

```{r}
stage_code = 'SS3'
```

Get a sample stage ID:

```{r}
stageId = stages_lookup[[stage_code]]
```

## Get Splits Data

The split times represented the accumulated time going the the stage at each split point. 

*The split times __do not__ include the overall stage time, so we need to be mindful that if we want to report on a stage the split times in and of themselves do not contain any information about the final section of the stage between the final split point and the stage finish line.*

We can load the splits data if we know the event and stage ID:

```{r}
splits = get_splits(eventId, stageId)
```

The splits data actually comprises two dataframes in columns `splitPoints` and `entrySplitPointTimes`.

### Split Locations

The `splitPoints` dataframe contains information about the splits locations:

```{r}
splits_locations = splits$splitPoints
splits_locations %>% arrange(number) %>% head(2)
```

We can also generate a list of the split IDs:

```{r}
splits_list = splits_locations$splitPointId
splits_list
```

We can retrieve the split codes ordered by the distance into the stage of each split point from a lookup on the split points:

```{r}
get_split_cols = function(splits){
  split_cols =  as.character(arrange(splits$splitPoints,
                                     distance)$splitPointId)
  split_cols
}

split_cols = get_split_cols(splits)
split_cols
```

Since the the split points dataframe does not including the final timing point location (i.e. the stage finish), we can get the full stage distance from the `stages` dataframe:

```{r}
stages[stages['code']==stage_code, 'distance']
```

### Mapping Split Codes to Split Numbers

To provide a human readable version of the split identifiers, let's map them onto a more meaningful label:

```{r}
get_split_label = function(x){
  paste0('split_', splits_locations[splits_locations$splitPointId==x,
                                    'number'])
}

splits_locations$splitname = sapply(splits_locations$splitPointId,
                                 get_split_label)

splits_locations %>% head(3)
```

We can generate a lookup list of split point names and IDs as:

```{r}
get_stages_lookup(splits_locations, 'splitPointId', 'splitname')
```

We can use this in a function that provides an annotated *and ordered* form of the split locations dataframe: 

```{r}
get_split_locations = function(splits){
  splits_locations = splits$splitPoints
  splits_locations$splitname = sapply(splits_locations$splitPointId,
                                      get_split_label)
  splits_locations %>%
    arrange(number)
}

splits_locations = get_split_locations(splits)
splits_locations
```

Create a convenience list of splits names:

```{r}
# We could create these as ordered factors?
split_names = splits_locations$splitname

split_names
```

### Split Times

The second dataframe returned from the splits API call contains the splits times, accessed via the `get_driver_splits()` function defined previously and imported from the *wrc-api.R* file. The data is returned in a long format, with each row describing a single split time for a particular driver on the single stage the split times were retrieved for.

```{r}
driver_splits = get_driver_splits(splits)
driver_splits %>% head(2)
```

The `get_multi_split_times(stage_list)` function can also provide us the long form data for multiple stages given one or more stage IDs.

### Wide Driver Split Times

We can cast the the driver split points into a wide format using the split point codes, ordered by split distance into the stage, as the widened column names:

```{r get_splits_wide}
# Loaded in from file
```

Rather than retrieve the split times into a long format, with one row per driver split, we can now retrieve the data into a wide form with one row per driver and a column for each split on the stage:

```{r}
splits_wide = get_splits_wide(splits)

splits_wide %>% head(2)
```

The following function, which closely resembles a similar function function for relabeling stage codes, allows us to rename the split points with more meaningful splits names:

```{r}
map_split_codes = function(df, splits_list) {
  # Get stage codes lookup id->code
  splits_lookup_code = get_stages_lookup(splits_locations,
                                         'splitPointId', 'splitname')
  
  #https://stackoverflow.com/a/34299333/454773
  plyr::rename(df, replace = splits_lookup_code,
               warn_missing = FALSE)
}
```

For example:

```{r}
splits_wide = get_splits_wide(splits) %>%
                map_split_codes(splits_list) %>% 
                map_driver_names(cars)

splits_wide %>% head(10)
```

We can also update our helper function to relabel stages with a more general function:

```{r}
relabel_times_df2 = function(df, s_list, cars, typ='stage') {
  if (typ=='split')
    df = df %>% map_split_codes(s_list)
  else
    df = df %>% map_stage_codes(s_list)
  
  df %>%
    map_driver_names(cars)
}
```


## Rebasing Split Times

The split times describe the split times recorded for each driver, but in many situations we may be interested in knowing the difference in split times for a specific driver relative to every other driver.

More formally, for drivers $j$ on stage $S$ and split $s$, we find the rebased stage times relative to driver $j$ as:

$$
{_{S,s}}t_{i}^{j} = {_{S,s}}t_{i} - {_{S,s}}t_{j}
$$
although we may want to negate that value depending on the sense of whether we want to focus on times from the selected driver's perspective, or from the perspective of the field of drivers they are being rebased against. 
.
To calculate the rebased times, we note that the wide dataframe format gives rows containing the split times for each driver, which is to say ${_{S,*}}t_i$.

To calculate the rebased times, we can simply subtract the row corresponding to the driver we want to rebase relative to, from the other driver rows.

Recall the heart of the rebase function we have previously defined:

```{r}
#https://stackoverflow.com/a/32267785/454773
rebase_essence = function(df, id, rebase_cols,
                          id_col='entryId') {
  
  df_ =  df
  
  # The rebase values are the ones
  # we want to subtract from each row
  rebase_vals = c(df[df[[id_col]]==id, rebase_cols])
  
  # Do the rebasing
  df_[,rebase_cols] =  df[,rebase_cols] - rebase_vals
    
  df_
}
```

Let's try the original function with an example driver:

```{r}
ogierEntryId = get_person_id(cars, 'ogier', ret='code')

ogier_rebased = rebase(splits_wide, ogierEntryId,
                       split_names, id_col='code')

ogier_rebased %>% head(10)
```

## Visualising Rebased Split Times Using Sparklines

If we cast the data back to a tidy long form data, we can easily generate a graphical summary from the long form data:

```{r warning=FALSE}
library(tidyr)

ogier_rebased_long <- ogier_rebased %>%
                        gather(key ="Split",
                               value ="TimeInS",
                               all_of(split_names))

ogier_rebased_long %>% head(10)
```

For example, we can use the `sparkline::spk_chr()` function to generate an HTML sparkline widget that we can embed in a `formattable::formattable()` generated table:
 
```{r}
library(formattable)
library(sparkline)

ogier_sparkline <- ogier_rebased_long %>%
                      group_by(code) %>%
                      summarize(spk_ = spk_chr(TimeInS, type ="bar"))

# We need to create an htmlwidget form of the table
out = as.htmlwidget(formattable(head(ogier_sparkline, 5)))

# The table also has a requirement on the sparkline package
out$dependencies = c(out$dependencies,
                     htmlwidgets:::widget_dependencies("sparkline",
                                                       "sparkline"))
out
```

## Finding the Rank Position at Each Split Point

It can often be tricky to work out the rank at each split by eye, so let's create a simple function to display the rank at each split for us:

```{r}
get_split_rank = function(df, split_cols){
  # We need to drop any list names
  split_names = as.character(split_names)
  
  df %>% mutate(across( split_cols, dense_rank ))
}


get_split_rank(splits_wide, split_names) %>% head(5)
```

## Finding Split Section Durations

Inspection of the split times data show the split times to be a strictly increasing function over the ordered in-stage split locations. That is, the times represent the time in stage up to that split point, rather than the fractional time taken to get from one split timing point to another.

In many cases, it will be convenient to know how much time a driver took to get from one split point to the next, not least because this allows us to identify sections of the stage where a driver may have particularly gained or lost time, or to identify where a driver may be making or losing time consistently across different parts of the stage.

In abstract terms, then, what we want to calculate is the time taken for a driver $i$ on stage $S$ to go between two split points, $m$ and $n$, where $m=0$ is the start, $m={_S}s_{max}+1$ is the stage end, and ${_S}s_{max}$ is the number of split points on stage $S$:

$$
{_{S,m,n}}t_{i} = {_{S,n}}t_{i} - {_{S,m}}t_{i}: 0{\le}m<n{\le}{_S}s_{max}
$$

For a specific, known stage, we might write the simpler:

$$
S={stagenum};
{_{m,n}}t_{i} = {_{n}}t_{i} - {_{m}}t_{i}: 0{\le}m<n{\le}s_{max}
$$

For a driver, *i*, we note that the accumulated stage time on stage $S$ as given by the split times is:

$$
S={stagenum};t_{i}=\sum_{s=0}^{s{_{max}}}{_{s,s+1}}t_{i}:
$$

To get the duration between two split points, we can create two temporary dataframes, one representing the original split times without the first split ${_{{s+1},s_{max}}}t$, one representing the split times without the last split, ${_{s,s_{max}-1}}t$. Subtracting one dataframe from the other this finds the difference across all consecutive columns: 

$$
{_{{s+1},s_{max}}}t - {_{s,s_{max}-1}}t
$$

Let's see how that works in practice:

```{r}
#https://stackoverflow.com/a/50411529/454773

get_split_duration = function(df, split_cols,
                              retId=TRUE, id_col='entryId') {
  
  # Drop names if they are set
  split_cols = as.character(split_cols)
  
  # [-1] drops the first column, [-ncol()] drops the last
  df_ = df[,split_cols][-1] - df[,split_cols][-ncol(df[,split_cols])]
  
  # The split time to the first split is simply the first split time
  df_[split_cols[1]] = df[split_cols[1]]
  
  if (retId) {
    # Add in the entryId column
    df_[[id_col]] = df[[id_col]]
  
    # Return the dataframe in a sensible column order
    df_ %>% select(c(all_of(id_col), all_of(split_cols)))
  } else {
    df_
  }
  
}
```

Let's see how that works:

```{r}
split_durations_wide = get_split_duration(splits_wide,
                                          split_names, id_col='code')

split_durations_wide %>% head(5)
```

### Finding Split Section Ranks

To find the rank in terms of which driver completed each stage *section* in the quickest time, we can simply pass the `split_durations_wide` dataframe rather than the `split_durations` dataframe to the `get_split_rank()` function:

```{r}
get_split_rank(split_durations_wide, split_names) %>% head(5)
```

## Adding Overall Stage Time to the Split Times

It is important to note that the split times data *does not* contain all the timing data for the stage, just the times relating to split points along the stage route. For a complete summary of stage timing data, we also need to add in the overall stage time from the stage results table:

```{r}
stage_times = get_stage_times(eventId, stageId)

stage_times %>% head(2)
```

*Recall that if required we can also retrieve stage time for multiple stages using the `get_multi_stage_times(stagelist)` function. The `get_stage_list(stages)` function will return a list of all stage IDs.*

If we merge each driver's stage times as an extra, final column to the wide split times dataframe, we can calculate the split section durations over the whole stage, including the time taken to get from the final split to the stage end.

Recalling that driver codes may not be unique, we should use the unique *entryId* values to create the extended dataframe:

```{r}
widen_splits_stage_times = function(splits_wide, stage_times,
                                    id_col='entryId'){
  
  results_cols = c('elapsedDurationMs', id_col,  'diffFirstMs', 'position')

  splits_wide = splits_wide %>%
                    merge(stage_times[,results_cols],
                          by = 'entryId') %>%
                    mutate(split_N = elapsedDurationMs/1000)

  splits_wide
}

full_splits_wide = get_splits_wide(splits) %>%
                      widen_splits_stage_times(stage_times) %>%
                      map_split_codes(splits_list) %>% 
                      map_driver_names(cars)

full_splits_wide %>% head(2)
```

To make further processing easier, we add the overall stage time to the list of split time column names. The "final split" is now the completed stage time:

```{r}
split_names = c(split_names, 'split_N')
split_names
```

## Calculating the Ultimate Stage from Ultimate Split Times

The ultimate stage time for a stage is the sum of the fastest sectional split times on the stage as recorded by any driver. Rebasing the stage winner's time against the ultimate stage time shows whether the driver potentially left time on the stage. (Of course, it might be that a very fast sectional time recorded by one driver may have wiped out their tyres and led to a relatively poor overall stage time, or risk taking that ended their stage prematurely...)

So how can we calculate the ultimate splits? For split section times ${_{S,s}}t_{i}$, the ultimate section time ${_{S,s}}u$ is given as:

$$
{_{S,s}}u = \forall i: min({_{S,s}}t_i)
$$

We can calculate the times by case the wide split section duration dataframe to a long form, grouping by the *splitPointId* and then summarising the minimum time in each group.

Here's how we can create the long form dataset:

```{r}
full_splits_wide %>%
      #gather() is deprecated / retired...
      #gather(splitPointId, sectionDurationS,
      #                 as.character(split_names))
      select(all_of(as.character(split_names)), code) %>%
      pivot_longer(as.character(split_names),
                   names_to = "splitname",
                   values_to = "sectionDurationS") %>%
      head(3)
```

We can also get the duration of each section:

```{r}
full_durations_wide = get_split_duration(full_splits_wide,
                                          split_names, id_col='code')

full_durations_wide %>% head()
```

To rebase on on an ultimate time basis, it helps to think of an ultimate driver whom we define as having a time equivalent to the fastest split duration time between two split points.

If we group by *splitPointId*, we can summarise on *sectionDurationS* to find the minimum duration at each split; we can also take the opportunity to add an accumulated stage time column at each split point as well:

```{r}
ultimate_splits_long = full_durations_wide %>%
                     pivot_longer(all_of(split_names),
                                  names_to = "splitname",
                                  values_to = "sectionDurationS") %>%
                     select(splitname, sectionDurationS) %>%
                     # Defensive measure
                     filter(!is.na(sectionDurationS) & sectionDurationS>0) %>%
                     group_by(splitname) %>% 
                     summarise(ultimate = min(sectionDurationS,
                                              na.rm = TRUE)) %>%
                     mutate(ultimateElapsed = cumsum(ultimate))

ultimate_splits_long
```

Rebasing arbitrary drivers against the ultimate stage (and the distribution of times recorded by other drivers) may give an idea of which drivers were pushing on what parts of a stage and where they were being more cautious.

Note that ultimate split times can be used to create an ultimate stage time time that can itself contribute to the ultimate ultimate rally time (*sic*).

### Rebasing to Ultimate Split Times

We can rebase to the ultimate split times in three senses:

- on an ideal, ultimate *per split* basis;
- on an ideal elapsed time basis (the cumulative sum of ideal ultimate split durations);
- on an actual best elapsed (stage) time basis at each split, rebasing relative to the minimum actual recorded elapsed time at each split.

### Per Split Ultimate Rebasing

To rebase on a per split basis, we can simply rebase durations relative to the minimum split duration at each split.

Let's reshape the ultimate driver durations to a wide format:

```{r}
ultimate_wide_durations = ultimate_splits_long %>% 
                  select(splitname, ultimate) %>%
                  pivot_wider(names_from = splitname,
                              values_from = ultimate) %>%
                  mutate(code='ultimate')

ultimate_wide_durations
```
We can now add the ultimate driver split durations to the *full_durations_wide* data and rebase against this dummy driver:

```{r}
ultimate_between_split = full_durations_wide %>%
                            select(code, all_of(split_names)) %>% 
                            bind_rows(ultimate_wide_durations) %>%
                            rebase('ultimate', split_names,
                                   id_col='code') %>%
                            head(10)

ultimate_between_split
```

In this case we note there is at least one driver in each column with a zero gap to the ultimate driver, specifically, the driver(s) who made it between two consecutive split points in the fastest time, and no driver with a negative gap.

### Ultimate Stage Rebasing

We can add an "ultimate" driver to the wide splits dataframe and then rebase as normal. The following function routinises the widening recipe we used above:

```{r}
ultimate_widen = function(df, col, valname){
  df %>% select(splitname, all_of(col)) %>%
              pivot_wider(names_from = splitname,
                          values_from = col) %>%
              mutate(code=valname)
}
```

We can get the wide form as:

```{r}
ultimate_wide_elapsed = ultimate_widen(ultimate_splits_long,
                                       'ultimateElapsed', 'ultimate')

ultimate_wide_elapsed
```

We can now add that to our original dataframe and rebase against the ultimate stage to show how far off the ultimate stage pace each driver was.

Again, let's make a routine of the process:

```{r}
ultimate_rebaser = function(df, ultimate_df, split_names,
                            ultimate_col ){
  df %>%
    select(code, all_of(split_names)) %>% 
  bind_rows(ultimate_df) %>%
  rebase(ultimate_col, split_names, id_col='code')
}
```

And let's use that routine:

```{r}
ultimate_accumulated = full_splits_wide %>% 
                          ultimate_rebaser(ultimate_wide_elapsed,
                                         split_names, 'ultimate') %>%
                          head(10)

ultimate_accumulated
```

We note there there is only one guaranteed difference of 0.0, from the driver with the fastest time at the first split, and that no times will be less than zero.

### Actual Best Elapsed Time Rebasing

The *full_splits_wide* dataframe contains the elapsed times for each driver at each split so we can summarise the long form of that data to get the best actual recorded elapsed times:

```{r}
actual_ultimate = full_splits_wide %>% 
                      select(code, all_of(split_names)) %>% 
                      # Make long
                      select(all_of(as.character(split_names)),
                             code) %>%
                      pivot_longer(as.character(split_names),
                                   names_to = "splitname",
                                   values_to = "sectionDurationS") %>%
                      group_by(splitname) %>%
                      summarise(actualUltimate = min(sectionDurationS,
                                                     na.rm=TRUE))

actual_ultimate
```

We can add this time to out ultimate times dataframe to provide an immediate point of reference between the actual best accumulated split times and the ultimate accumulated ideal split times:

```{r}
ultimate_splits_long$actual = actual_ultimate$actualUltimate
ultimate_splits_long
```

Let's cast the data to a long format in readiness for rebasing it:

```{r}
ultimate_wide_actual = ultimate_widen(ultimate_splits_long,
                                      'actual',
                                      'ultimate')

ultimate_wide_actual
```

And then rebase:

```{r}
ultimate_actual = full_splits_wide %>% 
                      ultimate_rebaser(ultimate_wide_actual,
                                       split_names, 'ultimate') %>%
                      head(10)

ultimate_actual
```

In this case, there is at least one zero value per split corresponding to the drive(s) who recorded the fastest elapsed time up to each split point.

## Visualising Rebased Times

There are various quick techniques we can use to help visualise the rebased split times and try to highlight significant patterns or peculiarities. For example, we can use coloured backgrounds to highlight each cell in a table, or sparklines to summarise each row.

### Context Sensitive Cell Colouring

As a quick example, let's first look at the split duration rebasing where we compare each driver's time in getting from one split point to the next against the fastest completion of that distance:

Recall the divergent color tile formatter we met previously, reused here with a different colour sense:

```{r}
xnormalize = function(x){
  # Normalise to the full range of values about 0
  # O will map to 0.5 in the normalised range
  x = c(x, -max(abs(x)), max(abs(x)))
  normalize(x)[1:(length(x)-2)]
}
color_tile2 <- function (...) {
  formatter("span", style = function(x) {
    style(display = "block",
          'text-align' = 'center',
          padding = "0 4px", 
          `border-radius` = "4px",
          `font.weight` = ifelse(abs(x)> 0.3*max(x), "bold", "normal"),
          color = ifelse(abs(x)> 0.3*max(x),'white',
                         ifelse(x==0,'lightgrey','black')),
          `background-color` = csscolor(matrix(as.integer(colorRamp(...)(xnormalize(as.numeric(x)))), 
                byrow=TRUE, 
                dimnames=list(c("green","red","blue"), NULL),
                nrow=3)))
  })}
```

We can use that to colour cells as a block (although we note that for ultimately rebased values we can never go below a zero value...). Pale colors are better (closer to the ultimate):

```{r}
formattable(ultimate_between_split, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
```

We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.

The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot of variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.

Let's compare that chart with how the accumulated stage time of each driver compares with the accumulated ultimate section times:

```{r}
formattable(ultimate_accumulated, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
```

Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still off the ultimate pace: it was his first split that made the difference.

Finally, let's see how the driver's actual split times varied against the best recorded split time at each split:

```{r}
formattable(ultimate_actual, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
```

Here, we see that TAN led the stage at each split point based on actual accumulated time.

### Using Sparklines to Summarise Rebased Deltas

A quick way of summarising rebased times in a more space efficient way is to us a sparkline. As we have seen previously, these can be used added as an extra column alongside a row of data, or could be used to as a quick visual indicative summary of a row of values.

Let's create a sparkline summarising each of the above tables.

First, the ultimate between split rebase:

```{r}
ultimate_between_split_spk = ultimate_between_split %>%
                                gather(key ="Stage",
                                value ="Gap", split_names) %>%
                                group_by(code) %>%
                                summarize(ultimate_section = spk_chr(-Gap,
                                                         type ="bar"))

spark_df(ultimate_between_split_spk)
```

How about for the accumulated ultimate split durations?

```{r}
ultimate_accumulated_spk = ultimate_accumulated %>%
                              gather(key ="Stage",
                              value ="Gap", split_names) %>%
                              group_by(code) %>%
                              summarize(ultimate_acc = spk_chr(-Gap,
                                                               type ="bar"))

spark_df(ultimate_accumulated_spk)
```

And finally, compared to actual recorded best split time:

```{r}
ultimate_actual_spk = ultimate_actual %>%
                          gather(key ="Stage",
                                 value ="Gap", split_names) %>%
                          group_by(code) %>%
                          summarize(ultimate_actual = spk_chr(-Gap,
                                                              type ="bar"))

spark_df(ultimate_actual_spk)
```


<!--chapter:end:split-times.Rmd-->

```{r cache = T, echo = F, message=F}
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(fig.path = "images/finding-splits-pace-")
```
# Visualising Pace Across Splits

We have already seen how we can perform pace calculations on stage level data and use pace maps and off-the-pace charts to visualise pace over the course of a rally.

But in WRC rallies at least, the stages are often long enough, and the promoter well resourced enough, to merit the collection of split data data at various split points along a stage. So in this chapter, we'll review how we can create pace charts and apply the techniques to plotting progress *within* a stage, across stage splits.

## Load Base Data

As ever, load in the helper functions:

```{r message=F, warning=F}
source('code/wrc-api.R')
source('code/wrc-wrangling.R')
source('code/wrc-charts.R')
```

And get the base data:

```{r}
s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)

# Quick Lookups
stage_list = get_stage_list(stages)
stage_codes = stages$code

# Driver details
entries = get_rally_entries(eventId)
cars = get_car_data(entries)
```

Get a sample stage ID and associated splits:

```{r}
# Get example stage ID
stageId = stages_lookup[['SS3']]

# Get splits for the stage
splits = get_splits(eventId, stageId)
splits_locations = get_split_locations(splits)
splits_list = splits_locations$splitPointId
split_names = splits_locations$splitname

# Get wide format data
splits_wide = get_splits_wide(splits) %>%
                relabel_times_df2(splits_list, cars, typ='split')

splits_wide %>% head(2)
```

Get long form splits data for one or more stages, in this case, just a single stage:

```{r}
splits_long = get_multi_split_times(stageId)
```

### Obtaining Split Distances

We can find the distance between each split as the difference between consecutive values. Let's augment the *splits_locations* with these values as well as with section start distances:

```{r}
splits_locations$start_dist = lag(splits_locations$distance,
                                  default=0)

splits_locations$section_dist = c(splits_locations$distance[1],
                                  diff(splits_locations$distance))

splits_locations
```

We can also retrieve these section distances into a *splitPointId* named list:

```{r}
split_distances = splits_locations$section_dist

# Label distances using split names
names(split_distances) = split_names

# Label the values using spiltPointId
#names(split_distances) = splits_locations$splitPointId

split_distances
```

We recall that the split points do not include the final timing line (the finish), so a complete set of distances also means we need to access the overall stage distance and account for that:

```{r}
stage_dist = stages[stages['stageId']==stageId,'distance']
stage_dist
```

The complete set of intermediate distances is then:

```{r}
full_split_distances = c(split_distances, stage_dist-sum(split_distances))

names(full_split_distances) = c(split_names, 'total')
  
full_split_distances
```

## Calculating Splits Pace

To calculate pace between two split points we need to get the elapsed time between those two points as well as the distance between split points.

We can obtain the split differences by finding differences between the columns of the wide format dataframe using the `get_split_duration()` function we created previously:

```{r}
#split_cols = get_split_cols(splits)

split_durations_wide = get_split_duration(splits_wide, split_names,
                                          id_col='code')

split_durations_wide %>% head(3)
```

We can then find the pace by dividing the split section times through by the split distances:

```{r}
section_pace_wide = split_durations_wide

for (s in split_names) {
  section_pace_wide[,s] = section_pace_wide[,s] / split_distances[s]
                            
}

section_pace_wide %>% head(2)
```

## Visualising the Splits Pace

To visualise the pace over each of the split sections, we can use exactly the same techniques that we used to visualise the stage pace, including pace maps and off-the-pace charts.

There are several different ways in which we might try to visualise pace. First, we can visualise absolute or rebased pace. Second, we can visualise pace *within* sections, using the times taken to get one split point to the next, or across the stage as a whole using the accumulated stage time.

### Pace Over Each Section

One quick way of inspecting the pace over each section is to use a box plot: 
```{r}
section_pace_long = section_pace_wide %>% 
                  head(10) %>%
                gather(splitname, pace, split_names) %>%
                merge(splits_locations[,c('splitname',
                                          'start_dist', 'distance')],
                          by='splitname')

section_pace_long %>% head(3)
```

```{r pace-over-splits-box, warning=FALSE}
ggplot(section_pace_long[section_pace_long$pace<40,],
       aes(x=distance, y=pace)) +
    geom_boxplot(aes(group=distance))
```
This suggests that the section between the first and second split may be quite technical, and the final sections much faster.

*Comparing section times against route metrics as described in [__Visualising Rally Stages__](https://rallydatajunkie.com/visualising-rally-stages/) will be the focus of a future unbook. Comparing manufacturer performance against different section and stage route types might also be worth further investigation.*

### Splits Sections Pace Maps

To generate the pace map, let's first rebase the split times with respect to a specified driver:

```{r}
example_driver = section_pace_wide[2,]$code

section_pace_wide_rebased = rebase(section_pace_wide, example_driver,
                                   split_names, id_col='code')

section_pace_wide_rebased %>% head(3)
```

To plot the pace map, we need to get the data into a long format:

```{r}
section_pace_long_rebased = section_pace_wide_rebased %>% 
                  head(10) %>%
                gather(splitname, pace,
                       as.character(split_names)) %>%
                merge(splits_locations[,c('splitname',
                                          'start_dist', 'distance')],
                      by='splitname')

section_pace_long_rebased %>% head()
```

We can now view the rebased pace over the splits:

```{r example_pace_splits, warning=FALSE}
section_pace_long_rebased %>%
    pace_map( xstart='start_dist',
         drivers=c('KAT','ROV'),
         xend='distance', id_col='code', lines=FALSE, label_dodge=2)
```

### Off-the-Pace Splits Pace Mapping

To review the off-the-pace performance over the splits on a stage, we can use the off-the-pace chart function applied to rebased elapsed times data..

Let's get some rebased data using the accumulated stage time at each split, hackfix flipping the basis of the rebase for now until such a time as the off-the-pace chart is better behaved:

```{r}
wide_splits_rebased = splits_wide %>%
                      head(10) %>%
                        rebase(example_driver,
                               splits_locations$splitname,
                               id_col='code', flip=TRUE)

wide_splits_rebased %>% head(3)
```

We can convert this to long form and add in distance information:

```{r}
long_splits_rebased = wide_splits_rebased %>%
  pivot_longer(splits_locations$splitname,
                   names_to = "splitname",
                   values_to = "sectionDurationS") %>%
  merge(splits_locations[,c('splitname','distance')],
         by='splitname')

long_splits_rebased %>% head(3)
```

At the start of the chart, it's convenient to add some zeroed values, so let's create a dataframe to help us add those data points:

```{r}
zero_df = data.frame(code=unique(long_splits_rebased$code)) %>%
            mutate(distance=0, sectionDurationS=0, splitname = 'split_0')
#zero_df$distance = 0
#zero_df$sectionDurationS=0
#zero_df$splitname = 'split_0'
```

And add them in:

```{r}
long_splits_rebased = bind_rows(long_splits_rebased, zero_df)
```

The off-the-pace chart is intended to show how much time is lost over the course of a stage, the gradient of the slope in each section being an indicator of the pace differential within that section (i.e. between two consecutive split points).

The off-the-pace chart is most easily generated from a long dataframe containing the accumulated stage time rather than the sectional times.

For example, we can cast the wide form data to a long form and co-opt the pace chart to render the times for us:

```{r rebased_splits_pace, warning=FALSE}
long_splits_rebased %>%
        off_the_pace_chart(dist='distance',
                           t='sectionDurationS',
                           label_typ='ggrepel',
                           code='code')
```


<!--chapter:end:finding-splits-pace.Rmd-->