vignettes/articles/find-dives.Rmd

---
title: "Using find_dives"
author: "tagtools project team"
date: "`r Sys.Date()`"
output: 
  html_document:
    code_folding: hide
vignette: >
  %\VignetteIndexEntry{find-dives}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tagtools)
```

Welcome to this vignette! On behalf of the team behind tagtools, thanks for taking some time to get to know this package. We hope it is useful to you. 

In this vignette you will learn to use `find_dives` to gain insight about a depth profile, calculate the mean dive duration and mean dive depth, and mark the beginnings and endings of dives. 

*Estimated time for this vignette: 20 minutes.*

These practicals all assume that you have R/Rstudio, some basic experience working with them, and can execute provided code, making some user-specific changes along the way (e.g. to help R find a file you downloaded). We will provide you with quite a few lines. To boost your own learning, you would do well to try and write them before opening what we give, using this just to check your work.

Additionally, be careful when copy-pasting special characters such as \_underscores\_ and 'quotes'. If you get an error, one thing to check is that you have just a single, simple underscore, and `'straight quotes'`, whether `'single'` or `"double"` (rather than “smart quotes”). 

# Load the test data set

Load the test dataset `mn12_186a_raw.nc`. This dataset has already been converted from the source file that was offloaded from the tag into a NetCDF file. In doing so, some metadata was gleaned from the file and added to the data. Other metadata was added by hand. If you want to run this example, download the "mn12_186a_raw.nc" file from https://github.com/animaltags/tagtools_data and change the file path to match where you've saved the files 

Use load_nc to load a NetCDF file:

```{r, echo = TRUE, eval = FALSE}
library(tagtools)
MN_file_path <- "nc_files/mn12_186a_raw.nc"
MN <- load_nc(MN_file_path)
```

```{r, echo = FALSE, eval = TRUE}
MN_file_path <- "nc_files/mn12_186a_raw.nc"
MN <- load_nc(MN_file_path)
```

This creates an animaltag list object MN in your workspace. You can view it in the Environment tab if working in RStudio, or in the command line type:

```{r, eval = FALSE, echo = TRUE}
names(MN)
str(MN$A)
# not run because output is very long! see the whole STRucture of MN:
# str(MN)
# shorter outline version:
str(MN, max.level = 1)
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#inspect_MN"> Show/Hide Results </button>  
<div id="inspect_MN" class="collapse"> 
```{r, eval = TRUE, echo = FALSE}
print("Results for `names(MN)`:")
print("------------------------")
names(MN)
print("Results for `str(MN$A)`:")
print("------------------------")
str(MN$A)
# not run because output is very long! see the whole STRucture of MN:
# str(MN)
# shorter outline version:
print("Results for `str(MN$A)`:")
print("------------------------")
str(MN, max.level = 1)
```
</div>

You should see that lists `A`, `M`, `P`, `S`, `T` and `info` are contained within the list MN.

# Exercise: Calculate the mean duration of dives deeper than 5m 

Our goal with these data is to calculate the mean duration of dives deeper than 5 m. If you can think of a way to do this already, please go ahead and try! You can then compare your answer to the step-by-step procedure below.

As with all raw depth data, there are some problems with this dive profile. Try and plot the depth profile and the temperature. Then, see if you can find evidence for each of these in the plot: 

1. Incorrect calibration of the sensor 
2. Occasional outliers 
3. Coarse depth resolution 
4. Temperature sensitivity

<button class="btn btn-primary" data-toggle="collapse" data-target="#ans_prob_with_prof"> _Show/Hide Answers_ </button>  
<div id="ans_prob_with_prof" class="collapse">
*1. Pay attention to the depths reported... While it looks like a dive profile, mostly, what looks like the "surface" is actually around some *$-10$ m *(that is, ten meters high in the air)! We know that humpback whales do not, in fact, go ten meters in the sky routinely. This suggests the sensor has not been well-calibrated.*
*2. Occasionally, there are some peaks that go higher yet than these ten meters high in the sky momentarily (more like 15 meters). These are outliers.*
*3. Zoom in to the shallow part of the dive to notice this. The individual measurements are only precise to the meter; thus it will appear to jump back and forth more suddenly than we would expect for a relaxed whale at the surface.* 
<!-- When y-axis limits in plott work: demonstrate with shallow depth. -->
*4. The temperature gradually rises as the recording nears the end, then spikes down and up. This is accompanied by a gradually decreasing depth from ten meters in the sky to the surface. Together, the two trends strongly suggest that the depth data has been affected by the changing temperature as the tag sits near the surface. *
</div>

## Hints & Tips
1. Look in `info` to find what species the data come from. Are the depth values reasonable for this species?
2. Zoom in and see what size depth steps there are in the data.
3. Use `plott` to plot both the depth and temperature:

```{r, echo = TRUE, eval = FALSE}
plott(X=list(Depth=MN$P, Temperature=MN$T), r=c(TRUE,FALSE))
# r=c(TRUE,FALSE) tells plot to reverse the y-axis for the Depth data (so that it looks like a dive profile), but not for the Temperature data (which would be silly). 
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#plott_MN"> Show/Hide Results </button>  
<div id="plott_MN" class="collapse"> 

```{r, echo = FALSE, eval = TRUE}
plott(X=list(Depth=MN$P, Temperature=MN$T), r=c(TRUE,FALSE))
```

</div>

## What to do about periods of data when the tag is not on the animal

Not all tags have a way to start logging as soon as the tag has been deployed on the animal. Often data logging is started by a time trigger or alarm, and the researcher has to make a guess as to when the tag will be deployed to set its start time appropriately. Often this means that a tag is logging data before it is put on an animal. 

Equally, tags have no means of detecting when they release from the tagged animal. As a consequence, they may continue to log data after they release. In most cases, the logged data from before and after deployment has no use. To reduce the data to just the periods when the tag is on the animal, use the tool `crop`:

```{r, echo = TRUE, eval = FALSE}
Pc = crop(MN$P)
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#Pc"> Show/Hide Results </button>  
<div id="Pc" class="collapse"> 

```{r, echo = FALSE, eval = TRUE}
Pc = crop(MN$P)
```

</div>

```{r, echo = FALSE, eval = TRUE}
Pc = crop_to(X = MN$P, sampling_rate = 1, tcues = c(0, 25306), times = NULL)
```

This displays an interactive depth plot. Follow the instructions to select the obvious diving section of the data and then click finish. The function returns a new data structure which contains just the selected part of the dive profile. The resulting sensor data list also contains fields that document what you just did. To see them:

```{r, echo = TRUE, eval = FALSE}
Pc$history
str(Pc, max.level = 1)
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#PcHistory"> Show/Hide Results </button>  
<div id="PcHistory" class="collapse"> 

```{r, echo = FALSE, eval = TRUE}
print("Results for `Pc$history`:")
print("-------------------------------------")
Pc$history
print("Results for `str(Pc, max.level = 1)`:")
print("-------------------------------------")
str(Pc, max.level = 1)
```

</div>

The `$history` field keeps track of the operations that you perform on a data structure. This helps with traceability if you make the processed data available in an archive. The `$crop` and `$start_time` fields show how the original data was changed: the start_time is with respect to the field ‘dephist_device_datetime_start’ in the `$info` structure which says when the tag recording started. Use `plott` to plot Pc to make sure you cropped it correctly.

```{r, echo = TRUE, eval = FALSE}
plott(X = list(DepthCropped=Pc), r = TRUE)

```

Note that the results from here on out will look a little different depending on where exactly you've cropped your data, but it should come to essentially the same thing.

<button class="btn btn-primary" data-toggle="collapse" data-target="#Pc_plott"> Show/Hide Results </button>  
<div id="Pc_plott" class="collapse"> 
```{r, echo = TRUE, eval = FALSE}
plott(X = list(DepthCropped=Pc), r = TRUE)
```
</div>

## Removing outliers. 
Outliers or spikes in the data may result from errors in the tag or poor sensor performance under rapidly changing environmental conditions. For example in this data set, rapid changes in temperature and pressure as the animal surfaces cause small glitches in the data. These are not representative of the animal’s  behaviour so we need to remove them. A good way to do this is with a median filter. Type:

```{r, echo = TRUE, eval = FALSE}
?median_filter
```

to find out how this function works. You call it using:

```{r, echo = TRUE, eval = TRUE} 
Pcm = median_filter(Pc,n=3)
```

Your variable `Pcm` now contains the median-filtered, cropped depth data. Check its history to verify that the median filtering has been added. Compare it against the unfiltered data using:

```{r, echo = TRUE, eval = FALSE} 
plott(X=list(Pc=Pc, Pcm=Pcm), r=c(TRUE,TRUE)) 
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#Pc_pcm_plott"> Show/Hide Results </button>  
<div id="Pc_pcm_plott" class="collapse"> 
```{r, echo = FALSE, eval = TRUE} 
plott(X=list(Pc=Pc, Pcm=Pcm), r=c(TRUE,TRUE)) 
```
</div>

This plots `Pc` in the upper panel and `Pcm` in the lower one, and reverses the y-axis labeling on both plots (by setting `r` to TRUE both times).

## Correcting pressure offsets & temperature effects
The next step is to correct the ‘0’ pressure offset of the depth sensor (so that the animal is not 10 m out of the water when it is really at the surface). We can also compensate for temperature at the same time. To do this we have to first crop the temperature data to match the pressure data. You can do this using:

```{r, echo = TRUE, eval = TRUE}
Tc <- crop_to(MN$T,tcues=Pc$crop)
```

This uses the crop information stored in Pc to do the same operation on MN$T. Plot them together to confirm that the time intervals (horizontal ticks) are the same:

```{r, echo = TRUE, eval = FALSE}
plott(X=list(Pcm=Pcm, Tc=Tc), r=c(TRUE,FALSE))
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#Pc_Tc_plott"> Show/Hide Results </button>  
<div id="Pc_Tc_plott" class="collapse"> 
```{r, echo = FALSE, eval = TRUE}
plott(X=list(Pcm=Pcm, Tc=Tc), r=c(TRUE,FALSE))
```
</div>

The tag toolbox has a function to correct pressure data called ‘fix_pressure’. Type

```{r, echo = TRUE, eval = FALSE}
? fix_pressure
```

to find out what it does and what assumptions it makes about the data. Use this function by calling:

```{r, echo = TRUE, eval = TRUE}
Pcmf <- fix_pressure(Pcm, Tc)
```

Compare the compensated dive profile to the uncompensated cropped ones using plott. (The code below will produce three plots, which might make them a bit small. Hit "Zoom" to view them in a larger window.)

```{r, echo = TRUE, eval = FALSE}
plott(X=list(Pc=Pc, Pcm=Pcm, Pcmf=Pcmf$p), r=c(TRUE,TRUE,TRUE))
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#Pc_Pcm_Pcmf_plott"> Show/Hide Results </button>  
<div id="Pc_Pcm_Pcmf_plott" class="collapse"> 

```{r, echo = FALSE, eval = TRUE}
plott(X=list(Pc=Pc, Pcm=Pcm, Pcmf=Pcmf$p), r=c(TRUE,TRUE,TRUE))
```

</div>

Which of the problems that we listed above have been taken care of? Any ideas what you could do about the remaining one(s)?

<button class="btn btn-primary" data-toggle="collapse" data-target="#ans_probs_addressed_or_naw"> _Show/Hide Answers_ </button>  
<div id="ans_probs_addressed_or_naw" class="collapse">
*1. Incorrect calibration of the sensor and temperature sensitivity have both been addressed by now, since the whale is actually resting and breathing at the surface, rather than ten meters in the air. Additionally, some of the occasional outliers have been removed by the median filter.* 

*However, none of this has increased the resolution of the depth data. In order to resolve this last issue, ultimately, new data would have to be taken at higher resolution.* 
</div>

## Finding dives & the mean dive duration

To find the mean dive duration for dives over 5 m in depth, you could measure each dive by hand on the `depth` plot (`ginput` is a useful function in Matlab and Octave for measuring data on a plot – there isn’t a great equivalent in R, where interactive plots are not really commonly used). But there is a toolbox function for this called `find_dives`. See the help on this function to find out what it does and what options it has. 

```{r, echo = TRUE, eval = FALSE}
? find_dives
```

To find dives deeper than 5 m in your compensated dive data, type:

```{r, echo = TRUE, eval = FALSE}
d <- find_dives(Pcmf$p,mindepth=5)
str(d, max.level = 1)
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#find_dives_d"> Show/Hide Results </button>  
<div id="find_dives_d" class="collapse"> 
```{r, echo = FALSE, eval = TRUE}
d <- find_dives(Pcmf$p,mindepth=5)
str(d, max.level = 1)
```
</div>

`d` should return a data frame with the start, end, and maximum depth of about 51 dives (depending on where you cropped the data). How can you get the mean dive duration & mean (maximum) dive depth from this structure? Code below provides one possible answer.

```{r, echo = TRUE, eval = FALSE}
total_dive_duration <- matrix(0)
for(n in 1:nrow(d)) {
  total_dive_duration <- total_dive_duration + d[n,2] - d[n,1]
  }
mean_dive_duration <- total_dive_duration/nrow(d)
mean_dive_duration
# since this is using data from find_dives, the mean duration is in seconds
# and mean depth
total_dive_depth <- matrix(0)
for(n in 1:nrow(d)) {
  total_dive_depth <- total_dive_depth + d[n,3]
}
mean_dive_depth <- total_dive_depth/nrow(d)
mean_dive_depth
```

When you have got the mean dive depth, try plotting the start and end of the dives on the depth plot:

```{r, echo = TRUE, eval = FALSE}
plott(X=list(Pcmf=Pcmf$p), r=TRUE)
points(d$start/(3600*24),rep(0,nrow(d)),col='green', pch=19)
points(d$end/(3600*24),rep(0,nrow(d)),col='red', pch=17)
```

<button class="btn btn-primary" data-toggle="collapse" data-target="#start_end_dives"> Show/Hide Results </button>  
<div id="start_end_dives" class="collapse"> 
```{r, echo = FALSE, eval = TRUE}
plott(X=list(Pcmf=Pcmf$p), r=TRUE)
points(d$start/(3600*24),rep(0,nrow(d)),col='green', pch=19)
points(d$end/(3600*24),rep(0,nrow(d)),col='red', pch=17)
```
</div>

Again, this plot might be rather small. As a result it might be tricky to make sense of the markers that designate the starts and ends of dives. You should be able to click Zoom to view the plot in a larger window. 

Note: if you cropped the time such that the units of the x-axis are not in days, you will have to adjust the multipliers in the `points` code accordingly. In the example above, the start and end times returned by find_dives are in seconds so we needed to divide them by 3600*24 to match the unit (days) automatically selected for time by `plott`.

# Review

Great work! You've learned how to use `find_dives`. 

*If you'd like to continue working through these practicals, `dive-stats` might be a good option if you're especially interested in analyzing individual dives. Otherwise, consider `fine-scale-tracking` or `mahalanobis-distance`.* 

```{r, echo = TRUE, eval = FALSE}
vignette('dive-stats')
vignette('fine-scale-tracking')
vignette('mahalanobis-distance')
```

***

Animaltags home pages: http://animaltags.org/ (old), https://animaltags.netlify.app/ (new), https://github.com/stacyderuiter/TagTools (for source code)