# Ocean Temperature Data Exploring

## Setup

Analysis and visualization was done using R and various packages. The following is the script used to generate 2 scatterplot graphs.

In [1]:
library(tidyverse)
library(lubridate) 
library(ggplot2)
library(plotly)
options(repr.plot.width=10, repr.plot.height=6)

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.1.1       v purrr   0.3.2  
v tibble  2.1.1       v dplyr   0.8.0.1
v tidyr   0.8.3       v stringr 1.4.0  
v readr   1.3.1       v forcats 0.4.0  
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Attaching package: 'lubridate'

The following object is masked from 'package:base':

    date


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout



## Reading and Wrangling Data

Temperature data is found in the `"data"` folder, while coordinates (and the time recorded) are in the `"data/nav"` folder.

All of the temperature data was cleaned using a `Python` script, located in the `data_cleaning` folder. No packages were used, and can be used as long as a v3.9 Python is installed (anything above or below is untested) and the scripts are pointed to the right data sources. *Unfortunately, the version where Jupyter runs on Windows (v3.7 Python) and the data cleaner (v3.9 Python) is different.*

- `data/temp_processed_summarized.csv`: mean temp over min
- `data/temp_processed.csv`: all temp
- `data/nav/nav_processed.csv`: mean GPS over min
- `data/nav_temp_joined_processed.csv`: joined mean temp & GPS over min

The dates are all of type `character`, meaning extracting any use without it being a proper `date` type is hard. Therefore, time and date must be formatted.

In [23]:
format_datetime <- function(df) {
  df_new <- df %>%
    # https://www.neonscience.org/resources/learning-hub/tutorials/dc-time-series-subset-dplyr-r
    mutate(date = as.Date(date, format = '%m/%d/%Y')) %>%
    # https://www.tidyverse.org/blog/2021/03/clock-0-1-0/
    mutate(datetime = as.POSIXct(date, "America/Vancouver")) %>%
    mutate(datetime = datetime +hour(time)+ minute(time))
  
  return(df_new)
}

The following are all of the functions needed to clean up 1 temperature file. However, we have quite a few files, and trying to clean and instantiate each by hand is cumbersome. Therefore, we will iterate through all of the files and summarize.

```{attention}
Please keep in mind that the following code blocks will take pretty long to run.
```

In [24]:
clean_SBE45_data <- function(x) {
  read <- read_delim(x, delim = ",", 
                     col_names = c("date", 
                                    "time", 
                                    "int_temp", 
                                    "conductivity",
                                    "salinity",
                                    "sound_vel",
                                    "ext_temp")) %>%
    select(date, time, ext_temp)
  return(read)
}

In [25]:
clean_STT_TSG_data <- function(x) {
  read <- read_delim(x, delim = ",",
                     col_names = c("date",
                                   "time",
                                   "type",
                                   "diff",
                                   "ext_temp",
                                   "int_temp")) %>%
    select(date, time, ext_temp)    
  return(read)
}

In [26]:
clean_temp_data <- function(x) {
  # https://stackoverflow.com/questions/10128617/test-if-characters-are-in-a-string
  if(grepl("SBE45-TSG-MSG", x, fixed = TRUE)) {
    return(clean_SBE45_data(x))
  } else {
    return(clean_STT_TSG_data(x))
  }
}

This is the data before it is summarized by the minute.

In [7]:
# https://stackoverflow.com/questions/11433432/how-to-import-multiple-csv-files-at-once
all_temperature_loaded <- list.files(path = "data/",
             pattern = "*.Raw",
             full.names = T) %>%
  map_df(~clean_temp_data(.))

Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_double(),
  salinity = col_double(),
  sound_vel = col_double(),
  ext_temp = col_double()
)
Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_double(),
  salinity = col_double(),
  sound_vel = col_double(),
  ext_temp = col_double()
)
"5 parsing failures.
  row       col expected     actual                                          file
61277 ext_temp  a double t215.2754  'data/SBE45-TSG-MSG_20210614-000001_NEW2.Raw'
61375 sound_vel a double s1506.032  'data/SBE45-TSG-MSG_20210614-000001_NEW2.Raw'
75629 ext_temp  a double t215.4430  'data/SBE45-TSG-MSG_20210614-000001_NEW2.Raw'
76575 ext_temp  a double t215.4469  'data/SBE45-TSG-MSG_20210614-000001_NEW2.Raw'
78521 sound_vel a double sv1507.524 'data/SBE45-TSG-MSG_20210614-000001_NEW2.Raw'
"Parsed w

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_character(),
  salinity = col_double(),
  sound_vel = col_character(),
  ext_temp = col_double()
)
"856 parsing failures.
row          col expected        actual                                          file
188 ext_temp     a double               'data/SBE45-TSG-MSG_20210623-000001_NEW2.Raw'
358 conductivity          embedded null 'data/SBE45-TSG-MSG_20210623-000001_NEW2.Raw'
669 sound_vel             embedded null 'data/SBE45-TSG-MSG_20210623-000001_NEW2.Raw'
675 sound_vel             embedded null 'data/SBE45-TSG-MSG_20210623-000001_NEW2.Raw'
675 ext_temp     a double               'data/SBE45-TSG-MSG_20210623-000001_NEW2.Raw'
... ............ ........ ............. .............................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format =

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_double(),
  salinity = col_double(),
  sound_vel = col_double(),
  ext_temp = col_double()
)
"117 parsing failures.
 row          col               expected actual                                          file
5882 salinity     no trailing characters        'data/SBE45-TSG-MSG_20210702-000001_NEW2.Raw'
5963 salinity     no trailing characters        'data/SBE45-TSG-MSG_20210702-000001_NEW2.Raw'
5992 ext_temp     no trailing characters        'data/SBE45-TSG-MSG_20210702-000001_NEW2.Raw'
6054 sound_vel    a double                      'data/SBE45-TSG-MSG_20210702-000001_NEW2.Raw'
6129 conductivity no trailing characters        'data/SBE45-TSG-MSG_20210702-000001_NEW2.Raw'
.... ............ ...................... ...... .............................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_double(),
  salinity = col_double(),
  sound_vel = col_double(),
  ext_temp = col_double()
)
"10 parsing failures.
  row          col               expected actual                                          file
 9331 conductivity no trailing characters        'data/SBE45-TSG-MSG_20210710-000001_NEW2.Raw'
 9601 salinity     no trailing characters        'data/SBE45-TSG-MSG_20210710-000001_NEW2.Raw'
15851 sound_vel    a double                      'data/SBE45-TSG-MSG_20210710-000001_NEW2.Raw'
42485 ext_temp     a double                      'data/SBE45-TSG-MSG_20210710-000001_NEW2.Raw'
50737 sound_vel    no trailing characters        'data/SBE45-TSG-MSG_20210710-000001_NEW2.Raw'
..... ............ ...................... ...... .............................................
See problems(...) for more details.
"Parsed with column specification:
co

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  int_temp = col_double(),
  conductivity = col_double(),
  salinity = col_double(),
  sound_vel = col_double(),
  ext_temp = col_double()
)
"67 parsing failures.
 row          col               expected actual                                          file
 754 conductivity a double                      'data/SBE45-TSG-MSG_20210719-000001_NEW2.Raw'
3167 ext_temp     a double                      'data/SBE45-TSG-MSG_20210719-000001_NEW2.Raw'
3239 ext_temp     a double                      'data/SBE45-TSG-MSG_20210719-000001_NEW2.Raw'
3720 conductivity no trailing characters        'data/SBE45-TSG-MSG_20210719-000001_NEW2.Raw'
3720 salinity     a double                    s 'data/SBE45-TSG-MSG_20210719-000001_NEW2.Raw'
.... ............ ...................... ...... .............................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  d

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_double()
)
"86147 parsing failures.
row col  expected    actual                                             file
  1  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210614-000001.Raw'
  2  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210614-000001.Raw'
  3  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210614-000001.Raw'
  4  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210614-000001.Raw'
  5  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210614-000001.Raw'
... ... ......... ......... ................................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_doubl

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_double()
)
"85615 parsing failures.
row col  expected    actual                                             file
  1  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210624-000001.Raw'
  2  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210624-000001.Raw'
  3  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210624-000001.Raw'
  4  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210624-000001.Raw'
  5  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210624-000001.Raw'
... ... ......... ......... ................................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_doubl

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_double()
)
"61634 parsing failures.
row col  expected    actual                                             file
  1  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210703-065123.Raw'
  2  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210703-065123.Raw'
  3  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210703-065123.Raw'
  4  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210703-065123.Raw'
  5  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210703-065123.Raw'
... ... ......... ......... ................................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_doubl

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_double()
)
"86225 parsing failures.
row col  expected    actual                                             file
  1  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210713-000001.Raw'
  2  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210713-000001.Raw'
  3  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210713-000001.Raw'
  4  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210713-000001.Raw'
  5  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210713-000001.Raw'
... ... ......... ......... ................................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_doubl

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_double()
)
"85740 parsing failures.
row col  expected    actual                                             file
  1  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210723-000001.Raw'
  2  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210723-000001.Raw'
  3  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210723-000001.Raw'
  4  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210723-000001.Raw'
  5  -- 6 columns 7 columns 'data/SST-TSG-Temp-Diff-MSG_20210723-000001.Raw'
... ... ......... ......... ................................................
See problems(...) for more details.
"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  diff = col_double(),
  ext_temp = col_double(),
  int_temp = col_doubl

In [27]:
head(all_temperature_loaded)
summary(all_temperature_loaded)

date,time,ext_temp
06/13/2021,19:42:03,19.5731
06/13/2021,19:42:04,19.524
06/13/2021,19:42:05,19.4795
06/13/2021,19:42:06,19.4724
06/13/2021,19:42:07,19.4734
06/13/2021,19:42:08,19.4718


     date               time             ext_temp    
 Length:7268078     Length:7268078    Min.   : 1.00  
 Class :character   Class1:hms        1st Qu.:13.37  
 Mode  :character   Class2:difftime   Median :15.04  
                    Mode  :numeric    Mean   :15.17  
                                      3rd Qu.:16.68  
                                      Max.   :24.22  
                                      NA's   :2317   

In [33]:
all_temperature_cleaned <- all_temperature_loaded %>%
  filter(!is.na(ext_temp)) %>%
  filter(ext_temp > 2) %>%
  format_datetime()

write_csv(all_temperature_cleaned, "data/temp_processed.csv")

all_temperature <- group_by(all_temperature_cleaned, datetime) %>%
    summarize(mean_ext = mean(ext_temp, na.rm = TRUE))

write_csv(all_temperature, "data/temp_processed_summarized.csv")

In [10]:
head(all_temperature)
summary(all_temperature)

datetime,mean_ext
2021-06-12 17:00:20,18.94975
2021-06-12 17:00:21,18.81057
2021-06-12 17:00:22,19.17494
2021-06-12 17:00:23,19.26122
2021-06-12 17:00:24,19.28124
2021-06-12 17:00:25,19.20406


    datetime                      mean_ext    
 Min.   :2021-06-12 17:00:20   Min.   :10.17  
 1st Qu.:2021-06-23 17:00:13   1st Qu.:13.35  
 Median :2021-07-04 17:00:06   Median :14.92  
 Mean   :2021-07-04 07:00:13   Mean   :15.28  
 3rd Qu.:2021-07-15 11:00:20   3rd Qu.:16.71  
 Max.   :2021-07-25 17:01:16   Max.   :21.71  

All of the navigation files will also be cleaned up and summarized in similar manner to the temperature files.

In [28]:
clean_nav_data <- function(x) {
  read <- read_csv(x, 
                   col_names = c(
                     "date",
                     "time",
                     "type",
                     "time_num",
                     "lat",
                     "lat_NS",
                     "long",
                     "long_WE",
                     "gps_quality",
                     "num_sat_view",
                     "hort_dil",
                     "ant_alt",
                     "ant_alt_unit",
                     "geoidal",
                     "geoidal_unit",
                     "age_diff",
                     "diff_station",
                     "checksum"
                   )) %>%
    select(date, time, lat, long)
    return(read)
}

In [19]:
all_nav_loaded <- list.files(path = "data/nav/",
                      pattern = "*.Raw",
                      full.names = T) %>%
  map_df(~clean_nav_data(.)) 

Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_double(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"4002 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-154208.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-154208.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-154208.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-154208.Raw'
  5  -- 18 columns 17 columns 'data/

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_double(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"937 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-174526.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-174526.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-174526.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210613-174526.Raw'
  5  -- 18 columns 17 columns 'data/

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86428 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210616-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210616-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210616-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210616-000001.Raw'
  5  -- 18 columns 17 columns '

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86579 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210623-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210623-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210623-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210623-000001.Raw'
  5  -- 18 columns 17 columns '

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86420 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210630-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210630-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210630-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210630-000001.Raw'
  5  -- 18 columns 17 columns '

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86416 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210706-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210706-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210706-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210706-000001.Raw'
  5  -- 18 columns 17 columns '

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86418 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210713-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210713-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210713-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210713-000001.Raw'
  5  -- 18 columns 17 columns '

"Parsed with column specification:
cols(
  date = col_character(),
  time = col_time(format = ""),
  type = col_character(),
  time_num = col_character(),
  lat = col_double(),
  lat_NS = col_character(),
  long = col_double(),
  long_WE = col_character(),
  gps_quality = col_double(),
  num_sat_view = col_double(),
  hort_dil = col_double(),
  ant_alt = col_double(),
  ant_alt_unit = col_character(),
  geoidal = col_logical(),
  geoidal_unit = col_character(),
  age_diff = col_logical(),
  diff_station = col_character(),
  checksum = col_character()
)
"86564 parsing failures.
row col   expected     actual                                           file
  1  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210720-000001.Raw'
  2  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210720-000001.Raw'
  3  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210720-000001.Raw'
  4  -- 18 columns 17 columns 'data/nav/Primary-GPS-GGA_20210720-000001.Raw'
  5  -- 18 columns 17 columns '

In [21]:
all_nav <- all_nav_loaded %>%
    format_datetime() %>%
    group_by(datetime) %>%
    summarize(mean_lat = mean(lat), mean_long = mean(long)) %>%
    mutate(mean_lat = mean_lat/100, mean_long = mean_long/100) %>%
    mutate(deg_lat_int = trunc(mean_lat, 0),
           deg_long_int = trunc(mean_long, 0)) %>%
    mutate(deg_lat_dec = round((mean_lat - deg_lat_int) * 10000),
           deg_long_dec = round((mean_long - deg_long_int) * 10000)) %>%
    mutate(mean_deg_lat = deg_lat_int + deg_lat_dec/(60 * 100),
          mean_deg_long = deg_long_int + deg_long_dec / (60 * 100)) %>%
    select(-deg_lat_int, -deg_long_int, -deg_lat_dec, -deg_long_dec)


write_csv(all_nav, "data/nav/nav_processed.csv")

In [29]:
head(all_nav)
summary(all_nav)

datetime,mean_lat,mean_long,mean_deg_lat,mean_deg_long
2021-06-12 17:00:16,32.41784,117.0942,32.69633,117.157
2021-06-12 17:00:17,32.41788,117.0941,32.6965,117.1568
2021-06-12 17:00:18,32.41809,117.0938,32.69683,117.1563
2021-06-12 17:00:19,32.41575,117.1053,32.69283,117.1755
2021-06-12 17:00:20,32.40404,117.1328,32.67333,117.2213
2021-06-12 17:00:21,32.39704,117.1738,32.66167,117.2897


    datetime                      mean_lat       mean_long      mean_deg_lat  
 Min.   :2021-06-12 17:00:16   Min.   :31.32   Min.   :117.1   Min.   :31.54  
 1st Qu.:2021-06-23 17:00:10   1st Qu.:33.76   1st Qu.:120.6   1st Qu.:34.13  
 Median :2021-07-04 17:00:04   Median :37.56   Median :123.3   Median :37.93  
 Mean   :2021-07-04 06:25:58   Mean   :39.47   Mean   :123.0   Mean   :39.74  
 3rd Qu.:2021-07-14 17:01:21   3rd Qu.:45.09   3rd Qu.:124.8   3rd Qu.:45.35  
 Max.   :2021-07-25 17:01:16   Max.   :52.21   Max.   :130.5   Max.   :52.64  
 NA's   :1                     NA's   :453     NA's   :527     NA's   :453    
 mean_deg_long  
 Min.   :117.2  
 1st Qu.:121.0  
 Median :123.6  
 Mean   :123.3  
 3rd Qu.:125.1  
 Max.   :130.8  
 NA's   :527    

Since we have the date and time (by the minute) of both the temperature and it's coordinates, we can match the two variables together.

In [30]:
joined_temp_nav <- inner_join(all_temperature, 
                             all_nav,
                             by = c("datetime" = "datetime"))

write_csv(joined_temp_nav, "data/nav_temp_joined_processed.csv")

In [31]:
head(joined_temp_nav)
summary(joined_temp_nav)

datetime,mean_ext,mean_lat,mean_long,mean_deg_lat,mean_deg_long
2021-06-12 17:00:20,18.94975,32.40404,117.1328,32.67333,117.2213
2021-06-12 17:00:21,18.81057,32.39704,117.1738,32.66167,117.2897
2021-06-12 17:00:22,19.17494,32.39495,117.1988,32.65833,117.3313
2021-06-12 17:00:23,19.26122,32.39151,117.2244,32.6525,117.374
2021-06-12 17:00:24,19.28124,32.39131,117.225,32.65217,117.375
2021-06-12 17:00:25,19.20406,32.39111,117.2255,32.65183,117.3758


    datetime                      mean_ext        mean_lat       mean_long    
 Min.   :2021-06-12 17:00:20   Min.   :10.17   Min.   :31.32   Min.   :117.1  
 1st Qu.:2021-06-23 17:00:13   1st Qu.:13.35   1st Qu.:33.80   1st Qu.:120.7  
 Median :2021-07-04 17:00:06   Median :14.92   Median :37.59   Median :123.3  
 Mean   :2021-07-04 07:00:13   Mean   :15.28   Mean   :39.48   Mean   :123.0  
 3rd Qu.:2021-07-15 11:00:20   3rd Qu.:16.71   3rd Qu.:45.09   3rd Qu.:124.8  
 Max.   :2021-07-25 17:01:16   Max.   :21.71   Max.   :52.21   Max.   :130.5  
                                               NA's   :452     NA's   :526    
  mean_deg_lat   mean_deg_long  
 Min.   :31.54   Min.   :117.2  
 1st Qu.:34.14   1st Qu.:121.0  
 Median :37.97   Median :123.6  
 Mean   :39.75   Mean   :123.3  
 3rd Qu.:45.38   3rd Qu.:125.1  
 Max.   :52.64   Max.   :130.8  
 NA's   :452     NA's   :526    

## Visualize the Data

Time and mean temperature plotted on a scatterplot to see temperature changes over time. Notice that since the ship moves in 1 way vs time, the shape of the 2D scatterplot and the 3D plot is very similar.

In [None]:
time_plot <- ggplot(all_temperature, aes(x = datetime, 
                                         y = mean_ext, 
                                         colour = mean_ext)) +
  geom_point() +
  scale_colour_gradient(low = "blue", high = "red") +
  labs(x = "Date and Time PST", 
       y = "Mean (by the min) Ocean Temperature (celcius)",
       colour = "Mean External Temperature")
time_plot

In [None]:
p<- plot_ly(joined_temp_nav, 
        x = ~mean_lat, 
        y = ~mean_long,
        z = ~mean_ext,
        color = ~mean_ext) %>%
  add_markers(size = 0.7)

In [None]:
embed_notebook(p)