# 2 - Process and Merge Datasets
Next, we want to process the ADCP and Science Datasets so that we can merge them into a single unified dataset, derive the individual profile dives, and run the vendor-suggested Quality Control algorithms.

In [7]:
import os, sys, re, ast
import pandas as pd
import numpy as np
import xarray as xr

# For Dev, append the module to the working path
sys.path.append('/home/jovyan/WHOIGit/ooicgsn_glider_dvl/ooicgsn_glider_dvl/')

In [2]:
# Load the ADCP Data
adcp = xr.open_dataset("../data/raw/CP05MOAS-GL388-01-ADCPAM000.deployment0008.adcp_data.nc").load()

# Load the Science Data
gdac = xr.open_dataset("../data/raw/CP05MOAS-GL388.deployment0008.science.nc").load()

#### Merge the Datasets
Merging the glider ADCP and Science datasets is a multistep process. The Science dataset contains all of the different science sensors, along with the glider's built-in sensors and navigation, on a common timestamp. However, the different science sensors all sample at different rates. Thus, the science parameters in the dataset are sparse and filled with NaNs. In order to merge the Science and ADCP data streams, we need to do the following:
1. Split the separate science and glider parameters into different datasets based on their source sensor
2. Drop the NaNs from the each grouped science parameters set
3. Linearly interpolate each grouped parameters set to the ADCP time
4. Merge the interpolated, grouped parameter sets to the ADCP dataset

In [9]:
from merge import merge_datasets, split_data, sensor_variables

In [10]:
merged = merge_datasets(adcp, gdac)
merged

#### Idenfity Individual Profiles/Dives
The ADCP on OOI gliders is only active during the dive portion of the deployment. The algorithm to separate the profilers looks for where the absolute difference between two depths exceeds 2 meters and the sampling interval is greater than 4 seconds (the ADCP samples at 2 seconds).

In [11]:
# Next, identify the individual profiles
from profiles import add_profiles

In [12]:
merged = add_profiles(merged)
merged

### Vendor QA/QC 

First, utilize the TRDI Data QA-QC Model rev12-1 thresholds to derive a first-pass QC flag. We assume that the Explorer DVL 600kHz is comparable to the WH600kHz (Wide-Band) for setting the thresholds.
I think the config settings are: 2.0 meter depth cell size, 0.1 ping interval time, 10 pings per ensemble

| Parameter | Good | Suspect | Bad |
| --------- | ---- | ------- | --- |
| Error Velocity Threshold | <=6.3 cm/s | >6.3 cm/s, <=12.6 cm/s | >=12.6 cm/s |
| Correlation Magnitude | >=115 | <115, >=63 | <63 |
| Percent Good | >=63% | <63%, >=50% | <50% |
| BIT Errors | 0 | >=1 | N/A |

In [14]:
import qc

In [26]:
adcp['error_seawater_velocity'].min()

In [28]:
# Run the individual tests (make sure inputs and units match)
percent_good = qc.percent_good_qc(merged, 63, 50)
corr_mag = qc.correlation_magnitude_qc(merged, 115, 63)
error_vel = qc.error_velocity_qc(merged, 12.6/100, 6.3/100)

# Merge the test results
qc_flag = qc.merge_qc([percent_good, corr_mag, error_vel])

In [29]:
qc_flag

array([[1, 1, 1, ..., 4, 4, 4],
       [1, 1, 1, ..., 4, 4, 4],
       [1, 1, 1, ..., 4, 4, 4],
       ...,
       [3, 3, 4, ..., 4, 4, 4],
       [3, 3, 3, ..., 4, 4, 4],
       [3, 4, 3, ..., 4, 4, 4]], shape=(890517, 30))

In [33]:
# Now add the qc_flags to the 
merged['vendor_qc_flag'] = (['time','bin'], qc_flag)
merged['vendor_qc_flag'].attrs = {
            'long_name': 'TRDI QC Summary Flag',
            'standard_name': 'aggregate_quality_flag',
            'comment': ('Summary of the TRDI QC tests as a QARTOD style summary flag, where '
                        'the values are 1 == pass, 2 == not evaluated, 3 == suspect or of high interest, '
                        '4 == fail, and 9 == missing.'),
            'flag_values': np.array([1, 2, 3, 4, 9]).astype(np.int32),
            'flag_meanings': 'pass not_evaluated suspect_or_of_high_interest fail missing'
        }

#### Add Waypoints
The next step is to add the waypoint data into the merged dataset. The waypoint data is included in the glider engineering data and is not in either the science or adcp datasets. The waypoint data is necessary to calculate the net water-column velocity, which is acheived by comparing the expected waypoint with the actual surfacing location.

In [34]:
glider = pd.read_csv("../data/raw/deployment0008_CP05MOAS-GL388-00-ENG000000-recovered_host-glider_eng_recovered.csv")
glider.head()

Unnamed: 0,time,obs,c_air_pump,c_ballast_pumped,c_battpos,c_battroll,c_bsipar_on,c_de_oil_vol,c_dvl_on,c_flbbcd_on,...,m_water_vx,m_water_vy,m_why_started,m_x_lmc,m_y_lmc,port_timestamp,preferred_timestamp,x_last_wpt_lat,x_last_wpt_lon,x_system_clock_adjusted
0,2019-06-18 23:01:35.200470,"('0',)",1,260.0,0.738,0.0,-1.0,260.0,-1.0,-1.0,...,0.0,0.0,64,0.0,0.0,0.0,internal_timestamp,0.0,0.0,0.0
1,2019-06-18 23:02:29.318180,"('1',)",1,260.0,0.738,0.0,-1.0,260.0,-1.0,-1.0,...,0.0,0.0,64,0.0,0.0,0.0,internal_timestamp,0.0,0.0,0.0
2,2019-06-18 23:03:01.304530,"('2',)",1,,0.738,0.0,-1.0,260.0,-1.0,-1.0,...,,,-99,,,0.0,internal_timestamp,,,
3,2019-06-18 23:03:05.753880,"('3',)",1,,0.738,0.0,-1.0,260.0,-1.0,-1.0,...,,,-99,,,0.0,internal_timestamp,,,
4,2019-06-18 23:03:10.206570,"('4',)",1,,0.738,0.0,-1.0,260.0,-1.0,-1.0,...,,,-99,,,0.0,internal_timestamp,,,


In [36]:
waypoints = glider[['time','c_wpt_lat','c_wpt_lon']].dropna()
waypoints

Unnamed: 0,time,c_wpt_lat,c_wpt_lon
0,2019-06-18 23:01:35.200470,0.000000,0.000000
1,2019-06-18 23:02:29.318180,0.000000,0.000000
9,2019-06-18 23:03:32.617650,39.833332,-70.833336
251,2019-06-18 23:33:21.277470,39.833332,-70.833336
252,2019-06-18 23:34:15.352840,39.833332,-70.833336
...,...,...,...
942067,2019-09-12 22:06:21.484130,39.833332,-70.875000
943248,2019-09-13 05:34:31.284420,39.833332,-70.875000
944280,2019-09-13 12:35:05.878660,39.333332,-70.291664
945541,2019-09-13 20:29:15.227600,39.333332,-70.291664


In [39]:
waypoints['time'] = waypoints['time'].apply(lambda x: pd.to_datetime(x))
waypoints

Unnamed: 0,time,c_wpt_lat,c_wpt_lon
0,2019-06-18 23:01:35.200470,0.000000,0.000000
1,2019-06-18 23:02:29.318180,0.000000,0.000000
9,2019-06-18 23:03:32.617650,39.833332,-70.833336
251,2019-06-18 23:33:21.277470,39.833332,-70.833336
252,2019-06-18 23:34:15.352840,39.833332,-70.833336
...,...,...,...
942067,2019-09-12 22:06:21.484130,39.833332,-70.875000
943248,2019-09-13 05:34:31.284420,39.833332,-70.875000
944280,2019-09-13 12:35:05.878660,39.333332,-70.291664
945541,2019-09-13 20:29:15.227600,39.333332,-70.291664


In [41]:
d_wpt = waypoints.diff()
d_wpt

Unnamed: 0,time,c_wpt_lat,c_wpt_lon
0,NaT,,
1,0 days 00:00:54.117710,0.000000,0.000000
9,0 days 00:01:03.299470,39.833332,-70.833336
251,0 days 00:29:48.659820,0.000000,0.000000
252,0 days 00:00:54.075370,0.000000,0.000000
...,...,...,...
942067,0 days 07:44:46.421140,0.000000,0.000000
943248,0 days 07:28:09.800290,0.000000,0.000000
944280,0 days 07:00:34.594240,-0.500000,0.583336
945541,0 days 07:54:09.348940,0.000000,0.000000


In [43]:
mask = (d_wpt['c_wpt_lat'] == 0) & (d_wpt['c_wpt_lon'] == 0)
mask

0         False
1          True
9         False
251        True
252        True
          ...  
942067     True
943248     True
944280    False
945541     True
945542     True
Length: 1424, dtype: bool

In [None]:
waypoints = waypoints[~mask]
for t, lat, lon in waypoin

In [54]:
for n, (t, lat, lon) in enumerate(waypoints.itertuples(index=False)):
    print(n)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [57]:
lon

-70.291664