-
Notifications
You must be signed in to change notification settings - Fork 1
/
LIOR.m
491 lines (459 loc) · 20.3 KB
/
LIOR.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
function [OxygenEstimates,UncertaintyEstimates,MinUncertaintyEquation]= ...
LIOR(Coordinates,Measurements,MeasIDVec, ... % Required inputs
varargin) % Optional inputs
% Version 0.0 (not yet released)
% In development 2018:
% Updated LIOR_files 2018.10.10: Fixed a bug in the training routines.
%
% Development to do list:
% 1. assess against available data
% 2. (2018.10.10 this is now done, no obvious impact on performance)
% retrain without hypoxic measurements, then rely on max(ans,0) to capture
% O2 min zones without biasing regressions... retest.
%
% Locally Interpolated Oxygen Regression (LIOR): Estimates oxygen
% and oxygen estimate uncertainty from combinations of other parameter
% measurements.
%
% Citations:
% LIARv1: Carter et al., 2016, doi: 10.1002/lom3.10087
% LIARv2, LIPHR, LINR citation: Carter et al. (2017), doi: 10.1002/lom3.10232
% LIOR, LISiR, LIPR, TBD
%
% This function needs the CSIRO seawater package to run if measurements
% are povided in molar units or if potential temperature is
% needed but not provided by the user. Scale differences from TEOS-10 are
% a negligible component of oxygen estimate error. The seawater
% package is distributed freely by CSIRO (website has citation info):
%
% http://www.cmar.csiro.au/datacentre/ext_docs/seawater.htm
%
% *************************************************************************
% Input/Output dimensions:
% n: Integer number of desired estimate locations
% e: Integer number of equations used at each location
% y: Integer number of parameter measurement types provided by the user.
% n*e: Total number of estimates returned as an n by e array
%
% *************************************************************************
% Required Inputs:
%
% Coordinates (required n by 3 array):
% Coordinates at which estimates are desired. The first column should
% be longitude (degrees E), the second column should be latitude
% (degrees N), and the third column should be depth (m).
%
% Measurements (required n by y array):
% Parameter measurements that will be used to estimate oxygen. The
% column order (y columns) is arbitrary, but specified by MeasIDVec.
% Concentrations should be expressed as micromol per kg
% seawater unless PerKgSwTF is set to false in which case they should
% be expressed as micromol per L, temperature should be expressed as
% degrees C, and salinity should be specified with the unitless
% convention. NaN inputs are acceptable, but will lead to NaN
% estimates for any equations that depend on that parameter.
%
% MeasIDVec (required 1 by y vector):
% Vector indicating which parameter is placed in each column of the
% 'Measurements' input. Note that salinity is required for all
% equations. For example, if the first three columns of 'Measurements'
% contain salinity, silicate, and temperature, then MeasIDVec should
% equal [1 5 7].
%
% Input Parameter Key:
% 1. Salinity
% 2. Potential temperature
% 3. Phosphate
% 4. Nitrate
% 5. Silicate
% 6. (not an option for LIOR, typically O2, distinct from AOU)
% 7. Temperature (recommended over theta)
%
% *************************************************************************
% Optional inputs: All remaining inputs must be specified as sequential
% input argument pairs (e.g. "...,'Equations',[1:16],'OAAdjustTF',false,
% etc...")
%
% Equations (optional 1 by e vector, default []):
% Vector indicating which equations will be used to estimate oxygen.
% This input also determines the order of the columns in the
% 'OxygenEstimates' output. If [] is input or the input is not
% specified then all 16 equations will be used and only the outputs
% from the equation with the lowest uncertainty estimate will be
% returned.
%
% (S=salinity, Theta=potential temperature, P=phosphate, Si=silicate,
% N=nitrate, T=temperature... see 'Measurements' for units).
%
% Output Equation Key:
% 1. S, Theta or T, P, N, Si
% 2. S, Theta or T, P, Si
% 3. S, Theta or T, N, Si
% 4. S, Theta or T, Si
% 5. S, Theta or T, P, N
% 6. S, Theta or T, P
% 7. S, Theta or T, N
% 8. S, Theta
% 9. S, P, N, Si
% 10. S, P, Si
% 11. S, N, Si
% 12. S, Si
% 13. S, P, N
% 14. S, P
% 15. S, N
% 16. S
%
% MeasUncerts (Optional n by y array or 1 by y vector, default: [0.003 S,
% 0.003 degrees C (T or theta), 1% P, 1% N, 1% Si]:
% Array of measurement uncertainties (see 'Measurements' for units).
% Uncertainties should be presented in order indicated by MeasIDVec.
% Providing these estimates will improve LIOR estimate uncertainties
% in 'UncertaintyEstimates'. Measurement uncertainties are a small part
% of LIOR estimate uncertainties for WOCE-quality measurements.
% However, estimate uncertainty scales with measurement uncertainty, so
% it is recommended that measurement uncertainties be specified for
% sensor measurements. If this optional input argument is not
% provided, the default WOCE-quality uncertainty is assumed. If a 1 by
% y array is provided then the uncertainty estimates are assumed to
% apply uniformly to all input parameter measurements.
%
% PerKgSwTF (Optional boolean, default true):
% Many sensors provide measurements in micromol per L (molarity)
% instead of micromol per kg seawater. Indicate false if provided
% measurements are expressed in molar units (concentrations must be
% micromol per L if so). Outputs will remain in molal units
% regardless.
%
% VerboseTF (Optional boolean, default true):
% Setting this to false will make LIOR stop printing updates to the
% command line. This behavior can also be permanently disabled below.
% *************************************************************************
% Outputs:
%
% OxygenEstimates:
% A n by e array of LIOR estimates specific to the coordinates and
% parameter measurements provided. Units are micromoles per kg
% seawater.
%
% UncertaintyEstimates:
% A n by e array of LIOR uncertainty estimates specific to the
% coordinates, parameter measurements, and parameter uncertainties
% provided.
%
% MinUncertaintyEquation:
% A n by 1 array of the index of the equation that returns the smallest
% uncertainty for each estimate location.
%
% *************************************************************************
% Missing data: should be indicated with a NaN. A NaN coordinate will
% yield NaN estimates for all equations at that coordinate. A NaN
% parameter value will yield NaN estimates for all equations that require
% that parameter.
%
% *************************************************************************
% Please send questions or related requests to brendan.carter@gmail.com.
% *************************************************************************
% Determining how chatty you want the function to be.
a=strcmpi(varargin,'VerboseTF');
if any(a)
VerboseTF=varargin{1,logical([0 a(1:end-1)])};
else
VerboseTF=true;
end
% Uncomment following line beginning with "VerboseTF" and save the function
% if you want less command line spam and you don't want to have to keep
% telling the code to be quiet.
% VerboseTF=false;
% *************************************************************************
% Parsing inputs, setting defaults, and sanity checking inputs.
%
% Starting timer
tic
% Verifying required inputs are provided
if nargin<3; error('LIOR called with too few input arguments.'); end
% Checking whether specific equations are specified.
a=strcmpi(varargin,'Equations');
if any(a)
Equations=varargin{1,logical([0 a(1:end-1)])};
SpecifiedEqn=true;
else
Equations=1:16;
SpecifiedEqn=false;
end
% Making [] argument for Equations equivalent to no argument.
if isempty(Equations);Equations=1:16; end
% Making 0 argument for Equations equivalent to no argument.
if Equations==0; Equations=1:16; end
% Checking whether the minimum uncertainty estimate is desired vs. outputs
% from all used equations.
a=strcmpi(varargin,'MinUncEstTF');
if any(a)
MinUncEst=varargin{1,logical([0 a(1:end-1)])};
else
MinUncEst=true;
if VerboseTF==true && SpecifiedEqn==false; disp('By Default, all equations with enough input variables will be used, but only the estimate with the lowest uncertainty will be returned. If outputs from multiple equations are desired, specify the desired equations with the Equations input and include the input argument pair setting MinUncEstTF to false.');end;
end
% Checking for PerKgSwTF input and setting default if not given
a=strcmpi(varargin,'PerKgSwTF');
if any(a)
PerKgSwTF=varargin{1,logical([0 a(1:end-1)])};
else
PerKgSwTF=true;
end
% Checking for MeasUncerts input and setting default if not given
a=strcmpi(varargin,'MeasUncerts');
if any(a)
MeasUncerts=varargin{1,logical([0 a(1:end-1)])};
else
MeasUncerts=[]; % This will be overriden immediately below
end
if nargin<5 || max(size(MeasUncerts))==0;
% Setting multiplicative uncertainties for P, N, O2, and Si.
DefaultUncertainties=diag([1 1 0.02 0.01 0.02 0.01 1]);
MeasUncerts=Measurements*DefaultUncertainties(MeasIDVec,MeasIDVec);
% Then setting additive uncertainties for T, S, and theta.
MeasUncerts(:,ismember(MeasIDVec,[1 2 7]))=0.003;
end
% Sanity checking the MeasUncerts argument. This also deals with the
% possibility that the user has provided a single set of uncertainties for
% all estimates.
if not(max(size(MeasUncerts)==size(Measurements))) && not(min(size(MeasUncerts)==size(Measurements))) && not(max(size(MeasUncerts))==0);
error('MeasUncerts must be undefined, a vector with the same number of elements as Measurements has columns, [] (for default values), or an array of identical dimension to Measurements.')
elseif not(min(size(MeasUncerts)==size(Measurements))) && not(max(size(MeasUncerts))==0);
if ~(size(MeasUncerts,2)==size(Measurements,2));
error('There are different numbers of columns of input uncertainties and input measurements.')
end
MeasUncerts=ones(size(Measurements(:,1)))*MeasUncerts;
end
% Making sure all provided predictors are identified.
if ~(size(MeasIDVec,2)==size(Measurements,2));
error('The MeasIDVec input does not have the same number of columns as the Measurements input. This means it is unclear which measurement is in which column.');
end
% Making sure all provided predictors are identified.
if any(MeasIDVec==6);
error('LIOR has no measurement with a 6 ID flag (reserved for O2 in other LIRs).');
end
% Making sure you downloaded the needed file and put it somewhere it
% can be found
if exist('LIOR_files.mat','file')<2; error('LIOR could not find the LIOR_files file needed to run. This mandatory file should be distributed from the same website as LIOR. Contact the corresponding author if you cannot find it there. If you do have it then make sure it is on the MATLAB path or in the active directory.'); end
% LIOR requires non-NaN coordinates to provide an estimate. This step
% eliminates NaN coordinate combinations prior to estimation. NaN
% estimates will be returned for these coordinates.
NaNCoords=max(isnan(Coordinates),[],2);
[NumCoords,~]=size(Coordinates);
n=sum(~NaNCoords);
e=size(Equations,2);
% Checking for common missing data indicator flags and warning if any are
% found. Consider adding your common ones here
if max(ismember([-999 -9 -1*10^20],Measurements))==1
warning('LIOR: A common non-NaN missing data indicator (e.g. -999, -9, -1e20) was detected in the input measurements provided. Missing data should be replaced with NaNs. Otherwise, LIOR will interpret your inputs at face value and give terrible estimates.')
end
% Checking to make sure all required parameters are provided
VarVec=[1 1 1 1 1
1 1 1 0 1
1 1 0 1 1
1 1 0 0 1
1 1 1 1 0
1 1 1 0 0
1 1 0 1 0
1 1 0 0 0
1 0 1 1 1
1 0 1 0 1
1 0 0 1 1
1 0 0 0 1
1 0 1 1 0
1 0 1 0 0
1 0 0 1 0
1 0 0 0 0];
NeedVars=any(VarVec(Equations,:),1);
HaveVars=false(1,6); HaveVars(1,MeasIDVec)=1;
%Temperature can sub for potential temperature
if ismember(7,MeasIDVec)==1; HaveVars(1,2)=true; end
% Temperature or potential temperature is required if
% measurements are provided in molar units
if PerKgSwTF==false; NeedVars(1,2)=1; end
% Eliminating equations. Earlier we used all equations if "Equations" was
% unspecified. Here we limit this input to the equations for which all
% needed inputs are supplied.
if MinUncEst==true && SpecifiedEqn==false;
Equations(:,any(VarVec(:,~HaveVars(1,1:5)),2))=[];
NeedVars=any(VarVec(Equations,:),1);
e=size(Equations,2);
end
% Making sure all needed variables are present
if ~all(HaveVars(1,NeedVars))
error('LIOR error: Check "Equations" and "MeasIDVec" inputs. One or more regression equations selected require one or more input parameters that are either not provided or not labeled correctly.')
end
% Putting measurement inputs in expected order
M=NaN(n,6); % Measurements
U=NaN(n,6); % Uncertainties
M(:,MeasIDVec)=Measurements(~NaNCoords,:);
C=Coordinates(~NaNCoords,:);
U(:,MeasIDVec)=MeasUncerts(~NaNCoords,:);
% Checking to see whether potential temperature is defined and needed.
% Defining it and subbing in temp uncert for theta uncert if necessary.
if ismember(2,MeasIDVec)==0 && NeedVars(1,2);
M(:,2)=sw_ptmp(M(:,1),M(:,7),sw_pres(C(:,3),C(:,2)),0);
U(:,2)=U(:,7);
end
% Converting units to molality if they are provided as molarity.
if PerKgSwTF==false;
densities=sw_dens(M(:,1),sw_temp(M(:,1),M(:,2), ...
sw_pres(C(:,3),C(:,2)),0),sw_pres(C(:,3),C(:,2)))/1000;
M(:,3)=M(:,3)./densities;
M(:,4)=M(:,4)./densities;
M(:,5)=M(:,5)./densities;
end
% *************************************************************************
% Beginning treatment of inputs and calculations
L=load('LIOR_files.mat');
% Adjusting negative longitudes.
C(C(:,1)>360,1)=rem(C(C(:,1)>360,1),360);
C(C(:,1)<0,1)=rem(C(C(:,1)<0,1,1),360)+360;
% We don't want constants to be interpolated across the Panama Canal (for
% instance), so data is carved into two segments... the Atlantic/Arctic
% (AA) and everything else. These interpolations are then done separately.
DepthToDegreeConversion=25; % See 2016 LIAR paper for explanation.
C(:,3)=C(:,3)/DepthToDegreeConversion;
L.Coords(:,3)=L.Coords(:,3)/DepthToDegreeConversion;
AAIndsM=or(inpolygon(C(:,1),C(:,2),L.LNAPoly(:,1),L.LNAPoly(:,2)), ...
or(inpolygon(C(:,1),C(:,2),L.LSAPoly(:,1),L.LSAPoly(:,2)), ...
or(inpolygon(C(:,1),C(:,2),L.LNAPolyExtra(:,1),L.LNAPolyExtra(:,2)), ...
or(inpolygon(C(:,1),C(:,2),L.LSAPolyExtra(:,1),L.LSAPolyExtra(:,2)), ...
inpolygon(C(:,1),C(:,2),L.LNOPoly(:,1),L.LNOPoly(:,2))))));
% Here we predefine a dummy interpolant. We fill it with the actual values
% later, but this is the slow step so we don't want it being done in a
% loop. The appended "Dummy" values are designed to return the
% interpolated value at the nearest location within the domain when the
% desired estimate is outside of the domain. Note that this is distinct
% from the scatteredInterpolant 'nearest' extrapolation option, which
% simply selectes the nearest neighbor of the input data points, and
% therefore results in discontinuities. Dummy points accomplish this by
% interpolating between the values within the dataset and the mean value
% within the dataset 'placed' nearly infinitely far away. Think of the 8
% dummy coordinates as the vertices of a very large cube around the data of
% interest.
Dummy=ones(8,1)*nanmean(nanmean(L.Cs,1),3);
Dummyset=horzcat([10^10; 10^10;10^10; 10^10;-10^10; -10^10;-10^10; ...
-10^10],[10^10;-10^10;10^10;-10^10;10^10;-10^10;10^10;-10^10], ...
[10^10;10^10;-10^10;-10^10;10^10;10^10;-10^10;-10^10]);
Dummy=horzcat(Dummyset,Dummy);
% Disambiguation:
% L.Cs (loaded from file)... pre-determined regression constants
% L.Coords (loaded from file)... coordinates of L.Cs
% C... viable subset of user-provided coordinates for LIOR estimates
% LCs... locally-interpolated L.Cs
InterpolantAA=scatteredInterpolant(vertcat(L.Coords(L.AAIndsCs,1), ...
Dummyset(:,1)),vertcat(L.Coords(L.AAIndsCs,2),Dummyset(:,2)), ...
vertcat(L.Coords(L.AAIndsCs,3),Dummyset(:,3)), ...
(1:size(vertcat(L.Coords(L.AAIndsCs,3),Dummyset(:,3))))');
InterpolantElse=scatteredInterpolant(vertcat(L.Coords(~L.AAIndsCs,1), ...
Dummyset(:,1)),vertcat(L.Coords(~L.AAIndsCs,2),Dummyset(:,2)), ...
vertcat(L.Coords(~L.AAIndsCs,3),Dummyset(:,3)), ...
(1:size(vertcat(L.Coords(~L.AAIndsCs,3),Dummyset(:,3))))');
%Preallocating for speed
if MinUncEst==true && SpecifiedEqn==false;
OxygenEstimates=NaN(NumCoords,1);
UncertaintyEstimates=NaN(NumCoords,1);
else
OxygenEstimates=NaN(NumCoords,e);
UncertaintyEstimates=NaN(NumCoords,e);
end
OxygenEst=NaN(n,e);
UncertEst=NaN(n,e);
% Disambiguation:
% Eq... a counter for which equation LIOR is on
% e... the number of equations that will be used
% Equation... the specific equation number currently being used
% Equations... the user provided list of equations that will be used.
for Eq=1:e;
Equation=Equations(1,Eq);
clear LCs
% selecting variables for this regression
if Equation==1
VarList='S Theta N P Si';
elseif Equation==2
VarList='S Theta N Si';
elseif Equation==3
VarList='S Theta P Si';
elseif Equation==4
VarList='S Theta Si';
elseif Equation==5
VarList='S Theta N P';
elseif Equation==6
VarList='S Theta N';
elseif Equation==7
VarList='S Theta P';
elseif Equation==8
VarList='S Theta';
elseif Equation==9
VarList='S N P Si';
elseif Equation==10
VarList='S N Si';
elseif Equation==11
VarList='S P Si';
elseif Equation==12
VarList='S Si';
elseif Equation==13
VarList='S N P';
elseif Equation==14
VarList='S N';
elseif Equation==15
VarList='S P';
elseif Equation==16
VarList='S';
end
UseVars=[1:5].*VarVec(Equation,:);
UseVars(UseVars==0)=[];
% Updating users if VerboseTF is set to true.
if VerboseTF==true;
disp(horzcat(num2str(floor(toc)),' seconds elapsed... now using eqn. ', ...
num2str(Equation),': ',VarList,'.'))
end
% Determining dimensions
SizUseVars=size(UseVars);
LCs=NaN(n,SizUseVars(1,2));
NumVars=1+SizUseVars(1,2);
% +1 above and below is for constant term, different in LIPHR
VarNumVec=horzcat(1,UseVars+1);
% Filling the interpolants with the predetermined regression constant
% information and then using them to interpret 'local constants' (LCs)
% at the coordinates of interest
for Var=1:NumVars;
InterpolantAA.Values=vertcat(L.Cs(L.AAIndsCs,VarNumVec(1,Var), ...
Equation),Dummy(:,VarNumVec(Var)+3));
InterpolantElse.Values=vertcat(L.Cs(~L.AAIndsCs,VarNumVec(1,Var), ...
Equation),Dummy(:,VarNumVec(Var)+3));
LCs(AAIndsM,Var)=InterpolantAA(C(AAIndsM,1),C(AAIndsM,2), ...
C(AAIndsM,3));
LCs(~AAIndsM,Var)=InterpolantElse(C(~AAIndsM,1), ...
C(~AAIndsM,2),C(~AAIndsM,3));
end
% Estimating methodological error from depth
EMLR=interp1(L.EMLRrec(:,1),L.EMLRrec(:,1+Equation),C(:,3));
% Estimating oxygen and oxygen estimate uncertainty from LCs.
OxygenEst(:,Eq)=real(sum(LCs.*horzcat(ones(n,1),M(:,UseVars)),2));
% Estimating uncertainty
UncertEst(:,Eq)=real((sum((LCs.*horzcat(zeros(n,1), ...
U(:,UseVars))).^2,2)+0.3^2+EMLR.^2).^(1/2));
end
% Setting negative LIOR values to 0
OxygenEst(OxygenEst<0)=0;
% Determines which regression has the lowest uncertainty estimate. If
% no regression was specified for outputs, this lowest-uncertainty
% oxygen and uncertainty estimate is returned.
[Temp,MinUncertaintyEquation]=min(UncertEst,[],2);
if MinUncEst==true && SpecifiedEqn==false;
if VerboseTF==true;
disp('Picking the outputs with the lowest uncertainties.')
end
OxygenEst=OxygenEst(sub2ind(size(OxygenEst),[1:1:size(MinUncertaintyEquation,1)]',MinUncertaintyEquation));
UncertEst=UncertEst(sub2ind(size(UncertEst),[1:1:size(MinUncertaintyEquation,1)]',MinUncertaintyEquation));
end
% Filling the outputs with the estimates at viable locations.
OxygenEstimates(~NaNCoords,:)=OxygenEst;
UncertaintyEstimates(~NaNCoords,:)=UncertEst;
if VerboseTF==true;
disp(horzcat('LIOR finished after ',num2str(round(toc,0)),' seconds.'));
end
end