Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

echometrics improvements #12

Open
6 tasks done
jr3cermak opened this issue Mar 15, 2022 · 25 comments
Open
6 tasks done

echometrics improvements #12

jr3cermak opened this issue Mar 15, 2022 · 25 comments

Comments

@jr3cermak
Copy link

jr3cermak commented Mar 15, 2022

Continued from PR #10

  • Use pytest test_slocum.py to exercise echometrics/pseudogram code to produce desired netCDF results
  • Always assign echometrics variables with extras dimension even if pseudogram is missing
  • Wire acoustic sensor configuration through deployment.json using extra_kwargs
  • apply extra_kwargs to ascii to nc conversion
  • tests/test_slocum.py::TestEcoMetricsThree::test_pseudogram produces three netCDF files that need to be consistent
  • Manual running of ascii/netCDF produces one file; pytest produces three files [differences in json config files]

Deferred:

  • apply extra_kwargs to dbd to ascii conversion (no current hooks to grab extra_kwargs from deployment.json)
@jr3cermak
Copy link
Author

That was one thing I hadn't quite figured out was how to run things within the test harness. I am familiar with pytest. To produce those results, I was manually running gutils_binary_to_ascii_watch and gutils_ascii_to_netcdf_watch.

From email:

I will make a change that allows "extra_kwargs" to be specified in the deployment.json file (top level key) and then passed into each Reader's (i.e. SlocumReader) extras(data, **kwargs) method.

This will be in preparation for moving the processing code from the merge (using *.*bd files) to the analysis/processing (using ascii/pandas). Doing that will be much easier when we move to using dbdreader, which is a great suggestion! I didn't know it existed and it will make it much easier to work with the *.*bd files.

The dbdreader has its own quirks.

@kwilcox
Copy link
Member

kwilcox commented Mar 15, 2022

You can run the existing EcoMetrics tests with pytest -k TestEcoMetrics. The tests do remove any of the produced files at the end of running. I often will comment out the tearDown method of a test while I am writing the assertions so I can inspect the produced netCDF files: https://github.com/SECOORA/GUTILS/blob/master/gutils/tests/test_slocum.py#L295-L297

I pushed a branch pseudograms-remix branch that has your initial work from #10. You can PR against that!

@jr3cermak
Copy link
Author

It seems to be working as is. With tearDown() disabled...

$ pytest -k TestEcoMetricsThree
============================================================================= test session starts ==============================================================================
platform linux -- Python 3.6.15, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/cermak/miniconda3/envs/glider/bin/python
cachedir: .pytest_cache
rootdir: /home/cermak/src/GUTILS, configfile: setup.cfg
plugins: anyio-2.2.0
collected 34 items / 33 deselected / 1 selected                                                                                                                                

tests/test_slocum.py::TestEcoMetricsThree::test_pseudogram 2022-03-15 21:14:42,212 - gutils.slocum - INFO - Converted unit_507-2022-042-1-2.sbd,unit_507-2022-042-1-2.tbd to unit_507_2022_042_1_2_sbd.dat
2022-03-15 21:14:42,348 - gutils.filters - INFO - ('Filtered 2/5 profiles from unit_507_2022_042_1_2_sbd.dat', 'Depth (1m): 1', 'Points (5): 1', 'Time (5s): 0', 'Distance (1m): 0')
PASSED

======================================================================= 1 passed, 33 deselected in 5.77s =======================================================================

The test produces three netCDF files. The last one has the desired information. The first two will need empty variables.

~/src/GUTILS/gutils/tests/resources/slocum/ecometrics3/rt$ ls -l netcdf/
total 652
-rw-rw-r-- 1 cermak staff 198712 Mar 15 21:14 ecometrics_1644647093_20220212T062453Z_rt.nc
-rw-rw-r-- 1 cermak staff 209647 Mar 15 21:14 ecometrics_1644647313_20220212T062833Z_rt.nc
-rw-rw-r-- 1 cermak staff 253545 Mar 15 21:14 ecometrics_1644648114_20220212T064154Z_rt.nc
netcdf ecometrics_1644648114_20220212T064154Z_rt {
dimensions:
        time = 20 ;
        extras = 2079 ;
variables:
        string trajectory ;
                trajectory:cf_role = "trajectory_id" ;
                trajectory:long_name = "Trajectory/Deployment Name" ;
                trajectory:comment = "A trajectory is a single deployment of a glider and may span multiple data files." ;
                trajectory:ioos_category = "Identifier" ;
....
        double pseudogram_time(extras) ;
                pseudogram_time:_FillValue = -9999.9 ;
                pseudogram_time:units = "seconds since 1990-01-01 00:00:00Z" ;
                pseudogram_time:calendar = "standard" ;
                pseudogram_time:long_name = "Pseudogram Time" ;
                pseudogram_time:ioos_category = "Other" ;
                pseudogram_time:standard_name = "pseudogram_time" ;
                pseudogram_time:platform = "platform" ;
                pseudogram_time:observation_type = "measured" ;
        double pseudogram_depth(extras) ;
                pseudogram_depth:_FillValue = -9999.9 ;
                pseudogram_depth:units = "m" ;
                pseudogram_depth:long_name = "Pseudogram Depth" ;
                pseudogram_depth:valid_min = 0. ;
                pseudogram_depth:valid_max = 2000. ;
                pseudogram_depth:ioos_category = "Other" ;
                pseudogram_depth:standard_name = "pseudogram_depth" ;
                pseudogram_depth:platform = "platform" ;
                pseudogram_depth:observation_type = "measured" ;
....
sci_echodroid_aggindex = _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, 0.0382824018597603, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
    _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, 
....

Continuing with other tasks... adding information seems straightforward. Do non-standard attributes cause problems? Fiddling with deployment.json and instrument.json a bit:

        double pseudogram_sv(extras) ;
                pseudogram_sv:_FillValue = -9999.9 ;
                pseudogram_sv:units = "db" ;
                pseudogram_sv:long_name = "Pseudogram SV" ;
                pseudogram_sv:colorBarMinimum = -200. ;
                pseudogram_sv:colorBarMaximum = 200. ;
                pseudogram_sv:ioos_category = "Other" ;
                pseudogram_sv:standard_name = "pseudogram_sv" ;
                pseudogram_sv:platform = "platform" ;
                pseudogram_sv:observation_type = "measured" ;
                pseudogram_sv:coordinates = "pseudogram_time pseudogram_depth" ;
                pseudogram_sv:echosounderRangeBins = 20LL ;
                pseudogram_sv:echosounderRange = 60. ;
                pseudogram_sv:echosounderRangeUnits = "meters" ;
                pseudogram_sv:echosounderDirection = "up" ;

The acoustics has two components with separate serial numbers.
Added acoustics instrument as:

        int instrument_acoustics ;
                instrument_acoustics:_FillValue = 0 ;
                instrument_acoustics:serial_number = "269615" ;
                instrument_acoustics:make_model = "Simrad WBT Mini" ;
                instrument_acoustics:serial_number_2 = "167" ;
                instrument_acoustics:make_model_2 = "ES200-CDK-split" ;
                instrument_acoustics:comment = "Slocum Glider UAF G507" ;
                instrument_acoustics:long_name = "Kongsberg Simrad WBT Mini" ;
                instrument_acoustics:mode_operation = "EK80" ;
                instrument_acoustics:calibration_date = "" ;
                instrument_acoustics:factory_calibrated = "" ;
                instrument_acoustics:calibration_report = "" ;
                instrument_acoustics:platform = "platform" ;
                instrument_acoustics:type = "instrument" ;

If this is ok, I can look at removing the hard coded options.

@jr3cermak
Copy link
Author

Moved config options to the instrument since it impacts all the eco* variables.

        int instrument_acoustics ;
                instrument_acoustics:_FillValue = 0 ;
                instrument_acoustics:serial_number = "269615" ;
                instrument_acoustics:make_model = "Simrad WBT Mini" ;
                instrument_acoustics:serial_number_2 = "167" ;
                instrument_acoustics:make_model_2 = "ES200-CDK-split" ;
                instrument_acoustics:comment = "Slocum Glider UAF G507" ;
                instrument_acoustics:long_name = "Kongsberg Simrad WBT Mini" ;
                instrument_acoustics:mode_operation = "EK80" ;
                instrument_acoustics:echosounderRangeBins = 20LL ;
                instrument_acoustics:echosounderRange = 60. ;
                instrument_acoustics:echosounderRangeUnits = "meters" ;
                instrument_acoustics:echosounderDirection = "up" ;
                instrument_acoustics:calibration_date = "" ;
                instrument_acoustics:factory_calibrated = "" ;
                instrument_acoustics:calibration_report = "" ;
                instrument_acoustics:platform = "platform" ;
                instrument_acoustics:type = "instrument" ;

@jr3cermak
Copy link
Author

Think about grouping these so other features can be added later and not get mixed up with other provided keywords.

replace:

    "extra_kwargs": {
        "enable_pseudograms": true,
        "echosounderRange": 60.0,
        "echosounderRangeBins": 20,
        "echosounderDirection": "up",
        "echosounderRangeUnits": "meters"
    },

with?

    "extra_kwargs": {
        "pseudograms": {
               "enable": true,
               "echosounderRange": 60.0,
               "echosounderRangeBins": 20,
               "echosounderDirection": "up",
               "echosounderRangeUnits": "meters"
        }
    },

@kwilcox
Copy link
Member

kwilcox commented Mar 17, 2022

Grouping the kwargs is a great idea... extras can be used to do anything and isn't restricted to pseudogram things.

@jr3cermak
Copy link
Author

jr3cermak commented Apr 8, 2022

Current tasks:

  • Try always adding extras dimension for deployments that need it; there is a current edge condition if no extras variables are present, the extras dimension is not added (Ben, 3/31/2022); do not use extras dimension; save as separate profile with time (PR Move the "extras" data to be in their own profile netCDF file #18)
  • generate psudogram plots from the GUTILS side of the house
  • add controls to kwargs to allow filtering of known bad data for deployment (IE: for this deployment, the 20th bin is producing bad data) that was just a bug; fixed

Interim testing:

  • base ERDDAP container running via "axiom/docker-erddap"; grab setup.xml and datasets.xml
  • configure local "content" or "bigParentDirectory" outside of container
  • Add testing dataset "extras_snippet.xml" to dataset.xml for testing; even make it the only available dataset
  • Produce nc files from glider dbd files and test ERDDAP

@kwilcox
Copy link
Member

kwilcox commented Apr 11, 2022

@jr3cermak I played around with hosting the datasets as-in (with the extras dimension) and it won't currently work with the DAC's setup since they are on an old version of ERDDAP. Even if they did upgrade their ERDDAP version it still doesn't work wonderfully. Requesting a subset of data where variables are dimensioned by both time and extras fails to return data. I'm sure this is something Bob Simons could advise on, but for now, we have 2 options:

  1. Remove the extras dimension and put the pseudogram data directly into the time dimension. We did this at one point, but I likely suggested splitting it out. IMO the extras dimension is much more correct.
  2. Bypass the DAC and get the pseudogram data into the AOOS data system another way. The profile netCDF files will not include the pseudogram data and it will only be available through the AOOS data portal. It won't be archived with the glider data through NCEI.

@jr3cermak
Copy link
Author

I also experimented with storing the pseudogram with the time dimension. Since the pseudogram time coordinates are different from the CTD profile, the resultant netCDF files became very large. So, I would say writing the pseudogram data out to a separate file sounds like the best option at the moment.

@kwilcox
Copy link
Member

kwilcox commented Apr 11, 2022

😒

Here is an ERDDAP Dataset that just serves the pseudogram data. I'm playing with some ideas to get this into AOOS, stay tuned.

<dataset type="EDDTableFromMultidimNcFiles" datasetID="unit_507_pseudogram" active="true">
        <!-- defaultDataQuery uses datasetID -->
        <!--
                    <defaultDataQuery>&amp;trajectory=extras_test-20220329T0000</defaultDataQuery>
                    <defaultGraphQuery>longitude,latitude,time&amp;.draw=markers&amp;.marker=2|5&.color=0xFFFFFF&.colorBar=|||||</defaultGraphQuery>
                    -->
        <reloadEveryNMinutes>1440</reloadEveryNMinutes>
        <updateEveryNMillis>-1</updateEveryNMillis>
        <!-- use datasetID as the directory name -->
        <fileDir>/datasets/gliders/ecodroid2</fileDir>
        <recursive>false</recursive>
        <fileNameRegex>.*\.nc</fileNameRegex>
        <metadataFrom>last</metadataFrom>
        <sortedColumnSourceName>pseudogram_time</sortedColumnSourceName>
        <sortFilesBySourceNames>trajectory pseudogram_time</sortFilesBySourceNames>
        <fileTableInMemory>false</fileTableInMemory>
        <accessibleViaFiles>true</accessibleViaFiles>
        <addAttributes>
            <att name="cdm_data_type">trajectoryProfile</att>
            <att name="featureType">trajectoryProfile</att>
            <!-- <att name="cdm_altitude_proxy">pseudogram_depth</att> -->
            <att name="cdm_trajectory_variables">trajectory,wmo_id</att>
            <att name="cdm_profile_variables">profile_id,profile_time,latitude,longitude</att>
            <att name="subsetVariables">trajectory,wmo_id,profile_id,profile_time,latitude,longitude</att>
            <att name="Conventions">Unidata Dataset Discovery v1.0, COARDS, CF-1.6</att>
            <att name="keywords">AUVS &gt; Autonomous Underwater Vehicles, Oceans &gt; Ocean Pressure &gt; Water Pressure, Oceans &gt; Ocean Temperature &gt; Water Temperature, Oceans &gt; Salinity/Density &gt; Conductivity, Oceans &gt; Salinity/Density &gt; Density, Oceans &gt; Salinity/Density &gt; Salinity, glider, In Situ Ocean-based platforms &gt; Seaglider, Spray, Slocum, trajectory, underwater glider, water, wmo</att>
            <att name="keywords_vocabulary">GCMD Science Keywords</att>
            <att name="Metadata_Conventions">Unidata Dataset Discovery v1.0, COARDS, CF-1.6</att>
            <att name="sourceUrl">(local files)</att>
            <att name="infoUrl">https://gliders.ioos.us/erddap/</att>
            <!-- title=datasetID -->
            <att name="title">unit_507-20220212T0000_pseudogram</att>
            <att name="ioos_dac_checksum">sdfsdf</att>
            <att name="ioos_dac_completed">False</att>
            <att name="gts_ingest">true</att>
        </addAttributes>

        <dataVariable>
            <sourceName>trajectory</sourceName>
            <destinationName>trajectory</destinationName>
            <dataType>String</dataType>
            <addAttributes>
                <att name="comment">A trajectory is one deployment of a glider.</att>
                <att name="ioos_category">Identifier</att>
                <att name="long_name">Trajectory Name</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>global:wmo_id</sourceName>
            <destinationName>wmo_id</destinationName>
            <dataType>String</dataType>
            <addAttributes>
                <att name="ioos_category">Identifier</att>
                <att name="long_name">WMO ID</att>
                <att name="missing_value" type="string">none specified</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>profile_id</sourceName>
            <destinationName>profile_id</destinationName>
            <dataType>int</dataType>
            <addAttributes>
                <att name="cf_role">profile_id</att>
                <att name="ioos_category">Identifier</att>
                <att name="long_name">Profile ID</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>profile_time</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Time</att>
                <att name="long_name">Profile Time</att>
                <att name="comment">Timestamp corresponding to the mid-point of the profile.</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>profile_lat</sourceName>
            <destinationName>latitude</destinationName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="colorBarMaximum" type="double">90.0</att>
                <att name="colorBarMinimum" type="double">-90.0</att>
                <att name="valid_max" type="double">90.0</att>
                <att name="valid_min" type="double">-90.0</att>
                <att name="ioos_category">Location</att>
                <att name="long_name">Profile Latitude</att>
                <att name="comment">Value is interpolated to provide an estimate of the latitude at the mid-point of the profile.</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>profile_lon</sourceName>
            <destinationName>longitude</destinationName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="colorBarMaximum" type="double">180.0</att>
                <att name="colorBarMinimum" type="double">-180.0</att>
                <att name="valid_max" type="double">180.0</att>
                <att name="valid_min" type="double">-180.0</att>
                <att name="ioos_category">Location</att>
                <att name="long_name">Profile Longitude</att>
                <att name="comment">Value is interpolated to provide an estimate of the longitude at the mid-point of the profile.</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>pseudogram_time</sourceName>
            <destinationName>time</destinationName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Time</att>
                <att name="long_name">Profile Time</att>
                <att name="comment">Timestamp corresponding to the mid-point of the profile.</att>
            </addAttributes>
        </dataVariable>


        <dataVariable>
            <sourceName>pseudogram_depth</sourceName>
            <destinationName>depth</destinationName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="colorBarMaximum" type="double">2000.0</att>
                <att name="colorBarMinimum" type="double">0.0</att>
                <att name="colorBarPalette">OceanDepth</att>
                <att name="ioos_category">Location</att>
                <att name="long_name">Depth</att>
            </addAttributes>
            </dataVariable>
        <dataVariable>
            <sourceName>pseudogram_sv</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>

        <dataVariable>
            <sourceName>sci_echodroid_aggindex</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_ctrmass</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_eqarea</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_inertia</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_propocc</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_sa</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
        <dataVariable>
            <sourceName>sci_echodroid_sv</sourceName>
            <dataType>double</dataType>
            <addAttributes>
                <att name="ioos_category">Other</att>
            </addAttributes>
        </dataVariable>
    </dataset>

@jr3cermak
Copy link
Author

I am just rounding the corner where I can almost get the latest glider deployment loaded under ERDDAP. I can see it is complaining about something. This is the combined case. It does not seem happy at all with the extras dimension.

*** constructing EDDTableFromFiles unit_507_combined
dir/file table doesn't exist: /erddapData/dataset/ed/unit_507_combined/dirTable.nc
dir/file table doesn't exist: /erddapData/dataset/ed/unit_507_combined/fileTable.nc
creating new dirTable and fileTable (dirTable=null?true fileTable=null?true badFileMap=null?false)
doQuickRestart=false
574 files found in /data/combined/
regex=.*\.nc recursive=false pathRegex=.* time=22ms
old nBadFiles size=0
old fileTable size=0   nFilesMissing=0
Didn't get expected attributes because there were no previously valid files,
  or none of the previously valid files were unchanged!
EDDTableFromFiles file #0=/data/combined/G507_1644626730_20220212T004530Z_rt.nc
0 insert in fileList
0 bad file: removing fileTable row for /data/combined/G507_1644626730_20220212T004530Z_rt.nc
java.lang.RuntimeException: 
ERROR in Test.ensureEqual(Strings) line #1, col #1 'e[end]'!='t[end]':
ERROR in Table.readNDNc /data/combined/G507_1644626730_20220212T004530Z_rt.nc:
Unexpected axis#0 for variable=pseudogram_depth
Specifically, at line #1, col #1:
s1: extras[end]
s2: time[end]
    ^

 at com.cohort.util.Test.error(Test.java:43)
 at com.cohort.util.Test.ensureEqual(Test.java:340)
 at gov.noaa.pfel.coastwatch.pointdata.Table.readNDNc(Table.java:7021)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromNcFiles.lowGetSourceDataFromFile(EDDTableFromNcFiles.java:211)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.getSourceDataFromFile(EDDTableFromFiles.java:3270)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.<init>(EDDTableFromFiles.java:1543)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromNcFiles.<init>(EDDTableFromNcFiles.java:130)
 at gov.noaa.pfel.erddap.dataset.EDDTableFromFiles.fromXml(EDDTableFromFiles.java:503)
 at gov.noaa.pfel.erddap.dataset.EDD.fromXml(EDD.java:457)
 at gov.noaa.pfel.erddap.LoadDatasets.run(LoadDatasets.java:359)
netcdf G507_1644626730_20220212T004530Z_rt {
dimensions:
        time = 78 ;
        extras = 651 ;
...
        double pseudogram_time(extras) ;
                pseudogram_time:_FillValue = -9999.9 ;
                pseudogram_time:units = "seconds since 1990-01-01 00:00:00Z" ;
                pseudogram_time:calendar = "standard" ;
                pseudogram_time:long_name = "Pseudogram Time" ;
                pseudogram_time:ioos_category = "Other" ;
                pseudogram_time:standard_name = "pseudogram_time" ;
                pseudogram_time:platform = "platform" ;
                pseudogram_time:observation_type = "measured" ;

That test looks suspicious... e[end]'!='t[end]. It almost looks like it wants the extra dimension to also start and end with the same timestamp?

Onto the separated case...

@jr3cermak
Copy link
Author

Resync branch after PR #18 and carry on.

@jr3cermak
Copy link
Author

Resync with master to take a look at the new pathway.

@jr3cermak
Copy link
Author

Running the latest deployment through the current code shows a single netcdf file now. Are the profiles combined?

This is quite different than what was shown in an earlier email with the tabledap link: https://gliders.ioos.us/erddap/tabledap/extras_test-20220329T0000.htmlTable?trajectory%2Cwmo_id%2Cprofile_id%2Ctime%2Clatitude%2Clongitude%2Cdepth%2Cpseudogram_depth%2Cpseudogram_sv%2Cpseudogram_time%2Csci_echodroid_aggindex%2Csci_echodroid_ctrmass%2Csci_echodroid_eqarea%2Csci_echodroid_inertia%2Csci_echodroid_propocc%2Csci_echodroid_sa%2Csci_echodroid_sv&time%3E=2021-12-02T00%3A00%3A00Z&time%3C=2021-12-09T17%3A33%3A35Z that references: https://gliders.ioos.us/erddap/files/extras_test-20220329T0000/

On the DAC for unit_507, there are two separate sets of files *_rt.nc and the _extra_rt.nc: https://gliders.ioos.us/erddap/files/unit_507-20220212T0000/

It looks like the pseudogram is folded back into the profiles as a single file now.

@jr3cermak
Copy link
Author

jr3cermak commented Jul 13, 2022

The latest master of GUTILS is great for backend storage of echodroid/pseudogram data.

Tossing the _extra_rt.nc behind an aggregated netCDF dataset in thredds allows for full deployment plotting.

  <dataset name="Glider extras" ID="Gretel-NC-extra" urlPath="GretelExtra.nc">
    <serviceName>all</serviceName>
    <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
      <aggregation dimName="time" type="joinExisting">
        <scan location="/home/cermak/glider/ecometrics6/rt/netcdf/" suffix="_rt_extra.nc" subdirs="false"/>
      </aggregation>
    </netcdf>
  </dataset>

Python code to pull from the aggregation just for reference.

#$ cat plotGretelDepl2.py 
import io, os, sys, struct, datetime
import subprocess
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.figure import Figure
from matplotlib.colors import LinearSegmentedColormap, Colormap
import matplotlib.dates as dates
import matplotlib.ticker as mticker
from matplotlib.patches import Rectangle
import json
import xarray as xr
#get_ipython().run_line_magic('matplotlib', 'inline')

def newFigure(figsize = (10,8), dpi = 100):

    fig = Figure(figsize=figsize, dpi=dpi)

    return fig

# Fetch Sv data
def fetchSv(start_time, end_time, ds):
    # Copy data into a numpy array and resort Sv(dB) values for plotting
    # Convert TS to string
    # datetime.datetime.strftime(datetime.datetime.utcfromtimestamp(dt), "%Y-%m-%d %H:%M:%S.%f")
    # Convert string to TS
    # datetime.datetime.strptime(dtSTR, "%Y-%m-%d %H:%M:%S.%f").timestamp()
    time_dim = 'time'
    sv_ts    = np.unique(ds[time_dim])

    startDTTM = start_time
    #startVal = datetime.datetime.strptime(startDTTM, "%Y-%m-%d %H:%M:%S.%f").timestamp()
    startVal = np.datetime64(datetime.datetime.strptime(startDTTM, "%Y-%m-%d %H:%M:%S.%f"))
    endDTTM = end_time
    #endVal = datetime.datetime.strptime(endDTTM, "%Y-%m-%d %H:%M:%S.%f").timestamp()
    endVal = np.datetime64(datetime.datetime.strptime(endDTTM, "%Y-%m-%d %H:%M:%S.%f"))

    # This obtains time indicies for the unique time values
    a = np.abs(sv_ts-startVal).argmin()
    b = np.abs(sv_ts-endVal).argmin()
    #print(a,b)

    #print(time_array.shape)
    #print(list(sv.variables))
    # https://xarray.pydata.org/en/v0.11.0/time-series.html
    sv_data  = ds['pseudogram_sv'].sel(time=slice(pd.Timestamp(sv_ts[a]),pd.Timestamp(sv_ts[b])))
    sv_time  = [pd.Timestamp(t.values).timestamp() for t in ds[time_dim].sel(time=slice(pd.Timestamp(sv_ts[a]),pd.Timestamp(sv_ts[b])))]
    sv_depth = ds['depth'].sel(time=slice(pd.Timestamp(sv_ts[a]),pd.Timestamp(sv_ts[b])))

    return (sv_time, sv_depth, sv_data)


# Make plots from intermediate deployment data
def makePlot(sv_time, sv_depth, sv_data):
    # Set the default SIMRAD EK500 color table plus grey for NoData.
    simrad_color_table = [(1, 1, 1),
        (0.6235, 0.6235, 0.6235),
        (0.3725, 0.3725, 0.3725),
        (0, 0, 1),
        (0, 0, 0.5),
        (0, 0.7490, 0),
        (0, 0.5, 0),
        (1, 1, 0),
        (1, 0.5, 0),
        (1, 0, 0.7490),
        (1, 0, 0),
        (0.6509, 0.3255, 0.2353),
        (0.4705, 0.2353, 0.1568)]
    simrad_cmap = (LinearSegmentedColormap.from_list
        ('Simrad', simrad_color_table))
    simrad_cmap.set_bad(color='lightgrey')

    # Convert sv_time to something useful
    svData   = np.column_stack((sv_time, sv_depth, sv_data))

    # Filter out the noisy -5.0 and -10.0 data
    svData = np.where(svData == -5.0, -60.0, svData)
    svData = np.where(svData == -15.0, -60.0, svData)

    # Sort Sv(dB) from lowest to highest so higher values are plotted last
    svData = svData[np.argsort(svData[:,2])]

    # Plot simply x, y, z data (time, depth, dB)
    #fig, ax = plt.subplots(figsize=(10,8))
    fig = newFigure()
    ax = fig.subplots()

    #ax.xaxis.set_minor_locator(dates.MinuteLocator(interval=10))   # every 10 minutes
    #ax.xaxis.set_minor_locator(dates.HourLocator(interval=3))   # every 3 hours
    #ax.xaxis.set_minor_formatter(dates.DateFormatter('%H'))  # hours
    #ax.xaxis.set_minor_formatter(dates.DateFormatter('%H:%M'))  # hours and minutes
    ax.xaxis.set_major_locator(dates.DayLocator(interval=2))    # every day
    #ax.xaxis.set_major_formatter(dates.DateFormatter('\n%m-%d-%Y'))
    ax.xaxis.set_major_formatter(dates.DateFormatter('%m/%d'))
    ax.tick_params(which='major', labelrotation=45)

    #ax.set_facecolor('lightgray')
    ax.set_facecolor('white')

    dateData = [datetime.datetime.fromtimestamp(ts) for ts in svData[:,0]]
    #im = plt.scatter(dateData, svData[:,1], c=svData[:,2], cmap=simrad_cmap, s=30.0)
    im = ax.scatter(dateData, svData[:,1], c=svData[:,2], cmap=simrad_cmap, s=30.0)

    #cbar = plt.colorbar(orientation='vertical', label='Sv (dB)', shrink=0.40)
    fig.colorbar(im, orientation='vertical', label='Sv (dB)', shrink=0.40)

    #plt.ylim(0, sv_depth.max())

    #plt.gca().invert_yaxis()

    #plt.ylabel('Depth (m)')
    #plt.xlabel('Date (UTC)')
    ax.set(ylim=[0, sv_depth.max()], xlabel='Date (UTC)', ylabel='Depth (m)')
    #plt.clim(0, -55)
    im.set_clim(0, -55)

    # Invert axis after limits are set
    im.axes.invert_yaxis()
    #plt.title("Acoustic Scattering Volume (dB) Pseudogram")
    ax.set_title("Acoustic Scattering Volume (dB) Pseudogram")

    return fig, ax

ds = xr.open_dataset('http://mom6node0:8080/thredds/dodsC/GretelExtra.nc')

# Find the timespan of the dataset
ts_min = ds['time'].min()
ts_max = ds['time'].max()

# use the entire deployment

start_dt_string = str(ts_min.dt.strftime("%Y-%m-%d %H:%M:%S.%f").values)
end_dt_string = str(ts_max.dt.strftime("%Y-%m-%d %H:%M:%S.%f").values)

(sv_time, sv_depth, sv_data) = fetchSv(start_dt_string, end_dt_string, ds)

if len(sv_data) > 100:
    (fig, ax) = makePlot(sv_time, sv_depth, sv_data)

    imageOut = "Sv_%s_all.png" % (str(ts_min.dt.strftime("%Y%m%d").values))
    fig.savefig(imageOut, bbox_inches='tight', dpi=100)

ds.close()

@kwilcox
Copy link
Member

kwilcox commented Jul 14, 2022

Nice, an added benefit I didn't even think about!

@jr3cermak
Copy link
Author

jr3cermak commented Mar 18, 2023

Cycling back around to provide an update to support future deployments. Will resync with master and move forward. Please let me know what things you need to support of echometrics, low resolution / echograms (formerly pseudograms). This update will provide:

  • netcdf output
  • csv output
  • pandas data frame
  • xarray data frame
  • direct images by plot type: binned, scatter or pcolormesh
  • the deployment json file needs to specify Sv thresholds for plotting and the default plotting type
  • updated documents on the data processing tools

Because of the two stage processing of GUTILS, to provide a data frame, the 2nd pass script would have to provide the DBD files and the cache file directory to decode and provide a direct data frame object. Otherwise, continue to use the 1st pass and produce the csv file and then read the csv file in the 2nd pass to recover the data frame (kinda of what happens now). The intermediate output file can be anything -- a pickled object with the data frame needed in the 2nd stage, etc.

We need to know what target(s) to hit for you so we can get them built into the CI testing. Once it all passes again, move ahead with other fun things. It looks like python 3.7 is EOL. Is there a particular version of python we should use? We are at the stage of reworking the tests and updating code. I am anticipating at least two to four weeks of additional effort on our side before a reasonable PR is ready. This could change based on the requirements/targets provided.

@jr3cermak
Copy link
Author

Main branch readme => python 3.9 :)

@jr3cermak
Copy link
Author

jr3cermak commented Mar 25, 2023

Unfortunately, our work has snowballed a bit. So, we will need to submit at least three PRs in total as of this writing. The first is ready to go when CI tests pass.

  • current PR Echometrics improvements early 2023 #22 : Improving echogram decoding and generation of single profile images for use; access to echogram data frame within the context of GUTILS now available as an example. Still not perfect -- but at least progress in the right direction after merging about four different sources of code over a period of two years. This PR improves support for the glider's "combo" mode.
  • now included in PR Echometrics improvements early 2023 #22: Support echograms of higher resolution: glider's "egram" mode to support the UAF glider and the USF glider.
  • also included in PR Echometrics improvements early 2023 #22: Provide code examples that create echogram timeseries/waterfall plots; aggregating profiles of arbitrary length of time (day, month, etc)

@jr3cermak
Copy link
Author

Latest checks have passed. I have refreshed documentation in the README.pdf and have it out on a website (that may be down at some point for an OS update).

https://nasfish.fish.washington.edu/echotools/docs/html/echotools/html/echotools/README.html

The important bit is walking from the produced netCDF files (*_extra.nc) to a time series plot of the echogram profiles given any time range. So, I think that is the target product that is desired on the data portal.

https://nasfish.fish.washington.edu/echotools/docs/html/echotools/html/echotools/README.html#product

That should give us the pivot point to start heading down the pyarrow rabbit hole.

@jr3cermak
Copy link
Author

Just a little more work on some additional "profile" products for echometrics. We stood up a prototype that will be used internally once implemented in some fashion on the data portal.

https://nasfish.fish.washington.edu/echotools/dppp/egramBrowser/portal.html

@kwilcox
Copy link
Member

kwilcox commented Apr 4, 2023

Ready for me to take a look?

@jr3cermak
Copy link
Author

There is at least one more pending update with additional "profile" products to be sent. I will post another note when things settled.

@jr3cermak
Copy link
Author

You can move ahead with the current code in the PR. This other new part needs some more R&D before it can be implemented. I originally thought it was going to be an easy drop in addition. That is not the case.

@jcermauwedu
Copy link

A proposal for additional CF standard names has been submitted to improve standards compliance for proposed acoustic datasets. For future use in deployment.json and other configuration files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants