Echometrics improvements early 2023 #22

jr3cermak · 2023-03-20T17:26:17Z

Initiate PR with additional updates forthcoming. This is primarily to check results of upstream CI tests before proceeding with additional work.

Rename pseudogram to echogram
Rename ecometrics to echometrics
Ensure all tests pass before proceeding with additional updates
convertDBD.sh
- Better match conditions for input files (rt[tbd] or delayed mode[EBD]) to decoder script
- Remove use of slocum dbd2asc decoder in favor of dbdreader (next)
- Move echogram files separately from dba_extension files since they normally do not exist
Update templates to track updated names

* Rename pseudogram to echogram * Rename ecometrics to echometrics * Ensure all tests pass before proceeding with additional updates * convertDBD.sh * Better match conditions for input files (rt[tbd] or delayed mode[EBD]) to decoder script * Remove use of slocum dbd2asc decoder in favor of dbdreader (next) * Move echogram files separately from dba_extension files since they normally do not exist * Update templates to track updated names

jr3cermak · 2023-03-20T17:46:24Z

CI updates in progress via #20.

Tests within docker are ok and matches independent testing.

==== 27 passed, 8 deselected, 1 xpassed, 3034 warnings in 267.18s (0:04:27) ====

kwilcox · 2023-03-21T12:38:57Z

@jr3cermak Can you rebase? CI changes are merged

kwilcox · 2023-03-21T15:29:17Z

Can we completely replace convertDBD.sh (and the slocum binaries) with functionality from dbdreader at this point?

jr3cermak · 2023-03-21T16:40:20Z

It would be great to skip the ascii conversion stage and promote important bits into the 2nd stage (netcdf) code. We would then have access to direct data frames instead having to read them from intermediate files. I am not sure dbdreader can completely replace all of what the slocum binaries can do. For straight ascii conversion, yes. For our data processing pipeline, we have moved completely away from the slocum binaries. The slocum binaries ascii conversion does not provide enough floating point precision to decode the embedded echograms which is why dbdreader is necessary. After these updates (other PRs will follow), we can take a deeper dive into the ascii and convertDBD.sh code and see what can be done.

kwilcox · 2023-03-21T18:55:15Z

How about an intermediary format other than the ascii that currently exists (like 1->N parquet files)? I would like to keep a table-like serialization for other types of processing, distribution, analysis, etc. The netCDF format was really meant as the format to submit to the IOOS Glider DAC and is always going to be lossy.

jr3cermak · 2023-03-21T20:28:24Z

Switching to parquet should be fine. Any shift will be require an initial lift to get started. I think I get it now. You are also not enthralled with netcdf as the unifying backend. I had hopes seeing activity at xpublish. So, the goal is to do away with convertDBD.sh and the ascii part but make the converter more useful to serve better purpose than just throwing the ascii part away. Right now the general process is DBD->convertDBD.sh/dbdreader->ascii->netcdf->ERDDAP->data portal. The goal is DBD->parquet/dbdreader->parquet storage->netcdf. The data portal would at some point begin to pull from parquet storage. Is there a xpublish type framework that would sit on top of parquet as xpublish is envisioned for xarray/zarr?

kwilcox · 2023-03-22T12:51:15Z

Sounds like we are on the same page!

*.*bd files 
    -> dbdreader
        -> parquet files 
            -> profile netCDF files (via pocean) for glider dac
            -> backend analysis/viz (static plots, etc.)
            -> xpublish
                -> frontend analysis/viz (dynamic plots, etc.)

xpublish can sit on top of parquet files the same way it can sit upon xarray datasets... with a little plugin magic. I have a proof of concept that uses duckdb on top of parquet file served through xpublish and it is pretty nice for a quick API, that is the direction I'll be heading.

jr3cermak · 2023-03-22T16:26:45Z

Splendid. I will let you know when this PR finished and ready to go. Then we can look into pyarrow a bit more.

Documentation * Update README.pdf for echotools General Notes * Rename driver script * Remove unneeded scripts * Most of the updates are within echotools processing * Rename as we go: pseudogram => echogram * Rename as we go: echosounder => echogram * Hopefully not breaking any APIs in the process * Set title to {segment} for echograms * Note deprecated arguments * Remove debug print statement * Allow user supplied arguments to override deployment.conf arguments * Print help menu if a data type is not provided * Convert limits to python lists if provided * Use segment for filenames if "default" is part of the filename * Default plot type is "binned" * Fix up "time" axis for echogram plots * Allow deployment.conf to update arguments not provided on the command line Specific * tests: TestEchoMetricsTwo: echometrics2; test_slocum.py * Fix date in deployment.json for echometrics2 * Replace with modern example * Includes test of teledyne module and creation of numpy, pandas and xarray data frames from echogram to provide an example of how it can start to be leveraged from within the GUTILS framework. * gutils: slocum/__init__.py * move setting of enable_image to a better spot * Update calls to convertDBD.sh to match API changes * Update calls to decodePseudogram.py => processEchograms.py * gutils: tests/resources/slocum/* * Eco => Echo * Update extra_kwargs to match API changes/updates * Rename as we go: pseudogram => echogram * Rename as we go: echosounder => echogram * Add more tests to Echodroid/Echometrics tests to attempt to capture backend failures * Remove some some unneeded files * echotools: teledyne.py * specify matplotlib backend to potentially make faster; defaults tend to trigger X11 layer * rename [pP]seudo => [eE]cho * Check arguments for None * default plot type => pcolormesh => binned * set plot facecolors to lightgrey * recalculate range bin size if defaults are changed * fix an inventory filtering bug * add a function to create groups of DBD files based on a raw DBD file listing * a deployment directory is not required if not writing netcdf files * a template file is required when writing netcdf files * add a function that overrides key arguments if a deployment file is read. Mainly applies to echograms. * allow plot types to be specified in a comma delimited list or with "all" * rearrange plotting code so all three types can be produced * ensure the color bar is consistent between the three plotting techniques * Rename as we go: pseudogram => echogram * Rename as we go: echosounder => echogram * echotools: processEchograms.py * Rename as we go: pseudogram => echogram * Rename as we go: echosounder => echogram * Always look for a deployment configuration file * Update arguments based on deployment configuration file

jr3cermak · 2023-03-25T21:15:27Z

This PR is good to go anytime. gutils/tests/test_slocum.py::TestEchoMetricsTwo::test_echogram has code that will read the echogram and put the profile into a dataframe: numpy, pandas and xarray. Initial starting point for migrating away from convertDBD.sh. The teledyne.py module has morphed a few times as it has encountered various sources of code. It can probably use an overhaul when we get closer to tackling the convertDBD.sh script and friends later. In an earlier life, it also moved away from the slocum binaries. Class functions still lurk in there even after converting to the dbdreader module. Progress.

* slocum/__init__.py * Set default plot type to 'binned' * Emit a log message for debugging * convertDBD.sh * Remove other debugging materials * echotools/teledyne.py * Leave default cmap selection as ascii and convert to actual colormap when requested * remove debugging material * if the date/time ticks are days that line up with 00:00:00, omit the time portion of the string * More migration of Pesudo => Echo * Bug/Enhancement: echogram is now properly time adjusted for "combo" and "egram" glider modes

* Ability to create timeseries views of echogram profiles

* Add new default product plot type "profile" * Update examples to use default product type

* Reduce testing resource size and allow testing in Docker image * Update dependencies and Docker image version * Pseudogram -> Echogram * Whoops, the binary utilities still require focal * dbdreader to the rescue... eventually! * Run the watch tests in Docker CI testing

* gutils/__init__.py * set logging level * gutils/filters.py * pass extra_kwargs as optional argument to reader_class * extra_kwargs now part of the class, remove from extras() * gutils/nc.py * Do not emit debug log message for orphans when the set is empty * Do not cross fill variables (is this needed for the DAC?) * gutils/slocum/__init__.py * Add datetime, warnings, dbdreader and pyarrow.parquet to needed modules * Reduce logging output for dbdreader and matplotlib * Add optional kwargs to SlocumReader() * Add flag to enabling parquet processing * Set processing mode based on ascii_file * Use internal kwargs to class * Let routine fail gracefully if parquet processing is enabled but unable to find appropriate modules * Fix: echometrics data was not being propogated to _extra_*.nc du to restrictive time filter. * Be sure to drop echogram_sv from main profile data * All echo data now stored in _extras. * Add block_read() to perform same task as dbd2asc * Add read_header() to fetch certain metadata from dbd * Adapt convert() routine to work as ascii or parquet * Updates to ascii processing for echograms * gutils/slocum/bin/convertDbds.sh * Add two arguments for echogram processing via ascii method * Add some proper defaults * Update gutils/slocum/echotools/teledyne.py large amount of upstream fixes and improvements. * pytest * New default: echograms produce two image files for each profile * echometrics5 and echometrics7 use ascii method * echometrics6 and echometrics8 use parquet method

* gutils/slocum/echotools/teledyne.py now flake8 compliant * gutils/nc.py * move imports to top * add debug flag to get at bugs at deep levels * gutils/tests/test_slocum.py * update answers with bug fixes and updates * setup.cfg * remove exception for echotools

* __init__.py * SciPy 1.13.0 warnings * nc.py * Skip writing extras file for echograms that are short * slocum/__init__.py * enable deeper debugging * slocum/bin/convertDbds.sh * enforce grep to use ascii on binary files * slocum/echotools/teledyne.py * allow PYTHONWARNINGS to override coded warnings * utilize dbdreader skip_inital_line feature * numpy datetime conversion warning * flake8 file matching update * ensure file descriptors are released * merge duplicate time coordinate data from DBD * flake8 updates

* improve handling when using the parquet format * use glider terms to refer to data file sources (glider -> flight) * use python classes instead of [] and () * set output directory appropriately when argument is not set * add metadata to data dict for tracking of created files * add env var GUTILS_PYTEST_NOTEARDOWN to tests to defer removal of ascii and netcdf directories during testing (debugging) * fix checks in a few tests with recent improvements to echotools processing

jr3cermak · 2023-08-28T22:11:20Z

Finally ready to move forward with updates to echometrics processing and the addition of parquet as intermediate storage. The pytests all pass when run manually.

2023-08-28 14:06:24,221 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-4.sbd,usf-bass-2016-253-0-4.tbd to usf_bass_2016_253_0_4_sbd.dat
2023-08-28 14:06:24,273 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-5.sbd,usf-bass-2016-253-0-5.tbd to usf_bass_2016_253_0_5_sbd.dat
2023-08-28 14:06:24,425 - gutils.slocum - INFO - Converted usf-bass-2016-253-0-6.sbd,usf-bass-2016-253-0-6.tbd to usf_bass_2016_253_0_6_sbd.dat
PASSED
gutils/tests/test_watch.py::TestWatchClasses::test_gutils_netcdf_to_erddap_watch PASSED

================================================================================== warnings summary ==================================================================================
gutils/tests/test_nc.py: 3030 warnings
gutils/tests/test_slocum.py: 870 warnings
  /home/cermak/miniconda3/envs/gutils_py3_9/lib/python3.9/site-packages/compliance_checker/suite.py:185: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    latest_version = str(max(StrictVersion(v) for v in version_nums))

gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
gutils/tests/test_slocum.py::TestEchoMetricsSix::test_echogram
  /home/cermak/miniconda3/envs/gutils_py3_9/lib/python3.9/site-packages/pyarrow/pandas_compat.py:354: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    if _pandas_api.is_sparse(col):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================== 38 passed, 1 xpassed, 3905 warnings in 440.84s (0:07:20) ==============================================================

Workflow tests are not working.

Looks like the build process needs to know about dbdreader. Utilization of dbdreader will completely replace reliance on the x86 slocum binaries for decoding.

There is an odd dependency failure via conda/Docker:

E   ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /opt/conda/lib/python3.9/site-packages/scipy/fft/_pocketfft/pypocketfft.cpython-39-x86_64-linux-gnu.so)

This update has a hook to process glider data into parquet intermediate files. The parquet enabled processing a shade faster than the ascii method.

This update references issue #12, #24 and #26.

- slocum/__init__.py - Final alignment of data types: rt, rtd, delayed - bin/convertDbds.sh - platform compatability fix for unit tests (see below) - tests/test_slocum.py tests/test_watch.py - do additional clean up of cache files after test completes Unit tests failed on different platforms due to either magic or operational diffences of file. Testing for data or ASCII allows all the unit test to successfully pass. - ftype=$(file $dbdSource | grep data); + ftype=$(file $dbdSource | grep 'data\|ASCII');

jr3cermak · 2023-10-15T23:06:05Z

This update to the PR includes a small fix that was discovered when trying to run unit tests on other platforms with different versions of OS and modules. Some small differences in magic or the way file operates, the shell must test for data or ASCII to allow unit tests to pass.

jr3cermak · 2023-10-15T23:14:37Z

From our standpoint, this is ready for implementation. Not sure what needs to be done to fix it in the workflow unit tests. Unit tests pass when run manually on all our platforms we have been testing.

Merge remote-tracking branch 'dev/master' into lowresechogram

cf49b84

jr3cermak mentioned this pull request Mar 25, 2023

echometrics improvements #12

Open

6 tasks

jr3cermak added 3 commits March 25, 2023 12:42

Fixes for flake8 tests

a66e956

Pass 2 for flake8

573ad45

Pass 3 for flake8

90d9b3e

jr3cermak added 4 commits March 30, 2023 17:38

Initial documentation update

0273431

* Ability to create timeseries views of echogram profiles

Product update

677f1ea

* Add new default product plot type "profile" * Update examples to use default product type

flake8 fix

ccfdd67

jr3cermak mentioned this pull request Jun 16, 2023

Quick fix ups and dependency dumps for echogram merge jr3cermak/GUTILS#4

Merged

kwilcox and others added 9 commits June 16, 2023 10:49

flake8 fixes

e771458

* gutils/slocum/echotools/teledyne.py now flake8 compliant * gutils/nc.py * move imports to top * add debug flag to get at bugs at deep levels * gutils/tests/test_slocum.py * update answers with bug fixes and updates * setup.cfg * remove exception for echotools

Add dbdreader to requirements.txt

5f813fd

Requirements.txt and mamba

3e0ea0c

requirements

3159b77

test2

725495b

try setup.py

05fa9cb

try3

bd3737c

jr3cermak added 17 commits June 24, 2023 13:46

try4

8ca83cb

try5

26fd439

docker try #2

821dd6d

docker try#3

0feb736

docker try #4

5c9158a

docker 3.8 try #1

41a760c

docker 3.8 try #1b

a816e38

docker 3.8 try #2

c9f2a6b

docker try #1

a217e19

docker try #2

2465e7e

docker try #3

86e194f

docker try #4

2db1a29

docker try #5

ee741ee

docker try #6

dca53e2

Merge branch 'master' into lowresechogram

29beb4a

VeckoTheGecko mentioned this pull request Dec 7, 2023

pyupgrade #30

Open

3 tasks

jr3cermak closed this Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Echometrics improvements early 2023 #22

Echometrics improvements early 2023 #22

jr3cermak commented Mar 20, 2023

jr3cermak commented Mar 20, 2023

kwilcox commented Mar 21, 2023

kwilcox commented Mar 21, 2023

jr3cermak commented Mar 21, 2023

kwilcox commented Mar 21, 2023

jr3cermak commented Mar 21, 2023

kwilcox commented Mar 22, 2023

jr3cermak commented Mar 22, 2023

jr3cermak commented Mar 25, 2023

jr3cermak commented Aug 28, 2023

jr3cermak commented Oct 15, 2023

jr3cermak commented Oct 15, 2023

Echometrics improvements early 2023 #22

Echometrics improvements early 2023 #22

Conversation

jr3cermak commented Mar 20, 2023

jr3cermak commented Mar 20, 2023

kwilcox commented Mar 21, 2023

kwilcox commented Mar 21, 2023

jr3cermak commented Mar 21, 2023

kwilcox commented Mar 21, 2023

jr3cermak commented Mar 21, 2023

kwilcox commented Mar 22, 2023

jr3cermak commented Mar 22, 2023

jr3cermak commented Mar 25, 2023

jr3cermak commented Aug 28, 2023

jr3cermak commented Oct 15, 2023

jr3cermak commented Oct 15, 2023