Pydmap Read #2

mts299 · 2019-01-04T00:14:00Z

First pydarn functionality!
DmapRead allows for users to read in dmap files and obtain records in a list of dictionaries.
The original code of dmap was taken from https://github.com/SuperDARNCanada/backscatter
and updated to python 3.6 and improved upon for speed and readability.

README is updated for installation notes and example code.

To test:

import pydarn
dmap = pydarn.DmapRead('20170114.map')
dmap_data = dmap.read_records()

This PR is to focus on the DmapRead functionality and the README.

Testing suite setup, performance test suite, DEVELOPERS.md, and DmapWrite are for other PRs

Fixed the file reading error with few other error in testing and the dmap file. Added some TODO's to look into for the future redesign.

Made some minor modifications to get pydmap to work with python 3.0. This included using memoryview instead of buffer for unpacking the C structs. Using encoding on chars and strings since parsing needs a byte object. Decoding was also needed for string in unpacking data. Stream test did not work and further cleanup of the code needs to happen including a better framework for testing. Performance testing also is needed to try and speed up the reading of files.

Moved the dmap data structures: - scalars - arrays - record to its own file for modularity. Cleaned up the code: - getter/setter property (pythonic) - use ChainMap for data structure on records for performance - got rid of some unused methods

Moved read and write for damp to io for better package readbility. Previous commit also moved various packages around for better organization and readibility. Started cleaning up code with better documentation, variable names, comments and functionality.

Cleaned up dmap.py some more. The following things have changed: - added several new expections for more specific errors - cleaned up error messages to be more descriptive - renamed variable for better clarity - re-wrote log messages for better readability - some code improvements for modularity and encapsulation - byte checking - format converting More work on the reading procedure then writing procedure needs to be cleaned up still.

Did the final clean up with dmap read. Time to test it!

Restructed the testing suite file directory system and file names for better readability. Also, applied pep8 style to the io.py file and restructured how to store records from a dmap file.

Modified the io.py to change the syntax for implementing the new data structures for dmap record, scalar, and array. Dmap records in DmapRead has also changed to deque for memory efficiency and performance reasons. made some minor modifications to init py for proper importing of the classes.

All test run and pass. Added an extra sanity check for empty files. All that is left is documentation.

Documented DmapRead following numpy style guide for compilation to read the docs.

Documented the dmap data structure file and did some spell checking on io.py as well.

Keith point out some speed difference in backscatter vs. pydarn. Main difference was logging being a large factor to time. Possible solution is logging on a separate thread or let it be an option. Minor improvements were also made to pydarn to improve the overall speed of the code. WARNING: logging is commented out for now.

This commit focused on decreasing the time of the DmapRead functionality as much as possible. One problem, to reach minimum amount of time to read dmap files the logging functionality had to be removed. Further investigation is needed on making this optional. Huge improvements were, removing the amount of calls to other library calls. Some fiddling with unpack and numpy.frombuffer. Comments are left in the code indicating non-pythonic coding that allowed for a speed performance. This current version has a 42% increase in performance compared to backscatters dmap in python 2.7 and a 55% increase when compared to an earlier version of pydarn's dmap reader when only what was needed to convert backscatters dmap to python 3.0.

Updated the README for pydarn installation. Added unit tests for pydarn data structures. Update documentation in unit tests. Set logging to be disabled and reduced logging information that is not needed given a enough information in the error messages stored in logging.

Updating README to be more clear on package requirements. Also, added numpy in the setup.py install for virtual environments.

ksterne · 2019-01-04T03:50:52Z

I know I'm not listed as a reviewer, but I couldn't help myself. These are mostly nit-picky comments, but hopefully helpful.

Do we need to add numpy to README.md as a prerequisite since it's been added with this PR? Should "Example" be changed to "Examples"? And then under the first example, the "DMAP files are a SuperDARN" doesn't seem like a complete thought. Is there more to that line that got misplaced?

I'll see if I can find time tomorrow to give it a whirl.

mts299 · 2019-01-04T04:46:06Z

@ksterne No worries, more the marrier for reviewing. Plus others might be busy with the Borealis project, and I would like to push this PR ASAP so I can start building off of it.

Sorry, where do you see that numpy has been added to this PR?
I added numpy into the setup.py just in case it was not installed. So it should be installed on setup if the user does not have it.
I did the same for the python yaml library, but in some cases, you also need to install the development code thus it being a prerequisite.

Yes, I must have written that sentence half awake or near the end of the day. I will re-look over it tomorrow.

Sounds awesome! Please let me know what you think. Currently, I am working on the DEVELOPERS guide, some extra documentation and how to set up a nice test suite without having large files.

egthomas · 2019-01-06T03:31:11Z

Will support for grid files be included at some point? Also, it looks like the xtd fields for power and spectral width are missing from the mapfile_types in dicts_to_file.

mts299 · 2019-01-06T03:59:56Z

Sorry, there is two branches with the same name. Please make sure you are using the one that has the complete README file that runs with the following test code I put in the description.

@egthomas you are using/looking at the dmapwrite code which is not the scope of the this PR. I left a note above dmapwrite that I need to clean it up and review it. That is legacy code from backscatter. Sorry for the confusion.

mts299 · 2019-01-06T04:04:08Z

However, I forgot to add grid files in the unit tests so thank you for pointing that out @egthomas

egthomas · 2019-01-06T04:11:06Z

Ah my mistake, thanks for the clarification!

README.md

setup.py

README.md

pydarn/logging_config.yaml

asreimer · 2019-01-07T09:50:25Z

Hey @mts299! This looks great! Most of my comments are minor things like typos. I was able to get the example code to run.

One major comment I didn't include in my review: I couldn't get the testing to work, except for test/pydmap_tests/test_datastructures.py

For both test_dmapread.py and test_dmapwrite.py:

Traceback (most recent call last):
  File "test_dmapwrite.py", line 17, in <module>
    import test_data
ModuleNotFoundError: No module named 'test_data'

And finally: python tests.py yields:

ImportError: Failed to import test module: tests
Traceback (most recent call last):
  File "tests.py", line 18, in suite
    mod = __import__(t, globals(), locals(), ['suite'])
ModuleNotFoundError: No module named 'pydarn.tests'

mts299 · 2019-01-07T15:24:57Z

@asreimer as the description states:

Testing suite setup, performance test suite, DEVELOPERS.md, and DmapWrite are for other PRs
As in python tests.py is not set up as that will be part of the Testing suite setup PR since we still need to figure out testing data files.

Also, DmapWrite is also not part of this PR so the test_dmapwrite.py would be expected to fail as I have not focused on cleaning or getting the code to work.

Will fix test_dmapread.py module problem right away, however, you need to change the data filenames in that test or download the specific files.

asreimer · 2019-01-07T16:06:34Z

Hmmm, the edit to my comment didn't save. I noticed your description after I posted that comment about the testing not working and editted my comment accordingly. 2am might have had something to do with it.

mts299 · 2019-01-07T16:09:07Z

@asreimer Okay, I mean test_dmapread.py and test_datastructures.py should work. However, there is that change you need to do with test_dmapread.py for data files. Which I will bring up sometime soon so I can figure out the test suite :)

Mainly fixed some README typos, test_dmapread.py error and added some setup features for hopefully better installation.

asreimer · 2019-01-07T23:34:21Z

I tested everything using the examples in the README.md and they work fine. Good to go.

That setup.cfg and MANIFEST.in are slick. Really nice work @mts299!

kkotyk · 2019-01-07T19:12:40Z

pydarn/exceptions/pydmap_exceptions.py

+        pydarn_logger.error(self.message)
+
+
+class ZeroByteError(Exception):


Maybe this should be renamed to 'InvalidByteError' since its raised if the bytes are 0 or less than zero.

Personally, I like Zero because it is concise and something may have set it to zero which might not have made the bytes invalid. However, I get what you mean with the less than zero cases, so I added NegativeByteError which could indicate something really bad happened :/

pydarn/exceptions/pydmap_exceptions.py

pydarn/logging_config.yaml

kkotyk · 2019-01-07T23:02:45Z

pydarn/pydmap/datastructures.py

+"""
+class DmapArray(NamedTuple):
+    name: str
+    value: np.ndarray


I've never seen a file that has this, but as part of the DMAP spec, its technically possible to store strings or even whole dmap records in a dmap value. I'm not sure if thats an issue if the type here is ndarray, but something to think about.

First off I really would like to see a DMAP example where DMAP records are stored in records. The Spec may be outdated or have legacy code that I would not like to introduce into newer code. Especially if the legacy introduces potential bugs.

I am more than happy to go back if a given example is present in the current DMAP format.

With that said, ndarrays can hold any type of object including strings and DmapRecords so there is no actual issue here (The bittersweet of python :p ). It just informs developers that the object field should be a data type of ndarray. Technically if they store a list or something else nothing will really happen because it is python. Just later on in the downstream something might throw a TypeError but that is the pythonic way.

I think in later python versions type hinting might actually enforce type checking.

https://docs.python.org/3/library/typing.html

pydarn/pydmap/datastructures.py

test/pydmap_tests/test_dmapread.py

test/pydmap_tests/test_dmapwrite.py

kkotyk · 2019-01-08T22:55:35Z

README.md

+dmap_data = dmap_reader.read_records() 
+
+# dmap_data[record number][paramter name].value
+print(dmap_data[0]['bmnum'].value) 


Is it possible to create a getter or some sort of decorator so that the user doesn't have to explicitly call .value on every data value they want? ex. if i call dmap_data[0]['bm_num'] then under the hood it returns me dmap_data[0]['bm_num'].value by default? I don't think this affects too much, but if you are writing a large script or something you are going to have a lot of extraneous looking code for .value.

Personally, I like the .value because it is explicit and shows self-documenting code (best practices). It also keeps things consistent. Especially with the complaints that RST and DaVitpy have gotten with unclear variable names and code.

For example:

for i in range(record[0]['beam'].shape[1] ): amplify *= record[0]['beam'].value

vs.

for i in range(record[0]['beam'].shape[1] ): amplify *= record[0]['beam']

The former is clearer and easier to understand at first glance (which is me in debugging mode) and the latter is more confusing or black magic.

Or to avoid extraneous looking code you can just store it in a variable ahead of time and then use it later to make the lines smaller.

For Example:

Plot (record[0][velocity'].value, record[0]['time'].value, default .... plot ... settings)

vs.

velocity_data = record[0][velocity'].value time_data = record[0]['time'].value Plot (velocity_data, time_data, default .... plot ... settings)

The latter is also considered cleaner code too. I mean at the end of the day you are going to have to access the record[0][velocity'] to get most of your data (assuming no loops are involved to cut some stuff down) so an extra .value especially with autocomplete seems okay to me :)

Also, I don't know if you can make a getter function/decorator with a collection containing a collection. You are potentially looking at some overhead with a getter method and that means a decrease in performance.

Sounds good. I think this is partly a preference thing and I don't think there are necessarily any right or wrong answers here. I see a couple of thumbs ups on my suggestion, so maybe some others can comment on what they think since this is ultimately a decision on top level functionality. Personally, I still stick by my suggestion as a preference provided there is no performance hits.

egthomas · 2019-01-09T00:12:09Z

I don't know if this is the appropriate place to post these links, but in one of the review or code comments I thought I saw the general question "What is dmap format?". Here are a few short guides/tutorials/documentation written by the original developer (Rob Barnes) which may be of some general help:

https://superdarn.github.io/rst/superdarn/src.doc/rfc/0006.html
https://github.com/SuperDARN/rst/wiki/Using-DataMap-Files-In-IDL
https://superdarn.github.io/rst/general/src.lib/dmap/index.html

kkotyk · 2019-01-09T15:52:24Z

@egthomas I'm not sure, but with Borealis, we usually use the issue tracker and use the description tags when we ask our questions or comments. "What is dmap format" would be a good issue with the question tag.

kkotyk · 2019-01-09T19:06:43Z

Other than maybe moving that data question to an issue or something, I think everything else is resolved. I can't submit an approval, so I'm just writing it here.

egthomas · 2019-01-09T22:47:29Z

I don't know where it was in the 100+ comments, but somewhere I'd like to echo Keith's sentiment that ideally the dmap I/O routines should be as general as possible and therefore able to support real-time data streams and files with different formats from the standard SuperDARN rawacf, fitacf, etc.

mts299 · 2019-01-09T22:59:33Z

@egthomas I was not disagreeing, I have not tested streaming and focused more on file reading. I will be revisiting streaming in the test suite. There is also talk about supporting other file formats.

asreimer · 2019-01-09T23:00:15Z

@egthomas we are all on the same page with this. And AFAIK, pydmap does still support data streams. @mts299 and I have been hashing out how to support multiple file types today.

Finally, can we please stop commenting on a closed PR and move this discussion into the chat?

egthomas · 2019-01-09T23:05:43Z

@mts299 sorry I wasn't trying to imply anyone was disagreeing, just having trouble keeping up with all of the messages coming through my inbox. This is my last comment on this PR request.

From Issue #2 suggestion, added `__str__` and `__repr__` for better printing of the object that is being used. Only dmap classes were changed because inherited classes will obtain the same attributes for printing of the object but the class name will be adjusted to be same as the actual class and not the dmap classes.

Pydmap Read

From Issue #2 suggestion, added `__str__` and `__repr__` for better printing of the object that is being used. Only dmap classes were changed because inherited classes will obtain the same attributes for printing of the object but the class name will be adjusted to be same as the actual class and not the dmap classes.

Pydmap Read

From Issue #2 suggestion, added `__str__` and `__repr__` for better printing of the object that is being used. Only dmap classes were changed because inherited classes will obtain the same attributes for printing of the object but the class name will be adjusted to be same as the actual class and not the dmap classes.

Pydmap Read

From Issue #2 suggestion, added `__str__` and `__repr__` for better printing of the object that is being used. Only dmap classes were changed because inherited classes will obtain the same attributes for printing of the object but the class name will be adjusted to be same as the actual class and not the dmap classes.

Pydmap Read

From Issue #2 suggestion, added `__str__` and `__repr__` for better printing of the object that is being used. Only dmap classes were changed because inherited classes will obtain the same attributes for printing of the object but the class name will be adjusted to be same as the actual class and not the dmap classes.

Marina Schmidt added 15 commits November 28, 2018 17:01

Updated to pydamp

332bde8

Fixed the file reading error with few other error in testing and the dmap file. Added some TODO's to look into for the future redesign.

Relocating files and class

6aebb5b

Moved the dmap data structures: - scalars - arrays - record to its own file for modularity. Cleaned up the code: - getter/setter property (pythonic) - use ChainMap for data structure on records for performance - got rid of some unused methods

Started Cleanup

d2581a4

Moved read and write for damp to io for better package readbility. Previous commit also moved various packages around for better organization and readibility. Started cleaning up code with better documentation, variable names, comments and functionality.

cleaned up dmap read

2cf91f1

Did the final clean up with dmap read. Time to test it!

Restructing

2b8d01a

Restructed the testing suite file directory system and file names for better readability. Also, applied pep8 style to the io.py file and restructured how to store records from a dmap file.

Final fixes

c590627

All test run and pass. Added an extra sanity check for empty files. All that is left is documentation.

Documented DmapRead

9c46c48

Documented DmapRead following numpy style guide for compilation to read the docs.

Documentation for data structures

d22a777

Documented the dmap data structure file and did some spell checking on io.py as well.

Update README and setup

344930a

Updating README to be more clear on package requirements. Also, added numpy in the setup.py install for virtual environments.

mts299 requested review from asreimer, kkotyk and billetd January 4, 2019 00:14

asreimer added the enhancement New feature or request label Jan 7, 2019

asreimer suggested changes Jan 7, 2019

View reviewed changes

asreimer mentioned this pull request Jan 7, 2019

ENH: More standardized setup.py #3

Closed

Fixed PR reviews

685bb3a

Mainly fixed some README typos, test_dmapread.py error and added some setup features for hopefully better installation.

asreimer approved these changes Jan 7, 2019

View reviewed changes

asreimer merged commit 7696edb into develop Jan 7, 2019

asreimer deleted the pydmap_fix branch January 7, 2019 23:34

kkotyk reviewed Jan 8, 2019

View reviewed changes

mts299 restored the pydmap_fix branch February 11, 2019 23:10

mts299 deleted the pydmap_fix branch July 5, 2019 20:23

mts299 added this to Done in IO - Deprecation Jan 3, 2020

mts299 added this to the v1.0.0 milestone Jan 3, 2020

mts299 pushed a commit that referenced this pull request Feb 5, 2020

Merge pull request #2 from SuperDARN/pydmap_fix

53bab09

Pydmap Read

mts299 pushed a commit that referenced this pull request Feb 26, 2020

Merge pull request #2 from SuperDARN/pydmap_fix

9a16bf2

Pydmap Read

mts299 pushed a commit that referenced this pull request Feb 26, 2020

Merge pull request #2 from SuperDARN/pydmap_fix

193fcca

Pydmap Read

mts299 pushed a commit that referenced this pull request Feb 26, 2020

Merge pull request #2 from SuperDARN/pydmap_fix

6cb97d1

Pydmap Read

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pydmap Read #2

Pydmap Read #2

mts299 commented Jan 4, 2019

ksterne commented Jan 4, 2019

mts299 commented Jan 4, 2019

egthomas commented Jan 6, 2019

mts299 commented Jan 6, 2019

mts299 commented Jan 6, 2019

egthomas commented Jan 6, 2019

asreimer commented Jan 7, 2019

mts299 commented Jan 7, 2019

asreimer commented Jan 7, 2019

mts299 commented Jan 7, 2019

asreimer commented Jan 7, 2019

kkotyk Jan 7, 2019

mts299 Jan 9, 2019

kkotyk Jan 7, 2019

mts299 Jan 9, 2019

kkotyk Jan 8, 2019

mts299 Jan 9, 2019

kkotyk Jan 9, 2019

egthomas commented Jan 9, 2019

kkotyk commented Jan 9, 2019

kkotyk commented Jan 9, 2019

egthomas commented Jan 9, 2019

mts299 commented Jan 9, 2019

asreimer commented Jan 9, 2019

egthomas commented Jan 9, 2019

		pydarn_logger.error(self.message)


		class ZeroByteError(Exception):

Pydmap Read #2

Pydmap Read #2

Conversation

mts299 commented Jan 4, 2019

ksterne commented Jan 4, 2019

mts299 commented Jan 4, 2019

egthomas commented Jan 6, 2019

mts299 commented Jan 6, 2019

mts299 commented Jan 6, 2019

egthomas commented Jan 6, 2019

asreimer commented Jan 7, 2019

mts299 commented Jan 7, 2019

asreimer commented Jan 7, 2019

mts299 commented Jan 7, 2019

asreimer commented Jan 7, 2019

kkotyk Jan 7, 2019

Choose a reason for hiding this comment

mts299 Jan 9, 2019

Choose a reason for hiding this comment

kkotyk Jan 7, 2019

Choose a reason for hiding this comment

mts299 Jan 9, 2019

Choose a reason for hiding this comment

kkotyk Jan 8, 2019

Choose a reason for hiding this comment

mts299 Jan 9, 2019

Choose a reason for hiding this comment

kkotyk Jan 9, 2019

Choose a reason for hiding this comment

egthomas commented Jan 9, 2019

kkotyk commented Jan 9, 2019

kkotyk commented Jan 9, 2019

egthomas commented Jan 9, 2019

mts299 commented Jan 9, 2019

asreimer commented Jan 9, 2019

egthomas commented Jan 9, 2019