-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add I/O for masked Traces #49
base: master
Are you sure you want to change the base?
Conversation
Make backwards compatible with older ASDF versions I/O of masked arrays of float/integer data. Ignore some test files.
Make backwards compatible with older ASDF versions I/O of masked arrays of float/integer data. Ignore some test files. Fix broken identity test
bed89ce
to
6d590f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heyhey,
first of all I'm really sorry for taking so long!
I like this and the implementation looks solid to me.
There is one point I'd like to discuss a bit: Instead of relying on a "magic" mask value, how about storing the actual mask as an attribute?
With some trickery it could be made so it takes 1 bit per array item. It would be a bit slow but as we are talking about masked arrays I'm not too worried here.
Aside from that the currently chosen mask value is machine dependent - probably not relevant for the platforms we really care about but its a bit of a sore spot.
For unsigned integer arrays the currently chosen mask value would be zero which does not really work. Also for lowish precision integers the risk for false positives might be a bit too large - for floating points I agree that this would not be an issue.
All in all I think the that using an actual mask array would solve these issues at the expense of storage size and performance.
Hi @malcolmw Is there still an interest in following up this? |
Hi, @krischer, This is not a priority for me anymore, so feel free to close this PR if it isn't a significant value-adding feature for pyASDF. However, if you think it is a useful feature you would like to merge to facilitate work with long segments of potentially-gappy continuous data, I am happy to help out. |
I do actually think that this would be a very nice addition to the data format. As pointed out in a comment above I'd prefer the data model of actually carrying along a second This would mirror to a certain extend how numpy's masked arrays work and it could also be properly integrated into the format. I could take care of adding it to the format definition and the validator if there is still interest in implementing this. |
Sounds good. I'm happy to implement, though it will be a while before I get to this. |
This pull request is for a branch that I use to work with masked Traces. Masked values are filled with a fill value automatically determined based on the data array's dtype at write-time, which is stored in the Dataset's attributes and later used to reconstruct the mask at read-time.