New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force little-endian format when writing eeg datafile #80
Conversation
753ff99
to
78066a5
Compare
Force pushed to fix the flake8 tests. Should work fine now |
Codecov Report
@@ Coverage Diff @@
## main #80 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 3 3
Lines 421 423 +2
=========================================
+ Hits 421 423 +2
Continue to review full report at Codecov.
|
Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks both. Then only three things remain:
- @Aniket-Pradhan please add an entry to the changelog here: https://github.com/bids-standard/pybv/blob/main/docs/changelog.rst, copying the style of the other entries (and add yourself to the list of authors)
- Please append your name and info to https://github.com/bids-standard/pybv/blob/main/CITATION.cff as well
- please add a
# pragma: no cover
(or whatever it was that prevents code coverage analysis to account for the lines) to the if clause, because we are (deliberately) not covering that in our tests, and I don't want coverage to get lower due to deliberate decisions :-)
Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome, thanks! I'll wait for a note from you that everything did indeed pass ... then we can merge and I'll make a new release.
There seems to be something wrong with the test, and I am guessing it is because we cannot pass a big-endian formatted array in a little-endian machine and expect it to carry out the scaling (and other arithmetic) operations correctly. For example, if we reverse the bytes and change the dtype of a numpy array the scaling operation would give a different result on a little-endian and a big-endian machine. Furthermore, Therefore, I don't think it would be possible to test this section on a little-endian machine. OTOH, I guess we can replace |
I see. In that case, let's still keep the check for endianness there - even if only 1 in 10k people run into it, it's a small thing to do and makes the code also easier to understand IMHO. Then we remove the test, and add the |
87a2edb
to
2b97fbc
Compare
Wait, that's exactly what should happen. If you create a big-endian array, NumPy should know how to handle that data and perform operations correctly. E.g. these two arrays are identical after multiplication: import numpy as np
little = np.arange(1e6, dtype="<f8").reshape(1000, 1000)
big = np.arange(1e6, dtype=">f8").reshape(1000, 1000)
little *= 7
big *= 7
np.array_equal(little, big) If this example does not demonstrate the issue I'd be curious to see something that doesn't work. |
When writing arrays to disk with little.tofile("little.dat")
big.tofile("big.dat")
!md5sum big.dat little.dat
5157b19a956273b22d8cd056b5eea320 big.dat
d2ae4a99442623c5ef2678a0ea5a3429 little.dat |
You're right, numpy is handling the operations correctly. I was also swapping the bytes along with changing the dtype, which now seems the wrong way to do it 😭 . However, the byte order of the numpy array is reset to the native machine's endianness " import numpy as np
def scale_arr(arr, scale=10):
return arr * scale
big = np.arange(1e6, dtype=">f8").reshape(1000, 1000)
big_scaled = scale_arr(big)
print(big.dtype, big.dtype.byteorder)
print(big_scaled.dtype, big_scaled.dtype.byteorder) From what I can understand, I guess numpy only retains the datatype in the same scope of the function call. Since the byte order is reset, the array is no longer little-endian, and hence the control doesn't go in the |
@cbrnr merge when you are happy. |
Indeed it seems like byte order is not preserved when copies are involved, e.g. |
Yes, thanks for confirming with me - I also think we should merge. We clarified a major point in this issue/PR with Brain Products and made sure that pybv correctly writes as little-endian from now on. The only issue that remains would be a person on a little-endian machine creating a big-endian array, and then passing it to |
Famous last words. The only thing that is bugging me is that we don't have a test. |
yes, fair enough (that's also why I typed |
Ideally, we should be able to write a file (in LE) even with a BE array on both architectures. In other words, we should always write a LE data file no matter which array we throw at it, independent of the host architecture.
I'm still -1 on adding a new CI, but the |
Okay, but that brings us back to your comment from #80 (comment)
So we would need to make sure an array keeps its byte order as it passes through If that's possible, then yes, we can test everything and ensure that passing any byte ordered array on any architecture results in (correctly scaled/processed) little endian format. |
One shabby solution I was thinking, is that we could reset the dtype of I am sure we can arrive at a much better solution than this, but I cannot think of anything better right now. |
It's pragmatic, I like that. And maybe some of the times we can just avoid making a copy and thus retain the dtype without the "shabby solution" :-)
you mentioned that before, could you please elaborate why that would be a problem? |
OK, so I looked at the changes again and it seems like we can simplify the condition to: if sys.byteorder == "big" and byteorder == "=": # pragma: no cover
data = data.byteswap() The case Therefore, unless we change a lot of code just for the sake of testing on an LE architecture, we really have to mark this line as untestable. However, I think that's fine – sorry if this was clear for you previously, I thought we could somehow create a test that works on LE without changing too much. We can always add this later if we think it is necessary. |
Perfect! Thanks @Aniket-Pradhan and @sappelhoff! |
@Aniket-Pradhan please run your tests and make sure everything passes :-) ... then notify me and I'll release pybv 0.6.0 |
ohh, looks like I am late to the party 😛
As @cbrnr pointed out, That's why I was calling it a "shabby" solution, because it is not really needed, lol. Anyways, thanks for being supportive. Let me know if you plan to carry out with a similar solution, and I would be happy to help 😄
I basically meant that we would have to specify the endianness in the supported formats if we were to proceed with testing the code. 😛
I'll run the tests on travis, and hopefully it should work fine. I'll ping here as soon as the build finishes. Thanks a lot @cbrnr and @sappelhoff for your help. You all rock! |
@sappelhoff everything looks green. We are good to go! :D |
https://pypi.org/project/pybv/0.6.0/ 🎉 feel free to draft a tweet @Aniket-Pradhan or @cbrnr or @hoechenberger - I'll like and retweet of course 😉 (but no energy to draft one myself) |
Also updated the docs to add information about this change 😸
closes #79