Force little-endian format when writing eeg datafile #80

Aniket-Pradhan · 2021-09-24T18:15:13Z

Also updated the docs to add information about this change 😸

closes #79

Fix styling guideline

Aniket-Pradhan · 2021-09-24T19:16:43Z

Force pushed to fix the flake8 tests. Should work fine now

codecov · 2021-09-24T19:21:51Z

Codecov Report

Merging #80 (cf048b7) into main (456d014) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main       #80   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            3         3           
  Lines          421       423    +2     
=========================================
+ Hits           421       423    +2

Impacted Files	Coverage Δ
pybv/io.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 456d014...cf048b7. Read the comment docs.

pybv/io.py

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

sappelhoff

Cool, thanks both. Then only three things remain:

@Aniket-Pradhan please add an entry to the changelog here: https://github.com/bids-standard/pybv/blob/main/docs/changelog.rst, copying the style of the other entries (and add yourself to the list of authors)
Please append your name and info to https://github.com/bids-standard/pybv/blob/main/CITATION.cff as well
please add a # pragma: no cover (or whatever it was that prevents code coverage analysis to account for the lines) to the if clause, because we are (deliberately) not covering that in our tests, and I don't want coverage to get lower due to deliberate decisions :-)

docs/changelog.rst

pybv/io.py

docs/changelog.rst

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

sappelhoff

awesome, thanks! I'll wait for a note from you that everything did indeed pass ... then we can merge and I'll make a new release.

Aniket-Pradhan · 2021-09-27T15:37:16Z

There seems to be something wrong with the test, and I am guessing it is because we cannot pass a big-endian formatted array in a little-endian machine and expect it to carry out the scaling (and other arithmetic) operations correctly.

For example, if we reverse the bytes and change the dtype of a numpy array the scaling operation would give a different result on a little-endian and a big-endian machine.

Furthermore, numpy.tofile loses the endianness information of the numpy array and exports the data to whatever the machine's endianness is. The test was working earlier because I had accidentally passed the wrong variable, but upon reviewing the coverage report I found this mistake.

Therefore, I don't think it would be possible to test this section on a little-endian machine. OTOH, I guess we can replace if (sys.byteorder == "big" and byteorder == "=") or byteorder == ">": with if sys.byteorder == "big": because it would not make sense to work with big-endian dtype on a little-endian machine.

sappelhoff · 2021-09-27T15:44:57Z

I see. In that case, let's still keep the check for endianness there - even if only 1 in 10k people run into it, it's a small thing to do and makes the code also easier to understand IMHO.

Then we remove the test, and add the pragma: no cover back, to get a 100% diff coverage

cbrnr · 2021-09-27T18:34:44Z

There seems to be something wrong with the test, and I am guessing it is because we cannot pass a big-endian formatted array in a little-endian machine and expect it to carry out the scaling (and other arithmetic) operations correctly.

Wait, that's exactly what should happen. If you create a big-endian array, NumPy should know how to handle that data and perform operations correctly.

E.g. these two arrays are identical after multiplication:

import numpy as np

little = np.arange(1e6, dtype="<f8").reshape(1000, 1000)
big = np.arange(1e6, dtype=">f8").reshape(1000, 1000)

little *= 7
big *= 7

np.array_equal(little, big)

If this example does not demonstrate the issue I'd be curious to see something that doesn't work.

cbrnr · 2021-09-27T18:44:14Z

When writing arrays to disk with .tofile(), the files are not identical, which means that they should differ in their endianness:

little.tofile("little.dat")
big.tofile("big.dat")
!md5sum big.dat little.dat 
5157b19a956273b22d8cd056b5eea320  big.dat
d2ae4a99442623c5ef2678a0ea5a3429  little.dat

Aniket-Pradhan · 2021-09-27T19:45:29Z

You're right, numpy is handling the operations correctly. I was also swapping the bytes along with changing the dtype, which now seems the wrong way to do it 😭 .

However, the byte order of the numpy array is reset to the native machine's endianness "=" whenever it is getting assigned (ref lines 476, 480 and 493, where we set it manually from the supported formats).

import numpy as np

def scale_arr(arr, scale=10):
    return arr * scale

big = np.arange(1e6, dtype=">f8").reshape(1000, 1000)
big_scaled = scale_arr(big)

print(big.dtype, big.dtype.byteorder)
print(big_scaled.dtype, big_scaled.dtype.byteorder)

From what I can understand, I guess numpy only retains the datatype in the same scope of the function call.

Since the byte order is reset, the array is no longer little-endian, and hence the control doesn't go in the if condition, not allowing us to test the block, thereby reducing the coverage. This is why I had to revert to the old code because the coverage still dropped when I committed the test case the first time.

sappelhoff · 2021-09-27T19:58:09Z

@cbrnr merge when you are happy.

cbrnr · 2021-09-28T07:21:16Z

Indeed it seems like byte order is not preserved when copies are involved, e.g. big = big * 7 will return a new array with native endianness but in-place operations such as big *= 7 will preserve endianness. Therefore, I would try to change operations involving copies with in-place variants if possible. However, this will probably require a ton of modifications that I'm not sure we want to maintain. So I guess I'll probably merge, but I would like to confirm with @sappelhoff.

pybv/io.py

sappelhoff · 2021-09-28T14:24:01Z

However, this will probably require a ton of modifications that I'm not sure we want to maintain. So I guess I'll probably merge, but I would like to confirm with @sappelhoff.

Yes, thanks for confirming with me - I also think we should merge. We clarified a major point in this issue/PR with Brain Products and made sure that pybv correctly writes as little-endian from now on. The only issue that remains would be a person on a little-endian machine creating a big-endian array, and then passing it to write_brainvision ... and I think it's safe to say that'll never happen (hehe).

cbrnr · 2021-09-28T15:24:34Z

... and I think it's safe to say that'll never happen (hehe).

Famous last words. The only thing that is bugging me is that we don't have a test.

sappelhoff · 2021-09-28T15:30:28Z

Famous last words. The only thing that is bugging me is that we don't have a test.

yes, fair enough (that's also why I typed hehe). Could we add a check to the top of write_brainvision that errors when an array is not in native byte order? That way we would cover all cases, right? The only untested thing (pragma: no cover) would then be the if-clause to convert a native byte order array on a big-endian machine to little-endian --> and that's something we could test by adding a new CI on a big-endian architecture. (we discussed that previously but decided against it, but we can reconsider)

cbrnr · 2021-09-28T17:58:24Z

Could we add a check to the top of write_brainvision that errors when an array is not in native byte order? That way we would cover all cases, right?

Ideally, we should be able to write a file (in LE) even with a BE array on both architectures. In other words, we should always write a LE data file no matter which array we throw at it, independent of the host architecture.

The only untested thing (pragma: no cover) would then be the if-clause to convert a native byte order array on a big-endian machine to little-endian --> and that's something we could test by adding a new CI on a big-endian architecture. (we discussed that previously but decided against it, but we can reconsider)

I'm still -1 on adding a new CI, but the if-clause is also True if the array is BE (independent of the architecture), so we should be able to test it.

sappelhoff · 2021-09-28T18:05:02Z

Okay, but that brings us back to your comment from #80 (comment)

Therefore, I would try to change operations involving copies with in-place variants if possible. However, this will probably require a ton of modifications that I'm not sure we want to maintain.

So we would need to make sure an array keeps its byte order as it passes through write_brainvision, see #80 (comment)

If that's possible, then yes, we can test everything and ensure that passing any byte ordered array on any architecture results in (correctly scaled/processed) little endian format.

Aniket-Pradhan · 2021-09-28T18:14:14Z

So we would need to make sure an array keeps its byte order as it passes through write_brainvision, see #80 (comment)

One shabby solution I was thinking, is that we could reset the dtype of data every time it is assigned a new copy. That way, we retain the appropriate dtype until we reach the final if condition. Although, that would be a lot of overhead for a very far-off case. Furthermore, we would have to use the appropriate dtype from the supported formats.

I am sure we can arrive at a much better solution than this, but I cannot think of anything better right now.

sappelhoff · 2021-09-28T18:42:47Z

One shabby solution I was thinking, is that we could reset the dtype of data every time it is assigned a new copy.

It's pragmatic, I like that. And maybe some of the times we can just avoid making a copy and thus retain the dtype without the "shabby solution" :-)

Furthermore, we would have to use the appropriate dtype from the supported formats.

you mentioned that before, could you please elaborate why that would be a problem?

cbrnr · 2021-09-29T07:08:39Z

OK, so I looked at the changes again and it seems like we can simplify the condition to:

if sys.byteorder == "big" and byteorder == "=":  # pragma: no cover
        data = data.byteswap()

The case byteorder == ">" never occurs because we're doing some copying before, which changes the array to native order.

Therefore, unless we change a lot of code just for the sake of testing on an LE architecture, we really have to mark this line as untestable. However, I think that's fine – sorry if this was clear for you previously, I thought we could somehow create a test that works on LE without changing too much.

We can always add this later if we think it is necessary.

cbrnr · 2021-09-29T08:56:31Z

Perfect! Thanks @Aniket-Pradhan and @sappelhoff!

sappelhoff · 2021-09-29T09:00:09Z

@Aniket-Pradhan please run your tests and make sure everything passes :-) ... then notify me and I'll release pybv 0.6.0

Aniket-Pradhan · 2021-09-29T14:47:37Z

ohh, looks like I am late to the party 😛

It's pragmatic, I like that. And maybe some of the times we can just avoid making a copy and thus retain the dtype without the "shabby solution" :-)

As @cbrnr pointed out, data would automatically be converted to the native architecture before it reaches the if condition. Therefore unless we want to modify a chunk of code just to test a possibility (which should be working anyways, because it is automatically converted to the native arch beforehand) doesn't seem feasible to me.

That's why I was calling it a "shabby" solution, because it is not really needed, lol. Anyways, thanks for being supportive. Let me know if you plan to carry out with a similar solution, and I would be happy to help 😄

Furthermore, we would have to use the appropriate dtype from the supported formats.

you mentioned that before, could you please elaborate why that would be a problem?

I basically meant that we would have to specify the endianness in the supported formats if we were to proceed with testing the code. 😛

@Aniket-Pradhan please run your tests and make sure everything passes :-) ... then notify me and I'll release pybv 0.6.0

I'll run the tests on travis, and hopefully it should work fine. I'll ping here as soon as the build finishes.

Thanks a lot @cbrnr and @sappelhoff for your help. You all rock!

Aniket-Pradhan · 2021-09-29T15:34:31Z

@sappelhoff everything looks green. We are good to go! :D

sappelhoff · 2021-09-29T18:11:34Z

https://pypi.org/project/pybv/0.6.0/ 🎉 feel free to draft a tweet @Aniket-Pradhan or @cbrnr or @hoechenberger - I'll like and retweet of course 😉 (but no energy to draft one myself)

Aniket-Pradhan mentioned this pull request Sep 24, 2021

Tests fail with big-endian machines #79

Closed

cbrnr approved these changes Sep 24, 2021

View reviewed changes

Aniket-Pradhan added 2 commits September 25, 2021 00:44

force little-endian format when writing eeg datafile

132a51f

Fix styling guideline

add information regarding little-endian format in the docs

78066a5

Aniket-Pradhan force-pushed the patch-0 branch from 753ff99 to 78066a5 Compare September 24, 2021 19:15

sappelhoff reviewed Sep 27, 2021

View reviewed changes

pybv/io.py Outdated Show resolved Hide resolved

sappelhoff and others added 2 commits September 27, 2021 10:55

tweak readme text

8e93b2c

modify condition when to swap bytes

664a5c9

cbrnr reviewed Sep 27, 2021

View reviewed changes

pybv/io.py Outdated Show resolved Hide resolved

fit if clause on one line

45edb1a

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

sappelhoff reviewed Sep 27, 2021

View reviewed changes

Aniket-Pradhan added 2 commits September 27, 2021 19:10

Ignore lines that cannot be tested from coverage

87dccc3

Update contributors list and changelog

5f2694e

sappelhoff reviewed Sep 27, 2021

View reviewed changes

docs/changelog.rst Outdated Show resolved Hide resolved

pybv/io.py Outdated Show resolved Hide resolved

Aniket-Pradhan added 2 commits September 27, 2021 19:36

remove redundant comments/pragmas

b307968

add website and fix naming syntax

f1fac36

cbrnr reviewed Sep 27, 2021

View reviewed changes

docs/changelog.rst Outdated Show resolved Hide resolved

sappelhoff added this to the 0.6.0 milestone Sep 27, 2021

Update docs/changelog.rst

10cacf2

Co-authored-by: Clemens Brunner <clemens.brunner@gmail.com>

sappelhoff approved these changes Sep 27, 2021

View reviewed changes

Fix indentation error E129

2b97fbc

Aniket-Pradhan force-pushed the patch-0 branch from 87a2edb to 2b97fbc Compare September 27, 2021 16:46

sappelhoff reviewed Sep 28, 2021

View reviewed changes

pybv/io.py Outdated Show resolved Hide resolved

Apply suggestions from code review

9054fb5

simplify check

7ef6235

sappelhoff approved these changes Sep 29, 2021

View reviewed changes

changelog style

cf048b7

cbrnr merged commit 0088662 into bids-standard:main Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force little-endian format when writing eeg datafile #80

Force little-endian format when writing eeg datafile #80

Aniket-Pradhan commented Sep 24, 2021 •

edited by sappelhoff

Aniket-Pradhan commented Sep 24, 2021

codecov bot commented Sep 24, 2021 •

edited

sappelhoff left a comment

sappelhoff left a comment

Aniket-Pradhan commented Sep 27, 2021 •

edited

sappelhoff commented Sep 27, 2021

cbrnr commented Sep 27, 2021

cbrnr commented Sep 27, 2021 •

edited

Aniket-Pradhan commented Sep 27, 2021 •

edited

sappelhoff commented Sep 27, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

Aniket-Pradhan commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 29, 2021

cbrnr commented Sep 29, 2021

sappelhoff commented Sep 29, 2021

Aniket-Pradhan commented Sep 29, 2021

Aniket-Pradhan commented Sep 29, 2021

sappelhoff commented Sep 29, 2021

Force little-endian format when writing eeg datafile #80

Force little-endian format when writing eeg datafile #80

Conversation

Aniket-Pradhan commented Sep 24, 2021 • edited by sappelhoff

Aniket-Pradhan commented Sep 24, 2021

codecov bot commented Sep 24, 2021 • edited

Codecov Report

sappelhoff left a comment

Choose a reason for hiding this comment

sappelhoff left a comment

Choose a reason for hiding this comment

Aniket-Pradhan commented Sep 27, 2021 • edited

sappelhoff commented Sep 27, 2021

cbrnr commented Sep 27, 2021

cbrnr commented Sep 27, 2021 • edited

Aniket-Pradhan commented Sep 27, 2021 • edited

sappelhoff commented Sep 27, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

Aniket-Pradhan commented Sep 28, 2021

sappelhoff commented Sep 28, 2021

cbrnr commented Sep 29, 2021

cbrnr commented Sep 29, 2021

sappelhoff commented Sep 29, 2021

Aniket-Pradhan commented Sep 29, 2021

Aniket-Pradhan commented Sep 29, 2021

sappelhoff commented Sep 29, 2021

Aniket-Pradhan commented Sep 24, 2021 •

edited by sappelhoff

codecov bot commented Sep 24, 2021 •

edited

Aniket-Pradhan commented Sep 27, 2021 •

edited

cbrnr commented Sep 27, 2021 •

edited

Aniket-Pradhan commented Sep 27, 2021 •

edited