Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to_dataframe() to Record. #380

Merged
merged 1 commit into from
Jun 30, 2022
Merged

Conversation

Ivorforce
Copy link
Contributor

Convenient for a quick start with records.

Copy link
Member

@tompollard tompollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I'll leave @bemoody and/or @cx1111 to comment too.

wfdb/io/record.py Show resolved Hide resolved
)

return pd.DataFrame(
data=self.p_signal,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an option to set data to different signal types, e.g. d_signal?

Copy link
Contributor Author

@Ivorforce Ivorforce Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps; I added a variable check to support all 4 combinations of data. I'm not sure if it makes sense for digital data though so I'll let someone else decide whether we want to keep it.

@cx1111
Copy link
Member

cx1111 commented Jun 1, 2022

It doesn't seem very efficient to store a regularly sampled signal as a dataframe. What's the use-case for this? To filter samples based on a datetime range? Perhaps it wold be better to add methods that get you the first index of the signal before, after, or between datetimes, which would incidentally solve for the d_signal/p_signal issue mentioned above.

Let me know if the dataframe is more useful for other reasons.

@tompollard
Copy link
Member

It doesn't seem very efficient to store a regularly sampled signal as a dataframe. What's the use-case for this?

Maybe not efficient, but lots of people are familiar with Pandas DataFrames which is good for accessibility. This small chunk of code allows people to quickly play around with the data in a familiar structure. e.g. perhaps not a great example of a use case, but I could do something like:

View the data:

import wfdb
import matplotlib.pyplot as plt
import seaborn as sns

record = wfdb.rdrecord('sample-data/a103l')
df = record.to_dataframe()
df.head()

                              II         V     PLETH
0 days 00:00:00        -0.023596  0.867586  0.482203
0 days 00:00:00.004000 -0.036981  0.982985  0.544374
0 days 00:00:00.008000 -0.062923  0.859791  0.478212
0 days 00:00:00.012000 -0.092452  0.788783  0.442857
0 days 00:00:00.016000 -0.094522  0.851996  0.474302

Plot the distribution of measurements:

sns.boxplot(data=df)
plt.show()

Figure_1

@cx1111
Copy link
Member

cx1111 commented Jun 2, 2022

Ok, makes sense. @Ivorforce can you please add a unit test and an example in the demo notebook?

@Ivorforce Ivorforce force-pushed the record-to-dataframe branch 2 times, most recently from 38d3679 to 88af056 Compare June 3, 2022 10:07
@Ivorforce
Copy link
Contributor Author

Ivorforce commented Jun 3, 2022

Yes, sorry for not providing motivation - @tompollard got it exactly right. I opened this PR for accessibility to new people and compatibility to existing libraries.

Editing an ipython notebook is a bit of a struggle if you don't want to change metadata, but I added a unittest and a demo like asked. I also added support for digital and expanded signals. Let me know if it's ready like this.

The demo, for quick reference:
Bildschirmfoto 2022-06-03 um 12 11 14

@bemoody
Copy link
Collaborator

bemoody commented Jun 3, 2022

This looks nice! I don't really use Pandas so I don't have much to add here. One question that comes to mind: is there a way for the dataframe to indicate the physical units of each channel/column?

@Ivorforce
Copy link
Contributor Author

No support for units in dataframes. See this issue: pandas-dev/pandas#10349

@Ivorforce
Copy link
Contributor Author

I switched to pd.Timedelta() because it has a higher precision than datetime.timedelta, resulting in a lower probability of precision loss for the index.

@tompollard
Copy link
Member

tompollard commented Jun 27, 2022

@Ivorforce it looks like one of the tests is failing. Please could you take a look when you have time? See: https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true

_________________________ TestRecord.test_to_dataframe _________________________
[18](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:19)
[19](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:20)
self = <tests.test_record.TestRecord testMethod=test_to_dataframe>
[20](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:21)
[21](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:22)
    def test_to_dataframe(self):
[22](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:23)
        record = wfdb.rdrecord("sample-data/test01_00s")
[23](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:24)
        df = record.to_dataframe()
[24](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:25)
[25](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:26)
        self.assertEqual(record.sig_name, list(df.columns))
[26](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:27)
        self.assertEqual(len(df), record.sig_len)
[27](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:28)
>       self.assertEqual(df.index[0], pd.Timedelta())
[28](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:29)
[29](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:30)
tests/test_record.py:538: 
[30](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:31)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[31](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:32)
[32](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:33)
>   ???
[33](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:34)
E   ValueError: cannot construct a Timedelta without a value/unit or descriptive keywords (days,seconds....)
[34](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:35)
[35](https://github.com/MIT-LCP/wfdb-python/runs/6904057193?check_suite_focus=true#step:5:36)
pandas/_libs/tslibs/timedeltas.pyx:986: ValueError

@Ivorforce
Copy link
Contributor Author

Whoops, looks like I missed this. Tests should pass now.

@cx1111 cx1111 merged commit 14df878 into MIT-LCP:main Jun 30, 2022
@cx1111
Copy link
Member

cx1111 commented Jun 30, 2022

Ty!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants