Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data class refactor #161

Merged
merged 28 commits into from
Jun 30, 2023
Merged

Data class refactor #161

merged 28 commits into from
Jun 30, 2023

Conversation

cgohil8
Copy link
Collaborator

@cgohil8 cgohil8 commented Jun 29, 2023

Massive refactor of the Data class

Changes:

  • This PR contains an API change for the Data.prepare method and the load_data wrapper in the config API.
  • Standardisation no longer occurs by default when calling .prepare().
  • Can now chain different preparation methods.
  • Can easily add new (parallelised) methods for preparing the data.
  • Added new methods for: downsampling, calculating amplitude envelopes, applying a sliding window, standardization, TDE and PCA.
  • Can set the buffer size.
  • Better naming of variables.
  • Updated examples and tutorials for the new data preparation.
  • We no longer support reading SPM files. OSL supports SPM to fif conversion, which can be read in osl-dynamics. (We can still read Matlab files.)

Preparation methods

We can now chain methods, e.g. for TDE-PCA data preparation we use:

data.tde_pca(n_embeddings=15, n_pca_components=80)
data.standardize()

# equvialently
data = data.tde_pca(n_embeddings=15, n_pca_components=80)
data = data.standardize()

# equivalently
data = data.tde_pca(n_embeddings=15, n_pca_components=80).standardize()

To prepare amplitude envelope data we use:

data.filter(low_freq=1, high_freq=45)
data.amplitude_envelope()
data.sliding_window(n_window=5)
data.standardize()

Tests confirm all new data preparation methods produce the same data as old code.

The Data.prepare method now takes a dict which contains methods to call in series. E.g.

methods = {
    "tde_pca": {"n_embeddings": 15, "n_pca_components": 80},
    "standardize": {},
}
data.prepare(methods)

(The order of the dict should be preserved from creation - added in Python 3.6+)

Saving and loading

Internal variables names were changed significantly. In particular, the Data.save method was changed significantly. However, the changes to saving and loading are backwards compatible, data saved with old code will still be readable.

Potential improvements

Rather than defining _apply functions in each method, we could see if we can use functools.partial or define a decorator for the functions in data.processing.

@cgohil8 cgohil8 linked an issue Jun 29, 2023 that may be closed by this pull request
@cgohil8 cgohil8 force-pushed the data_class branch 2 times, most recently from 89d5f99 to 6910f5e Compare June 29, 2023 21:15
This commit reproduces the old code when doing the following:
- Calling .prepare(n_embeddings=15, n_pca_components).
  This means the new tde_pca and standardize methods are working.
- Calling .filter(low_freq=1, high_freq=10).
- Calling .filter(low_freq=1, low_freq=10);.amp_env();
  .sliding_window(n_window=5);.standardize() reproduces
  .prepare(amplitude_envelope=True, low_freq=1, high_freq=10, n_window=5).

This means all the preparation methods reproduce the old code.
@cgohil8 cgohil8 force-pushed the data_class branch 2 times, most recently from 860a87b to d4ad2fb Compare June 29, 2023 22:16
@cgohil8 cgohil8 force-pushed the data_class branch 2 times, most recently from dd0a6ed to 9fa5ff6 Compare June 30, 2023 10:44
In particular, I checked:

    methods = {
        "tde_pca": {"n_embeddings": 15, "n_pca_components": 80},
        "standardize": {},
    }
    data.prepare(methods)

produces exactly the same data as (in the old code):

    data.prepare(n_embeddings=15, n_pca_components=80)

and

    methods = {
        "filter": {"low_freq": 1, "high_freq": 30},
        "amplitude_envelope": {},
        "sliding_window": {"n_window": 5},
        "standardize": {},
    }
    data.prepare(methods)

produces exactly the same data as (in the old code):

    data.prepare(amplitude_envelope=True, low_freq=1, high_freq=30, n_window=5)
@evanr70
Copy link
Contributor

evanr70 commented Jun 30, 2023

To implement true chaining (data.method1(...).method2(...).method3(...)), we should return self from any methods for which it makes sense (like .standardize)

osl_dynamics/data/rw.py Outdated Show resolved Hide resolved
Copy link
Contributor

@evanr70 evanr70 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make a few operational changes to pipeline and rw, but excellent work overall

@evanr70 evanr70 self-requested a review June 30, 2023 14:09
@cgohil8 cgohil8 merged commit fdf57bb into main Jun 30, 2023
1 check passed
@cgohil8 cgohil8 deleted the data_class branch June 30, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Option to set buffer size for shuffling
3 participants