Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: PyBIDS 1.0 layout API #989

Open
effigies opened this issue Apr 25, 2023 · 8 comments
Open

RFC: PyBIDS 1.0 layout API #989

effigies opened this issue Apr 25, 2023 · 8 comments

Comments

@effigies
Copy link
Collaborator

Context: Over the last year or so, we've looked into replacing the SQLAlchemy-based BIDSLayout with one based on ANCP-BIDS. In the process, we've found issues in the current API where we are forced to decide whether it's worth shimming the old behavior in to avoid breaking things for users (if there are any).

Ultimately, we've decided that the API is too large and has features that seem to have more maintenance burden than value to users. So we want to propose a new API.

We would like this API to be:

  1. Small. Understanding it should take ~10 minutes.
  2. Reimplementable. If you have a better implementation, it should be cheap for people to switch to yours.
  3. Schema-based. The BIDS standard now has a schema; it should be possible to swap in an updated schema to support unmerged BEPs.

PyBIDS 1.0 layout API

Base types (see PaddedInt):

Index = PaddedInt
Value = str
Entity = Literal['subject', 'session', ...]  # Could be an enum...

Data structures:

class Schema:
    """Representation of the state of BIDS schema
    
    This replaces the concept of a config file, allowing PyBIDS to accept
    BEP or application-specific schemas.
    
    For the time being, this should be considered opaque to users.
    """

class File:
    """Generic file representation
    
    This permits unparseable filenames to be represented.
    """
    path: Path
    _layout: BIDSLayout | None
    
    # Implement os.PathLike
    def __fspath__(self) -> str:
        return str(self.path)
    
    @cached_property
    def relative_path(self) -> Path:
        if not self._layout:
            raise ValueError
        return self.path.relative_to(self._layout.root)

class BIDSFile(File):
    entities: dict[Entity, Value | Index]
    datatype: str | None
    suffix: str | None
    extension: str | None

    @cached_property
    def metadata(self) -> dict[str, Any]:
        """Sidecar metadata aggregated according to inheritance principle"""
        if not self._layout:
            raise ValueError
        ...

class BIDSLayout:
    # Inspired by context: https://github.com/bids-standard/bids-specification/blob/master/src/schema/meta/context.yaml
    schema: Schema
    root: Path
    dataset_description: dict[str, Any]

    # Non-conforming files, maybe better name?
    # could be property
    ignored: list[File]
    
    @cached_property
    def files(self) -> list[BIDSFile]: ...

    @cached_property
    def datatypes(self) -> list[str]: ...

    @cached_property
    def modalities(self) -> list[str]: ...

    @cached_property
    def subjects(self) -> list[str]: ...

    @cached_property
    def entities(self) -> list[Entity]: ...
    
    def get_entities(
        self,
        entity: Entity,
        **filters,
    ) -> list[Value | Index]: ...

    def get_metadata(self, term: str, **filters) -> list[Any]: ...
    def get_files(self, **filters) -> list[BIDSFile]: ...

It can be useful to group multiple datasets into a single logical layout, for example, when querying raw data and derivatives together. Therefore we also propose an explicit collection of layouts, such that each method/property is implemented by aggregating over all of the collected layouts.

class LayoutCollection(BIDSLayout):
    primary: BIDSLayout
    layouts: list[BIDSLayout]

mylayout = LayoutCollection('/path/to/ds')
mylayout.primary.entities  # Only once
mylayout.entities  # Iterative

layout.utils

A module will be provided to provide utilities that only rely on the BIDSLayout API. The goal is to provide implementations of common tasks that will survive refactors and serve as exemplar code for working with the API.

def get_bval(x: BIDSFile, layout: BIDSLayout | None = None) -> BIDSFile: ...
def get_bvec(x: BIDSFile, layout: BIDSLayout | None = None) -> BIDSFile: ...
def get_fieldmap(x: BIDSFile, layout: BIDSLayout | None = None) -> list[BIDSFile]: ...

Some (potentially layout-free) parsing/creation seems useful:

def parse_file(x: Path | str) -> BIDSFile:
    """Construct a layout-free BIDSFile"""
    
def new_file(
    template: BIDSFile | None = None,
    layout: BIDSLayout | None = None,
    *,
    datatype: str | None = None,
    suffix: str | None = None,
    extension: str | None = None,
    **entities,
) -> BIDSFile:
    """Generate new file
        
    If given a template, additional arguments are overrides.
    """

Please let us know if there are things you need to do that cannot be done with this API. We recognize that this will break things, but if there are features you use that require more than 10-20 lines of code to reimplement without access to BIDSLayout internals, please let us know.

@effigies effigies pinned this issue Apr 25, 2023
@pvandyken
Copy link
Contributor

Is the .get() method being eliminated? Not sure if this is an oversight, but if so, it seems a shame, as it's by far the method I most use and get_files adds an extra 6 characters of typing.

@effigies
Copy link
Collaborator Author

Dropping get() was intentional because it currently is an omnibus interface that retrieves files objects, file paths, entity keys, entity values, metadata keys, metadata values, subject dirs and session dirs. It seems much cleaner to separate out the current uses that we want to keep into separate functions. We could keep one of them as get(), and I think it would have to be what we currently have as get_files(), as that was the default return type for the current .get(). Anything else would likely lead to silent failures.

@effigies
Copy link
Collaborator Author

effigies commented May 1, 2023

Question for the group: Do we want to architect this as abstract base classes, to facilitate reimplementation, or keep everything concrete for now?

@pvandyken
Copy link
Contributor

From a typing perspective, the most flexible (and Pythonic?) would probably be a Protocol, which would allow you to sub out the implementation without requiring the implementation to explicitly be a subclass of the pybids BIDSLayout. So my preference would be Protocol > ABC > Concrete implementation.

@effigies
Copy link
Collaborator Author

effigies commented May 1, 2023

Okay. I'll try coding it up as a collection of protocols. I've only used them as a hack before, I'm curious to try this.

@effigies
Copy link
Collaborator Author

effigies commented May 2, 2023

Realizing I should have tagged @erdalkaraca and @ghisvail on this. I suspect you will have opinions.

@wasciutto
Copy link
Contributor

wasciutto commented Jul 6, 2023

Often I am wanting to query and compare at the conceptual level of images instead of having to deal with particular files and their extensions.

To this end, something I would find useful is a BIDSImage class which abstracts away file-level information and lets me deal with all the information pertaining to a particular image - specifically, combining .nii images with their .json sidecar and .tsv timing files, although I could see cases where it would be nice to join more information. For example, joining .nii with its corresponding confounds in the fmriprep derivative dataset.

Rough example of what I'm thinking:

class BIDSImage():
    # least common denominator of source entities
    entities: dict[Entity, Value | Index]
    # suffix of the root image
    suffix: str | None

    def __init__(root_image: BIDSFile, related_files: BIDSFile): pass
    # Return path to the root image file
    def get_root_file() -> Path: pass
    # Return list of all files this class was derived from
    def get_files() -> list[Path]: pass
    # Shared metadata from all files aggregated
    def metadata(self) -> dict[str, Any]: pass

@effigies
Copy link
Collaborator Author

effigies commented Jul 7, 2023

Thanks for the feedback!

I'm not sure that we want this additional abstraction in pybids; more new classes is more cognitive overhead for downstream users.

But making sure you have the tools to quickly build it yourself if it is useful to you is in scope. entities, suffix and get_root_file() come for free from BIDSFile, and possibly metadata, though I'm not sure what "shared metadata from all files aggregated" means. As to get_files(), this seems like the list of inputs passed to __init__, so if we make sure you can do that, we should be good.

clane9 added a commit to childmindresearch/bids2table that referenced this issue Aug 9, 2023
Add a higher level `BIDSTable` interface inspired by the [proposed PyBIDS API redesign](bids-standard/pybids#989).

* Move entities module one level up

* Move `join_bids_path()` helper into `entities`

* Add `BIDSTable` subclass of `DataFrame`

Add `BIDSTable` subclass of `DataFrame` with convenience methods for
accessing subtables and flattened metadata.

* Add long names to entities field metadata

* Add table `filter()` method

Add `BIDSTable.filter()` which filters rows according to a condition
applied to a single column. The supported conditions follow
`pandas.Series.filter()`.

* Add `files` property returning list of `BIDSFile`s

Also change `file` column group to `finfo` to try to limit possible confusion.

* Update example and bug fixes

Bug fixes:

- Set the index of `flat_metadata` to the parent table's index.
- Treat NA in the row mask as False in `filter()`.

* Add properties for subjects, datatypes, etc

* Add `sort_entities()`

* Upgrade required python to >=3.8

* Add `filter_multi` method and documentation

PyBIDS supports querying a layout with multiple filters specified as
keyword arguments. This is a nice interface, and is also useful for
programmatic filtering. Here we add a `filter_multi()` method to do
something similar.

* Flatten JSON metadata only to first level

* Fix mypy error

* Move some things around

* Add `func` arg to `filter()`

Add a `func` arg option to `filter` for arbitrary lambda function
filtering.

Also move `join_bids_path()` into the `table` module.

* More moving around

* Don't use `removeprefix`

* Yet more moving around

* Update example

* Add a comment on the filter api

* Change arg name output -> index_path

Having the argument be `output` in `bids2table` was confusing when you
only want to load a table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants