Skip to content

Separate DataSetBlock from DataSet#233

Merged
yousefmoazzam merged 17 commits intofeature/transparent-file-storefrom
feature/refactor-dataset-block
Mar 5, 2024
Merged

Separate DataSetBlock from DataSet#233
yousefmoazzam merged 17 commits intofeature/transparent-file-storefrom
feature/refactor-dataset-block

Conversation

@yousefmoazzam
Copy link
Copy Markdown
Collaborator

Changes

There's several ways to describe the main change at a high-level:

  • change DataSetBlock such that it is fully functional without being a child class of DataSet
  • completely decouple DataSetBlock from DataSet
  • allow the creation of blocks (instances of DataSetBlock) without a "base" (instance of DataSet)

At a lower-level, the changes are:

  • modify DataSetBlock to be fully functional (as required by methods and the task runner) without using DataSet as a base class
  • remove the use of DataSet and FullFileDataSet from the data store writer + reader and task runner
  • remove the use of FullFileDataSet from StandardTomoLoader
  • remove DataSet and FullFileDataSet completely

Motivation

This change was partly motivated by the difficulties seen in trying to use FullFileDataSet in StandardTomoLoader to create blocks (ie, to create instances of DataSetBlock).

Another reason for this change was that, as development has continued, it was recognised that the main data object that methods and the task runner work with are blocks rather than chunks. In other words, blocks are the fundamental data object in httomo, not chunks. On the other hand, the original dataset design involves a class hierarchy where a chunk (represented by DataSet) is the fundamental data object, and from which other data objects derive from to get certain behaviour (DataSetBlock and FullFileDataSet). These two choices about which data object is the fundamental one clash, and consequences of this clash are likely manifested in some of the difficulties that have been encountered during development.

team-gpu and others added 15 commits February 28, 2024 17:19
Co-authored-by: Yousef Moazzam <yousefmoazzam@users.noreply.github.com>
Refactored DataSetBlock to be standalone and more robust
Many block reading tests using the loader are now broken due to moving
away from using `FullFileDataSet` to generate blocks for the loader, but
some have been fixed in this change.

In particular, the block reading loader tests that have been fixed are
the ones that don't preview/crop the data to be loaded, and the ones
that still need to fixed are the ones that use previewing (and thus
require some offsetting to do the block reading).
Refactor loader to create blocks directly, without using `FullFileDataSet`
Co-authored-by: Yousef Moazzam <yousefmoazzam@users.noreply.github.com>
…tasetblock

Fix methods with new datasetblock (WIP)
In addition to getting this test to work with the refactored
`DataSetBlock`, the test's behaviour has also been modified slightly:
- only one method is used to process the two blocks (two methods in the
  section was unnecessary for this test's purpose of checking that the
  last block being processed triggers the task runner to update its
  collection of side outputs)
- after the first block of two has been processed, it's asserted that
  the task runner hasn't yet attempted to update its collection of side
  outputs
@yousefmoazzam yousefmoazzam force-pushed the feature/refactor-dataset-block branch 2 times, most recently from bd2c34b to 298e4f8 Compare March 5, 2024 15:00
A consequence of changes being done in parallel to how darks/flats are
handled in the pipelione mean that it'll soon no longer be necessary to
account for the darks/flats in the max slices estimation.

Doing the removal here makes it simpler to handle darks/flats from the
loader's perspective, because now that the max slices estimation doesn't
need the darks/flats, the task runner won't need to ask the loader for
the darks/flats. This means that the loader doesn't need the behaviour
of translating empty darks/flats in the form of `None` into empty arrays
(which is currently only in `DataSetBlock`).

Put another way, removing the darks/flats input to the max slices
calculation here allows the avoidance of copying the logic in
`DataSetBlock` to translate `None` darks/flats values to empty arrays,
because the task runner no longer needs dark/flats from the loader for
max slices calculation. (Putting the logic into `AuxiliaryData` to be
usable by both the loader and the block class was also an option, but
it felt messy to give the `AuxiliaryData` the slicing dimension and data
shape in order to produce the appropriately shaped empty array in the
detector y and detector x dimensions).

Now, only `DataSetBlock` has the need to be able to translate `None`
darks/flats values to appropriately shaped empty arrays.
@yousefmoazzam yousefmoazzam force-pushed the feature/refactor-dataset-block branch from 298e4f8 to f73ab5c Compare March 5, 2024 15:13
The original intention of this check was to protect against having less
angles than projections. However, in the case of a pipeline involving
360 data, the angles get truncated to half their original length, while
the reconstruction shape in the 0th dimension remains (in general) much
larger than the number of angles after truncation.

This check perhaps could be moved to the loader to check when angles are
being loaded. Alternatively, the numpy array containing the angles in
the `AuxiliaryData` could be locked/ not writeable by default to provide
a higher level of certainty that the angles are only modified in a few
exempt situations, as a protection measure against the the angles array
becoming inconsistent with the data.
@yousefmoazzam yousefmoazzam marked this pull request as ready for review March 5, 2024 16:08
@yousefmoazzam yousefmoazzam merged commit 9b1aa0a into feature/transparent-file-store Mar 5, 2024
@yousefmoazzam yousefmoazzam deleted the feature/refactor-dataset-block branch March 5, 2024 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Finish integration of DataSetBlock changes in feature/refactor-dataset-block

2 participants