Knowledge Base
related project: https://github.com/orgs/ilastik/projects/3
-
DataSelectionApplet
andBatchProcessingApplet
are initialized withsyncWithImageIndex=False
which prevents automatic lane creation:DataSelectionApplet
signals the workflow/shell to add/remove lanes. Other applets' slots are then resized accordingly -
BatchProcessingApplet
: has no top-level operator. Encapsulates the logic to drive the other applets - Applets can be hidden from the applet bar setting
interactive=False
-
AppletGuiInterface
: defines interface for multi-lane guis- Use for normal multi-lane guis where image lanes are truly separate.
-
Applet.getMultiLaneGui
: class to provide all controls (applet gui, centralWidget, layerControls, and menu if applet has one) for each configured lane - Mixin inheritance pattern has proven useful to combine guis that might also be used separately. E.g.
EdgeTrainingWithMulticutGui
- Synchronization between applets is realized using slots in top-level operators, but an applet can also report progress and fire signals to the shell
-
TopLevelOperator
: needs to implement three functions fromOpMultiLaneWrapper
:addLane
,removeLane
,getLane
-
OpMultiLaneWrapper
: adds the concept of lanes around single-lane top level operators of applets. - Serialization only happens if a slot is dirty. To serialize data that is not part of a slot, create a fake output slot.
- GUI shell
- represents main window and menu bars
- provides utilities like SVG operator diagram export and
numpyAllocationTracking
(in debug mode and if our package is installed, it will display our biggest allocations) - listens to the progress signals of applets and shows a progress bar until it is sent a progress of 100%
- the viewer is a stacked view containing all the viewers (
centralWidgets
) corresponding to the applets in the workflow (which have been opened before, are constructed lazily), and always brings the current one to the top. - handles the
ImageNameList
that represents multiple lanes when multiple datasets are opened in the GUI, and also provides the dropdown menu to select the current image - holds the
ProjectManager
to interact with.ilp
-files
- Headless shell
- implements only a barebones shell with
ProjectManager
- implements only a barebones shell with
-
Workflow
is just a bigOperator
. EachApplet
inside has its own top-level operator - Has two names: one that is displayed to the user (
displayName
), and one used to identify the workflow type from project files. -
defaultAppletIndex
: index of the applet inWorkflow.applets
that is shown when a new project is opened - First applet is generally
DataSelctionApplet
- need to provide
ROLE_NAMES
describing which data can be used as input: mandatory names need to come first - FIXME: from the GUI it is not obvious which roles are optional
-
imageNameListSlot
: must be overridden in the workflow: should return a list of image names - required axis ordering can be enforced here
- need to provide
-
workflow_cmdline_args
: whatever command line args that weren't consumed by mainArgumentParser
-
connectLane
: most important function, executed once a lane has been added to every applet- extra operators can be inserted between the applets here, but should have the workflow set as their parent
- logic for managing events between applets, or how applets affect one another should be added here
-
DataExportApplet.topLevelOperator.getLane(laneIndex)
: the images to export have to be connected one-by-one -
Attention: if slots from multiple lanes are connected to a shared operator like a classifier, it will become dirty even if no new training data is available on the new lane yet. This can be circumvented by storing the previous state in
prepareForNewLane()
and restoring it inhandleNewLanesAdded()
.
- Pixelclassification works on multiple lanes simultaneously -> cannot use
OpMultiLaneWrapper
, as it shares lanes in the classifier. It was written by hand with level-1 slots -
handleAppletStateUpdateRequested
: signal fired by applet; up to the applet writer to fire it whenever it makes sense (workflow is not polling for applet changes). So in the workflow, status of all applets should be checked here. Enable/disable applets -
handleAppletChanged
: most of the workflows don't do something here. Fired when user switches applets by clicking. -
getLane
: returns an object (view) of one specific internal operator in aMultiLaneOperator
with the applet inputs connected -
onProjectLoaded()
is the method where the workflow can be set up, also performing additional configuration based oncmdline
args which were provided to to the constructor already, but there we can't configure anything yet because applets are not set up. - In the case of exporting
N
lanes,DataExportApplet
will call its methodsprepare_for_entire_export()
,N x prepare_for_lane_export()
,N x post_process_lane_export()
, andpost_process_entire_export()
. The workflow can implement them by monkey patching them into theDataExportApplet
. - The multicut workflow could be seen as a reference, as it is closest to how workflows were meant to be designed.
- command-line arguments are consumed in a hierarchical manner: first
ilastik_main.py
, then the respectiveworkflow
to theapplets
. Each provides unused arguments to the next level. - if
freezeCache
isTrue
: cached slot will always return something: zeros. The point of freeze cache is to not compute -
prepareForNewLane
in pixelClassification includes the following hack: as a result of dirty notifications the classifier might get destroyed when a new lane is added; so the classifier state is stored before the graph is modified, and inserted after setup is finished - in batch processing for each file: add a lane, do processing and export, remove lane. Essentially it should work with a single lane as well, by just changing the file name, but in practice it somehow didn't work
(There is an outdated wish-list by Stuart for improvements.)
- If an output of an operator should be cached, create a wrapper operator that internally feeds the original output to the cache, and serves the cached output to the outside.
- Some operators have an output and cache, where the output is connected to the cache-input internally. This is bad design and should not be done.
-
OpBlockedArrayCache
: caches results of a configuredblockShape
, so all upstream operators only see requests of this shape, and requests to the operator are broken down into the required blocks- Has a bypass mode for headless mode. In bypass mode it makes sure requests still respect the
blockShape
even though they are not cached. - requests of shape e.g. (1, None, None, None, 3): single timeslice, all spatial dimensions and 3 channels
- Has a bypass mode for headless mode. In bypass mode it makes sure requests still respect the
-
cacheMemoryManager
: background thread, keeps a list of all caches and manages them. usespsutil
viaMemory
class in lazyflow -
OpSimpleBlockedArrayCache
: makes sureOpUnblockedArayCache
never sees a request that isn't block aligned- manages two different kinds of locks: one for the whole cache and little locks, one for each block
-
fixAtCurrent
: dirty notifications are not forwarded immediately. once this is set toFalse
, they are forwarded downstream. Related tofreezeCache
-
CleanBlocks
: for saving currently valid slices to the project and restoring again
-
OpUnblockedArrayCache
, not to be used in workflow directly. Not aware of blocking scheme. Stores each requests exactly as providedroi
by savingstart
,stop
anddata
. LZ4 compressedvigra.ChunkedArrays
-
opCompressedCache
: depracated, should not used for new operators, deprecation warning, only operator tat needs it OpCompressedUserLabelArray -> stores brushstrokes (requiressetDirty
functionality). -
OpSlicedBlockedArrayCache
is a combination of 3 caches forxy
,xz
andyz
planes to serve ortho-view viewer requests as fast as possible.
-
jsonConfig
, module developed by Stuart, can specify types for fields. should not be used anymore, usejsonSchema
instead -
build the docs:
push-to-origin-gh-pages.sh
-
dirty
-notifications are there predominantly for the viewer; to know what to re-request when something has been changed -
request cancellation is implemented by raising an exception, but that doesn't work with foreign threads, so use lazyflow's
threadPool
. -
lazyFlowClassifiers: two kinds: Vector-wise and pixel-wise
- vector-wise: more or less no spatial awareness: flat list of vectors
-
Request
:threadPool
: can be used independently, guarantees that functions run on the same threads if paused and resumed -
RESTFulVolume
: huge volume in multiple h5 files, interfaced as a single volume in ilastik -
multiprocessHdf.py
: compressed h5py reading is slow. This utility uses the multiprocessing module to read compressed data. Could get rid off. -
roi.py
:tinyVector
, not related to vigra. Motivation: numpy arrays have a large amount of overhead. Could be simplified, as it implements subset of numpy's API. Do timing analysis: construction, vs computation and then maybe get rid off- did some analysis of construction times of 3-element
TinyVector
vs 3-elementnumpy.array
.TinyVector
seems to be about 2.5-fold faster:
# in ipython: >>> import random >>> from lazyflow.roi import TinyVector >>> import numpy >>> %timeit [numpy.array((random.random(), random.random(), random.random())) for i in range(int(10e5))] 1.49 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) >>> %timeit [ TinyVector((random.random(), random.random(), random.random())) for i in range(int(10e5))] 608 ms ± 3.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
- remainning TODO: computational comparison, and actual usage in the code
- did some analysis of construction times of 3-element
-
BigRequestStreamer
: splits up a big request into small ones, either user specifies sub-request shape, or it is automatically estimated- estimate is based on RAM per pixel, number of threads, ..., such that all processors are busy and all requests fit in RAM. TODO: benchmark optimal request sizes
- uses RoiRequestBatch to process multiple ROIs at once, can report progress
- Slots have multiple levels.
- Level-zero means just directly connecting "values"
- Level-one slots = multislot of level-zero slots. When connecting a level-1 slot to another level-1 slot: it only connects partners (the individual slots inside those level-1 slots) directly.
- It is allowed to connect from a single level-zero slot to a (multi) level-one slot, will duplicate data for all contained level-zero slots. Going from level-one slot to level-zero slot is forbidden.
- resizing a multislot means adding more lanes, which needs to be propagated, sometimes even backwards. This is currently done by recursive calls. Very naïve, and really slow because calling
setupOutputs
on everything on every connected slot's setup changes is inefficient. To prevent this snowball effect propagating downstream all the time, there is a feature ingraph.py
:- methods of the
Graph
with@is_setup_fn
decorator mean "graph topology changes". It internally starts to increase some depth counter of changes which will go back to zero when all changes are done. Executes a callback once everything is done, because e.g. Gui only needs to be updated once. Register to this viacall_when_setup_finished
.
- methods of the
- Slots can have a single input (called
partner
), and can be connected to multiple outputs (calledpartners
). (Should be renamed to upstream and downstream for clarity) - A
Slot
doesn't have aparent
, but it belongs to anOperator
and has a reference to it- multi-level slots are "operators" to slots that are contained within them. To still get the surrounding operator use
getRealOperator()
. TODO_getRealOperator
: should be a more informative name likegetParent
, we need to name the enclosing multi-level slot kind of differently
- multi-level slots are "operators" to slots that are contained within them. To still get the surrounding operator use
- Slots have many signals that one can be notified about
-
OutputSlot
implements theexecute
method to be able to behave like anOperator
, which is needed in slot to slot connection cases without an operator in between -
ValueRequest
: when using arequestObject
(request.py) it is assumed that there is some computation involved in getting the values. But for simple values, this whole machinery with greenlets is overkill. So for value (or non-compute) slots theValueRequest
mimics aRequest
, but doesn't trigger the machinery- there should be multiple types of slots: 1) slots that are like a parameter, and 2) slots that support slicing. Of both kinds, slots can be a simple value or computed.
- in
setupOutputs
: never call another operatorsexecute
function: it changesgraph
while it is being setup -
ObejectclassificationWorkflow
: implemented with the ideas of usingstype
andrtype
, and made it a lot more complicated - a slot has a
MetaDict
for additional configuration, whose key/value entries can be accessed by dot notation (.attr
instead of['attr']
). Is this necessary?
-
Operator.setInSlot
: to shove in data from the project file at loading. Goes through the connections in thegraph
. Will callsetInSlot
for all connected slots -
RoiRequestBatch
: totalVolume: in number of pixels -
RequestOperationWrapper
: Should only appear in tracebacks -
setDirty
: no change if the value is the same as before -
Graph
: gui is most expensive listener for signals from slots. If many changes appear, the gui should only listen to the last signal of a particular kind from a particular origin, thus usescall_when_setup_finished
-
RequestExecutionWrapper
allows to request values from slots from multiple threads, and should be created from the main thread. Warning: can lead to deadlocks when it performs actions during graph setup that request the execution of other operators, so may only retrieve values from parameter slots at that time.
-
back_propagate_values
:- either you have a connected slot or you call
setValue
- it enables setting
setValue
- either you have a connected slot or you call