Skip to content
k-dominik edited this page Sep 8, 2017 · 23 revisions

related project: https://github.com/orgs/ilastik/projects/3

Ilastik

Applet

  • DataSelectionApplet and BatchProcessingApplet are initialized with syncWithImageIndex=False which prevents automatic lane creation: DataSelectionApplet signals the workflow/shell to add/remove lanes. Other applets' slots are then resized accordingly
  • BatchProcessingApplet: has no top-level operator. Encapsulates the logic to drive the other applets
  • Applets can be hidden from the applet bar setting interactive=False
  • AppletGuiInterface: defines interface for multi-lane guis
    • Use for normal multi-lane guis where image lanes are truly separate.
  • Applet.getMultiLaneGui: class to provide all controls (applet gui, centralWidget, layerControls, and menu if applet has one) for each configured lane
  • Mixin inheritance pattern has proven useful to combine guis that might also be used separately. E.g. EdgeTrainingWithMulticutGui
  • Synchronization between applets is realized using slots in top-level operators, but an applet can also report progress and fire signals to the shell
  • TopLevelOperator: needs to implement three functions from OpMultiLaneWrapper: addLane, removeLane, getLane
  • OpMultiLaneWrapper: adds the concept of lanes around single-lane top level operators of applets.
  • Serialization only happens if a slot is dirty. To serialize data that is not part of a slot, create a fake output slot.

Shell

  • GUI shell
    • represents main window and menu bars
    • provides utilities like SVG operator diagram export and numpyAllocationTracking (in debug mode and if our package is installed, it will display our biggest allocations)
    • listens to the progress signals of applets and shows a progress bar until it is sent a progress of 100%
    • the viewer is a stacked view containing all the viewers (centralWidgets) corresponding to the applets in the workflow (which have been opened before, are constructed lazily), and always brings the current one to the top.
    • handles the ImageNameList that represents multiple lanes when multiple datasets are opened in the GUI, and also provides the dropdown menu to select the current image
    • holds the ProjectManager to interact with .ilp-files
  • Headless shell
    • implements only a barebones shell with ProjectManager

Workflow

  • Workflow is just a big Operator. Each Applet inside has its own top-level operator
  • Has two names: one that is displayed to the user (displayName), and one used to identify the workflow type from project files.
  • defaultAppletIndex: index of the applet in Workflow.applets that is shown when a new project is opened
  • First applet is generally DataSelctionApplet
    • need to provide ROLE_NAMES describing which data can be used as input: mandatory names need to come first
    • FIXME: from the GUI it is not obvious which roles are optional
    • imageNameListSlot: must be overridden in the workflow: should return a list of image names
    • required axis ordering can be enforced here
  • workflow_cmdline_args: whatever command line args that weren't consumed by main ArgumentParser
  • connectLane: most important function, executed once a lane has been added to every applet
    • extra operators can be inserted between the applets here, but should have the workflow set as their parent
    • logic for managing events between applets, or how applets affect one another should be added here
    • DataExportApplet.topLevelOperator.getLane(laneIndex): the images to export have to be connected one-by-one
    • Attention: if slots from multiple lanes are connected to a shared operator like a classifier, it will become dirty even if no new training data is available on the new lane yet. This can be circumvented by storing the previous state in prepareForNewLane() and restoring it in handleNewLanesAdded().
  • Pixelclassification works on multiple lanes simultaneously -> cannot use OpMultiLaneWrapper, as it shares lanes in the classifier. It was written by hand with level-1 slots
  • handleAppletStateUpdateRequested: signal fired by applet; up to the applet writer to fire it whenever it makes sense (workflow is not polling for applet changes). So in the workflow, status of all applets should be checked here. Enable/disable applets
  • handleAppletChanged: most of the workflows don't do something here. Fired when user switches applets by clicking.
  • getLane: returns an object (view) of one specific internal operator in a MultiLaneOperator with the applet inputs connected
  • onProjectLoaded() is the method where the workflow can be set up, also performing additional configuration based on cmdline args which were provided to to the constructor already, but there we can't configure anything yet because applets are not set up.
  • In the case of exporting N lanes, DataExportApplet will call its methods prepare_for_entire_export(), N x prepare_for_lane_export(), N x post_process_lane_export(), and post_process_entire_export(). The workflow can implement them by monkey patching them into the DataExportApplet.
  • The multicut workflow could be seen as a reference, as it is closest to how workflows were meant to be designed.

Unsorted remarks

  • command-line arguments are consumed in a hierarchical manner: first ilastik_main.py, then the respective workflow to the applets. Each provides unused arguments to the next level.
  • if freezeCache is True: cached slot will always return something: zeros. The point of freeze cache is to not compute
  • prepareForNewLane in pixelClassification includes the following hack: as a result of dirty notifications the classifier might get destroyed when a new lane is added; so the classifier state is stored before the graph is modified, and inserted after setup is finished
  • in batch processing for each file: add a lane, do processing and export, remove lane. Essentially it should work with a single lane as well, by just changing the file name, but in practice it somehow didn't work

Lazyflow

(There is an outdated wish-list by Stuart for improvements.)

Caches

  • If an output of an operator should be cached, create a wrapper operator that internally feeds the original output to the cache, and serves the cached output to the outside.
    • Some operators have an output and cache, where the output is connected to the cache-input internally. This is bad design and should not be done.
  • OpBlockedArrayCache: caches results of a configured blockShape, so all upstream operators only see requests of this shape, and requests to the operator are broken down into the required blocks
    • Has a bypass mode for headless mode. In bypass mode it makes sure requests still respect the blockShape even though they are not cached.
    • requests of shape e.g. (1, None, None, None, 3): single timeslice, all spatial dimensions and 3 channels
  • cacheMemoryManager: background thread, keeps a list of all caches and manages them. uses psutil via Memory class in lazyflow
  • OpSimpleBlockedArrayCache: makes sure OpUnblockedArayCache never sees a request that isn't block aligned
    • manages two different kinds of locks: one for the whole cache and little locks, one for each block
    • fixAtCurrent: dirty notifications are not forwarded immediately. once this is set to False, they are forwarded downstream. Related to freezeCache
    • CleanBlocks: for saving currently valid slices to the project and restoring again
  • OpUnblockedArrayCache, not to be used in workflow directly. Not aware of blocking scheme. Stores each requests exactly as provided roi by saving start, stop and data. LZ4 compressed vigra.ChunkedArrays
  • opCompressedCache: depracated, should not used for new operators, deprecation warning, only operator tat needs it OpCompressedUserLabelArray -> stores brushstrokes (requires setDirty functionality).
  • OpSlicedBlockedArrayCache is a combination of 3 caches for xy, xz and yz planes to serve ortho-view viewer requests as fast as possible.

Unsorted remarks

  • jsonConfig, module developed by Stuart, can specify types for fields. should not be used anymore, use jsonSchema instead

  • build the docs: push-to-origin-gh-pages.sh

  • dirty-notifications are there predominantly for the viewer; to know what to re-request when something has been changed

  • request cancellation is implemented by raising an exception, but that doesn't work with foreign threads, so use lazyflow's threadPool.

  • lazyFlowClassifiers: two kinds: Vector-wise and pixel-wise

    • vector-wise: more or less no spatial awareness: flat list of vectors
  • Request: threadPool: can be used independently, guarantees that functions run on the same threads if paused and resumed

  • RESTFulVolume: huge volume in multiple h5 files, interfaced as a single volume in ilastik

  • multiprocessHdf.py: compressed h5py reading is slow. This utility uses the multiprocessing module to read compressed data. Could get rid off.

  • roi.py: tinyVector, not related to vigra. Motivation: numpy arrays have a large amount of overhead. Could be simplified, as it implements subset of numpy's API. Do timing analysis: construction, vs computation and then maybe get rid off

    • did some analysis of construction times of 3-element TinyVector vs 3-element numpy.array. TinyVector seems to be about 2.5-fold faster:
    # in ipython:
    >>> import random
    >>> from lazyflow.roi import TinyVector
    >>> import numpy
    >>> %timeit [numpy.array((random.random(), random.random(), random.random())) for i in range(int(10e5))]
    1.49 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    >>> %timeit [ TinyVector((random.random(), random.random(), random.random())) for i in range(int(10e5))]
    608 ms ± 3.11 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    • remainning TODO: computational comparison, and actual usage in the code
  • BigRequestStreamer: splits up a big request into small ones, either user specifies sub-request shape, or it is automatically estimated

    • estimate is based on RAM per pixel, number of threads, ..., such that all processors are busy and all requests fit in RAM. TODO: benchmark optimal request sizes
    • uses RoiRequestBatch to process multiple ROIs at once, can report progress

slot.py

  • Slots have multiple levels.
    • Level-zero means just directly connecting "values"
    • Level-one slots = multislot of level-zero slots. When connecting a level-1 slot to another level-1 slot: it only connects partners (the individual slots inside those level-1 slots) directly.
    • It is allowed to connect from a single level-zero slot to a (multi) level-one slot, will duplicate data for all contained level-zero slots. Going from level-one slot to level-zero slot is forbidden.
    • resizing a multislot means adding more lanes, which needs to be propagated, sometimes even backwards. This is currently done by recursive calls. Very naïve, and really slow because calling setupOutputs on everything on every connected slot's setup changes is inefficient. To prevent this snowball effect propagating downstream all the time, there is a feature in graph.py:
      • methods of the Graph with @is_setup_fn decorator mean "graph topology changes". It internally starts to increase some depth counter of changes which will go back to zero when all changes are done. Executes a callback once everything is done, because e.g. Gui only needs to be updated once. Register to this via call_when_setup_finished.
  • Slots can have a single input (called partner), and can be connected to multiple outputs (called partners). (Should be renamed to upstream and downstream for clarity)
  • A Slot doesn't have a parent, but it belongs to an Operator and has a reference to it
    • multi-level slots are "operators" to slots that are contained within them. To still get the surrounding operator use getRealOperator(). TODO _getRealOperator: should be a more informative name like getParent, we need to name the enclosing multi-level slot kind of differently
  • Slots have many signals that one can be notified about
  • OutputSlot implements the execute method to be able to behave like an Operator, which is needed in slot to slot connection cases without an operator in between
  • ValueRequest: when using a requestObject (request.py) it is assumed that there is some computation involved in getting the values. But for simple values, this whole machinery with greenlets is overkill. So for value (or non-compute) slots the ValueRequest mimics a Request, but doesn't trigger the machinery
    • there should be multiple types of slots: 1) slots that are like a parameter, and 2) slots that support slicing. Of both kinds, slots can be a simple value or computed.
  • in setupOutputs: never call another operators execute function: it changes graph while it is being setup
  • ObejectclassificationWorkflow: implemented with the ideas of using stype and rtype, and made it a lot more complicated
  • a slot has a MetaDict for additional configuration, whose key/value entries can be accessed by dot notation (.attr instead of ['attr']). Is this necessary?

Unsure:

Unsorted

  • Operator.setInSlot: to shove in data from the project file at loading. Goes through the connections in the graph. Will call setInSlot for all connected slots
  • RoiRequestBatch: totalVolume: in number of pixels
  • RequestOperationWrapper: Should only appear in tracebacks
  • setDirty: no change if the value is the same as before
  • Graph: gui is most expensive listener for signals from slots. If many changes appear, the gui should only listen to the last signal of a particular kind from a particular origin, thus uses call_when_setup_finished
  • RequestExecutionWrapper allows to request values from slots from multiple threads, and should be created from the main thread. Warning: can lead to deadlocks when it performs actions during graph setup that request the execution of other operators, so may only retrieve values from parameter slots at that time.

slot.py

  • back_propagate_values:
    • either you have a connected slot or you call setValue
    • it enables setting setValue