New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script changes #1449

Open
wants to merge 51 commits into
base: dev
from

Conversation

Projects
None yet
3 participants
@Lestropie
Copy link
Member

Lestropie commented Sep 1, 2018

Mostly addressing #1400, but a few other things in there too; it was kind of a long steady accumulation of various things, I've tried to chop it up into a few commits as best I could.

They're significant library changes, and I don't like making changes that break compatibility with any stand-alone scripts out there, but I think there's strong justification for making the changes. After all, how many times have we modified the C++ library code in getting to the point we're at now?

I'll try to enumerate everything of interest here for comments. Not going to be merging this PR quickly, I should probably be implementing more tests before accepting such large changes; but I wanted to open the proposed changes up for discussion.
(Also I need to get this stuff out of my head so I can focus on more pertinent demands...)

  1. Completely changed how executable script files invoke the core application library.
    Previously there was a set of compulsory library functions that needed to be executed in the correct order, e.g. app.initialise(), app.parse(), app.complete(). Now, one defines two functions: usage() and execute() (couldn't have usage() and run() like in C++, since the latter would clash with the run module). After definition of these functions, app.execute() is invoked. app.execute() then sets everything up, using the inspect module to find those two defined functions and run them when appropriate.
    The biggest advantage of this is that it enables the use of exceptions. Instead of calling app.error(), library functions will throw a custom MRtrix3 exception, which can either be caught by the script programmer, or result in similar error message & cleanup behaviour as app.error(). This substantially improves the capability for scripts to catch when things don't quite work as intended and adjust appropriately.
    run.command() gets its own exception, which includes all information about the command and the stdout / stderr contents. This means that the error messaging and cleanup can occur entirely within app.execute() when the exception is called; as opposed to the current solution, which is partially managed by run.command() writing a text file that app.complete() then reads, which is kinda clunky.

  2. Moved some basic capabilities, e.g. listing available MRtrix3 binaries / importing the config file(s), to __init.py__. Anyone using the library functions but not invoking the app module in its entirety should still be able to access this information.

  3. Changes to run.command() to support multi-threaded usage.
    Note that this required the use of tempfile.mkstemp() rather than tempfile.TemporaryFile() due to an apparent race condition in the latter on Python3 on Windows (not present on Windows Python2; not yet tested on Linux). This is despite the Python documentation suggesting that TemporaryFile() uses mkstemp()... The side-effect of this is that the stdout / stderr contents of executed commands are written to files in /tmp/ and deleted after they are read; there could potentially be issues with users not having write access to that location and/or accumulation of stale files due to cleanup issues.

  4. Porting of foreach to Python. In addition to appearing in the online documentation, this also means that the terminal will present a simple progress bar showing the execution of however many instances of the command are run, and at completion only invocations that failed, along with their stdout / stderr contents, will be printed. Note that the command-line usage is fractionally different to the current bash script.

  5. New script responsemean, which has substantial changes from the basis average_response, but I think the name is more consistent with the rest of MRtrix3. In addition to making use of the libraries (and hence appearing in the online documentation), having better error messages, and being a little more Pythonic in places, the (default) averaging process is different, as proposed in #1096.

  6. Merge file and path modules into fsys. Really didn't like having to use pylint directives to hide the fact that a Python built-in was being over-ridden.

  7. Much nicer terminal feedback when running population_template, as you don't get humongous run.command() strings when running e.g. mrmath across all subjects (implemented the solution proposed in this comment in #928).

Partially addresses #692.
Hoping that #927 can be closed, but point 3 may raise other issues regarding the solution.
Closes #928.
Closes #1096.
Closes #1275.
Closes #1338 (script author can simply catch the exception along with stdout / stderr contents).
Closes #1377 (command returncode is returned in exception raised by run.command()).
Closes #1400.

Lestropie added some commits Aug 17, 2018

Major changes to Python scripting library
Previously, executable scripts would call a set of functions within the mrtrix3.app module in the appropriate order in order to initialise the consistent command-line interface, parsing, etc.. A copy-and-paste block of code at the top of each executable file would ensure that this module was present within the Python path.
This change makes the executable interface more similar to that of the C++ files within the cmd/ folder of MRtrix3. Two functions must be defined within the executable file: 'usage()' and 'execute()'. Then, at the bottom of the executable file, the copy-and-paste code block for locating and importing the mrtrix3.app module appears, but also calls the app.execute() function. This function defined within the app module sets up the interface, and uses the Python inspect module to locate and run the two functions defined within the executable file as appropriate.
A major advantage of this approach is that it allows Python exceptions to be thrown by library functions (instead of calling the deprecated app.error() function), which may optionally be caught by the calling code in various circumstances. Standard Python exceptions can also be caught by the app module, and displayed in a manner consistent with the script interface. Additionally, when commands executed via the mrtrix3.run module fail, a dedicated exception class is thrown, which allows both the catching of failed commands in the executing code where permissible, and moves all of the code responsible for handling such command failures to the app module (previously the run module would write a text file, which the app module would then load).
A couple of generic capabilities (e.g. reading the MRtrix3 config file(s)) have been moved to __init__.py, such that they are more easily accessible when using the library but not exploiting the app module interface.
Various changes to the mrtrix3.run module have been made to support multi-threaded use of the run.command() function. This includes use of tempfile.mkstemp() rather than tempfile.TemporaryFile(), as the latter was found to not be thread-safe in Python3 on Windows. The run.command() function additionally now supports list input, where individual entries may themselves be lists (allowing e.g. concatenation of multiple file paths into a command call, but the terminal display will show only the common prefix/suffix of the list rather than the entire contents). run.command() now additionally supports '||' and '&&' operators.
Minor fixes to 7617787
- dwibiascorrect: Fix cmdline function call.
- population_template: mrtrix3.app module needs to be loaded earlier for root-level functions.
- generate_user_docs.sh: Fix grep call based on new script syntax.
- 5ttgen gif: Add missing import call.
New script: responsemean
This script replaces average_response, performing the same function but with an executable name more consistent with the rest of MRtrix3.
It utilises the MRtrix3 Python scripting library, and therefore appears in the auto-generated documentation.
Additionally, while average_response simply averaged response function coefficients directly (meaning that subjects with greater image intensity would have a greater effect on the group average response function shape), responsemean (by default) takes into account any global differences in scaling between subjects when generating the mean. The former behaviour can however be accessed using the -legacy command-line option.
foreach: Port to Python
This commit contains a re-implementation of the 'foreach' script using the MRtrix3 Python scripting library, as opposed to the original implementation which is simply a Bash script. This means that the script will be included in the auto-generated documentation. It also means that the terminal output from multiple running commands can be better controlled.
This does however involve a change to the user interface of this command: Rather than placing a colon character as a 'no-op' to separate the 'foreach' invocation from the command to be executed multiple times, this new version instead expects the final positional command-line argument input to be a string that contains the command to be executed (including any substitution flags). The documentation has been updated to reflect this.
mrtrix3.file & mrtrix3.path -> mrtrix3.fsys
Python modules 'file' and 'path' have been merged into a single module 'fsys' (filesystem).
This in particular prevents overriding of the Python built-in 'file', which was very much not ideal.
@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Sep 3, 2018

I suppose this will still sit here for a while, so I'll comment in more detail at a later point in time, when I, well, find the time. 🙂

That said, now that I thought of it when commenting on #1445, something that might be considered at this stage: getting rid of the notfound script as a separate script, but incorporate its functionality in foreach in some way (which could then immediately also lead to a cleaner UI to use the notfound functionality within a foreach).
Now that I'm talking about foreach and its UI here anyway; I haven't looked into the details (yet), but about that UI: was the change from the colon to the string input (technically) necessary for the command to use the scripting library, or is there another motivation for the UI change for this script in particular?

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 4, 2018

was the change from the colon to the string input (technically) necessary for the command to use the scripting library, or is there another motivation for the UI change for this script in particular?

Partly for argparse support, partly because it's a slightly more direct interface to what's actually going on. Thinking about it again, it might be possible to revert to the colon syntax without writing complete command-line parsing code from scratch, I'm not quite sure... it'd be a bit of a hack.

But even if that were the case I'm not sure I'd actually want to. While this solution may or may not impede tab-completion of filesystem paths for particular shells, it does mean that special symbols within the desired command string don't need to be slash-escaped.

I also find this usage more clear given the fact that all other MRtrix3 commands permit arbitrary placement of command-line options, whereas the colon separation approach forbids this. For example:

Current PR code

foreach directory/* "5ttgen fsl IN PRE_5TT.mif -nthreads 0" # Forces single-threading of individual jobs
foreach directory/* "5ttgen fsl IN PRE_5TT.mif -nthreads 2" # Forces multi-threading of individual jobs
foreach directory/* -nthreads 0 "5ttgen fsl IN PRE_5TT.mif" # Forces execution of one job at a time
foreach directory/* "5ttgen fsl IN PRE_5TT.mif" -nthreads 0  # Forces execution of one job at a time
foreach directory/* -nthreads 8 "5ttgen fsl IN PRE_5TT.mif" # Forces execution of eight jobs at a time
foreach directory/* "5ttgen fsl IN PRE_5TT.mif" -nthreads 8  # Forces execution of eight jobs at a time
foreach directory/* -nthreads 8 "5ttgen fsl IN PRE_5TT.mif -nthreads 0" # Forces execution of eight single-threaded jobs at a time
foreach directory/* "5ttgen fsl -nthreads 0 IN PRE_5TT.mif" -nthreads 8  # Forces execution of eight single-threaded jobs at a time

Colon separation approach

(Hypothetical alterations to foreach to use colon separation rather than command string quotation, but not identical interface to current bash version of foreach, which uses a bit of a command-line hack to activate parallel processing):

foreach directory/* -nthreads 0 : 5ttgen fsl IN PRE_5TT.mif # Forces execution of one job at a time
foreach directory/* -nthreads 8 : 5ttgen fsl IN PRE_5TT.mif # Forces execution of eight jobs at a time
foreach directory/* : 5ttgen fsl IN PRE_5TT.mif -nthreads 0 # Forces execution of a single-threaded job at a time
foreach directory/* : 5ttgen fsl IN PRE_5TT.mif -nthreads 8 # Forces execution of one multi-threaded job at a time
foreach directory/* -nthreads 8 : 5ttgen fsl IN PRE_5TT.mif -nthreads 0 # Forces multi-threaded execution of single-threaded jobs

Personally I find that in the PR code, the encapsulation of one of the -nthreads usages within a quoted string (such that it clearly applies to the invoked function, rather than the foreach invocation) is less of a clash with MRtrix3's ability to arbitrarily place command-line options than is the colon separation approach. Not having to escape e.g. pipe symbols in the quoted string approach is also nice. But there may be variations in personal preferences / opinions here.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Sep 18, 2018

Sorry, a bit late to the party, but my 2 cents for what it's worth:

  • more than happy for foreach to absorb the functionality in notfound - I think it makes sense.

  • I'd personally rather stick with the colon syntax, but happy enough with a consensus. I think the colon makes it very clear which options apply to the foreach command, and which to the command it will invoke. Note this is not a distinction that would be required in any other regular MRtrix3 comand, so I don't think it would actually make sense to allow arbitrary positioning of arguments or options. I also reckon handling of special characters in filenames (e.g. a space) would get messy if the command itself is quoted - you'd need to escape the quotes within the quotes, or some other trick, in a way that I don't think is particularly obvious. I do agree that it would be nice not to have to escape or quote |, && or ||, though. But on balance, I prefer the current syntax...

  • 👍 for responsemean

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 19, 2018

I also reckon handling of special characters in filenames (e.g. a space) would get messy if the command itself is quoted - you'd need to escape the quotes within the quotes, or some other trick, in a way that I don't think is particularly obvious.

I'd thought of escaping whitespace characters in filesystem paths, but the potential use of quotes to encapsulate paths (e.g. coming from "command_history"), thus necessitating escaping of those quotation marks within the command string, would indeed be pretty messy, and so tips the balance toward the colon syntax.

I'll still have to wait and see whether it can be done easily without re-implementing subprocess though. Colon is technically just a no-op in bash, so this usage is very non-standard. I think I know how to do it though.

foreach: Port to former colon-separation syntax
As per discussions in #1449.
Other changes:- "UNI" substitution now corresponds to the unique part of the input, after removing common prefix and suffix. Previously it was the unique part of the -basename-, after removing the common -suffix- only; this to me was not a general solution, and the behaviour was not necessarily consistent with the name of the substitution.
- Raise an exception if the command string following the colon separator does not contain any instances of any of the substitutions, and therefore the exact same command would be executed for every input.
- Update foreach documentation page accordingly, including demonstration of -nthreads use both before and after the colon separator.
- run.killAll(): Fix attempting to iterate over a NoneType.
@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 19, 2018

... Was too curious, so had to try the colon syntax. The easy solution worked. 👍 Python is awesome sometimes.

(see 5c859f5 for details)

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 19, 2018

Another criticism I'll throw in the ring specifically for foreach is the "NAME" substitution. This extracts specifically the basename of a filesystem path. Given that foreach technically isn't only applicable to using filesystem paths as inputs, "NAME" to me feels too non-specific: "BASE" would be better.

@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Sep 19, 2018

Just to chip in as the discussion is happening: after giving it some thought, and also reading @jdtournier's arguments, I'm similarly in favour of the existing colon syntax. I get what you tried to do before regarding consistency with other commands, but I don't think consistency (to other mrtrix commands) really matters here: foreach is used very differently and for different purposes anyway. Also, the colon is visually very clear, and I like how it helps to read / navigate / ... someone's long pipeline script. And finally also, I hadn't even thought yet about the issues arising with having to escape quotes in quotes. That's indeed one to really avoid, I think. So well, all that to say that I'm in support of the "original" colon syntax.

Another criticism I'll throw in the ring specifically for foreach is the "NAME" substitution. This extracts specifically the basename of a filesystem path. Given that foreach technically isn't only applicable to using filesystem paths as inputs, "NAME" to me feels too non-specific: "BASE" would be better.

Not sure whether "BASE" is really more clear than "NAME" (even though I do appreciate the argument), but I'm not to fussed about it either; happy to change if some people like it more. I'm more sensitive about breaking peoples' existing scripts, or habits/experience; but that chance should be quite low for this substitution option... Most (if not all?) of the steps in the pipelines in the manuals only use "IN", and a few "PRE". I actually don't recall myself using "NAME" or "UNI" (...well, maybe once or something, but not frequently for sure). So safe to change, I'd say.

When I find some (more) time, I still have to look into responsemean though. I'm ok with the name change, but I've got at least one minor concern with the mechanism. I've brought this up ages ago, I think, but can't find the post: it's indeed about the fact that this mechanism in principle requires to process all (different) tissue responses at once, so no relative rescaling happens. So by introducing this (pull request proposed) mechanism without that (all response functions processed together), you gain some (the different weighting due to input scaling without normalisation; i.e. the purpose you indeed introduced this mechanism for), but you also lose something else (harder to describe what, but I've got a good idea about it). I don't want to get ahead of myself at this time (so will still look into it), but I emphasised this "in principle" phrase above because it may not be much of an issue in practice. However, I believe it indeed "won't be much of an issue" under the conditions that the original problem also wasn't much of an issue (i.e. when the responses for a given tissue don't vary that much in shape (contrast, not scale) among themselves). But well, I'll do some thinking for a few typical scenarios and evaluate whether the risk of my minor concern is real (and significant), or whether it may not be much of an actual concern in practice.

That said, I've actually got a few other ways of computing a combined ("average", but not really) response or set of responses in store; but it's a bit too early to introduce them at this stage. So if my above minor concerns with the proposed change in this pull request turn out to be not really an issue in practice for the properties of most realistic scenarios, I'm happy to go with that approach for the time being though. (if used correctly, I of course do fully support the use of geometric means / arithmetic in log-space 😉 )

@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Sep 19, 2018

Now that we're talking script library here anyway, and when I noticed this again in the original pull request post:

Much nicer terminal feedback when running population_template, as you don't get humongous run.command() strings when running e.g. mrmath across all subjects (implemented the solution proposed in this comment in #928).

...can someone remind me again what our motivation actually is to, by default, print all the run.command strings? I'm not overly fussed about it, but maybe still: for some scripts, that really unleashed the Matrix onto peoples' terminals... I do get we're trying to be clear to the user what is exactly going on, in a transparent fashion, but it's a bit weird in contrast to (binary) commands, right? It's not like we're by default printing the full -info output for those (that's exactly why -info exists as an option). And developers of scripts can still choose to print (by default) some useful information anyway (and they are probably in the best position to decide what is useful / informative terminal output for a given script). So well, why not, e.g., only print the run.command strings when the -info option is provided to a script? Or even have a separate option altogether to print the run.command strings... i.e., a level between the default terminal output and the -info output (the latter indeed also shows the full -info per binary command within the script).

Maybe I'm overlooking a good reason to show the run.command strings by default though... so just thought I'd bring it up to figure this out (again).

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 24, 2018

can someone remind me again what our motivation actually is to, by default, print all the run.command strings?

Because people get nervous and think that something has crashed if nothing prints on the terminal.

for some scripts, that really unleashed the Matrix onto peoples' terminals

Use a progressBar. If you have a list of commands of known length, you could make a wrapper class that initialises the progressBar and provides a member function that both invokes run.command() and increments. When providing -info command-line option, individual Command: prompts will still be shown.

Lestropie added some commits Sep 24, 2018

dwi2response: Progress bars
For tax and tournier algorithms, use a progress bar to reduce amount of terminal output.
Also slightly re-format the printing of unhandled Python exceptions.
Scripts: Rename exceptions
Instead of MRtrixBaseException, MRtrixException, MRtrixCmdException and MRtrixFnException, use MRtrixBaseError, MRtrixError, MRtrixCmdError and MRtrixFnError. This is more consistent with the naming of existing Python objects, where only the very base classes are called "Exception" and anything remotely specific is called "Error".
@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Sep 26, 2018

Other point to consider with the foreach substitutions is the danger of clashes: e.g. currently any filesystem path that happens to contain the two sequential characters "IN" will be erroneously substituted. "INPUT" would be less dangerous; "NAME" could be changed to "BASENAME" rather than "BASE", which is more clear than either alternative; "PRE" could be "PREFIX"; "UNI" could be "UNIQUE". But I've no idea how frequently people may or may not have encountered such clashes in order to justify such extensions.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Sep 27, 2018

Yes, I've also wondered whether this might cause trouble. I'd be happy to come up with alternatives, but I'd favour an explicit de-referencing syntax of some form, rather than a longer name. Something like :IN: or [IN] (although there is scope for wildcard expansion with the latter), or +IN+, or ...?

I'm not a fan of using longer names even if it might makes the intention more obvious on casual inspection. I think as long as it's relatively well documented, users will appreciate shorter substitutions in routine practice. The current description in the foreach help page is clear enough, in my opinion (although it could be clearer still):

AVAILABLE SUBSTITUTIONS:
  IN:   the full matching pattern, including leading folders
        For example, if the target list contains a file "folder/image.mif",
        any occurrence of "IN" will be substituted with "folder/image.mif".
  NAME: the basename of the matching pattern
        For example, if the target list contains a file "folder/image.mif",
        any occurrence of "NAME" will be substituted with "image.mif".
  PRE:  the prefix of the matching pattern
        For example, if the target list contains a file "folder/image.mif",
        any occurrence of "PRE" will be substituted with "image".
  UNI:  the unique prefix of the matching pattern after removing the common suffix
        For example, if the target list contains files:
        "folder/001dwi.mif", "folder/002dwi.mif", "folder/003dwi.mif"
        any occurrence of "UNI" will be substituted with "001", "002", "003".
@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Oct 1, 2018

I'd favour an explicit de-referencing syntax of some form

That could work. Probably the closest in 'spirit' would be to use e.g. "$IN", but the dollarsigns would need to be escaped, and so would probably lead to a lot of mistakes. Given we're already kind of mis-using the colon operator in the usage of foreach, ":IN:" might be the least objectionable. Still, this PR is a fair way off merge, so further alternatives welcome.

@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Oct 1, 2018

I'm not a fan of using longer names even if it might makes the intention more obvious on casual inspection.

I don't always agree with this, but in this context, I do.

I'd favour an explicit de-referencing syntax of some form

I'm not even sure this is really needed, to be honest. foreach has now existed already a long time, and I've never heard anyone running into any issues caused by these keywords. Maybe one quick question though: when we detect them, is that detection case sensitive? If so, that may just be more than sufficient to avoid even a rare possible problem. If not, I do suggest it'd be case sensitive (i.e. only detect these keywords if they're full caps).

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Oct 1, 2018

when we detect them, is that detection case sensitive?

Yes (both currently and in the PR). I reckon most people would conventionally use lower-case when naming their own files, hence why this hasn't really been an issue. Something like our DICOM repository has subject names in all-caps; but then maybe it's less likely for foreach to be used with such data.

I've also tried to explicitly catch and issue a warning when such an erroneous substitution might be happening here.

@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Oct 1, 2018

I reckon most people would conventionally use lower-case when naming their own files, hence why this hasn't really been an issue.

That's one reason indeed. But also: typically the folders at those levels (e.g. subjects) wouldn't be mentioned in most foreach scenarios; they'd actually be the ones captured by foreach to iterate over e.g. all subjects... so in this case, there's also no issue.

I've also tried to explicitly catch and issue a warning when such an erroneous substitution might be happening here.

That should indeed help, just in case.

foreach: Make use of subprocess shell=true
In run.command(), accept a parameter that specifies that the command string should be passed to subprocess as-is, and interpreted by a shell spawned specifically for that command, as opposed to the default handling. This ensures that any more advanced usages of the foreach script can be executed as the user intended (following string substitution), provided that any shell operators are appropriately escaped.

Lestropie and others added some commits Nov 20, 2018

Scripts: Enforce naming convention
Script run_pylint will now enforce PEP8-based naming conventions in all executable scripts and Python library files.
These are principally:
- CamelCase for classes;
lowercase_with_underscores for variables and functions;
- UPPERCASE_WITH_UNDERSCORES for constants.
Note that these requirements are currently not applied to the 'build' and 'configure' scripts.
Some other minor changes that were made during the conversion process:
- In both foreach script and run module, make use of a global Shared class to encapsulate features that must be accessible outside of individual function calls.
- New function app.trace(), which simply prints out the filename and line number upon execution.
- Minor tweaks to app.ProgressBar display when output is not a TTY and no count is provided.
- When -nthreads option is specified, additionally set environment variables to modulate multi-threading in OpenMP and ITK applications.
@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Nov 23, 2018

(performed a manual dev merge; so things don't start to diverge or otherwise get messy with the copyright update in mind)

Lestropie added some commits Nov 24, 2018

dwibiascorrect: Use algorithm module
In addition to porting to the algorithm module, the output of the fsl algorithm was changed slightly. Instead of directly using the output field from fast, which contains a value of 1.0 (i.e. no correction) outside of the mask, and hence introduces a subtle 'step' in image intensities at the outer edge of this mask, this algorithm will now instead apply a hard mask to the output DWI based on the input mask.
@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Dec 11, 2018

Should mention here in addition to the commit message:

In the process of 8f6dcb7 I made a change to dwibiascorrect when using fsl fast, which is related to discussion in #1270. Instead of accepting fast's bias field estimate of 1.0 outside the provided brain mask, which introduces a 'step' in image intensities at the edge of the mask, it now instead zero-fills the output bias-corrected DWI outside of the mask. This hopefully makes the influence of this limitation slightly less 'insidious'. I had the idea to instead use mrmodelfield to parameterise the bias field inside the mask and hence enable extrapolating it beyond the mask, but it seems that command was removed at some point.

@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Dec 11, 2018

Instead of accepting fast's bias field estimate of 1.0 outside the provided brain mask, which introduces a 'step' in image intensities at the edge of the mask, it now instead zero-fills the output bias-corrected DWI outside of the mask. This hopefully makes the influence of this limitation slightly less 'insidious'.

👍 Very happy with that; it's indeed quite misleading otherwise.

I had the idea to instead use mrmodelfield to parameterise the bias field inside the mask and hence enable extrapolating it beyond the mask, but it seems that command was removed at some point.

I wouldn't do this in any case; it makes it potentially even more unclear on "what method" is exactly run. fast does what it does, nothing more, nothing less; let's leave it at that. I say it's up to them, if they want to offer this kind of functionality.

Lestropie and others added some commits Dec 18, 2018

Merge branch 'dev' into script_changes
Conflicts:
	bin/dwipreproc
	lib/mrtrix3/_5ttgen/gif.py
	lib/mrtrix3/fsl.py
	lib/mrtrix3/image.py
run.command(): Don't suppress MRtrix3 command outputs
Previously the -quiet flag was added to all MRtrix3 command calls, as shell=True was used in subprocess. Now that this is no longer the case, it is preferable to instead have all commands generate their standard amount of terminal output, so that script authors can parse this text if desired.
The additional -info flag passed when a script is run in debug mode will additionally check to make sure that no such flag is included in the command string itself, which could otherwise lead to a command-line parsing error in the underlying command.
5ttgen fsl: Two tweaks
- When FSL command 'standard_space_roi' fails, check for the presence of Unix command 'dc' in PATH. If it's not present, alert the user to the fact that it's possible this is the reason why standard_space_roi may have failed.
- When combining SGM partial volume fractions across all structures, use list input to run.command() to shorten terminal output.
Update dwipreproc
Use list input to run.command() so that the individual volume images being concatenated are collapsed when run.command() produces its terminal string.
@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Jan 8, 2019

Probably aware and/or because this branch is still unfinished, but just for the record / in case: currently generating the user docs generates some errors related to dwibiascorrect:

[DOC] generating user documentation (./docs)

/home/thijs/mrtrix3/docs/generate_user_docs.sh:

Traceback (most recent call last):
  File "/home/thijs/mrtrix3/bin/dwibiascorrect", line 82, in <module>
    app.execute()
  File "/home/thijs/mrtrix3/lib/mrtrix3/app.py", line 90, in execute
    module.usage(CMDLINE)
  File "/home/thijs/mrtrix3/bin/dwibiascorrect", line 20, in usage
    algorithm.usage(cmdline)
  File "/home/thijs/mrtrix3/lib/mrtrix3/algorithm.py", line 42, in usage
    module = importlib.import_module(path.script_subdir_name() + '.' + package_name)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named dwibiascorrect.ants
Traceback (most recent call last):
  File "/home/thijs/mrtrix3/bin/dwibiascorrect", line 82, in <module>
    app.execute()
  File "/home/thijs/mrtrix3/lib/mrtrix3/app.py", line 90, in execute
    module.usage(CMDLINE)
  File "/home/thijs/mrtrix3/bin/dwibiascorrect", line 20, in usage
    algorithm.usage(cmdline)
  File "/home/thijs/mrtrix3/lib/mrtrix3/algorithm.py", line 42, in usage
    module = importlib.import_module(path.script_subdir_name() + '.' + package_name)
  File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named dwibiascorrect.ants
@thijsdhollander

This comment has been minimized.

Copy link
Member

thijsdhollander commented Jan 8, 2019

When I find some (more) time, I still have to look into responsemean though. I'm ok with the name change, but I've got at least one minor concern with the mechanism. [...]

Ok, did that in the mean time; I'm ok with responsemean for now, doesn't cause the concerns I was worried about sufficiently to be an issue in most cases. So I'm not worried about its inclusion currently. That said, the need for it compared to the "simple" averaging is now also mostly gone when dwibiascorrect is used in (group wise) pipelines. So well, the overall impact of the change is minor; thus happy to have it either way.

That said, I've actually got a few other ways of computing a combined ("average", but not really) response or set of responses in store; but it's a bit too early to introduce them at this stage. [...]

Works (but not "neatly" implemented yet; more internally/messy; tested successfully in a few collaborations though) and is on the way at some point, but probably not immediately in the public space. So don't wait for it currently; it'll happen when the time is right.

Provide script developers with the capability to show or hide individ…
…ual run.command and run.function lines.

Defaults to showing the lines just as is now the case. -quiet, -info and -debug also override this in any case, providing the end-user with specific output capabilities as they wish; so nothing is forced on them if they choose to -quiet, and nothing is taken away from them if they choose to -info or have a need to -debug.
...but for the default user experience of a script, the developer can tailor the output to suit the specific flow and desirable level of feedback and progress indication that works best for their app or algorithm. This caters optimally for individual needs of script developers, while still providing the same default output for less experienced developers (who might not even notice this optional functionality exists) or during the process of prototyping a new script.
@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented on 09a4dee Jan 8, 2019

  • I was planning on changing run.command() to take keyword arguments over and above the command string only; this would make it consistent with run.function(), and also somewhat immunise against library / executable version mismatches. That would actually help homogenise the usage here, and you wouldn't need to specify False for shell whenever you want to hide any particular command string.

  • If this is going to be used extensively, the length of the parameter name kind of compounds. Would simply "show" / "hide" / "suppress" / "silence" be sufficient? Could then be identical across the two functions also. Though does however need to reflect that it's the printing of the command / function string that's being hidden, not the command / function itself I suppose (still valid to capture and process stdout / stderr from a command even if not shown on the terminal).

This comment has been minimized.

Copy link
Member

thijsdhollander replied Jan 9, 2019

  • Yep, no worries, makes sense; and I'm always happy with more consistency in usage. I'm not understanding (or misunderstanding) the last statement there though: it seems to imply that currently (i.e. after this commit) one does need to specify False for shell to be able to hide the command string...? How so? It's possible to just state (the currently called) show_command_line=False without stating shell=False, or even in the presence of shell=True, without issues, right? But anyways, perfectly fine with allowing run.command() to take keywords just like run.function() does.

  • I was actually coming from that (so happy with it for sure), but I thought there might be a concern (currently for run.function()) with capturing/"stealing" such a short common term, if you want to be able to generally "pass on" keyword arguments to the function (or command). That's why I aimed for a more specific name, and even (inherently) on purpose for a longer one. But certainly happy with a short one, for the same reasons you mention. In general, I'd go with a "positive" one (i.e. rather show than hide) to avoid confusion around double negations like -hide=False to achieve showing stuff. That coincidentally also helps with addressing your last point: suppress or silence could be interpreted as affecting more than just the printing aspect. show, on the other hand, refers more clearly to a visual thing: i.e. whether something is printed (thus shown, for most environments). Also nothing against making it identical for commands and functions; the fact that they're currently different was more of an inherent consequence of making them more specific, certainly not a goal (or benefit) of itself.

Implementation wise: I don't think there's a nicer way than popping the kwargs. Python 3.[something] supports keyword-only arguments, but Python 2.7 sadly doesn't... so no way to put it nicely in the declaration in the presence of *args.

This comment has been minimized.

Copy link
Member

thijsdhollander replied Jan 9, 2019

In the mean time, I've done the name change to show; but as mentioned, I'm certainly entirely ok with extending run.command() to take args/kwargs as well.

This comment has been minimized.

Copy link
Member

Lestropie replied Jan 9, 2019

I'm not understanding (or misunderstanding) the last statement there though:

That's fine, that's me forgetting that you can provide positional arguments to a function using keywords in Python. I was thinking that in order to set show to False, you'd have to do: "run.command(cmd, False, False)", when in reality you can just do: "run.command(cmd, show=False)". In light of that it's better to restrict run.command() to only pre-defined keyword arguments.

I was actually coming from that (so happy with it for sure), but I thought there might be a concern (currently for run.function()) with capturing/"stealing" such a short common term...

Had the same concern; open to other suggestions, or anything for which there is a precedent.

This comment has been minimized.

Copy link
Member

Lestropie replied Jan 9, 2019

In light of that it's better to restrict run.command() to only pre-defined keyword arguments.

Changed my mind again: Even though positional arguments can be specified using keywords, the fact that there's two bools there and no logical ordering of them means it's better to have the function read keyword arguments: then these options must be specified as keywords.

This comment has been minimized.

Copy link
Member

thijsdhollander replied Jan 10, 2019

👍 All good. With Python 2.7 in mind, it's sadly always a trade-off: either the popping of **kwargs solution to enforce a keyword-only behaviour, or a more informative declaration. They're both imperfect in their own right, so I'm not fussed which one it ends up being. Happy with how you did it now.
About the choice of keyword (name), thinking about that again, I think -show will end up being pretty safe actually. I can't immediately come up with a possible (future) function where that would be reasonably required or cause issues. And even then, the function down the track may have an alternative for -show if it's a really important functionality. So given all that, I'd argue the brevity and simplicity/elegance of "show" wins out.

thijsdhollander and others added some commits Jan 9, 2019

5ttgen fsl: Fix import error
Error introduced in 449893b.

thijsdhollander and others added some commits Jan 10, 2019

dwipreproc: Bug fix
dwipreproc performs explicit removal of unwanted header fields from the output DWI, as introduced in 47e7288 / #1389, by exporting the DWI header key-value pairs to a JSON file and explicitly editing the JSON contents. The writing of the input JSON file was however lost during the process of resolving divergent code in 7617787. This commit fixes the issue.
Correct operation of code for selecting & executing different versions of eddy - including if both CUDA and OpenMP versions are installed but the former does not run correctly - has been verified for the changes in #1449.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment