Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo fhs restructure #896

Merged
merged 28 commits into from
Feb 7, 2017
Merged

Repo fhs restructure #896

merged 28 commits into from
Feb 7, 2017

Conversation

jdtournier
Copy link
Member

@jdtournier jdtournier commented Jan 27, 2017

This is a start at making the changes outlined in #864, to get some early feedback on the idea.

So far:

  • lib/core/
  • executables are now stored in bin/
  • invokable scripts are also stored in bin/
  • all intermediates (object, moc, etc) go into tmp/
  • the multi-config support by compiling into separate config folders is gone entirely
  • ./build has a new select special target to swap between configs
  • when configs get swapped out, all of the compiler-generated stuff (non-tracked content in bin/, all of tmp/, lib/libmrtrix*, and config) gets moved as-is over to a folder named build.name
  • when configs get swapped in, all of these files are restored from the corresponding folder
  • the folder for the currently active config is renamed by appending .active to it, so the system know which folder to store the current config into when swapping out.
  • move script modules into lib/mrtrix3 and sort out the module loading issues that will ensue
  • move data, tables, etc, into share/mrtrix3 and update scripts to match

Still to do:

  • sort out potential problems when building separate projects (i.e. what happens when you try to re-build a separate project after the config has been swapped out on the MRtrix3 core folder)

How it works in practice:

Here's a little sample of what you can do. I recommend you start with a fresh clone on this one:

$ git clone git@github.com:MRtrix3/mrtrix3.git mrtrix3_rearranged
$ cd mrtrix3_rearranged/
$ git checkout repo_fhs_restructure 

Check what config you are currently using:

$ ./build select
current config is "build.default"

Even though there is no such folder yet. It just assumes if no config is active, you're on the default config...

Do what you need to do as usual:

$ ./configure
$ ./build

You run stuff, and your new command falls over (note I'm assuming you've added the bin/ folder to your PATH here):

$ run_my_failing_command
segmentation fault

OK, let's create an assert build and see if that catches anything:

$ ./build select assert
storing "build.default"...
creating empty "build.assert"...
$ ./configure -assert
$ ./build bin/run_my_failing_command
...
$ run_my_failing_command
Assertion failed: pointer != nullptr, file src/some_code.cpp, line 123

Note that the command remains identical - the new executable is already in the PATH. You fix the bug, now you can go back to the default (release) build:

$ ./build select default
storing "build.assert"...
restoring "build.default"...

and so on.

A good thing here is that the command that's failing might be a script, but it might be failing because the executable it invokes is buggy. So with this you can trivially compile an assert build and have the script run with those executables with no further modifications.

Anyway, that's just to give you a flavour of how the idea works. Give it a shot, see what you think...

Lestropie and others added 12 commits January 20, 2017 14:23
In adding features to the Python scripts in order to make them compatible with packaging via PyInstaller, much of the library has been shuffled around. Most noticeable difference is that individual functions are no longer defined in their own library file; instead, some semblence of function grouping has been applied. This means that library function calls will typically have 'lib.' and a module name before the function name.
For PyInstaller, differences are in scripts/lib/algorithm.py and scripts/algorithm/package.py; getting PyInstaller to pull in an appropriately read from the algorithm files in scripts/src/ is the trickiest part of the process.
Also had to rename lib.message.print to lib.message.console, since PyInstaller didn't like the former.
./build now creates executables in bin/ and stores temporaries in tmp/
and move some scripts over to bin/ to check operation
@jdtournier
Copy link
Member Author

this post seems relevant for locating the python modules.

On that note, I note the current strategy for importing seems pretty cumbersome:

from lib.errorMessage  import errorMessage
from lib.getHeaderInfo import getHeaderInfo
from lib.getUserPath   import getUserPath
from lib.printMessage  import printMessage
from lib.warnMessage   import warnMessage
from lib.runCommand    import runCommand
from lib.runFunction   import runFunction

Ideally, this should be enough:

from mrtrix3 import errorMessage, getHeaderInfo, getUserPath, printMessage, warnMessage, runCommand, runFunction, app, cmdlineParser

Any reason why we can't support that kind of setup...?

@Lestropie
Copy link
Member

On that note, I note the current strategy for importing seems pretty cumbersome ... Ideally, this should be enough ... Any reason why we can't support that kind of setup...?

Bear in mind that my creating these scripts / library was basically my first Python experience... Had not yet encountered the word "Pythonic". Was probably just a habit I got since (as you know) I like the readability afforded by consistent indentation.

P.S. Some of the changes I made in the package_scripts branch will probably still get used (just not the packaging bits) since the structure is a little more logical than having one function per file. There, I actually found myself retaining the full module path (maybe except for runCommand()) since it clarifies which function calls are coming from functions provided by MRtrix3 and what might e.g. be defined locally. That would affect the import call(s).

Copy link
Member

@dchristiaens dchristiaens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave it a test run with different configure settings and it all worked like a charm. Nice work!

@dchristiaens
Copy link
Member

W.r.t. the python scripts:

Some of the changes I made in the package_scripts branch will probably still get used (just not the packaging bits) since the structure is a little more logical than having one function per file.

Good, that certainly cleans things up a bit. Personally, I would organise things in fewer modules and use the __init__.py file to avoid having to call into whichever file each function lives, but that's just a matter of preference. One thing I do strongly advocate, though, is to change the module name lib to something more specific (mrtrix seems an obvious choice).

@jdtournier
Copy link
Member Author

Just another quick note about how python modules should be conceptualised: there's a good discussion on this thread about this. I reckon this particular quote is probably the best way to think about this:

Another metaphor that may help is you might look python modules as namespaces.

Point being that you're kind of expected to pack a lot more into each python file (which is a module in this context) than you would in C++.

@Lestropie
Copy link
Member

The changes in package_scripts treat each Python file more like a namespace. But there's still not a huge amount in each file simply because the library is very light.

Just to be clear:

move script modules into lib/mrtrix3 and sort out the module loading issues that will ensue
move data, tables, etc, into share/mrtrix3 and update scripts to match

If the user were to install MRtrix3 to /usr/local/, these are to be stored in e.g. /usr/local/mrtrix3/lib/mrtrix3/ and /usr/local/mrtrix3/share/mrtrix3/, as opposed to /lib/mrtrix3/ and /share/mrtrix3/? Just you've used the FHS referencing the latter in the justification for these changes, but that would require specific handling of such paths; whereas for the former, there's kind of a redundant 'mrtrix3/' sub-directory in there. Although at least it means the Python library has the correct name I suppose.

- Python library files have been re-arranged: There is no longer one function per file, instead there is some  degree of grouping of functionalities into modules.
- Various conformations to the FHS restructure:
  - Scripts now reside in the bin/ directory.
  - Script libraries now reside in lib/mrtrix3/.
  - Additional data used by scripts now reside in share/mrtrix3/, in sub-directories according to the script using the data.
  - The example connectome lookup table files have been moved to share/mrtrix3/labelconvert.
@jdtournier
Copy link
Member Author

jdtournier commented Jan 30, 2017

If the user were to install MRtrix3 to /usr/local/, these are to be stored in e.g. /usr/local/mrtrix3/lib/mrtrix3/ and /usr/local/mrtrix3/share/mrtrix3/, as opposed to /lib/mrtrix3/ and /share/mrtrix3/? Just you've used the FHS referencing the latter in the justification for these changes, but that would require specific handling of such paths; whereas for the former, there's kind of a redundant 'mrtrix3/' sub-directory in there. Although at least it means the Python library has the correct name I suppose.

I think there's a slight misunderstanding here... I'll clarify my understanding of how the FHS is supposed to work, sorry if you already know all of this - maybe I'm missing your point here.

So the way I understand it, the folder structure on a linux system is organised as a kind of nested hierarchy, with known folders at each level with clear functions. These known folders include

  • bin/ for executables
  • lib/ for shared / dynamically loadable functionality
  • share/ for application data, documentation, resources, man pages, icons, etc)
  • optionally include/ for C headers
  • optionally var/ for system logs, print spooling, system application cache, mail, etc
  • optionally src/ for code

You'll find most of these in the root / folder, and these will typically relate to very fundamental system-level functionality (daemons, kernel modules, etc). You'll also find most of these in the /usr/ folder, and again in the /usr/local/ folder. At each of these levels, the folders listed above are used in the same way, what changes is the 'priority' level (for want of a better word). So in contrast to the very low-level stuff at the root level, the stuff in /usr/ is designed for user applications, and is where the bulk of the software is actually installed in practice. Same for /usr/local/, but this is conventionally where you as a user/sysadmin would install software not managed through your regular distribution channels (i.e. not through your distro's package management backend). That last one is also not typically in the PATH by default. There is also the lesser-used /opt/ folder, which I guess is best suited to 3rd party software that doesn't fit in with these conventions (on my system, I have CUDA, Dropbox, Acrobat Reader, for example).

The idea is that packages get installed directly into one of these trees, not within their own folders. That way, if you install something, there is no change required to your settings, it will already be in your PATH, the libraries will be where the dynamic linker can find them, the man pages (if you have them) will be indexed, etc.

So the idea here is that the structure of the repo would match that expectation, so that if users/sysdamins want to install/repackage MRtrix3 that way, it will fit in as expected. What that means is that if the users were to install into /usr/local/, our bin/ folder would end up merged with /usr/local/bin, our lib/mrtrix3/ folder would be /usr/local/lib/mrtrix3/, and our share/mrtrix3/ folder would be /usr/local/share/mrtrix3/. So for example if MRtrix3 eventually gets picked up by e.g. NeuroDebian (as MRtrix 0.2 was), it would be trivial for them to distribute. Note that there's still no requirement that it has to be installed in such a way, but it would at least match most sysadmins' expectations around how a package is typically organised.

@jdtournier
Copy link
Member Author

jdtournier commented Jan 30, 2017

With the scripts now in bin/ along with compiled binaries, generate_user_docs.sh will need to discriminate between Python scripts using the libraries, other scripts (Python or Bash), and compiled binaries.

Good point. As far as I can tell though, it already does, given this line for listing the binaries, and this line for identifying the scripts. Would definitely need amending to match the new locations, but should otherwise work as-is, as far as I can tell...?

@jdtournier
Copy link
Member Author

Looks all fine, I'd say. I'm not incredibly convinced of the benefit of this whole operation versus the "chaos" we unleash upon users again... (even though we're still in beta, but that's a well kept secret to many users in practice 😛 ). I do see the benefits (so certainly not against it), but I feel they're relatively slim compared to the big change.

I agree, there's no massive requirement to do this. But there are a few arguments for. It does mean that MRtrix3 will be structured in a way to matches conventions a bit better, which will reduce some of the frustrations from sysadmins and packagers like the one that prompted this discussion. It makes it simpler to package MRtrix3 in a way that matches the way most distros manage packages, and so makes it more likely that users' install match our own. It makes dealing with different configs if anything a little bit less confusing, since the concept is now managed separately from the build itself - at the moment there's a lot of gymnastics to figure out where temporaries and binaries are to be stored, whereas with these changes it's a fixed location. Also, the user-visible changes are actually not that big: the main thing that changes is the location of the binaries, and hence how the PATH is set. And if anything, that PATH will now only need to consist of a single entry rather than the current two (bin/ & scripts/, so should actually be easier for new users too.

But as you say:

I'd definitely encourage to already do it as part of the upcoming 0.3.16 tag. More and more users I interact with are a little bit fed up with things changing all the time. It's certainly not about how big the change is, it's just the frequency of changes.

I also feel that if we do proceed, it's best to have a clear break, and tag_0.3.16 is definitely a good candidate here. It introduces a lot of changes, many of which affect the syntax of existing commands, or the way things are done, so users will need to make a conscious decision when upgrading. This would fit in nicely here, if anything it reduces the chances of people using the new version without being aware of the magnitude of the changes (not that I'm advocating this as a good reason for these changes!)...

@draffelt
Copy link
Member

Just spent my first child-free evening in months catching up with this thread and #864. If only I had cold beer and popcorn!

I'm a fan of the new config selection. I used the old multi-config set up a lot. I like that I now won't have to type the full path to the debug binary when running gdb! I also think the restructure makes sense.

I'm in favour of including it in tag_0.3.16. It's already huge enough, has several new commands (and lost a few), new documentation, fixel format and visualisation etc. I agree users are likely to prefer fewer large updates. We will just need to make sure we clearly separate the critical information from the fine details in the announcement.

@dchristiaens
Copy link
Member

Just to amend @jdtournier's comment:

Another metaphor that may help is you might look python modules as namespaces.
Point being that you're kind of expected to pack a lot more into each python file (which is a module in this context) than you would in C++.

The namespace analogy is spot on, but a module does not have to be a single file. I also think it makes sense to pack a lot more in each module, but you can still distribute the functions and classes over a range of files if you prefer. The way to do that is to arrange the relevant files in a folder and provide a __init__.py file that imports the relevant functions from separate files (submodules if you will). This facilitates easy encapsulation, and shortens the import statements and name qualifiers in the scripts.

In my mind, there are ultimately only 2 logical modules: one for user-script interaction (argument parsing, error messages, progress bar, etc.) and one for script-mrtrix interaction (running commands, managing temporary files, etc.). I would call these mrtrix.app and mrtrix.cmd. You could then simply do

from mrtrix import app, cmd

cmd.run('mrinfo example.mif')
app.warn('Some example warning')

In addition, such structure would also allow room for possible future additions such as functions to read and write .mif images and .tck files to python. I have code for this which I could easily add if there ever was a need for it...

But as I said before, it is a matter of preference and I also appreciate that it may require quite a lot of effort to refactor the code. The above were just my two cents on the matter. The one thing I do feel strongly about is the naming of lib. Even though we don't intend to distribute this code as a standalone python package, some heavy python users (me for example) might like to add the library to their python path to be callable from other scripts, especially if more functionality would be added. At that point import lib is terribly confusing whereas import mrtrix is clean and unambiguous.

On a different note:

But going ahead with this: I'd definitely encourage to already do it as part of the upcoming 0.3.16 tag.

I also agree with @thijsdhollander here: if we're going to break our user's install once more, we might as well do it now.

@Lestropie
Copy link
Member

I think my misunderstanding was related to this:

The idea is that packages get installed directly into one of these trees, not within their own folders.

As long as the relative path between different components of MRtrix3 is preserved during 'installation', then the techniques currently employed for locating different components relative to one another will work. I had an image in my head where MRtrix3's bin/ directory went somewhere, the lib/ directory went somewhere, share/ went somewhere, etc..

Given how quickly this is coming together, and how long 0.3.16 has already been pushed back, and the consensus surrounding it, I reckon we should get it in there.

@thijsdhollander thijsdhollander mentioned this pull request Jan 31, 2017
@thijsdhollander
Copy link
Contributor

Also, the user-visible changes are actually not that big: the main thing that changes is the location of the binaries, and hence how the PATH is set. And if anything, that PATH will now only need to consist of a single entry rather than the current two (bin/ & scripts/, so should actually be easier for new users too.

Yes, definitely agree it's better for any new future users; I'm more worried about the existing ones who'll need to do some conscious changes to their installation. But it's indeed for the better, so I'm all for the short "pain" now, rather than the stretched-out and frequent one.

This would fit in nicely here, if anything it reduces the chances of people using the new version without being aware of the magnitude of the changes (not that I'm advocating this as a good reason for these changes!)...

It's just a happy (opportunity for) coincidence. 🙂

We will just need to make sure we clearly separate the critical information from the fine details in the announcement.

This will indeed be key to avoiding unneeded (and dramatic) fallout on the forum. I suppose we recommend a fresh install after this? If so, we need to make sure (to communicate clearly that) people also remove their old install and/or PATH entries to the old install. In a weird turn of events, these big changes happening together, may actually have people ending up with 2 installations; both which may come with old and new PATH entries. That would be a bit of a nightmare.

In my mind, there are ultimately only 2 logical modules: one for user-script interaction (argument parsing, error messages, progress bar, etc.) and one for script-mrtrix interaction (running commands, managing temporary files, etc.). I would call these mrtrix.app and mrtrix.cmd.

I like this idea, especially with a potential future addition of...

In addition, such structure would also allow room for possible future additions such as functions to read and write .mif images and .tck files to python. I have code for this which I could easily add if there ever was a need for it...

...there could be good scenarios for scripts that just need mrtrix.app; so the split in (just) those 2 logical modules makes good sense.

@Lestropie
Copy link
Member

Script refactor and creation of share/ directory now in here.

Lestropie and others added 6 commits February 7, 2017 11:12
…etion.py

Also tweaked a few little things, such as removing the regex search for '__debug' and ensuring that scripts now present in bin/ don't get added.
and update the docs to match.
@jdtournier
Copy link
Member Author

Just merged tag_0.3.16 in here. Everyone happy for me to merge back to tag_0.3.16 once the tests are done...? I still need to fix up the module handling, but there's no need to delay other changes like #809, I can work on that later.

Copy link
Member

@Lestropie Lestropie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it; we'll sort out any residual issues once everything's into tag_0.3.16 and we're all using it.

@jdtournier jdtournier merged commit 9bc955b into tag_0.3.16 Feb 7, 2017
@jdtournier jdtournier deleted the repo_fhs_restructure branch February 7, 2017 15:18
@jdtournier
Copy link
Member Author

done.

@thijsdhollander
Copy link
Contributor

👍 perfect, that should help sort out further issues and avoid too much chaos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants