New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes globbing issues #2404
base: main
Are you sure you want to change the base?
Fixes globbing issues #2404
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2404 +/- ##
==========================================
+ Coverage 52.00% 52.11% +0.11%
==========================================
Files 519 520 +1
Lines 60243 60386 +143
Branches 8297 8298 +1
==========================================
+ Hits 31327 31469 +142
+ Misses 27239 27238 -1
- Partials 1677 1679 +2
Continue to review full report at Codecov.
|
I know it's a draft, so just a quick questions: the |
So here are our options in regards to globbing in the command line, as discussed in the previous meeting: 1. Always deglob unless
|
My vote is for 3rd option try filename and deglob. |
Option 3 ftw ;) |
I vote for 2, but disable glob expansion by setting a special environment variable instead of escaping and/or passing Nevertheless, it's probably a good idea to check if a file with the name identical to a glob pattern exists, and, if this happens, issue a warning with the message "if you actually want to pass this file, please set this environment variable". |
so far it looks like option 3, but @Tomaz-Vieira has sneakily not voiced a strong opinion yet. |
2a741ff
to
f09f72a
Compare
f09f72a
to
df14245
Compare
|
||
from lazyflow.utility.pathHelpers import lsH5N5, globH5N5, globNpz | ||
|
||
# pyright: strict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like you introduce new tooling here. Is it comparable to mypy? Is it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's... complementary to mypy, I guess. It does some things better, some things worse. Having that comment there shouldn't hurt anyone, but I can mark that file to be strictly checked elsewhere (outside of the source code) if it's bothering you =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you check other parts of the code without the strict option? Or did you only check this one file at all? When would you have non-strict checks (strict sounds like the right thing to do? :D)...
In any case, maybe that is better suited for pyproject.toml
?
raise ValueError(f"Could not convert {value} to a valid Scheme") | ||
|
||
@classmethod | ||
def contains(cls, value: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to be used only once, and there it could also be done by constructing in a `try... except``` with raise from there...
I think it's a bit weird that this enables:
>>> x = Scheme("http")
>>> x.contains("https")
True
Currently ilastik will not allow using files whose names can be interpreted as globs (ar colon-separated lists of files). This problems permeates surprisingly deep into the codebase, affecting all stack reader operators, (de)serialization and file selection everywhere (data selection applet and GUI, command line parsing, batch applet, importing labels, etc).
This PR attempts to put some order into things, and move all globbing to a single place. Also, globbing will only happen when reading user input, and nowhere else.
DataUrl, DataPath and Dataset
This PR creates the
DataUrl
class and, most importantly, its subclass,DataPath
.DataPath
is meant to feel a like python's builtinPath
, but able to.glob()
and check existence (.exists()
) even when using internal paths to archive files like.h5
,.n5
and.npz
.A group of
DataPath
s is encapsulated into aDataset
class. ADataset
is what the user selects when he fills a role of an ilastik Lane. It can be a stack - which is aDataset
with multipleDataPath
s - or it could be a single file - which is aDataset
with a singleDataPath
. In the future,Dataset
will be further generalized to a group ofDataUrl
s to encompass remote files and precomputed chunks).All path strings provided by the user are to be converted into
Dataset
s as soon as possible. AllDataPath
s in aDataset
are guaranteed to exist. Also, NODataPath
inside aDataset
will be a globstring, even if its path has colons, brackets or other funny symbols.FilesystemDatasetInfo
no longer deals with globbed, colon-separated strings, but rather takes a fully expandedDataset
. Because of that, a lot of path handling had been removed fromFilesystemDatasetInfo
.Data selection
There is a new implementation of the stack selection dialog, which now produces
Dataset
s instead of strings. The machinery for selecting multiple lanes via patterns or full directory is also in place, but inactive. We can activate it if we decide to close #2283(De)serialization of stacks
Serialization of stacks no longer relies on joining paths with
os.pathsep
. This is still done for backwards compatibility, but now each path of the stack is saved as an item of a list.The logic for finding missing files has been integrated with the
Dataset
logic and the new stack selecting dialog and can handle missing stacks, so that we are one step closer to NOT internalizing stacks all the time.fixes #2391