Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring load() to allow caller to override file-reading logic #205

Merged
merged 8 commits into from
Jul 16, 2019

Conversation

mlin
Copy link
Collaborator

@mlin mlin commented Jul 15, 2019

as planned in #177 discussion

@mlin
Copy link
Collaborator Author

mlin commented Jul 15, 2019

@dinvlad WDYT of this?

I hope making the read_source callback an async one will not be too inconvenient; this maximizes the flexibility since it'll often be used to read from network URIs.

@coveralls
Copy link

coveralls commented Jul 15, 2019

Pull Request Test Coverage Report for Build 910

  • 85 of 90 (94.44%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.01%) to 96.873%

Changes Missing Coverage Covered Lines Changed/Added Lines %
WDL/init.py 6 7 85.71%
WDL/Tree.py 27 29 93.1%
WDL/_parser.py 51 53 96.23%
Totals Coverage Status
Change from base Build 901: -0.01%
Covered Lines: 4058
Relevant Lines: 4189

💛 - Coveralls

WDL/Tree.py Outdated
if not os.path.isabs(importer_dir):
morepaths = [os.path.join(p, importer_dir) for p in path]
path.extend(morepaths)
path.append(importer_dir)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinvlad I want to highlight the above logic related to handling nested relative imports, which might become tricky for the LSP get_document() use case. For example suppose we have a directory tree,

A
├── a.wdl
└── B
    ├── b.wdl
    └── C
        └── c.wdl

and A/a.wdl does import B/b.wdl, b.wdl does import C/c.wdl. When asked to read_source("C/C.wdl") one has to figure out that's relative to B/ not A/.

Copy link

@dinvlad dinvlad Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the uri parameter to read_source is verbatim from import, not a fully resolved absolute path? I thought we could just the uri to LSP if it was a full file://A/B/C/c.wdl.

But if it's relative, we'll have to essentially use everything from read_source_default() until with open(fn, "r") as infile:, to fully resolve the uri but not to read it (as reading will be done from memory in LSP, or from file system if the uri is not in memory).

So given that we have to essentially reuse all but the last couple lines in read_source_default(), would it perhaps make sense to move the "full URI resolution" functionality into a separate function, and then to provide a read_source_uri(uri) method to actually implement loading from a fully-resolved URI?

Copy link

@dinvlad dinvlad Jul 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for a network URI, we don't have to resolve it so it will be just short-cut by read_source_default(). So making the code in read_source_default() run always first, before calling read_source_uri(), could be a general solution :)

@mlin mlin added this to In progress in miniwdl Jul 15, 2019
@dinvlad
Copy link

dinvlad commented Jul 15, 2019

I hope making the read_source callback an async one will not be too inconvenient; this maximizes the flexibility since it'll often be used to read from network URIs.

Agreed on this point!
Please see above for the uri stuff.

@mlin
Copy link
Collaborator Author

mlin commented Jul 16, 2019

@dinvlad thanks for the feedback! I did more work on it to additionally store the absolute path after resolution of relative loads/imports. The diff is getting pretty unwieldy now but the key docstring:

https://github.com/chanzuckerberg/miniwdl/pull/205/files#diff-c7ebedebf9b171179223458e04e073dcR91

and upshot:

  1. read_source is now responsible for returning the absolute filename/URI of the document read, in addition to its contents.
  2. the pos: SourcePosition attribute on all AST nodes including Document now has separate uri and abspath attributes, the first being whatever was passed initially and the second being the resolved absolute path.
  3. read_source now gets importer: Document instead of importer_uri: str, so in particular it has importer.pos.{uri,abspath}. This permitted simplifying the logic for handing relative paths in read_source_default. (Will it still help further to break this out into something you can call on its own?)

@dinvlad
Copy link

dinvlad commented Jul 16, 2019

Hi @mlin, I'm not sure if this addresses our use case directly, if I understood the signature of read_source correctly. Could you clarify, does MiniWDL invoke read_source for each file that it wants to get the source text for? In that case, we have to make it resolve text corresponding to the fully qualified uri, not importer.pos.uri, because we aren't asking to resolve text for the importer uri, only for the imported uri. Does that make sense? Thanks

@dinvlad
Copy link

dinvlad commented Jul 16, 2019

In addition, which function can we call to fully resolve the URI first (that is, to make it absolute file:// or keep it as http(s):// URI)?

To re-iterate, what I was asking for is:

  1. Provide a general resolve_uri function that resolves relative or HTTP URI into absolute or HTTP URI, where in case of HTTP URI it just returns the URI as is. This function should NOT attempt to also resolve the text.

    In the case of LSP, we don't want to re-implement this function, and would like to just re-use its default implementation.

  2. Add a read_source function (as an option to load or load_async) that takes in an absolute URI (file:// or http(s)://) and resolves it into text.

    This function we will override with our implementation for LSP, which can be as simple as

    return ls.workspace.get_document(absolute_uri).source

Would it make sense to split functionality in this way, or did I misunderstand the intent of the current read_source?

"""


async def resolve_file_import(uri: str, path: List[str], importer: Optional[Document]) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dinvlad ok, sold :) I made read_source_default call this, rather than taking its result as an argument, in order to avoid having to make them both overridable load() arguments

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!!

@mlin mlin merged commit 61f58eb into master Jul 16, 2019
miniwdl automation moved this from In progress to Done Jul 16, 2019
@mlin mlin deleted the mlin-async-load branch July 16, 2019 23:22
@dinvlad
Copy link

dinvlad commented Aug 12, 2019

Hi @mlin,

I just realized this is still not working as expected. E.g. when it parses Optimus.wdl for import "FastqToUBam.wdl" as FastqToUBam, read_source() receives FastqToUBam.wdl as the uri, instead of the fully-resolved URI as we discussed. Is it possible to add functionality such that all relative imports would be fully-resolved first, before passing them to read_source()?

Thanks

@dinvlad
Copy link

dinvlad commented Aug 12, 2019

Ok, I was able to achieve this functionality using

uri = await WDL.Tree.resolve_file_import(uri, path, importer)

I forgot that this is how we decided to do it :-) I guess that'd still be nice to do automatically instead, but not critical.

@dinvlad
Copy link

dinvlad commented Aug 12, 2019

Btw, I also noticed that WDL.resolve_file_import() is not currently implemented yet. I had to use WDL.Tree.resolve_file_import() instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
miniwdl
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants