New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-240] Display data links for input and output files #300
Conversation
R: @dhalperi |
Kenn, can you please take an initial pass here? Use your good sense. |
ping @kennknowles |
private static String getBrowseUrl(String filePattern) { | ||
IOChannelFactory factory; | ||
try { | ||
factory = IOChannelUtils.getFactory(filePattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression of the current code is that the IOChannelFactory
is never interrogated unless validate
is true. This should probably stick to that discipline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated AvroIO and TextIO to be resilient to bad file schemes when their validation is disabled. I didn't update sources and sinks, but now I think they probably need the same treatment. I'll work on that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
IO transforms internally use IOChannelUtils to retrieve an IOChannelFactory for file operations. It's not safe to assume that this is safe at construction time because a custom IOChannelFactory may not be registered, or the transform implementation may be replaced altogether.
I've addressed all feedback so far. Please take another look. @kennknowles |
ping @kennknowles :) |
@@ -140,9 +140,27 @@ public void validate(PipelineOptions options) {} | |||
public void populateDisplayData(DisplayData.Builder builder) { | |||
super.populateDisplayData(builder); | |||
|
|||
// Append wildcard to browseUrl input since this is a filename prefix | |||
String browseUrl = null; | |||
String browseFilePattern = baseOutputFilename + "*"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has some issues:
- If the prefix is just a bucket name, GcsPath will throw
- If it is a local file pattern, adding a glob pattern doesn't make sense.
@tgroh, want to take a pass at this one? |
"Unexpected error while retrieving browse url for file pattern: %s", filePattern), e); | ||
} | ||
|
||
return factory.getBrowseUrl(filePattern); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move into the try block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason? I prefer to keep try blocks tight so it's clear which operation can throw and to not catch more than expected.
Have not had a chance to re-scan recently. However, a PR like this should not be changing APIs inside of IO classes. Those changes should be factored out into their own PR for discussion. It's not obvious to me that this is a mandatory API for all IOChannelFactories. |
As a status update here: I spoke with @dhalperi and @lukecwik about this on Friday, and we're not quite happy with the current design. The |
@swegner status of this? should it just be closed and abandoned? |
Yes, I will close for now and revisit later. |
Backport: Disable exec-maven-plugin cleanupDaemonThreads
PEP 484 specifies that they be hinted as the type of a single element, as seen from the caller's perspective. Closes apache#289. Co-authored-by: Christopher Wilcox <crwilcox@google.com>
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull request
mvn clean verify
. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>
in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
With display data, SDK authors have the ability to annotate display items with a link URLs. This adds browse URLs for GCS and local files, and attach them to well-known source/sink display data.