Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-240] Display data links for input and output files #300

Closed
wants to merge 12 commits into from

Conversation

swegner
Copy link
Contributor

@swegner swegner commented May 6, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

With display data, SDK authors have the ability to annotate display items with a link URLs. This adds browse URLs for GCS and local files, and attach them to well-known source/sink display data.

@swegner
Copy link
Contributor Author

swegner commented May 6, 2016

R: @dhalperi

@dhalperi
Copy link
Contributor

dhalperi commented May 6, 2016

R: -@dhalperi @kennknowles

Kenn, can you please take an initial pass here? Use your good sense.

@swegner
Copy link
Contributor Author

swegner commented May 9, 2016

ping @kennknowles

private static String getBrowseUrl(String filePattern) {
IOChannelFactory factory;
try {
factory = IOChannelUtils.getFactory(filePattern);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression of the current code is that the IOChannelFactory is never interrogated unless validate is true. This should probably stick to that discipline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated AvroIO and TextIO to be resilient to bad file schemes when their validation is disabled. I didn't update sources and sinks, but now I think they probably need the same treatment. I'll work on that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

IO transforms internally use IOChannelUtils to retrieve
an IOChannelFactory for file operations. It's not safe to
assume that this is safe at construction time because a
custom IOChannelFactory may not be registered, or the transform
implementation may be replaced altogether.
@swegner
Copy link
Contributor Author

swegner commented May 10, 2016

I've addressed all feedback so far. Please take another look. @kennknowles

@swegner
Copy link
Contributor Author

swegner commented May 12, 2016

ping @kennknowles :)

@@ -140,9 +140,27 @@ public void validate(PipelineOptions options) {}
public void populateDisplayData(DisplayData.Builder builder) {
super.populateDisplayData(builder);

// Append wildcard to browseUrl input since this is a filename prefix
String browseUrl = null;
String browseFilePattern = baseOutputFilename + "*";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has some issues:

  • If the prefix is just a bucket name, GcsPath will throw
  • If it is a local file pattern, adding a glob pattern doesn't make sense.

@swegner
Copy link
Contributor Author

swegner commented May 12, 2016

@tgroh, want to take a pass at this one?

"Unexpected error while retrieving browse url for file pattern: %s", filePattern), e);
}

return factory.getBrowseUrl(filePattern);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move into the try block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason? I prefer to keep try blocks tight so it's clear which operation can throw and to not catch more than expected.

@dhalperi
Copy link
Contributor

Have not had a chance to re-scan recently. However, a PR like this should not be changing APIs inside of IO classes. Those changes should be factored out into their own PR for discussion. It's not obvious to me that this is a mandatory API for all IOChannelFactories.

@swegner
Copy link
Contributor Author

swegner commented May 16, 2016

As a status update here: I spoke with @dhalperi and @lukecwik about this on Friday, and we're not quite happy with the current design. The browseUrl probably shouldn't be a member of IOChannelFactory. We're thinking about moving to a design where the browse URL logic is inside a display data specific component.

@dhalperi
Copy link
Contributor

dhalperi commented Jun 8, 2016

@swegner status of this? should it just be closed and abandoned?

@swegner
Copy link
Contributor Author

swegner commented Jun 8, 2016

Yes, I will close for now and revisit later.

@swegner swegner closed this Jun 8, 2016
@swegner swegner deleted the displaydata-urls branch July 7, 2016 16:44
dhalperi pushed a commit to dhalperi/beam that referenced this pull request Aug 23, 2016
Backport: Disable exec-maven-plugin cleanupDaemonThreads
iemejia pushed a commit to iemejia/beam that referenced this pull request Jan 12, 2018
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
PEP 484 specifies that they be hinted as the type of a single element,
as seen from the caller's perspective.

Closes apache#289.

Co-authored-by: Christopher Wilcox <crwilcox@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants