Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove PubsubFileInjector and IntraBundleParallelization #957

Merged
merged 1 commit into from
Sep 14, 2016

Conversation

peihe
Copy link
Contributor

@peihe peihe commented Sep 14, 2016

No description provided.

@peihe
Copy link
Contributor Author

peihe commented Sep 14, 2016

R: @lukecwik @dhalperi

@lukecwik
Copy link
Member

lukecwik commented Sep 14, 2016

@lukecwik
Copy link
Member

LGTM

@asfgit asfgit merged commit fc96474 into apache:master Sep 14, 2016
asfgit pushed a commit that referenced this pull request Sep 14, 2016
@swegner
Copy link
Contributor

swegner commented Sep 14, 2016

IntraBundleParallelization is gone!! 🌟 ❗ 👍 🎉

@RXminuS
Copy link

RXminuS commented Jan 18, 2017

@swegner I'm sorry, why is this a good thing? This was the easiest way we had to make parallel HTTP requests without having to tinker with threads ourselves. For us it's a 10x speed difference if this options is there or not, for whatever reason the default DoFn just doesn't understand how to "parallelize network requests properly" and IntraBundleParallelization was the only fix.

@swegner
Copy link
Contributor

swegner commented Jan 18, 2017

My comment was related to the fact that IntraBundleParallelization carried a dependency on OldDoFn and cleaning it up gets Beam one-step closer in the DoFn migration.

I believe also IntraBundleParallelization was written using Dataflow implementation details outside of the Beam model. See BEAM-414.

Alternatives for parallelizing network requests would be a good discussion; would you mind posting your use-case to the user list (user@beam.apache.org)?

@RXminuS
Copy link

RXminuS commented Jan 18, 2017

Fair enough :-) Where did the depency on the OldDoFn come from? As far as I can tell it should be fairly easy to migrate, just looks like a bit of boilerplate Threading code.

I'll send in my use-case. It's actually a solution that we came up with together with our assigned Google Engineer based on some other projects that they'd done, so I'll have him send in details of those other projects as well.

@swegner
Copy link
Contributor

swegner commented Jan 18, 2017

OldDoFn is the previous implementation of DoFn from Dataflow 1.x. I don't know all the details on why it wasn't migrated, @lukecwik or @kennknowles might know more.

But I can see in the previous implementation various assumptions that are outside of the Beam model. In particular, scheduling work using the GCS executor service, and relying on an in-memory Sempaphore to track work state. Removing these assumptions gives runners more freedom to apply their own optimizations.

@bjchambers
Copy link
Contributor

bjchambers commented Jan 18, 2017 via email

@lukecwik
Copy link
Member

lukecwik commented Jan 18, 2017 via email

@RXminuS
Copy link

RXminuS commented Jan 18, 2017

Ah yeah I see now. Obviously I appreciate an automatic approach as much as the next lazy coder so there must be some way we can make that happen :-)

@RXminuS
Copy link

RXminuS commented Mar 6, 2017

Hey, can we open a new discussion about how to "thread" things properly now that IntraBundleParallelization is far gone. ATM, we're manually keying bundles and then executing a manual threaded DoFn over all the items in the bundle...but it's not ideal @lukecwik

@kennknowles
Copy link
Member

Yes, this is a good topic to bring up on user@beam.apache.org or dev@beam.apache.org.

@RXminuS
Copy link

RXminuS commented Mar 17, 2017

Honestly don't understand why discussions aren't more accessible...I really need to email a general address? Isn't the point of open source to collaborate and share the best ideas

@peihe peihe deleted the archetypes branch August 15, 2017 09:24
@JoshFerge
Copy link

is there any update on this discussion? this is quite an important use case for our team.

@swegner
Copy link
Contributor

swegner commented Feb 21, 2018

If you haven't yet, I recommend posting your question and use-case to dev@beam.apache.org or user@beam.apache.org mailing lists. I'm not aware of alternatives provided in the current version of Beam, but other users on the mailing list may have ideas. And contributions are of course welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants