Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-22519][flink-python] support tar python archives #15813

Closed

Conversation

YikSanChan
Copy link
Contributor

@YikSanChan YikSanChan commented Apr 30, 2021

What is the purpose of the change

Support tar python archives.

Brief change log

  • Add TarUtils.
  • Choose between ZipUtils vs. TarUtils based on the archivePath.
  • TODO: tests and docs.

Verifying this change

This change added tests and can be verified as follows: TODO

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs and JavaDocs

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit ec561cd (Fri Apr 30 02:12:21 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 30, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@YikSanChan YikSanChan force-pushed the 22519-support-targz-python-archives branch from ec561cd to 0d44263 Compare May 10, 2021 07:46
@YikSanChan YikSanChan changed the title [FLINK-22519][flink-python] support tar.gz python archives [FLINK-22519][flink-python] support tar python archives May 10, 2021
String untarCommand =
gzipped
? String.format(
"gzip -dc '%s' | (cd '%s' && tar -xf -)", inFilePath, targetDirPath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if I can use tar -xzf directly? It is simpler, but I am not exactly sure why hadoop-common chooses to do the more verbose way, i.e., gzip ... | tar -xf ...

if (!f.isDirectory() && !f.mkdirs()) {
throw new IOException("failed to create directory " + f);
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dianfu
Copy link
Contributor

dianfu commented May 10, 2021

@YikSanChan Thanks a lot for the update. Have left a few comments as above. Besides, it would be great if we could add some tests.

@YikSanChan YikSanChan force-pushed the 22519-support-targz-python-archives branch from 180a9c6 to 1533b82 Compare May 11, 2021 11:58
@zjffdu
Copy link
Contributor

zjffdu commented Jun 25, 2021

Any update on this PR ? Supporting tar is definitely necessary.

@YikSanChan
Copy link
Contributor Author

Any update on this PR ? Supporting tar is definitely necessary.

Hi Jeff, I haven't worked on this for a while. Though if you see the need being urgent, I am happy to revisit and try to finish this next week.

@zjffdu
Copy link
Contributor

zjffdu commented Jun 25, 2021

Thanks @YikSanChan I am trying to make a workaround via yarn.ship-archives, but it would still be helpful if this PR is merged.

@dianfu
Copy link
Contributor

dianfu commented Jun 30, 2021

@YikSanChan Regarding to tests, you could take a look ZipUtilsTests which is added recently for reference.

@dianfu
Copy link
Contributor

dianfu commented Jul 16, 2021

@YikSanChan What's the status of this PR? I'm asking because we are approaching the feature freeze of 1.14 which is planned at the end of this month. It would be great to include this feature in 1.14.

@zjffdu
Copy link
Contributor

zjffdu commented Jul 16, 2021

@YikSanChan @dianfu Here's one zeppelin notebook which create conda env tar and zip file to use customized python env in yarn cluster. But it would be great if this PR can be merged so that I can only create one tar file. http://23.254.161.240/#/notebook/2G8N1WTTS

@YikSanChan
Copy link
Contributor Author

@dianfu @zjffdu my bad since I wasn't working on this for a while. Luckily I will get some free time next week, I will try and let you folks know by EOW next week

@zjffdu
Copy link
Contributor

zjffdu commented Jul 26, 2021

@YikSanChan Any update ?

@YikSanChan
Copy link
Contributor Author

YikSanChan commented Jul 28, 2021

@YikSanChan Any update ?

Hi Jeff, unfortunately I am not able to work on this recently.

@dianfu
Copy link
Contributor

dianfu commented Aug 16, 2021

@YikSanChan Since this feature is very important and you don't have much bandwidth on this work. If you don't mind, I'd like to take over this PR and continue the work based on this PR.

@YikSanChan
Copy link
Contributor Author

@YikSanChan Since this feature is very important and you don't have much bandwidth on this work. If you don't mind, I'd like to take over this PR and continue the work based on this PR.

Feel free, no worry at all.

@dianfu
Copy link
Contributor

dianfu commented Aug 16, 2021

@YikSanChan Thanks a lot~

@dianfu dianfu closed this in ab7f541 Aug 16, 2021
hhkkxxx133 pushed a commit to hhkkxxx133/flink that referenced this pull request Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants