Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HDFS copy logic #5218

Merged
merged 3 commits into from Apr 17, 2020
Merged

Fix HDFS copy logic #5218

merged 3 commits into from Apr 17, 2020

Conversation

xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Apr 7, 2020

Current hdfs cp will copy all the files from src dir to dest dir as a flatten layout.
This fix will ensure the dest directory could retain the same structure as src directory.

For HadoopPinotFS.copy(src,dest) behavior:

Before:

src/2014/01/01/myTable-20140101.avro
src/2014/01/02/myTable-20140102.avro
src/2014/01/03/myTable-20140103.avro
src/2014/01/04/myTable-20140104.avro

=>

dest/myTable-20140101.avro
dest/myTable-20140102.avro
dest/myTable-20140103.avro
dest/myTable-20140104.avro

New logic:

src/2014/01/01/myTable-20140101.avro
src/2014/01/02/myTable-20140102.avro
src/2014/01/03/myTable-20140103.avro
src/2014/01/04/myTable-20140104.avro

=>

dest/2014/01/01/myTable-20140101.avro
dest/2014/01/02/myTable-20140102.avro
dest/2014/01/03/myTable-20140103.avro
dest/2014/01/04/myTable-20140104.avro

Copy link
Contributor

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw copy method is used in HadoopSegmentGenerationJobRunner, where it copies the newly generated segments from staging dir to the final output dir. But why do the new segments store in different sub-dirs that makes you make this change?

@xiangfu0
Copy link
Contributor Author

I saw copy method is used in HadoopSegmentGenerationJobRunner, where it copies the newly generated segments from staging dir to the final output dir. But why do the new segments store in different sub-dirs that makes you make this change?

For all other pinot fs, the behavior keeps same as directory copying retains sub-dir structure. Otherwise, what if there are files with same name?

If user wants to just copy all the files recursively from a directory and flatten to a new directory, then user should list all the files then do file copy to dest dir.

Copy link
Contributor

@jackjlli jackjlli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Minor comments but LGTM.

@codecov-io
Copy link

codecov-io commented Apr 17, 2020

Codecov Report

Merging #5218 into master will increase coverage by 0.25%.
The diff coverage is 73.86%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #5218      +/-   ##
============================================
+ Coverage     65.90%   66.16%   +0.25%     
============================================
  Files          1052     1067      +15     
  Lines         54170    54311     +141     
  Branches       8078     8074       -4     
============================================
+ Hits          35702    35935     +233     
+ Misses        15819    15741      -78     
+ Partials       2649     2635      -14     
Impacted Files Coverage Δ Complexity Δ
...e/pinot/broker/api/resources/PinotBrokerDebug.java 76.66% <ø> (ø) 0.00 <0.00> (ø)
.../BrokerResourceOnlineOfflineStateModelFactory.java 55.81% <ø> (ø) 0.00 <0.00> (ø)
.../pinot/broker/broker/helix/HelixBrokerStarter.java 71.97% <ø> (ø) 0.00 <0.00> (ø)
...thandler/SingleConnectionBrokerRequestHandler.java 92.68% <ø> (ø) 0.00 <0.00> (ø)
...ting/instanceselector/InstanceSelectorFactory.java 71.42% <ø> (ø) 0.00 <0.00> (ø)
...er/routing/segmentpruner/SegmentPrunerFactory.java 83.33% <ø> (ø) 0.00 <0.00> (ø)
...outing/segmentselector/SegmentSelectorFactory.java 60.00% <ø> (ø) 0.00 <0.00> (ø)
...oker/routing/timeboundary/TimeBoundaryManager.java 87.50% <ø> (ø) 0.00 <0.00> (ø)
...mmon/assignment/InstanceAssignmentConfigUtils.java 67.50% <ø> (ø) 0.00 <0.00> (?)
...not/common/assignment/InstancePartitionsUtils.java 73.17% <ø> (ø) 0.00 <0.00> (ø)
... and 283 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79c0838...a208fb0. Read the comment docs.

@xiangfu0 xiangfu0 merged commit d68ef7b into apache:master Apr 17, 2020
@xiangfu0 xiangfu0 deleted the fixing-hdfs-copy-logic branch April 17, 2020 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants