Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable #2143

Merged
merged 2 commits into from
Oct 9, 2020

Conversation

xushiyan
Copy link
Member

@xushiyan xushiyan commented Oct 4, 2020

Remove APIs in HoodieTestUtils

  • listAllDataFilesAndLogFilesInPath
  • listAllLogFilesInPath
  • listAllDataFilesInPath
  • writeRecordsToLogFiles
  • createCleanFiles
  • createPendingCleanFiles

Migrate the callers to use HoodieTestTable and HoodieWriteableTestTable with new APIs added

  • listAllBaseAndLogFiles
  • listAllLogFiles
  • listAllBaseFiles
  • withLogAppends
  • addClean
  • addInflightClean

Also added related APIs in FileCreateUtils

  • createCleanFile
  • createRequestedCleanFile
  • createInflightCleanFile

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@xushiyan xushiyan added the pr:wip Work in Progress/PRs label Oct 4, 2020
@xushiyan xushiyan force-pushed the migrate-to-HoodieTestTable branch 5 times, most recently from 6fb5d84 to 3be9ec7 Compare October 5, 2020 00:02
Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles

Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean

Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
@xushiyan xushiyan removed the pr:wip Work in Progress/PRs label Oct 5, 2020
@xushiyan xushiyan marked this pull request as ready for review October 5, 2020 02:16
@xushiyan xushiyan requested a review from yanghua October 5, 2020 02:19
@xushiyan
Copy link
Member Author

xushiyan commented Oct 5, 2020

@yanghua this should be the 2nd last PR of this migration. Thanks.

Comment on lines 164 to 181
private void appendRecordsToLogFile(List<HoodieRecord> groupedRecords) throws Exception {
String partitionPath = groupedRecords.get(0).getPartitionPath();
HoodieRecordLocation location = groupedRecords.get(0).getCurrentLocation();
try (HoodieLogFormat.Writer logWriter = HoodieLogFormat.newWriterBuilder().onParentPath(new Path(basePath, partitionPath))
.withFileExtension(HoodieLogFile.DELTA_EXTENSION).withFileId(location.getFileId())
.overBaseCommit(location.getInstantTime()).withFs(fs).build()) {
Map<HoodieLogBlock.HeaderMetadataType, String> header = new HashMap<>();
header.put(HoodieLogBlock.HeaderMetadataType.INSTANT_TIME, location.getInstantTime());
header.put(HoodieLogBlock.HeaderMetadataType.SCHEMA, schema.toString());
logWriter.appendBlock(new HoodieAvroDataBlock(groupedRecords.stream().map(r -> {
try {
GenericRecord val = (GenericRecord) r.getData().getInsertValue(schema).get();
HoodieAvroUtils.addHoodieKeyToRecord(val, r.getRecordKey(), r.getPartitionPath(), "");
return (IndexedRecord) val;
} catch (IOException e) {
return null;
}
}).collect(Collectors.toList()), header));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java#writeRecordsToLogFiles

Comment on lines -355 to +404
public List<FileStatus> listAllFilesInTempFolder() throws IOException {
return FileSystemTestUtils.listRecursive(fs, new Path(Paths.get(basePath, HoodieTableMetaClient.TEMPFOLDER_NAME).toString()));
public FileStatus[] listAllFilesInTempFolder() throws IOException {
return FileSystemTestUtils.listRecursive(fs, new Path(Paths.get(basePath, HoodieTableMetaClient.TEMPFOLDER_NAME).toString())).toArray(new FileStatus[0]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to return array to work with HoodieTableFileSystemView APIs which take in array

@yanghua
Copy link
Contributor

yanghua commented Oct 8, 2020

@xushiyan Sorry for the late reply, the past week was during the National Day holiday, you know. Will review tomorrow.

@yanghua yanghua self-assigned this Oct 8, 2020
@xushiyan
Copy link
Member Author

xushiyan commented Oct 8, 2020

@xushiyan Sorry for the late reply, the past week was during the National Day holiday, you know. Will review tomorrow.

@yanghua Understood. No worries.

Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xushiyan left two comments.

@codecov-io
Copy link

codecov-io commented Oct 9, 2020

Codecov Report

Merging #2143 into master will increase coverage by 6.23%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2143      +/-   ##
============================================
+ Coverage     47.38%   53.61%   +6.23%     
- Complexity     2567     2845     +278     
============================================
  Files           358      359       +1     
  Lines         16484    16535      +51     
  Branches       1767     1777      +10     
============================================
+ Hits           7811     8866    +1055     
+ Misses         8032     6912    -1120     
- Partials        641      757     +116     
Flag Coverage Δ Complexity Δ
#hudicli 38.37% <ø> (ø) 193.00 <ø> (ø)
#hudiclient 100.00% <ø> (ø) 0.00 <ø> (ø)
#hudicommon 54.74% <ø> (ø) 1793.00 <ø> (ø)
#hudihadoopmr 33.05% <ø> (ø) 181.00 <ø> (ø)
#hudispark 65.51% <ø> (ø) 303.00 <ø> (ø)
#huditimelineservice 62.29% <ø> (-2.15%) 50.00 <ø> (+1.00) ⬇️
#hudiutilities 69.98% <ø> (+59.30%) 325.00 <ø> (+277.00)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
.../apache/hudi/timeline/service/TimelineService.java 25.00% <0.00%> (-2.54%) 6.00% <0.00%> (+1.00%) ⬇️
.../hudi/utilities/schema/SchemaRegistryProvider.java 0.00% <0.00%> (ø) 0.00% <0.00%> (ø%)
...va/org/apache/hudi/utilities/schema/SchemaSet.java 100.00% <0.00%> (ø) 3.00% <0.00%> (?%)
.../apache/hudi/utilities/HoodieSnapshotExporter.java 88.59% <0.00%> (+5.26%) 28.00% <0.00%> (ø%)
...e/hudi/utilities/transform/ChainedTransformer.java 100.00% <0.00%> (+11.11%) 4.00% <0.00%> (+1.00%)
...g/apache/hudi/utilities/schema/SchemaProvider.java 100.00% <0.00%> (+20.00%) 2.00% <0.00%> (+1.00%)
...in/java/org/apache/hudi/utilities/UtilHelpers.java 64.59% <0.00%> (+27.95%) 30.00% <0.00%> (+19.00%)
...ties/deltastreamer/HoodieDeltaStreamerMetrics.java 36.11% <0.00%> (+36.11%) 6.00% <0.00%> (+6.00%)
...ities/checkpointing/InitialCheckPointProvider.java 45.45% <0.00%> (+45.45%) 1.00% <0.00%> (+1.00%)
...di/utilities/sources/helpers/IncrSourceHelper.java 54.54% <0.00%> (+54.54%) 4.00% <0.00%> (+4.00%)
... and 29 more

Copy link
Contributor

@yanghua yanghua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @xushiyan Thanks for your contribution.

@yanghua yanghua merged commit 1d1d91d into apache:master Oct 9, 2020
prashantwason pushed a commit to prashantwason/incubator-hudi that referenced this pull request Feb 22, 2021
* [HUDI-995] Migrate HoodieTestUtils APIs to HoodieTestTable

Remove APIs in `HoodieTestUtils`
- listAllDataFilesAndLogFilesInPath
- listAllLogFilesInPath
- listAllDataFilesInPath
- writeRecordsToLogFiles
- createCleanFiles
- createPendingCleanFiles

Migrate the callers to use `HoodieTestTable` and `HoodieWriteableTestTable` with new APIs added
- listAllBaseAndLogFiles
- listAllLogFiles
- listAllBaseFiles
- withLogAppends
- addClean
- addInflightClean

Also added related APIs in `FileCreateUtils`
- createCleanFile
- createRequestedCleanFile
- createInflightCleanFile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants