Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-535] Ensure Compaction Plan is always written in .aux folder to avoid 0.5.0/0.5.1 reader-writer compatibility issues #1229

Merged
merged 1 commit into from Jan 17, 2020

Conversation

bvaradar
Copy link
Contributor

What is the purpose of the pull request

PR #1009 introduces table layout version which is used to determine if rename must be allowed or not. Renames are disabled by default in 0.5.1 but users can enable them in ingestion through write config. Along with this feature, another related change to avoid writing compaction plans in .aux folder is added. This change could result in race conditions when we use different versions for reader and writer. This PR effectively reverts the compaction plan storage change allowing it to be also stored in .aux folder.

Brief change log

(for example:)

  • Ensure Compaction Plan is always written in .aux folder to avoid 0.5.0/0.5.1 reader-writer compatibility issues

Verify this pull request

(Please pick either of the following options)

Existing unit-tests covers the code change that is done

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@bvaradar
Copy link
Contributor Author

@vinothchandar : when you get a chance, please take a look.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, based on this understanding.. Compaction .requested is used by queries to determine if a file slice is pending compaction and if we had new writers , they old readers wont find the file in .aux..
clean need not be written in .aux since there are no real query facing implications?
Few follow ups though to be sure => should n't we double write to both .aux and timeline... There are going to be more compaction like actions coming up (index, timeline compactions..)..So they will also go to .aux? if we just double wrote (without any other complications), then the new writers can atleast function without hacks/special casing... Given enough time, we can also make the readers upgrade and read it from timeline and eventually drop this whole writing into .aux business?
Not sure if that makes sense.. but hopefully it gives something to think about..

I still think this change fixes the problem at hand.. So merge if you don't see any red flags from thequestions above.

@bvaradar bvaradar force-pushed the hudi-535 branch 2 times, most recently from 6e62d44 to a3b6ac7 Compare January 16, 2020 04:44
@bvaradar
Copy link
Contributor Author

@vinothchandar : I have made changes to handle the compatibility issue when reading compaction plan instead of double writing. Can you take one more look at this and review

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 comment on the ordering

detailPath = new Path(metaClient.getMetaPath(), instant.getFileName());
public Option<byte[]> readCompactionPlanAsBytes(HoodieInstant instant) {
try {
// This is going to be the common case in future when 0.5.1 is deployed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then in the meantime, every reader will be doing two RPCs for this method right? I am thinking about flipping the order.. read from aux first, then fallback to meta path...

This way, when we switch to writing in metapath, only the older readers will incur this additional RPC... Does it make sense?

@vinothchandar
Copy link
Member

@bvaradar any updates?

… avoid 0.5.0/0.5.1 reader-writer compatibility issues
@bvaradar
Copy link
Contributor Author

@vinothchandar : updated the PR to handle compatibility. Jira : https://issues.apache.org/jira/browse/HUDI-546 to do the follow-up of stopping the write to aux folder.

@vinothchandar
Copy link
Member

LGTM .. merging

@vinothchandar vinothchandar merged commit 923e2b4 into apache:master Jan 17, 2020
sumit-dp pushed a commit to Schedule1/incubator-hudi that referenced this pull request Feb 25, 2020
… avoid 0.5.0/0.5.1 reader-writer compatibility issues (apache#1229)
sumit-dp pushed a commit to Schedule1/incubator-hudi that referenced this pull request Mar 6, 2020
…st for hoodie-client module (apache#930)

[HUDI-271] Create QuickstartUtils for simplifying quickstart guide

- This will be used in Quickstart guide (Doc changes to follow in a seperate PR). The intention is to simplify quickstart to showcase hudi APIs by writing and reading using spark datasources.
- This is located in hudi-spark module intentionally to bring all the necessary classes in hudi-spark-bundle finally.

HUDI-121 : Address comments during RC2 voting

1. Remove dnl utils jar from git
2. Add LICENSE Headers in missing files
3. Fix NOTICE and LICENSE in all HUDI packages and in top-level
4. Fix License wording in certain HUDI source files
5. Include non java/scala code in RAT licensing check
6. Use whitelist to include dependencies as part of timeline-server bundling

[HUDI-121] Update Release notes

[HUDI-121] Fix bugs in Release Scripts found during RC creation

[HUDI-287] Address comments during review of release candidate
  1. Remove LICENSE and NOTICE files in hoodie child modules.
  2. Remove developers and contributor section from pom
  3. Also ensure any failures in validation script is reported appropriately
  4. Make hoodie parent pom consistent with that of its parent apache-21 (https://github.com/apache/maven-apache-parent/blob/apache-21/pom.xml)

Update Release notes

[HUDI-121] Fix bug in validation in create_source_release.sh

[HUDI-121] Fix bug in validation in deploy_staging_jars.sh

[HUDI-265] Failed to delete tmp dirs created in unit tests (apache#928)

[HUDI-285] Implement HoodieStorageWriter based on actual file type (apache#936)

[HUDI-121] Remove leftover notice file and replace com.uber.hoodie with org.apache.hudi in log4j properties

[HUDI-121] Prepare for 0.5.0-incubating-rc5

[HUDI-293] Remove KEYS file from github repository

[HUDI-232] Implement sealing/unsealing for HoodieRecord class (apache#938)

[HOTFIX] Move to openjdk to get travis passing (apache#944)

[MINOR] Add incubating to NOTICE and README.md

Please enter the commit message for your changes. Lines starting

Rebased with Huid master
added coudera profile

[HUDI-292] Avoid consuming more entries from kafka than specified sourceLimit. (apache#947)

 - Special handling when allocedEvents > numEvents
 - Added unit tests

[Docs] Update README.md (apache#955)

[HUDI-298] Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables (apache#956)

* Fix issue with incorrect column mapping casusing bad data, during on-the-fly merge of Real Time tables

[HUDI-301] fix path error when update a non-partition MOR table

Shade and relocate Avro dependency in hadoop-mr-bundle

[HUDI-121] Fix licensing issues found during RC voting by general incubator group

Update RELEASE Notes in master

[HUDI-121] Fix issues in release scripts

[HUDI-40] Add parquet support for the Delta Streamer (apache#949)

[HUDI-283] : Ensure a sane minimum for merge buffer memory (apache#964)

- Some environments e.g spark-shell provide 0 for memory size
- This causes unnecessary performance degradation

[MINOR] Remove release notes and move confluent repository to hoodie parent pom

[MINOR] Add backtick escape while syncing partition fields (apache#967)

[MINOR] Move all repository declarations to parent pom (apache#966)

[HUDI-290] Normalize test class name of all test classes (apache#951)

[HUDI-130] Paths written in compaction plan needs to be relative to base-path

[HUDI-169] Speed up rolling back of instants (apache#968)

[MINOR] Fix vm crashes (apache#979)

[MINOR] Fix no output in travis (apache#984)

[MINOR] fix annotation in teardown (apache#990)

[MINOR] Fix avro schema warnings in build

[HUDI-313] Fix select count star error when querying a realtime table

synchronized lock on conf object instead of class

Bump checkstyle from 8.8 to 8.18 (apache#981)

Bumps [checkstyle](https://github.com/checkstyle/checkstyle) from 8.8 to 8.18.
- [Release notes](https://github.com/checkstyle/checkstyle/releases)
- [Commits](checkstyle/checkstyle@checkstyle-8.8...checkstyle-8.18)

Signed-off-by: dependabot[bot] <support@github.com>

Bump httpclient from 4.3.2 to 4.3.6 (apache#980)

Bumps httpclient from 4.3.2 to 4.3.6.

Signed-off-by: dependabot[bot] <support@github.com>

[HUDI-312] Make docker hdfs cluster ephemeral. This is needed to fix flakiness in integration tests. Also, Fix DeltaStreamer hanging issue due to uncaught exception

[HUDI-314] Fix multi partition keys error when querying a realtime table

Add MOR integration testing

[HUDI-321] Support bulkinsert in HDFSParquetImporter (apache#987)

- Add bulk insert feature
- Fix some minor issues

[MINOR] Add features and instructions to build Hudi in README (apache#992)

[HUDI-324] TimestampKeyGenerator should support milliseconds (apache#993)

[HUDI-302]: simplified countInstants() method in HoodieDefaultTimeline (apache#997)

[HUDI-245]: replaced instances of getInstants() and reverse() with getReverseOrderedInstants() (apache#1000)

[DOCS] Update to align with original Uber whitepaper (apache#999)

[DOCS] Change Hudi acronyms to plural

[HUDI-253]: added validations for schema provider class (apache#995)

[HUDI-218] Adding Presto support to Integration Test (apache#1003)

[HUDI-137] Hudi cleaning state changes should be consistent with compaction actions

Before this change, Cleaner performs cleaning of old file versions and then stores the deleted files in .clean files.
With this setup, we will not be able to track file deletions if a cleaner fails after deleting files but before writing .clean metadata.
This is fine for regular file-system view generation but Incremental timeline syncing relies on clean/commit/compaction metadata to keep a consistent file-system view.

Cleaner state transitions is now similar to that of compaction.

1. Requested : HoodieWriteClient.scheduleClean() selects the list of files that needs to be deleted and stores them in metadata
2. Inflight : HoodieWriteClient marks the state to be inflight before it starts deleting
3. Completed : HoodieWriteClient marks the state after completing the deletion according to the cleaner plan

[HUDI-306] Support Glue catalog and other hive metastore implementations (apache#961)

- Support Glue catalog and other metastore implementations
- Remove shading from hudi utilities bundle
- Add maven profile to optionally shade hive in utilities bundle

[HUDI-80] Leverage Commit metadata to figure out partitions to be cleaned for Cleaning by commits mode (apache#1008)

[HUDI-330] add EmptyStatement java checkstyle rule

Migrate integration tests to spark 2.4.4

[HUDI-329] Presto Containers for integration test must allow newly built local jars to override

[HOTFIX] fix missing version of rat-plugin (apache#1015)

- Fixing RT queries for HiveOnSpark that causes race conditions
- Adding more comments to understand usage of reader/writer schema

- Ensure that rollback instant is always created before the next commit instant.
  This especially affects IncrementalPull for MOR tables since we can end up pulling in
  log blocks for uncommitted data
- Ensure that generated commit instants are 1 second apart

[HUDI-339] Add support of Azure cloud storage (apache#1019)

- Add Azure WASB (BLOB) and ADLS storage in StorageSchemes enum
- Update testStorageSchemes to test new added storage

[HUDI-342] add pull request template for hudi project (apache#1022)

[HUDI-346] Set allowMultipleEmptyLines to false for EmptyLineSeparator rule (apache#1025)

[HOTFIX] Fix error configuration item of dockerfile-maven-plugin

[HUDI-350]: updated default value of config.getCleanerCommitsRetained() in javadocs

[HUDI-348] Add Issue template for the project (apache#1029)

[HUDI-345] Fix used deprecated function (apache#1024)

- Schema.parse() with new Schema.Parser().parse
- FSDataOutputStream constructor

[HUDI-328] Adding delete api to HoodieWriteClient (apache#1004)

[HUDI-328]  Adding delete api to HoodieWriteClient and Spark DataSource

[MINOR] Some minor optimizations in HoodieJavaStreamingApp (apache#1046)

[HUDI-362] Adds a check for the existence of field (apache#1047)

[HUDI-358] Add Java-doc and importOrder checkstyle rule (apache#1043)

- import groups are separated by one blank line
- org.apache.hudi.* at the top location

[HUDI-359] Add hudi-env for hudi-cli module (apache#1042)

[HUDI-340]: made max events to read from kafka source configurable (apache#1039)

[HUDI-327] Add null/empty checks to key generators (apache#1040)

* Adds null and empty checks to all key generators.
* Also improves error messaging for key generator issues.

[HUDI-325] Fix Hive partition error for updated HDFS Hudi table (apache#1001)

[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule (apache#1048)

[HUDI-364] Refactor hudi-hive based on new ImportOrder code style rule

[HUDI-366] Refactor some module codes based on new ImportOrder code style rule (apache#1055)

[HUDI-366] Refactor hudi-hadoop-mr / hudi-timeline-service / hudi-spark / hudi-integ-test / hudi- utilities based on new ImportOrder code style rule

[HUDI-373] Refactor hudi-client based on new ImportOrder code style rule (apache#1056)

[HUDI-209] Implement JMX metrics reporter (apache#1045)

[HUDI-372] Support the shortName for Hudi DataSource (apache#1054)

- Ability to do `spark.write.format("hudi")...`

[HUDI-374] Unable to generateUpdates in QuickstartUtils (apache#1059)

[HUDI-357] Refactor hudi-cli based on new comment and code style rules (apache#1051)

[HUDI-370] Refactor hudi-common based on new ImportOrder code style rule (apache#1063)

[DOCS] Update Hudi Readme (apache#1058)

- Add build status
- Clean up layout

[DOCS] Update the build source link (apache#1071)

[MINOR] Update some urls from http to https in the README file (apache#1074)

[HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path (apache#1062)

[HUDI-294] Delete Paths written in Cleaner plan needs to be relative to partition-path

[HUDI-355] Refactor hudi-common based on new comment and code style rules (apache#1049)

[HUDI-355] Refactor hudi-common based on new comment and code style rules

[HUDI-365] Refactor hudi-cli based on new ImportOrder code style rule (apache#1076)

[checkstyle] Add ConstantName java checkstyle rule (apache#1066)

* add SimplifyBooleanExpression java checkstyle rule
* collapse empty tags in scalastyle file

[HUDI-378] Refactor the rest codes based on new ImportOrder code style rule (apache#1078)

[HUDI-379] Refactor the codes based on new JavadocStyle code style rule (apache#1079)

[MINOR] add *.log to .gitignore file (apache#1086)

[HUDI-353] Add hive style partitioning path

[MINOR] Beautify the cli banner (apache#1089)

* Add one empty line
* replace Cli to CLI
* replace Hoodie to Apache Hudi

[checkstyle] Unify LOG form (apache#1092)

[HUDI-390] Add backtick character in hive queries to support hive identifier as tablename (apache#1090)

[HUDI-387] Fix NPE when create savepoint via hudi-cli (apache#1085)

[HUDI-368] code clean up in TestAsyncCompaction class (apache#1050)

[MINOR] Remove redundant plus operator (apache#1097)

[MINOR] replace scala map add operator (apache#1093)

replace ++: with ++

[MINOR] Unify Lists import (apache#1103)

[HUDI-398]Add spark env set/get for spark launcher (apache#1096)

[HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset

[MINOR] Add slack invite icon in README (apache#1108)

[HUDI-106] Adding support for DynamicBloomFilter (apache#976)

- Introduced configs for bloom filter type
- Implemented dynamic bloom filter with configurable max number of keys
- BloomFilterFactory abstractions; Defaults to current simple bloom filter

[HUDI-415] Get commit time when Spark start (apache#1113)

[HUDI-386] Refactor hudi scala checkstyle rules (apache#1099)

[HUDI-444] Refactor the codes based on scala codestyle ReturnChecker rule (apache#1121)

[HUDI-311] : Support for AWS Database Migration Service in DeltaStreamer

 - Add a transformer class, that adds `Op` fiels if not found in input frame
 - Add a payload implementation, that issues deletes when Op=D
 - Remove Parquet as a top level source type, consolidate with RowSource
 - Made delta streamer work without a property file, simply using overridden cli options
 - Unit tests for transformer/payload classes

Fix Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string in HoodieCopyOnWrite#deleteFilesFunc (apache#1126)

same link in apache#771
this time is in HoodieCopyOnWrite deleteFilesFunc method

[MINOR] Set info servity for ImportOrder temporarily (apache#1127)

- Now we need fix import check error manually, disable the rule temporarily before finding a better solution.

[MINOR] fix typo

[MINOR] fix typo

[minor] Fix few typos in the java docs (apache#1132)

[HUDI-389] Fixing Index look up to return right partitions for a given key along with fileId with Global Bloom (apache#1091)

* Fixing Index look up to return partitions for a given key along with fileId with Global Bloom
* Addressing some of the comments
* Fixing test in TestHoodieGlobalBloomIndex to test the fix

[HUDI-416] Improve hint information for cli (apache#1110)

[MINOR] fix typos

[MINOR] optimize hudi timeline service (apache#1137)

[HUDI-470] Fix NPE when print result via hudi-cli (apache#1138)

[MINOR] typo fix (apache#1142)

[MINOR] Update the java doc of HoodieTableType (apache#1148)

[HUDI-453] Fix throw failed to archive commits error when writing data to MOR/COW table

Fix checkstyle

Skip setting commit metadata

Fix empty content clean plan

Update comment

[MINOR]: alter some wrong params which bring fatal exception

[HUDI-482] Fix missing @OverRide annotation on methods (apache#1156)

* [HUDI-482] Fix missing @OverRide annotation on methods

[HUDI-455] Redo hudi-client log statements using SLF4J (apache#1145)

* [HUDI-455] Redo hudi-client log statements using SLF4J

[MINOR] Fix out of limits for results

[MINOR] Fix out of limits for results

Clean up code

[HUDI-343]: Create a DOAP file for Hudi

[HUDI-402]: code clean up in test cases

[MINOR] Fix error usage of String.format (apache#1169)

[HUDI-492]Fix show env all in hudi-cli

[HUDI-118]: Options provided for passing properties to Cleaner, compactor and importer commands

[MINOR] Optimize hudi-cli module (apache#1136)

[MINOR]Optimize hudi-client module (apache#1139)

[HUDI-377] Adding Delete() support to DeltaStreamer (apache#1073)

- Provides ability to perform hard deletes by writing delete marker records into the source data
- if the record contains a special field _hoodie_delete_marker set to true, deletes are performed

[HUDI-484] Fix NPE when reading IncrementalPull.sqltemplate in HiveIncrementalPuller (apache#1167)

Revert "[HUDI-455] Redo hudi-client log statements using SLF4J (apache#1145)" (apache#1181)

This reverts commit e637d9e.

[HUDI-438] Merge duplicated code fragment in HoodieSparkSqlWriter (apache#1114)

[HUDI-406]: added default partition path in TimestampBasedKeyGenerator

[HUDI-501] Execute docker/setup_demo.sh in any directory

[HUDI-405] Remove HIVE_ASSUME_DATE_PARTITION_OPT_KEY config from DataSource

[HUDI-464] Use Hive Exec Core for tests (apache#1125)

[HUDI-417] Refactor HoodieWriteClient so that commit logic can be shareable by both bootstrap and normal write operations (apache#1166)

[HUDI-508] Standardizing on "Table" instead of "Dataset" across code (apache#1197)

- Docs were talking about storage types before, cWiki moved to "Table"
 - Most of code already has HoodieTable, HoodieTableMetaClient - correct naming
 - Replacing renaming use of dataset across code/comments
 - Few usages in comments and use of Spark SQL DataSet remain unscathed

[MINOR] Remove old jekyll config file (apache#1198)

Update deprecated HBase API

[HUDI-319] Add a new maven profile to generate unified Javadoc for all Java and Scala classes (apache#1195)

* Add javadoc build command in README, links to javadoc plugin and rename profile.
* Make java version configurable in one place.

[HUDI-25] Optimize HoodieInputformat.listStatus() for faster Hive incremental queries on Hoodie

    Summary:
    - InputPathHandler class classifies  inputPaths into incremental, non incremental and non hoodie paths.
    - Incremental queries leverage HoodieCommitMetadata to get partitions that are affected and only lists those partitions as opposed to listing all partitions
    - listStatus() processes each category separately

[HUDI-331]Fix java docs for all public apis in HoodieWriteClient (apache#1111)

[HUDI-114]: added option to overwrite payload implementation in hoodie.properties file

[HUDI-248] CLI doesn't allow rolling back a Delta commit

[HUDI-469] Fix: HoodieCommitMetadata only show first commit insert rows.

[CLEAN] replace utf-8 constant with StandardCharsets.UTF_8

[MINOR] Fix partition typo (apache#1209)

[HUDI-522] Use the same version jcommander uniformly (apache#1214)

[HUDI-91][HUDI-12]Migrate to spark 2.4.4, migrate to spark-avro library instead of databricks-avro, add support for Decimal/Date types

- Upgrade Spark to 2.4.4, Parquet to 1.10.1, Avro to 1.8.2
- Remove spark-avro from hudi-spark-bundle. Users need to provide --packages org.apache.spark:spark-avro:2.4.4 when running spark-shell or spark-submit
- Replace com.databricks:spark-avro with org.apache.spark:spark-avro
- Shade avro in hudi-hadoop-mr-bundle to make sure it does not conflict with hive's avro version.

[HUDI-322] DeltaSteamer should pick checkpoints off only deltacommits for MOR tables

[HUDI-502] provide a custom time zone definition for TimestampBasedKeyGenerator (apache#1188)

[HUDI-526] fix the HoodieAppendHandle

[MINOR] Reuse random object (apache#1222)

Fix conversion of Spark struct type to Avro schema

cr https://code.amazon.com/reviews/CR-17184364

[MINOR] Refactor unnecessary boxing inside TypedProperties code (apache#1227)

Adding util methods to assist in adding deletion support to Quick Start

Fixing delete util method

Fixing checkstyle issues

[MINOR] Fix redundant judgment statement (apache#1231)

[HUDI-335] Improvements to DiskBasedMap used by ExternalSpillableMap, for write and random/sequential read paths, by introducing bufferedRandmomAccessFile

Add GlobalDeleteKeyGenerator

Adds new GlobalDeleteKeyGenerator for record_key deletes with global indices. Also refactors key generators into their own package.

[MINOR] Make constant fields final in HoodieTestDataGenerator (apache#1234)

[MINOR] Fix missing @OverRide annotation on BufferedRandomAccessFile method (apache#1236)

[HUDI-509] Renaming code in sync with cWiki restructuring (apache#1212)

 - Storage Type replaced with Table Type (remaining instances)
 - View types replaced with query types;
 - ReadOptimized view referred as Snapshot Query
 - TableFileSystemView sub interfaces renamed to BaseFileOnly and Slice Views
 - HoodieDataFile renamed to HoodieBaseFile
 - Hive Sync tool will register RO tables for MOR with a `_ro` suffix
 - Datasource/Deltastreamer options renamed accordingly
 - Support fallback to old config values as well, so migration is painless
 - Config for controlling _ro suffix addition
 - Renaming DataFile to BaseFile across DTOs, HoodieFileSlice and AbstractTableFileSystemView

[HUDI-537] Introduce `repair overwrite-hoodie-props` CLI command (apache#1241)

[HUDI-527] scalastyle-maven-plugin moved to pluginManagement as it is only used in hoodie-spark and hoodie-cli modules.

This fixes compile warnings as well as unnecessary plugin invocation for most of the modules which do not have scala code.

[HUDI-535] Ensure Compaction Plan is always written in .aux folder to avoid 0.5.0/0.5.1 reader-writer compatibility issues (apache#1229)

[HUDI-238] Make Hudi support Scala 2.12 (apache#1226)

* [HUDI-238] Rename scala related artifactId & add maven profile to support Scala 2.12

[MINOR] Add toString method to TimelineLayoutVersion to make it more readable (apache#1244)

[MINOR] Fix PMC in DOAP] (apache#1247)

[HUDI-552] Fix the schema mismatch in Row-to-Avro conversion (apache#1246)

[HUDI-551] Abstract a test case class for DFS Source to make it extensible (apache#1239)

[HUDI-556] Add lisence for PR#1233

[HUDI-559] : Make the timeline layout version default to be null version

Moving to 0.5.2-SNAPSHOT on master branch.

[MINOR] Download KEYS file when validating release candidate (apache#1259)

[MINOR] Update the javadoc of HoodieTableMetaClient#scanFiles (apache#1263)

[MINOR] Update the javadoc of HoodieTableMetaClient#scanFiles

[MINOR] Fix invalid maven repo address (apache#1265)

[MINOR] Change deploy_staging_jars script to take in scala version (apache#1269)

[MINOR] Change deploy_staging_jars script to take in scala version (apache#1270)

[MINOR] Add missing licenses (apache#1271)

[MINOR] fix license issue (apache#1273)

[HUDI-549] update Github README with instructions to build with Scala 2.12 (apache#1275)

[MINOR] Fix missing groupId / version property of dependency

[MINOR] Fix invalid issue url & quickstart url (apache#1282)

[MINOR] Remove junit-dep dependency

[MINOR] Fix assigning to configuration more times (apache#1291)

HUDI-117 Close file handle before throwing an exception due to append failure.
Add test cases to handle/verify stage failure scenarios.

[HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator (apache#1281)

* [HUDI-578] Trim recordKeyFields and partitionPathFields in ComplexKeyGenerator

* add tests

[HUDI-564] Added new test cases for HoodieLogFormat and HoodieLogFormatVersion.

[HUDI-583] Code Cleanup, remove redundant code, and other changes (apache#1237)

[MINOR] Updated DOAP with 0.5.1 release (apache#1300)

[MINOR] Updated DOAP with 0.5.1 release (apache#1301)

Increase test coverage for HoodieReadClient

[HUDI-596] Close KafkaConsumer every time (apache#1303)

[HUDI-595] code cleanup, refactoring code out of PR# 1159 (apache#1302)

[HUDI-566] Added new test cases for class HoodieTimeline, HoodieDefaultTimeline and HoodieActiveTimeline.

[HUDI-585] Optimize the steps of building with scala-2.12 (apache#1293)

[MINOR] Remove the declaration of thrown RuntimeException (apache#1305)

[HUDI-499] Allow update partition path with GLOBAL_BLOOM (apache#1187)

* Handle partition path update by deleting a record from the old partition and
  insert into the new one
* Add a new configuration "hoodie.bloom.index.update.partition.path" to
  enable the behavior
* Add a new unit test case for global bloom index

[HUDI-571] Add 'commits show archived' command to CLI

[HUDI-570] - Improve test coverage for FSUtils.java

[HUDI-587] Fixed generation of jacoco coverage reports.

surefire plugin's argLine is moved into a property. This configuration allows jacoco plugin to modify the argLine to insert it's Java Agent's configuration during pre-unit-test stage.

[HUDI-560] Remove legacy IdentityTransformer (apache#1264)

[HUDI-582] Update NOTICE year

[HUDI-478] Fix too many files with unapproved license when execute build_local_docker_images (apache#1323)

[HUDI-605] Avoid calculating the size of schema redundantly (apache#1317)

CLI - add option to print additional commit metadata

[HUDI-574] Fix CLI counts small file inserts as updates (apache#1321)

[MINOR] Fix typo (apache#1331)

[HUDI-514] A schema provider to get metadata through Jdbc (apache#1200)

[HUDI-571] Add show archived compaction(s) to CLI

[MINOR] Fix some typos

[MINOR] Code Cleanup, remove redundant code (apache#1337)

[HUDI-615]: Add some methods and test cases for StringUtils. (apache#1338)

[HUDI-108] Removing 2GB spark partition limitations in HoodieBloomIndex with spark 2.4.4 (apache#1315)

[MINOR] Add javadoc to SchedulerConfGenerator and code clean (apache#1340)

[HUDI-617] Add support for types implementing CharSequence (apache#1339)

- Data types extending CharSequence implement a #toString method which provides an easy way to convert them to String.
- For example, org.apache.avro.util.Utf8 is easily convertible into String if we use the toString() method. It's better to make the support more generic to support a wider range of data types as partitionKey.

[HUDI-622]: Remove VisibleForTesting annotation and import from code (apache#1343)

* HUDI:622: Remove VisibleForTesting annotation and import from code

Refactoring getter to avoid double extrametadata in json representation

[HUDI-624]: Split some of the code from PR for HUDI-479 (apache#1344)

[HUDI-597] Enable incremental pulling from defined partitions (apache#1348)

[HUDI-625] Fixing performance issues around DiskBasedMap & kryo (apache#1352)

[HUDI-580] Fix incorrect license header in files

Added cloudera profile

Added cloudera profile
removed  hudi-integ-test
rebased from apache hudi mater
lyogev pushed a commit to YotpoLtd/incubator-hudi that referenced this pull request Mar 30, 2020
… avoid 0.5.0/0.5.1 reader-writer compatibility issues (apache#1229)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants