Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-4558] lost 'hoodie.table.keygenerator.class' in hoodie.properties #6320

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,17 @@
import org.apache.hudi.config.HoodieIndexConfig;
import org.apache.hudi.config.HoodieWriteConfig;
import org.apache.hudi.exception.HoodieException;
import org.apache.hudi.exception.HoodieKeyGeneratorException;
import org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor;
import org.apache.hudi.index.HoodieIndex;
import org.apache.hudi.keygen.ComplexAvroKeyGenerator;
import org.apache.hudi.keygen.CustomAvroKeyGenerator;
import org.apache.hudi.keygen.GlobalAvroDeleteKeyGenerator;
import org.apache.hudi.keygen.NonpartitionedAvroKeyGenerator;
import org.apache.hudi.keygen.TimestampBasedAvroKeyGenerator;
import org.apache.hudi.keygen.constant.KeyGeneratorOptions;
import org.apache.hudi.keygen.constant.KeyGeneratorType;
import org.apache.hudi.keygen.SimpleAvroKeyGenerator;

import org.apache.flink.configuration.ConfigOption;
import org.apache.flink.configuration.ConfigOptions;
Expand All @@ -42,6 +49,7 @@
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Set;

Expand Down Expand Up @@ -570,7 +578,7 @@ private FlinkOptions() {
.stringType()
.defaultValue(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
.withDescription("Clean policy to manage the Hudi table. Available option: KEEP_LATEST_COMMITS, KEEP_LATEST_FILE_VERSIONS, KEEP_LATEST_BY_HOURS."
+ "Default is KEEP_LATEST_COMMITS.");
+ "Default is KEEP_LATEST_COMMITS.");

public static final ConfigOption<Integer> CLEAN_RETAIN_COMMITS = ConfigOptions
.key("clean.retain_commits")
Expand All @@ -580,12 +588,12 @@ private FlinkOptions() {
+ "This also directly translates into how much you can incrementally pull on this table, default 30");

public static final ConfigOption<Integer> CLEAN_RETAIN_HOURS = ConfigOptions
.key("clean.retain_hours")
.intType()
.defaultValue(24)// default 24 hours
.withDescription("Number of hours for which commits need to be retained. This config provides a more flexible option as"
+ "compared to number of commits retained for cleaning service. Setting this property ensures all the files, but the latest in a file group,"
+ " corresponding to commits with commit times older than the configured number of hours to be retained are cleaned.");
.key("clean.retain_hours")
.intType()
.defaultValue(24)// default 24 hours
.withDescription("Number of hours for which commits need to be retained. This config provides a more flexible option as"
+ "compared to number of commits retained for cleaning service. Setting this property ensures all the files, but the latest in a file group,"
+ " corresponding to commits with commit times older than the configured number of hours to be retained are cleaned.");

public static final ConfigOption<Integer> CLEAN_RETAIN_FILE_VERSIONS = ConfigOptions
.key("clean.retain_file_versions")
Expand Down Expand Up @@ -875,6 +883,33 @@ public static <T> boolean isDefaultValueDefined(Configuration conf, ConfigOption
|| conf.get(option).equals(option.defaultValue());
}

public static String getKeyGenClassNameByType(Configuration conf) {
String genType = conf.get(FlinkOptions.KEYGEN_TYPE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can giving the option KEYGEN_CLASS_NAME a default value: SimpleAvroKeyGenerator
solves your problem ?

Copy link
Contributor Author

@wuwenchi wuwenchi Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the KEYGEN_CLASS_NAME default value, it does solve the problem of not showing.

However, in the original logic, if KEYGEN_CLASS_NAME is not set, KEYGEN_TYPE will be read.
Therefore, if the default value is set for KEYGEN_CLASS_NAME, the KEYGEN_TYPE configuration is equivalent to completely useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why we still always set up the param KEYGEN_CLASS_NAME in this patch for SQL then ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If KEYGEN_CLASS_NAME is not set, this field will not be displayed in the table properties.
KEYGEN_TYPE is equivalent to an alias of KEYGEN_CLASS_NAME, so KEYGEN_TYPE can be read here to initialize KEYGEN_CLASS_NAME. And the default value of KEYGEN_TYPE is SIMPLE, so KEYGEN_CLASS_NAME will be set to SimpleAvroKeyGenerator by default.

According to the above logic, if the user does not set KEYGEN_CLASS_NAME, KEYGEN_CLASS_NAME can be replaced by setting KEYGEN_TYPE, and after setting, KEYGEN_CLASS_NAME will also be stored in the table propeties. In this way, the function of KEYGEN_TYPE not only conforms to the original logic, but also conforms to the description in the document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what are we fixing here if KEYGEN_TYPE default value is set up and a keygen clazz can be inferred correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no current process, then KEYGEN_CLASS_NAME will not be assigned, and there will be no KEYGEN_CLASS_NAME property in table properties.
If you set the default value of KEYGEN_CLASS_NAME as you said above, then the KEYGEN_TYPE parameter will be invalid.
Therefore, I initialized KEYGEN_CLASS_NAME according to KEYGEN_TYPE, so that not only can the value of KEYGEN_CLASS_NAME be saved in table properties, but KEYGEN_TYPE can also take effect normally according to the original logic.

KeyGeneratorType typeEnum;

try {
typeEnum = KeyGeneratorType.valueOf(genType.toUpperCase(Locale.ROOT));
} catch (IllegalArgumentException e) {
throw new HoodieKeyGeneratorException("Unsupported keyGenerator Type " + genType);
}
switch (typeEnum) {
case SIMPLE:
return SimpleAvroKeyGenerator.class.getName();
case COMPLEX:
return ComplexAvroKeyGenerator.class.getName();
case TIMESTAMP:
return TimestampBasedAvroKeyGenerator.class.getName();
case CUSTOM:
return CustomAvroKeyGenerator.class.getName();
case NON_PARTITION:
return NonpartitionedAvroKeyGenerator.class.getName();
case GLOBAL_DELETE:
return GlobalAvroDeleteKeyGenerator.class.getName();
default:
throw new HoodieKeyGeneratorException("Unsupported keyGenerator Type " + genType);
}
}

/**
* Returns all the optional config options.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,11 @@ private static void setupHoodieKeyOptions(Configuration conf, CatalogTable table
conf.setString(FlinkOptions.KEYGEN_CLASS_NAME, ComplexAvroKeyGenerator.class.getName());
LOG.info("Table option [{}] is reset to {} because record key or partition path has two or more fields",
FlinkOptions.KEYGEN_CLASS_NAME.key(), ComplexAvroKeyGenerator.class.getName());
} else if (!conf.getOptional(FlinkOptions.KEYGEN_CLASS_NAME).isPresent()) {
String keyGenName = FlinkOptions.getKeyGenClassNameByType(conf);
conf.setString(FlinkOptions.KEYGEN_CLASS_NAME, keyGenName);
LOG.info("Table option [{}] is reset to {} because of {}",
FlinkOptions.KEYGEN_CLASS_NAME.key(), keyGenName, FlinkOptions.KEYGEN_TYPE);
wuwenchi marked this conversation as resolved.
Show resolved Hide resolved
}
}

Expand Down