[HUDI-6220] Add HUDI code version to commit files and hoodie.properties.#8724
[HUDI-6220] Add HUDI code version to commit files and hoodie.properties.#8724prashantwason wants to merge 6 commits intoapache:masterfrom
Conversation
|
will review this tomorrow. |
bvaradar
left a comment
There was a problem hiding this comment.
@prashantwason : Overall, looks good. Few questions.
| } else { | ||
| LOG.warn("Unable to find driver bind address from spark config"); | ||
| this.hostAddr = NetworkUtils.getHostname(); | ||
| this.hostAddr = NetworkUtils.getHostAddr(); |
There was a problem hiding this comment.
What is the rational for this change ?
There was a problem hiding this comment.
The NetworkUtils.getHostname function is returning the IP address. So renamed it to getHostAddr which is more appropriate.
There was a problem hiding this comment.
Also, I added the getHostName function which returns the hostname
| @Override | ||
| public Map<String, String> getInfo() { | ||
| final Map<String, String> info = new HashMap<>(); | ||
| System.getProperties().stringPropertyNames().forEach(property -> info.put(property, System.getProperty(property))); |
There was a problem hiding this comment.
Instead of blindly copying all properties, we should do this selectively, as there may be sensitive information here.
There was a problem hiding this comment.
yes. we should be judicious here
There was a problem hiding this comment.
These files are within the dataset itself. The dataset would contain sensitive information too. So I dont see the issue.
Anyways, I will filter them.
| return parts; | ||
| } | ||
|
|
||
| public static void main(String[] args) { |
| @Override | ||
| public Map<String, String> getInfo() { | ||
| final Map<String, String> info = new HashMap<>(); | ||
| System.getProperties().stringPropertyNames().forEach(property -> info.put(property, System.getProperty(property))); |
There was a problem hiding this comment.
yes. we should be judicious here
| info.put("spark.defaultParallelism", String.valueOf(javaSparkContext.defaultParallelism())); | ||
| info.put("spark.defaultMinPartitions", String.valueOf(javaSparkContext.defaultMinPartitions())); | ||
| info.put("spark.executor.instances", String.valueOf(javaSparkContext.getConf().get("spark.executor.instances"))); | ||
| return info; |
There was a problem hiding this comment.
should we also let users configure any hoodie write configs to be added to the extra metadata. since we don't serialize them anywhere, sometimes might come in handy during investigations. by default we don't need to add any hoodie write configs. but if user configures them, we can add them as well
There was a problem hiding this comment.
Simply easier to add the entire hoodie config (without schema).
| if (isValidChecksum(props)) { | ||
| final boolean isValidChecksum = isValidChecksum(props); | ||
| final String comment = String.format("Date=%s, host=%s, #properties=%d, hudi_version=%s", | ||
| Instant.now(), NetworkUtils.getHostname(), isValidChecksum ? props.size() : props.size() + 1, HoodieVersion.get()); |
There was a problem hiding this comment.
if you modified NetworkUtils.getHostname() just for this purpose, lets keep the existing method and add a new one.
There was a problem hiding this comment.
NetworkUtils.getHostname() does not return hostname - it returns IP address. So this is a fix.
|
Hi @prashantwason could you check the build failure? |
|
@hudi-bot run azure |
92d5c03 to
315d808
Compare
|
@hudi-bot run azure |
7b99dac to
8525623
Compare
|
@hudi-bot run azure |
|
@prashantwason : Can you fix the conflictx and test so that we can land this ? |
[HUDI-6220] Add HUDI code version to commit files and hoodie.properties.
Change Logs
Impact
More debugging information available
Risk level (write none, low medium or high below)
None
Documentation Update
None
Contributor's checklist