Conversation
| config.getString(Config.TIMESTAMP_INPUT_DATE_FORMAT_PROP)); | ||
| this.inputDateFormat.setTimeZone(TimeZone.getTimeZone("GMT")); | ||
| } | ||
|
|
There was a problem hiding this comment.
@afilipchik : Can you document some examples of how this is configured and used
| .map( | ||
| recordKeyField -> | ||
| DataSourceUtils.getNestedFieldValAsString(record, recordKeyField)) | ||
| .collect(Collectors.joining(".")), |
There was a problem hiding this comment.
We are joining by "." here. Is there any assumption here ? Wanted to see if this is generic enough to be put it in this class.
There was a problem hiding this comment.
I think it can be anything.
HoodieKey(String recordKey, String partitionPath)
so I'm just creating a long key: bla.bla1.bla2. Can be bla:bla1:bla2
There was a problem hiding this comment.
Hi I think this kind of key creates ambiguity refer this PR for details #728
There was a problem hiding this comment.
@bvaradar any thoughts? lets make a call on this PR?
There was a problem hiding this comment.
@jaimin-shah : Wondering if there is a necessity to decode the generated record key. If this is a one-way concatenation of fields to record-key, it should be fine. right ? As we are storing individual fields that constitute the record key separately, there wont be any need to decode this record key and ambiguity during decoding should be ok. Isn't it ?
There was a problem hiding this comment.
@bvaradar By ambiguity I meant two different records having same key. For example US, .A => US..A and US. , A => US..A Keeping recordKeyField as part of key resolves this. Although I agree these kind of cases are quite rare.
There was a problem hiding this comment.
Makes sense @jaimin-shah. Didn't think about this possibility. @afilipchik : Based on this information, this may not be safe in general sense.
There was a problem hiding this comment.
Hey, not sure I understood the edge case example. Here is how we are using it:
We have an object with id and a version. Version suppose to monotonically increase, and we want to be able to dedub based on key and the version (there should be at most 1 object with the same id and version).
Our config looks like:
hoodie.datasource.write.recordkey.field=object.id,object.version for insert table and
What can go wrong with this one?
There was a problem hiding this comment.
Hi @afilipchik for your use case I don't think there will be any problem but there can be problem when there no restrictions on recordKeyField ( e.g. Version suppose to monotonically increase ). You can take a look at https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/ComplexKeyGenerator.java it has generic implementation of complex key for quite similar requirement.
vinothchandar
left a comment
There was a problem hiding this comment.
Also can you please file a JIRA for this and include it in the PR title as documented here https://hudi.apache.org/contributing.html#lifecycle
|
@bvaradar @afilipchik can we make a call on this? |
|
Closing this for now due to inactivity. @afilipchik, please open it if you think otherwise. |
…e to updated runner image (apache#862)
No description provided.