-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-28971][table-planner][doc] Adds user documentation for the new LOOKUP hint #20577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@flinkbot run azure |
godfreyhe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @lincoln-lil , I left some comments, and please add some doc to explain the relation of retry behavior with/without full cache/partial cache
| <tr> | ||
| <th rowspan="1">table</th> | ||
| <th>table</th> | ||
| <th>N</th> | ||
| <th>string</th> | ||
| <th>N/A</th> | ||
| <th>the table name of the lookup source</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<th> is used for table header, while <td> is used for table body.
| <th>value can be 'true' or 'false' to suggest the planner choose the corresponding lookup function. | ||
| If the backend lookup source does not support the suggested lookup mode, it will take no effect.</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text should align left
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| <th>N</th> | ||
| <th>string</th> | ||
| <th>N/A</th> | ||
| <th>the table name of the lookup source</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is full path of table name required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is consistent with from clause
| <tr> | ||
| <th>option type</th> | ||
| <th>option name</th> | ||
| <th>optional</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use required ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| ``` | ||
| Note: the optimizer prefers async lookup if no 'async' option is specified, it will always use sync lookup when: | ||
| 1. the connector only implements the sync lookup | ||
| 2. user enables 'TRY_RESOLVE' mode of 'table.optimizer.non-deterministic-update.strategy' and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any doc to explain them more? if existing, please add a link to refer it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there should be a configuration link
| Note: the async options are consistent with the async options in [job level Execution Options]({{< ref "docs/dev/table/config" >}}#execution-options), | ||
| will use job level configuration if not set. Another difference is that the scope of the LOOKUP hint | ||
| is smaller, limited to the table name corresponding to the hint option set in the current lookup | ||
| operation (other lookup operations will not be affected by the LOOKUP hint). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Another difference is that the scope of the LOOKUP hint
is smaller, limited to the table name corresponding to the hint option set in the current lookup
operation (other lookup operations will not be affected by the LOOKUP hint)."
It's better we should introduce select scope concept to explain it, which is more specific and precise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you that select scope is necessary for query hints, and I suggested @swuferhong introducing this concept in the begining of query hints in #20513
so this part will not add it again, but after #20351 merged, we can add a link here
| ##### 3. Enable Delayed Retry Strategy For Lookup | ||
| Delayed retry for lookup join is intended to solve the problem of delayed updates in external system | ||
| which cause unexpected enrichment with stream data. The hint option 'retry-predicate'='lookup_miss' | ||
| can enable retry on both sync and async lookup, only support fixed delay retry strategy currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only fixed delay retry strategy is supported currently.
…Caching On Retries'
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@godfreyhe thanks for your comments here! I've addressed these comments and add a section about 'Effect Of Enabling Caching On Retries'
| <tr> | ||
| <th>option type</th> | ||
| <th>option name</th> | ||
| <th>optional</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| <th>N</th> | ||
| <th>string</th> | ||
| <th>N/A</th> | ||
| <th>the table name of the lookup source</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this is consistent with from clause
| <th>value can be 'true' or 'false' to suggest the planner choose the corresponding lookup function. | ||
| If the backend lookup source does not support the suggested lookup mode, it will take no effect.</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| Note: the async options are consistent with the async options in [job level Execution Options]({{< ref "docs/dev/table/config" >}}#execution-options), | ||
| will use job level configuration if not set. Another difference is that the scope of the LOOKUP hint | ||
| is smaller, limited to the table name corresponding to the hint option set in the current lookup | ||
| operation (other lookup operations will not be affected by the LOOKUP hint). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you that select scope is necessary for query hints, and I suggested @swuferhong introducing this concept in the begining of query hints in #20513
so this part will not add it again, but after #20351 merged, we can add a link here
| ##### Further Notes | ||
|
|
||
| ###### Effect Of Enabling Caching On Retries | ||
| [FLIP-221](https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job) adds caching support for lookup source, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this url is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| When PARTIAL caching is enabled, it will lookup from local cache first for a coming record and will | ||
| do an external lookup via backend connector if cache miss(if cache hit, then return the record immediately), | ||
| and this will trigger a retry when lookup result is empty(same with caching disabled), the final lookup | ||
| result is determined when retry done(in PARTIAL caching mode, it will also update local cache). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the retry is complete ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to "the final lookup result is determined when retry completed"
| ```sql | ||
| SELECT /*+ LOOKUP(key=value[, key=value]*) */ | ||
|
|
||
| key: | ||
| stringLiteral | ||
|
|
||
| value: | ||
| stringLiteral | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a section ### Syntax
| - async options are all optional, will use default value if not configured. | ||
| - there is no default value for retry options, all retry options should be set to valid values when need to enable retry. | ||
|
|
||
| ##### 1. Use Sync And Async Lookup Function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the font is too small, it's better we can change the title levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 5th level is too small, seems we should not follow the sub levels strictly, I'll update this.
| stringLiteral | ||
| ``` | ||
|
|
||
| The available hint options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a section ### Lookuup Hint Options
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@godfreyhe I've addressed all your comments and updated the pr, thanks again!
| When PARTIAL caching is enabled, it will lookup from local cache first for a coming record and will | ||
| do an external lookup via backend connector if cache miss(if cache hit, then return the record immediately), | ||
| and this will trigger a retry when lookup result is empty(same with caching disabled), the final lookup | ||
| result is determined when retry done(in PARTIAL caching mode, it will also update local cache). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to "the final lookup result is determined when retry completed"
| stringLiteral | ||
| ``` | ||
|
|
||
| The available hint options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| - async options are all optional, will use default value if not configured. | ||
| - there is no default value for retry options, all retry options should be set to valid values when need to enable retry. | ||
|
|
||
| ##### 1. Use Sync And Async Lookup Function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, 5th level is too small, seems we should not follow the sub levels strictly, I'll update this.
|
@flinkbot run azure |
1 similar comment
|
@flinkbot run azure |
godfreyhe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update
| 2. configure the async parameters | ||
| 3. enable delayed retry strategy for lookup | ||
|
|
||
| ##### Syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Join Hints | ||
|
|
||
| #### LOOKUP Hint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about merge them into ### LOOKUP Join Hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "### Join Hints" here is reserved for merge new join hint doc, to keep consistent with other join hints, change "#### LOOKUP Hint" -> "#### LOOKUP"
| ``` | ||
| Note: the optimizer prefers async lookup if no 'async' option is specified, it will always use sync lookup when: | ||
| 1. the connector only implements the sync lookup | ||
| 2. user enables 'TRY_RESOLVE' mode of ['table.optimizer.non-deterministic-update.strategy']({{ < ref "docs/dev/table/config" > }}#table-optimizer-non-deterministic-update-strategy) and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| LOOKUP('table'='Customers', 'async'='true', 'output-mode'='allow_unordered', 'capacity'='100', 'timeout'='180s') | ||
| ``` | ||
| 注意:联接提示上的异步查找参数和[作业级别配置参数]({{< ref "docs/dev/table/config" >}}#execution-options)的 | ||
| 含义是一致的,没有设置的参数值由默认值生效,另一个区别是联接提示作用的范围更小,仅限于当前联接操作中对应联接提示选项设置的表名(未被联接提示作用的其他联接查询不受影响)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need new line here
| 缓存策略有部分缓存、全部缓存两种,当开启全部缓存时('lookup.cache'='FULL'),重试无法起作用(因为查找表被完整缓存,重试查找没有任何实际意义);当开启部分缓存时,当一条数据开始查找处理时, | ||
| 先在本地缓存中查找,如果没找到则通过连接器进行外部查找(如果存在,则立即返回),此时查不到的记录和不开启缓存时一样,会触发重试查找,重试结束时的结果即为最终的查找结果(在部分缓存模式下,更新本地缓存)。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当开启全部缓存时 -> 开启全部缓存时
当开启部分缓存时 -> 开启部分缓存时 otherwise, there are two 当s
|
|
||
| #### 关于查找键及 'retry-predicate'='lookup_miss' 重试条件的说明 | ||
| 对不同的连接器,提供的索引查找能力可能是不同的,例如内置的 HBase 连接器,默认仅提供了基于 `rowkey` 的索引查找能力(未 | ||
| 启用二级索引),而对于内置的 JDBC 连接器,默认情况下任何字段都可以被用作索引查找,这是物理存储的特性不同所决定的。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@godfreyhe thanks for your detailed comments! I've updated the pr.
| ### Join Hints | ||
|
|
||
| #### LOOKUP Hint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "### Join Hints" here is reserved for merge new join hint doc, to keep consistent with other join hints, change "#### LOOKUP Hint" -> "#### LOOKUP"
| 2. configure the async parameters | ||
| 3. enable delayed retry strategy for lookup | ||
|
|
||
| ##### Syntax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@flinkbot run azure |
godfreyhe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM


What is the purpose of the change
This is the user documentation work for the new LOOKUP hint, includes Chinese version
Brief change log
add user documentation for the new LOOKUP hint in hints.md page
Does this pull request potentially affect one of the following parts:
Documentation