-
Notifications
You must be signed in to change notification settings - Fork 792
SOLR-16286 : Topic stream not honoring initialCheckpoint in getPersis… #935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
./gradlew check passed except: ` Verify... .../solr/solr/documentation/build/site/modules/jwt-auth/org/apache/solr/handler/admin/api/ModifyJWTAuthPluginConfigAPI.html Broken javadocs links were found! Common root causes:
|
cpoerschke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @danrosher for opening this PR!
solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamDecoratorTest.java
Show resolved
Hide resolved
solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamExpressionTest.java
Outdated
Show resolved
Hide resolved
solr/solrj/src/test/org/apache/solr/client/solrj/io/stream/StreamExpressionTest.java
Outdated
Show resolved
Hide resolved
#944 sounds like it might be related. |
…eamExpressionTest.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
…eamExpressionTest.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
| expr = | ||
| "classify(" | ||
| + | ||
| // use cacheMillis=0 to prevent cached results. it doesn't matter on the first run, | ||
| // but we want to ensure that when we re-use this expression later after | ||
| // training another model, we'll still get accurate results. | ||
| "model(modelCollection, id=\"model\", cacheMillis=0)," | ||
| + "topic(checkpointCollection, uknownCollection, q=\"*:*\", fl=\"text_s, id\", id=\"1000000\")," | ||
| + "field=\"text_s\"," | ||
| + "analyzerField=\"tv_text\")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the same expression as above but without the initialCheckpoint=0 ... though reading the (current) docs that means "the highest version in the index" though if the highest version in the index was used then the first batch in the stream below would not include the documents just added with ids 2 and 3?
Wondering if the documentation needs tweaking to account for persisted checkpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkpoints are persisted when the stream is closed, or if checkpointEvery > -1 (and then every count % checkpointEvery), otherwise the checkpoints are stored in the checkpoints hashmap, so for 'just' added docs, I think as long as is it matches the underlying query, and those docs have been soft committed (see caveat for topicstream SOLR-8709), I think they should be picked up, unless I'm completely misunderstanding ?
…eamDecoratorTest.java Co-authored-by: Christine Poerschke <cpoerschke@apache.org>
|
Thoughts @cpoerschke and @danrosher on this being ready for merging? Seems like a valuable bug fix! Or is there still a lack of clarity on the right behavior? |
|
This PR passes the tests in StreamDecoratorTest and StreamExpressionTest so
I believe it's ready to merge
…On Wed, 17 Aug 2022 at 12:26, Eric Pugh ***@***.***> wrote:
Thoughts @cpoerschke <https://github.com/cpoerschke> and @danrosher
<https://github.com/danrosher> on this being ready for merging? Seems
like a valuable bug fix! Or is there still a lack of clarity on the right
behavior?
—
Reply to this email directly, view it on GitHub
<#935 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACE6HP37J3K7TBZQAZ5EZ3LVZTD5FANCNFSM53IAXYQA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think the new behaviour makes sense. Am less sure w.r.t. the documentation and/or how to describe the changes in behaviour. @epugh - feel free to jump in if you wish. https://github.com/apache/solr/blob/5a5989e5b6164091243dd29cfe327b5eaac2cfbd/solr/solr-ref-guide/modules/query-guide/pages/stream-source-reference.adoc#topic-parameters currently says "If not set, it defaults to the highest version in the index." -- is that the highest version of documents in the index or is it alluding to checkpoints somehow? Would something like "If not set, it defaults to previously established checkpoints (if any) or otherwise the highest version in the index." be accurate and user-friendly? |
risdenk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good bug fix. I'm not sure why it stalled. @epugh @danrosher @cpoerschke maybe just docs?
|
i think some question about the docs. I like everything here ;-). I am going to add it to my "List of tickets to merge on Monday Oct 31st" so folks can weigh in ;-) |
|
@joel-bernstein I was going to merge this today, but I realized that might conflict with your work on seperating out the streaming code? Should I wait? Do you want to add this to your list of tickets to merge once that work is done? |
|
This PR had no visible activity in the past 60 days, labeling it as stale. Any new activity will remove the stale label. To attract more reviewers, please tag someone or notify the dev@solr.apache.org mailing list. Thank you for your contribution! |
|
This PR is now closed due to 60 days of inactivity after being marked as stale. Re-opening this PR is still possible, in which case it will be marked as active again. |
https://issues.apache.org/jira/browse/SOLR-16286
Description
getCheckpoints() does honor initialCheckpoint, but when stored, getPersistedCheckpoints which is processed first, does not. The effect is that initialCheckpoint=0 doesn't work as expected after it's stored.
Solution
Modify getPersistedCheckpoints to honor initialCheckpoint as it does in getCheckpoints
Tests
Added StreamExpressionTest.testTopicStreamInitialCheckpoint to do run a topic SE, then persist it, then re-run to ensure initialCheckpoint=0 is still honored, then again without initialCheckpoint=0 to ensure zero returned as up to date.,
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.