New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cassandra support to blocksconvert #3795
Add Cassandra support to blocksconvert #3795
Conversation
Thanks for your PR. I'm not very familiar with Cassandra, but I think your approach is correct. To handle possible timeouts, I'd suggest to first buffer all index entries for given hash, and only once next hash is found, pass the buffered entries to the processor (one by one). This assumes that all entries for one hash are returned before entries for another hash – I'm not sure from the code if that's the case or not, but I believe that is requirement for Comment for I hope this helps. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know Cassandra either, but it seems plausible.
Re timeouts, I guess you should re-try the same query a few times?
I can't see how any buffering would be necessary; as long as all values for the same hash are delivered to one processor before moving to another hash that's enough.
I did a DynamoDB port at #3362
for ix := range processors { | ||
p := processors[ix] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be for _, p := range processors
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from the bigtable reader, and it crashes with a nil pointer deference if I change to for _, p := range processors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, you probably tripped over this problem: https://golang.org/doc/faq#closures_and_goroutines
to which your solution is ok, alternatively p := p
after my suggested change.
Either way a comment pointing to the FAQ would help readers recall why such gyrations are necessary.
"github.com/cortexproject/cortex/pkg/chunk/cassandra" | ||
) | ||
|
||
const nbTokenRanges = 512 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a fixed number, rather than being related to the number of processors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, added some comments for the mystery value. Cassandra doesn't like having the entire table scanned, so I split it up in many chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cassandra doesn't like having the entire table scanned, so I split it up in many chunks.
Interesting. I've recently written code scanning entire Cassandra table with hundreds of millions of entries, and haven't run into any issues. Select was very simple: SELECT <columns> FROM <table>
, with no other conditions. It took a while to process all the results (as expected), but it worked.
|
||
var query string | ||
|
||
query = fmt.Sprintf("SELECT hash, range, value FROM %s WHERE token(hash) >= %v", tableName, rng.start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the value is used for series index entries. Small optimisation.
784a5fa
to
67adb1b
Compare
Signed-off-by: Benjamin Charron <benjamin.charron@ubisoft.com>
67adb1b
to
756f25d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing for the fact that I don't know Cassandra, I think we could merge this.
Thanks for the PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This is early work on a Cassandra scanner for blocksconvert. I'm looking for comments to help me shape this commit. Among other things, I didn't want to duplicate the code that sets up the Cassandra connections, but wasn't sure how to go about re-using it, hence the
GetReadSession()
:(There are no tests as both functions depend on Cassandra, and I wasn't sure if I should create a Cas mock interface.
I copied the BigTable scanning logic, but it doesn't react well when Cassandra times out during scanning, and I'm unsure how to handle that case.
Please have a look and let me know what sucks about it this code.
Signed-off-by: Benjamin Charron benjamin.charron@ubisoft.com
What this PR does:
Adds support for scanning Cassandra indexes in tools/blocksconvert
Which issue(s) this PR fixes:
N/A
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]