Add Cassandra support to blocksconvert #3795

ubcharron · 2021-02-05T18:41:35Z

This is early work on a Cassandra scanner for blocksconvert. I'm looking for comments to help me shape this commit. Among other things, I didn't want to duplicate the code that sets up the Cassandra connections, but wasn't sure how to go about re-using it, hence the GetReadSession() :(

There are no tests as both functions depend on Cassandra, and I wasn't sure if I should create a Cas mock interface.

I copied the BigTable scanning logic, but it doesn't react well when Cassandra times out during scanning, and I'm unsure how to handle that case.

Please have a look and let me know what sucks about it this code.

Signed-off-by: Benjamin Charron benjamin.charron@ubisoft.com

What this PR does:
Adds support for scanning Cassandra indexes in tools/blocksconvert

Which issue(s) this PR fixes:
N/A

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany · 2021-02-09T09:15:11Z

Thanks for your PR. I'm not very familiar with Cassandra, but I think your approach is correct.

To handle possible timeouts, I'd suggest to first buffer all index entries for given hash, and only once next hash is found, pass the buffered entries to the processor (one by one). This assumes that all entries for one hash are returned before entries for another hash – I'm not sure from the code if that's the case or not, but I believe that is requirement for IndexEntryProcessor to work correctly.

Comment for IndexEntryProcessor says that ALL entries for Hash + Range pair must all arrive first, but I'm not sure if that's correct – there can only be single entry for given Hash and Range. I think processor requires all entries for given Hash instead to arrive in the sequence. Reason is that IndexEntryProcessor is interested in finding all chunks for given series, and series info is encoded in Hash, while chunk info is encoded in Range.

I hope this helps.

bboreham

I don't know Cassandra either, but it seems plausible.

Re timeouts, I guess you should re-try the same query a few times?

I can't see how any buffering would be necessary; as long as all values for the same hash are delivered to one processor before moving to another hash that's enough.

I did a DynamoDB port at #3362

bboreham · 2021-02-25T17:57:08Z

tools/blocksconvert/scanner/cassandra_index_reader.go

+	for ix := range processors {
+		p := processors[ix]


This can be for _, p := range processors.

I copied it from the bigtable reader, and it crashes with a nil pointer deference if I change to for _, p := range processors

Aha, you probably tripped over this problem: https://golang.org/doc/faq#closures_and_goroutines
to which your solution is ok, alternatively p := p after my suggested change.
Either way a comment pointing to the FAQ would help readers recall why such gyrations are necessary.

bboreham · 2021-02-25T18:03:24Z

tools/blocksconvert/scanner/cassandra_index_reader.go

+	"github.com/cortexproject/cortex/pkg/chunk/cassandra"
+)
+
+const nbTokenRanges = 512


Why is this a fixed number, rather than being related to the number of processors?

Thank you, added some comments for the mystery value. Cassandra doesn't like having the entire table scanned, so I split it up in many chunks.

Cassandra doesn't like having the entire table scanned, so I split it up in many chunks.

Interesting. I've recently written code scanning entire Cassandra table with hundreds of millions of entries, and haven't run into any issues. Select was very simple: SELECT <columns> FROM <table>, with no other conditions. It took a while to process all the results (as expected), but it worked.

bboreham · 2021-02-25T18:05:00Z

tools/blocksconvert/scanner/cassandra_index_reader.go

+
+				var query string
+
+				query = fmt.Sprintf("SELECT hash, range, value FROM %s WHERE token(hash) >= %v", tableName, rng.start)


I don't think the value is used for series index entries. Small optimisation.

Signed-off-by: Benjamin Charron <benjamin.charron@ubisoft.com>

bboreham

Allowing for the fact that I don't know Cassandra, I think we could merge this.

Thanks for the PR!

pstibrany

Thank you!

pull-request-size bot added the size/L label Feb 5, 2021

bboreham reviewed Feb 25, 2021

View reviewed changes

ubcharron force-pushed the cassandra-index-reader branch from 784a5fa to 67adb1b Compare May 14, 2021 18:16

Add Cassandra support to blocksconvert

756f25d

Signed-off-by: Benjamin Charron <benjamin.charron@ubisoft.com>

ubcharron force-pushed the cassandra-index-reader branch from 67adb1b to 756f25d Compare May 14, 2021 19:19

bboreham approved these changes May 17, 2021

View reviewed changes

pstibrany approved these changes May 18, 2021

View reviewed changes

pstibrany merged commit bdb563f into cortexproject:master May 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Cassandra support to blocksconvert #3795

Add Cassandra support to blocksconvert #3795

ubcharron commented Feb 5, 2021

pstibrany commented Feb 9, 2021

bboreham left a comment

bboreham Feb 25, 2021

ubcharron May 14, 2021

bboreham May 17, 2021

bboreham Feb 25, 2021

ubcharron May 14, 2021

pstibrany May 18, 2021

bboreham Feb 25, 2021

bboreham left a comment

pstibrany left a comment


		var query string

		query = fmt.Sprintf("SELECT hash, range, value FROM %s WHERE token(hash) >= %v", tableName, rng.start)

Add Cassandra support to blocksconvert #3795

Add Cassandra support to blocksconvert #3795

Conversation

ubcharron commented Feb 5, 2021

pstibrany commented Feb 9, 2021

bboreham left a comment

Choose a reason for hiding this comment

bboreham Feb 25, 2021

Choose a reason for hiding this comment

ubcharron May 14, 2021

Choose a reason for hiding this comment

bboreham May 17, 2021

Choose a reason for hiding this comment

bboreham Feb 25, 2021

Choose a reason for hiding this comment

ubcharron May 14, 2021

Choose a reason for hiding this comment

pstibrany May 18, 2021

Choose a reason for hiding this comment

bboreham Feb 25, 2021

Choose a reason for hiding this comment

bboreham left a comment

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment