2.0 Forwards TSDB Table Compatability#150
Conversation
… in preparation for 2.0. Annotations will be stored with a 3 byte qualifier in the same row with their associated timeseries. For backwards compatibility, if a user tries 2.0 and decides to downgrade, or run an instance of 1.1 alongside 2.0, they can do so without affecting their data. This patch allows for proper compaction of cell data and leaves future types alone.
|
Hey Chris, Did I get this right? If yes, why not do the filtering based on the family then? Can we maybe talk, preferably on the list, about how to implement support for annotations? I don't feel great about adding a second family, because in case you didn't know, families are stored in different HFiles by HBase, and thus every read that misses the BlockCache has to go through all the HFiles. So adding a second column family essentially doubles the number of IOPS that need to be done during reads. OpenTSDB's read performance is already not as good as I'd like it to be, so it's kind of a big deal if we make reads twice more expensive in IOPS. We're probably better off storing the annotations in-line with the data. What do you think? Am I misunderstanding something? |
|
Hi Benoit, That was probably my first attempt and since I dev on Windows, I had all of those encoding issues J I saw your comments about not storing the annotations in a different family so I tweaked my code to use the same ‘t’ cf as the data points. Making it “compaction” compatible was a little bit of a pain but not too bad. Since individual data points have a qualifier of 2 bytes and compacted cells have a qualifier of 4 bytes or greater, I figured we could make the annotation qualifier 3 bytes where byte [0] would be a type id, e.g. \x01 = an annotation (and maybe we’d have other types in the future). Bytes 1 and 2 would be the timestamp offset similar to the data points. Does that sound alright? For the actual annotation data, formatting etc, I’ll hit up the list for another RFC. Thanks! |
|
Ah, so that explains what I saw in the What you propose sounds a lot more reasonable. Let's discuss on the list. I'm wondering whether it would make sense to compact the annotations along with the data or not. Do you have any opinion? PS: It would be great if you could fix the encoding issues in your fork otherwise it's going to be really hard to review the diffs between upstream and your fork to integrate the changes. |
|
Thanks Tsuna. Do you want to start a thread or should I in the list? Since the annotation data will be very variable and “likely” will only have a few entries per row, I don’t know that we would need to compact the annotations. Unless we compacted in a separate cell, i.e. have one for data points and another for all annotations, we would have to come up with a binary protocol to interleave the data with the annotations. You have it setup nicely now where you can split the time out of the qualifiers and separate the values just by length. I fixed the line endings in my latter commits and future pull requests should be clean too :) |
|
I started a thread about annotations on the mailing list. |
|
Cool, thanks! Do you think this patch is alright though, even if we never officially integrate annotations or other types of data in the ‘tsdb’ table? We could still figure out the format for notes later on. |
|
Yeah. What do you think about this one instead: tsuna/opentsdb@491429f – it's more general, it just ignores all KeyValues that don't look good, instead of hardcoding an exception for 3-byte long qualifiers. It also correctly handles 0-byte long qualifiers (which is possible!). And it doesn't require keeping track of which KVs to delete from Note: I didn't test this change yet. |
|
Cool, I’ll take a look and give it a test today. I think that I had to add the delete array because the whole row was passed onto the trivial/complex compaction call. Thanks! |
|
Tested and queries are fine. But compactions blow away the data. Could we use a combo of both patches, the delete list for preserving data in case people run 1.1.0 and 2.0 interfaces? Thanks! hbase(main):002:0> get 'tsdb', "\x00\x02\x3B\x51\x0B\x05\x80\x00\x00\x01\x00\x01\x1B\x00\x00\x03\x00\x04\xBD" COLUMN CELL t:\x00\x17 timestamp=1359676801000, value=\x00\x00\x00\x00\x00\x00/o t:\x018@ timestamp=1359677700000, value={"tsuid":"00023B00000100011B0000030004BD","start_time":1359677700,"end_time":0,"de t:\x03\xD7 timestamp=1359676861000, value=\x00\x00\x00\x00\x00\x00/q t:\x07\x97 timestamp=1359676921000, value=\x00\x00\x00\x00\x00\x00/s t:\x0BW timestamp=1359676981000, value=\x00\x00\x00\x00\x00\x00/u t:\x0F\x17 timestamp=1359677041000, value=\x00\x00\x00\x00\x00\x00/w t:\x12\xD7 timestamp=1359677101000, value=\x00\x00\x00\x00\x00\x00/y t:\x16\x97 timestamp=1359677161000, value=\x00\x00\x00\x00\x00\x00/{ t:\x1AW timestamp=1359677221000, value=\x00\x00\x00\x00\x00\x00/} t:\x1E\x17 timestamp=1359677281000, value=\x00\x00\x00\x00\x00\x00/\x7F t:!\xD7 timestamp=1359677341000, value=\x00\x00\x00\x00\x00\x00/\x81 t:%\x97 timestamp=1359677401000, value=\x00\x00\x00\x00\x00\x00/\x83 t:)W timestamp=1359677461000, value=\x00\x00\x00\x00\x00\x00/\x85 t:-\x17 timestamp=1359677521000, value=\x00\x00\x00\x00\x00\x00/\x87 t:0\xD7 timestamp=1359677581000, value=\x00\x00\x00\x00\x00\x00/\x89 t:4\x97 timestamp=1359677641000, value=\x00\x00\x00\x00\x00\x00/\x8B t:8W timestamp=1359677701000, value=\x00\x00\x00\x00\x00\x00/\x8D t:<\x17 timestamp=1359677761000, value=\x00\x00\x00\x00\x00\x00/\x8F t:?\xD7 timestamp=1359677821000, value=\x00\x00\x00\x00\x00\x00/\x91 t:C\x97 timestamp=1359677881000, value=\x00\x00\x00\x00\x00\x00/\x93 t:GW timestamp=1359677941000, value=\x00\x00\x00\x00\x00\x00/\x95 t:K\x17 timestamp=1359678001000, value=\x00\x00\x00\x00\x00\x00/\x97 t:N\xE7 timestamp=1359678062000, value=\x00\x00\x00\x00\x00\x00/\x99 t:R\xA7 timestamp=1359678122000, value=\x00\x00\x00\x00\x00\x00/\x9B t:VW timestamp=1359678181000, value=\x00\x00\x00\x00\x00\x00/\x9D t:Z\x17 timestamp=1359678241000, value=\x00\x00\x00\x00\x00\x00/\x9F t:]\xD7 timestamp=1359678301000, value=\x00\x00\x00\x00\x00\x00/\xA1 t:a\x97 timestamp=1359678361000, value=\x00\x00\x00\x00\x00\x00/\xA3 t:eW timestamp=1359678421000, value=\x00\x00\x00\x00\x00\x00/\xA5 t:i\x17 timestamp=1359678481000, value=\x00\x00\x00\x00\x00\x00/\xA7 t:l\xD7 timestamp=1359678541000, value=\x00\x00\x00\x00\x00\x00/\xA9 t:p\x97 timestamp=1359678601000, value=\x00\x00\x00\x00\x00\x00/\xAB t:tW timestamp=1359678661000, value=\x00\x00\x00\x00\x00\x00/\xAD t:x\x17 timestamp=1359678721000, value=\x00\x00\x00\x00\x00\x00/\xAF t:{\xD7 timestamp=1359678781000, value=\x00\x00\x00\x00\x00\x00/\xB1 t:\x7F\x97 timestamp=1359678841000, value=\x00\x00\x00\x00\x00\x00/\xB3 t:\x83W timestamp=1359678901000, value=\x00\x00\x00\x00\x00\x00/\xB5 t:\x87\x17 timestamp=1359678961000, value=\x00\x00\x00\x00\x00\x00/\xB7 t:\x8A\xD7 timestamp=1359679021000, value=\x00\x00\x00\x00\x00\x00/\xB9 t:\x8E\x97 timestamp=1359679081000, value=\x00\x00\x00\x00\x00\x00/\xBB t:\x92W timestamp=1359679141000, value=\x00\x00\x00\x00\x00\x00/\xBD t:\x96\x17 timestamp=1359679201000, value=\x00\x00\x00\x00\x00\x00/\xBF t:\x99\xD7 timestamp=1359679261000, value=\x00\x00\x00\x00\x00\x00/\xC1 t:\x9D\x97 timestamp=1359679321000, value=\x00\x00\x00\x00\x00\x00/\xC3 t:\xA1W timestamp=1359679381000, value=\x00\x00\x00\x00\x00\x00/\xC5 t:\xA5\x17 timestamp=1359679441000, value=\x00\x00\x00\x00\x00\x00/\xC7 t:\xA8\xD7 timestamp=1359679501000, value=\x00\x00\x00\x00\x00\x00/\xC9 t:\xAC\x97 timestamp=1359679561000, value=\x00\x00\x00\x00\x00\x00/\xCB t:\xB0W timestamp=1359679621000, value=\x00\x00\x00\x00\x00\x00/\xCD t:\xB4\x17 timestamp=1359679681000, value=\x00\x00\x00\x00\x00\x00/\xCF t:\xB7\xD7 timestamp=1359679741000, value=\x00\x00\x00\x00\x00\x00/\xD1 t:\xBB\x97 timestamp=1359679801000, value=\x00\x00\x00\x00\x00\x00/\xD3 t:\xBFW timestamp=1359679861000, value=\x00\x00\x00\x00\x00\x00/\xD5 t:\xC3\x17 timestamp=1359679921000, value=\x00\x00\x00\x00\x00\x00/\xD7 t:\xC6\xD7 timestamp=1359679981000, value=\x00\x00\x00\x00\x00\x00/\xD9 t:\xCA\x97 timestamp=1359680041000, value=\x00\x00\x00\x00\x00\x00/\xDB t:\xCEW timestamp=1359680101000, value=\x00\x00\x00\x00\x00\x00/\xDD t:\xD2\x17 timestamp=1359680161000, value=\x00\x00\x00\x00\x00\x00/\xDF t:\xD5\xD7 timestamp=1359680221000, value=\x00\x00\x00\x00\x00\x00/\xE1 t:\xD9\x97 timestamp=1359680281000, value=\x00\x00\x00\x00\x00\x00/\xE3 t:\xDDW timestamp=1359680341000, value=\x00\x00\x00\x00\x00\x00/\xE5 61 row(s) in 0.0550 seconds hbase(main):003:0> get 'tsdb', "\x00\x02\x3B\x51\x0B\x05\x80\x00\x00\x01\x00\x01\x1B\x00\x00\x03\x00\x04\xBD" COLUMN CELL t:\x00\x17\x03\xD7\x07\x97\x0BW\x0F\x1 timestamp=1359681204904, value=\x00\x00\x00\x00\x00\x00/o\x00\x00\x00\x00\x00\x00/q\x00\x00\x00\x00\x00\x00/s\x00 7\x12\xD7\x16\x97\x1AW\x1E\x17!\xD7%\x \x00\x00\x00\x00\x00/u\x00\x00\x00\x00\x00\x00/w\x00\x00\x00\x00\x00\x00/y\x00\x00\x00\x00\x00\x00/{\x00\x00\x00\ 97)W-\x170\xD74\x978W<\x17?\xD7C\x97GW x00\x00\x00/}\x00\x00\x00\x00\x00\x00/\x7F\x00\x00\x00\x00\x00\x00/\x81\x00\x00\x00\x00\x00\x00/\x83\x00\x00\x00\ K\x17N\xE7R\xA7VWZ\x17]\xD7a\x97eWi\x1 x00\x00\x00/\x85\x00\x00\x00\x00\x00\x00/\x87\x00\x00\x00\x00\x00\x00/\x89\x00\x00\x00\x00\x00\x00/\x8B\x00\x00\x 7l\xD7p\x97tWx\x17{\xD7\x7F\x97\x83W\x 00\x00\x00\x00/\x8D\x00\x00\x00\x00\x00\x00/\x8F\x00\x00\x00\x00\x00\x00/\x91\x00\x00\x00\x00\x00\x00/\x93\x00\x0 87\x17\x8A\xD7\x8E\x97\x92W\x96\x17\x9 0\x00\x00\x00\x00/\x95\x00\x00\x00\x00\x00\x00/\x97\x00\x00\x00\x00\x00\x00/\x99\x00\x00\x00\x00\x00\x00/\x9B\x00 9\xD7\x9D\x97\xA1W\xA5\x17\xA8\xD7\xAC \x00\x00\x00\x00\x00/\x9D\x00\x00\x00\x00\x00\x00/\x9F\x00\x00\x00\x00\x00\x00/\xA1\x00\x00\x00\x00\x00\x00/\xA3\ \x97\xB0W\xB4\x17\xB7\xD7\xBB\x97\xBFW x00\x00\x00\x00\x00\x00/\xA5\x00\x00\x00\x00\x00\x00/\xA7\x00\x00\x00\x00\x00\x00/\xA9\x00\x00\x00\x00\x00\x00/\x \xC3\x17\xC6\xD7\xCA\x97\xCEW\xD2\x17\ AB\x00\x00\x00\x00\x00\x00/\xAD\x00\x00\x00\x00\x00\x00/\xAF\x00\x00\x00\x00\x00\x00/\xB1\x00\x00\x00\x00\x00\x00 xD5\xD7\xD9\x97\xDDW /\xB3\x00\x00\x00\x00\x00\x00/\xB5\x00\x00\x00\x00\x00\x00/\xB7\x00\x00\x00\x00\x00\x00/\xB9\x00\x00\x00\x00\x00\ 1 row(s) in 0.0290 seconds |
|
Ah, right, forgot about that. My bad. Does it work better with this small extra change? diff --git a/src/core/CompactionQueue.java b/src/core/CompactionQueue.java
index 669dd7b..697d6cb 100644
--- a/src/core/CompactionQueue.java
+++ b/src/core/CompactionQueue.java
@@ -276,7 +276,7 @@ final class CompactionQueue extends ConcurrentSkipListMap<byte[], Boolean> {
short last_delta = -1; // Time delta, extracted from the qualifier.
KeyValue longest = row.get(0); // KV with the longest qualifier.
int longest_idx = 0; // Index of `longest'.
- final int nkvs = row.size();
+ int nkvs = row.size();
for (int i = 0; i < nkvs; i++) {
final KeyValue kv = row.get(i);
final byte[] qual = kv.qualifier();
@@ -292,6 +292,8 @@ final class CompactionQueue extends ConcurrentSkipListMap<byte[], Boolean> {
// or it could be something from a future version of OpenTSDB that
// we're not forward compatible with.
if (len % 2 != 0 || len == 0) {
+ row.remove(i);
+ nkvs--;
continue;
}
trivial = false; |
|
I had some time to test my code. I did the testing manually, but we need some unit tests here, because this stuff is actually more complicated than it seems. So my code above and your code actually both contain a few different bugs. There are several corner cases:
My last attempt is at tsuna/opentsdb@fd09e81 – please let me know what you think. |
|
Oh wow, yeah I completely missed the nrows++ and hadn’t tested the two annotations and one dp, though I had tried it with a compacted cell. Everything looks good to me in your patch but I’ll test today just to verify. Thanks for handling this! |
|
Have you had a chance to test my change? Would like to cut the release soon... |
|
Sorry, swamped with a deadline. I just tested and it worked properly on one and two annotations so run with it :) Thanks! |
|
Alright, so this will go in v1.1.0 then. |
Patch to properly avoid future types of data stored in the tsdb table preparation for 2.0. Annotations will be stored with a 3 byte qualifier in the same row with their associated timeseries. For backwards compatibility, if a user tries 2.0 and decides to downgrade, or run an instance of 1.1 alongside 2.0, they can do so without affecting their data. This patch allows for proper compaction of cell data and leaves future types alone.