Skip to content

2.0 Forwards TSDB Table Compatability#150

Closed
manolama wants to merge 1 commit into
OpenTSDB:masterfrom
manolama:master
Closed

2.0 Forwards TSDB Table Compatability#150
manolama wants to merge 1 commit into
OpenTSDB:masterfrom
manolama:master

Conversation

@manolama

Copy link
Copy Markdown
Member

Patch to properly avoid future types of data stored in the tsdb table preparation for 2.0. Annotations will be stored with a 3 byte qualifier in the same row with their associated timeseries. For backwards compatibility, if a user tries 2.0 and decides to downgrade, or run an instance of 1.1 alongside 2.0, they can do so without affecting their data. This patch allows for proper compaction of cell data and leaves future types alone.

… in preparation for 2.0. Annotations will be stored with a 3 byte qualifier in the same row with their associated timeseries. For backwards compatibility, if a user tries 2.0 and decides to downgrade, or run an instance of 1.1 alongside 2.0, they can do so without affecting their data. This patch allows for proper compaction of cell data and leaves future types alone.
@tsuna

tsuna commented Jan 30, 2013

Copy link
Copy Markdown
Member

Hey Chris,
can you clarify how annotations are stored exactly? I glanced over manolama/opentsdb@4059dfb where you integrated Peter's code (although it's weird because some files look like they got removed and re-added with the same contents, I'm guessing there is an encoding issue, possibly different line endings, causing those massive spurious diffs) and from what I can tell the annotations are stored in a separate column family, and the column qualifier is still an offset in seconds, and the value is a JSON encoded annotation.

Did I get this right? If yes, why not do the filtering based on the family then?

Can we maybe talk, preferably on the list, about how to implement support for annotations? I don't feel great about adding a second family, because in case you didn't know, families are stored in different HFiles by HBase, and thus every read that misses the BlockCache has to go through all the HFiles. So adding a second column family essentially doubles the number of IOPS that need to be done during reads. OpenTSDB's read performance is already not as good as I'd like it to be, so it's kind of a big deal if we make reads twice more expensive in IOPS. We're probably better off storing the annotations in-line with the data.

What do you think? Am I misunderstanding something?

@manolama

Copy link
Copy Markdown
Member Author

Hi Benoit,

That was probably my first attempt and since I dev on Windows, I had all of those encoding issues J I saw your comments about not storing the annotations in a different family so I tweaked my code to use the same ‘t’ cf as the data points. Making it “compaction” compatible was a little bit of a pain but not too bad.

Since individual data points have a qualifier of 2 bytes and compacted cells have a qualifier of 4 bytes or greater, I figured we could make the annotation qualifier 3 bytes where byte [0] would be a type id, e.g. \x01 = an annotation (and maybe we’d have other types in the future). Bytes 1 and 2 would be the timestamp offset similar to the data points.

Does that sound alright? For the actual annotation data, formatting etc, I’ll hit up the list for another RFC. Thanks!

@tsuna

tsuna commented Jan 30, 2013

Copy link
Copy Markdown
Member

Ah, so that explains what I saw in the CompactionQueue. I was under the impression that you were still using a separate column family, and hence was confused as to why you needed to ignore annotations by looking at the qualifier in the compaction code.

What you propose sounds a lot more reasonable. Let's discuss on the list. I'm wondering whether it would make sense to compact the annotations along with the data or not. Do you have any opinion?

PS: It would be great if you could fix the encoding issues in your fork otherwise it's going to be really hard to review the diffs between upstream and your fork to integrate the changes.

@manolama

Copy link
Copy Markdown
Member Author

Thanks Tsuna. Do you want to start a thread or should I in the list?

Since the annotation data will be very variable and “likely” will only have a few entries per row, I don’t know that we would need to compact the annotations. Unless we compacted in a separate cell, i.e. have one for data points and another for all annotations, we would have to come up with a binary protocol to interleave the data with the annotations. You have it setup nicely now where you can split the time out of the qualifiers and separate the values just by length.

I fixed the line endings in my latter commits and future pull requests should be clean too :)

@tsuna

tsuna commented Jan 30, 2013

Copy link
Copy Markdown
Member

I started a thread about annotations on the mailing list.

@manolama

Copy link
Copy Markdown
Member Author

Cool, thanks!

Do you think this patch is alright though, even if we never officially integrate annotations or other types of data in the ‘tsdb’ table? We could still figure out the format for notes later on.

@tsuna

tsuna commented Jan 31, 2013

Copy link
Copy Markdown
Member

Yeah. What do you think about this one instead: tsuna/opentsdb@491429f – it's more general, it just ignores all KeyValues that don't look good, instead of hardcoding an exception for 3-byte long qualifiers. It also correctly handles 0-byte long qualifiers (which is possible!). And it doesn't require keeping track of which KVs to delete from row.

Note: I didn't test this change yet.

@manolama

Copy link
Copy Markdown
Member Author

Cool, I’ll take a look and give it a test today. I think that I had to add the delete array because the whole row was passed onto the trivial/complex compaction call. Thanks!

@manolama

manolama commented Feb 1, 2013

Copy link
Copy Markdown
Member Author

Tested and queries are fine. But compactions blow away the data. Could we use a combo of both patches, the delete list for preserving data in case people run 1.1.0 and 2.0 interfaces? Thanks!

hbase(main):002:0> get 'tsdb', "\x00\x02\x3B\x51\x0B\x05\x80\x00\x00\x01\x00\x01\x1B\x00\x00\x03\x00\x04\xBD"

COLUMN CELL

t:\x00\x17 timestamp=1359676801000, value=\x00\x00\x00\x00\x00\x00/o

t:\x018@ timestamp=1359677700000, value={"tsuid":"00023B00000100011B0000030004BD","start_time":1359677700,"end_time":0,"de

                                    scription":"Put Event 3","notes":"These would be details about the event, the description is just a summary","cus

                                    tom":{"owner":"clarsen","dept":"ops"}}

t:\x03\xD7 timestamp=1359676861000, value=\x00\x00\x00\x00\x00\x00/q

t:\x07\x97 timestamp=1359676921000, value=\x00\x00\x00\x00\x00\x00/s

t:\x0BW timestamp=1359676981000, value=\x00\x00\x00\x00\x00\x00/u

t:\x0F\x17 timestamp=1359677041000, value=\x00\x00\x00\x00\x00\x00/w

t:\x12\xD7 timestamp=1359677101000, value=\x00\x00\x00\x00\x00\x00/y

t:\x16\x97 timestamp=1359677161000, value=\x00\x00\x00\x00\x00\x00/{

t:\x1AW timestamp=1359677221000, value=\x00\x00\x00\x00\x00\x00/}

t:\x1E\x17 timestamp=1359677281000, value=\x00\x00\x00\x00\x00\x00/\x7F

t:!\xD7 timestamp=1359677341000, value=\x00\x00\x00\x00\x00\x00/\x81

t:%\x97 timestamp=1359677401000, value=\x00\x00\x00\x00\x00\x00/\x83

t:)W timestamp=1359677461000, value=\x00\x00\x00\x00\x00\x00/\x85

t:-\x17 timestamp=1359677521000, value=\x00\x00\x00\x00\x00\x00/\x87

t:0\xD7 timestamp=1359677581000, value=\x00\x00\x00\x00\x00\x00/\x89

t:4\x97 timestamp=1359677641000, value=\x00\x00\x00\x00\x00\x00/\x8B

t:8W timestamp=1359677701000, value=\x00\x00\x00\x00\x00\x00/\x8D

t:<\x17 timestamp=1359677761000, value=\x00\x00\x00\x00\x00\x00/\x8F

t:?\xD7 timestamp=1359677821000, value=\x00\x00\x00\x00\x00\x00/\x91

t:C\x97 timestamp=1359677881000, value=\x00\x00\x00\x00\x00\x00/\x93

t:GW timestamp=1359677941000, value=\x00\x00\x00\x00\x00\x00/\x95

t:K\x17 timestamp=1359678001000, value=\x00\x00\x00\x00\x00\x00/\x97

t:N\xE7 timestamp=1359678062000, value=\x00\x00\x00\x00\x00\x00/\x99

t:R\xA7 timestamp=1359678122000, value=\x00\x00\x00\x00\x00\x00/\x9B

t:VW timestamp=1359678181000, value=\x00\x00\x00\x00\x00\x00/\x9D

t:Z\x17 timestamp=1359678241000, value=\x00\x00\x00\x00\x00\x00/\x9F

t:]\xD7 timestamp=1359678301000, value=\x00\x00\x00\x00\x00\x00/\xA1

t:a\x97 timestamp=1359678361000, value=\x00\x00\x00\x00\x00\x00/\xA3

t:eW timestamp=1359678421000, value=\x00\x00\x00\x00\x00\x00/\xA5

t:i\x17 timestamp=1359678481000, value=\x00\x00\x00\x00\x00\x00/\xA7

t:l\xD7 timestamp=1359678541000, value=\x00\x00\x00\x00\x00\x00/\xA9

t:p\x97 timestamp=1359678601000, value=\x00\x00\x00\x00\x00\x00/\xAB

t:tW timestamp=1359678661000, value=\x00\x00\x00\x00\x00\x00/\xAD

t:x\x17 timestamp=1359678721000, value=\x00\x00\x00\x00\x00\x00/\xAF

t:{\xD7 timestamp=1359678781000, value=\x00\x00\x00\x00\x00\x00/\xB1

t:\x7F\x97 timestamp=1359678841000, value=\x00\x00\x00\x00\x00\x00/\xB3

t:\x83W timestamp=1359678901000, value=\x00\x00\x00\x00\x00\x00/\xB5

t:\x87\x17 timestamp=1359678961000, value=\x00\x00\x00\x00\x00\x00/\xB7

t:\x8A\xD7 timestamp=1359679021000, value=\x00\x00\x00\x00\x00\x00/\xB9

t:\x8E\x97 timestamp=1359679081000, value=\x00\x00\x00\x00\x00\x00/\xBB

t:\x92W timestamp=1359679141000, value=\x00\x00\x00\x00\x00\x00/\xBD

t:\x96\x17 timestamp=1359679201000, value=\x00\x00\x00\x00\x00\x00/\xBF

t:\x99\xD7 timestamp=1359679261000, value=\x00\x00\x00\x00\x00\x00/\xC1

t:\x9D\x97 timestamp=1359679321000, value=\x00\x00\x00\x00\x00\x00/\xC3

t:\xA1W timestamp=1359679381000, value=\x00\x00\x00\x00\x00\x00/\xC5

t:\xA5\x17 timestamp=1359679441000, value=\x00\x00\x00\x00\x00\x00/\xC7

t:\xA8\xD7 timestamp=1359679501000, value=\x00\x00\x00\x00\x00\x00/\xC9

t:\xAC\x97 timestamp=1359679561000, value=\x00\x00\x00\x00\x00\x00/\xCB

t:\xB0W timestamp=1359679621000, value=\x00\x00\x00\x00\x00\x00/\xCD

t:\xB4\x17 timestamp=1359679681000, value=\x00\x00\x00\x00\x00\x00/\xCF

t:\xB7\xD7 timestamp=1359679741000, value=\x00\x00\x00\x00\x00\x00/\xD1

t:\xBB\x97 timestamp=1359679801000, value=\x00\x00\x00\x00\x00\x00/\xD3

t:\xBFW timestamp=1359679861000, value=\x00\x00\x00\x00\x00\x00/\xD5

t:\xC3\x17 timestamp=1359679921000, value=\x00\x00\x00\x00\x00\x00/\xD7

t:\xC6\xD7 timestamp=1359679981000, value=\x00\x00\x00\x00\x00\x00/\xD9

t:\xCA\x97 timestamp=1359680041000, value=\x00\x00\x00\x00\x00\x00/\xDB

t:\xCEW timestamp=1359680101000, value=\x00\x00\x00\x00\x00\x00/\xDD

t:\xD2\x17 timestamp=1359680161000, value=\x00\x00\x00\x00\x00\x00/\xDF

t:\xD5\xD7 timestamp=1359680221000, value=\x00\x00\x00\x00\x00\x00/\xE1

t:\xD9\x97 timestamp=1359680281000, value=\x00\x00\x00\x00\x00\x00/\xE3

t:\xDDW timestamp=1359680341000, value=\x00\x00\x00\x00\x00\x00/\xE5

61 row(s) in 0.0550 seconds

hbase(main):003:0> get 'tsdb', "\x00\x02\x3B\x51\x0B\x05\x80\x00\x00\x01\x00\x01\x1B\x00\x00\x03\x00\x04\xBD"

COLUMN CELL

t:\x00\x17\x03\xD7\x07\x97\x0BW\x0F\x1 timestamp=1359681204904, value=\x00\x00\x00\x00\x00\x00/o\x00\x00\x00\x00\x00\x00/q\x00\x00\x00\x00\x00\x00/s\x00

7\x12\xD7\x16\x97\x1AW\x1E\x17!\xD7%\x \x00\x00\x00\x00\x00/u\x00\x00\x00\x00\x00\x00/w\x00\x00\x00\x00\x00\x00/y\x00\x00\x00\x00\x00\x00/{\x00\x00\x00\

97)W-\x170\xD74\x978W<\x17?\xD7C\x97GW x00\x00\x00/}\x00\x00\x00\x00\x00\x00/\x7F\x00\x00\x00\x00\x00\x00/\x81\x00\x00\x00\x00\x00\x00/\x83\x00\x00\x00\

K\x17N\xE7R\xA7VWZ\x17]\xD7a\x97eWi\x1 x00\x00\x00/\x85\x00\x00\x00\x00\x00\x00/\x87\x00\x00\x00\x00\x00\x00/\x89\x00\x00\x00\x00\x00\x00/\x8B\x00\x00\x

7l\xD7p\x97tWx\x17{\xD7\x7F\x97\x83W\x 00\x00\x00\x00/\x8D\x00\x00\x00\x00\x00\x00/\x8F\x00\x00\x00\x00\x00\x00/\x91\x00\x00\x00\x00\x00\x00/\x93\x00\x0

87\x17\x8A\xD7\x8E\x97\x92W\x96\x17\x9 0\x00\x00\x00\x00/\x95\x00\x00\x00\x00\x00\x00/\x97\x00\x00\x00\x00\x00\x00/\x99\x00\x00\x00\x00\x00\x00/\x9B\x00

9\xD7\x9D\x97\xA1W\xA5\x17\xA8\xD7\xAC \x00\x00\x00\x00\x00/\x9D\x00\x00\x00\x00\x00\x00/\x9F\x00\x00\x00\x00\x00\x00/\xA1\x00\x00\x00\x00\x00\x00/\xA3\

\x97\xB0W\xB4\x17\xB7\xD7\xBB\x97\xBFW x00\x00\x00\x00\x00\x00/\xA5\x00\x00\x00\x00\x00\x00/\xA7\x00\x00\x00\x00\x00\x00/\xA9\x00\x00\x00\x00\x00\x00/\x

\xC3\x17\xC6\xD7\xCA\x97\xCEW\xD2\x17\ AB\x00\x00\x00\x00\x00\x00/\xAD\x00\x00\x00\x00\x00\x00/\xAF\x00\x00\x00\x00\x00\x00/\xB1\x00\x00\x00\x00\x00\x00

xD5\xD7\xD9\x97\xDDW /\xB3\x00\x00\x00\x00\x00\x00/\xB5\x00\x00\x00\x00\x00\x00/\xB7\x00\x00\x00\x00\x00\x00/\xB9\x00\x00\x00\x00\x00\

                                    x00/\xBB\x00\x00\x00\x00\x00\x00/\xBD\x00\x00\x00\x00\x00\x00/\xBF\x00\x00\x00\x00\x00\x00/\xC1\x00\x00\x00\x00\x

                                    00\x00/\xC3\x00\x00\x00\x00\x00\x00/\xC5\x00\x00\x00\x00\x00\x00/\xC7\x00\x00\x00\x00\x00\x00/\xC9\x00\x00\x00\x0

                                    0\x00\x00/\xCB\x00\x00\x00\x00\x00\x00/\xCD\x00\x00\x00\x00\x00\x00/\xCF\x00\x00\x00\x00\x00\x00/\xD1\x00\x00\x00

                                    \x00\x00\x00/\xD3\x00\x00\x00\x00\x00\x00/\xD5\x00\x00\x00\x00\x00\x00/\xD7\x00\x00\x00\x00\x00\x00/\xD9\x00\x00\

                                    x00\x00\x00\x00/\xDB\x00\x00\x00\x00\x00\x00/\xDD\x00\x00\x00\x00\x00\x00/\xDF\x00\x00\x00\x00\x00\x00/\xE1\x00\x

                                    00\x00\x00\x00\x00/\xE3\x00\x00\x00\x00\x00\x00/\xE5\x00

1 row(s) in 0.0290 seconds

@tsuna

tsuna commented Feb 2, 2013

Copy link
Copy Markdown
Member

Ah, right, forgot about that. My bad. Does it work better with this small extra change?

diff --git a/src/core/CompactionQueue.java b/src/core/CompactionQueue.java
index 669dd7b..697d6cb 100644
--- a/src/core/CompactionQueue.java
+++ b/src/core/CompactionQueue.java
@@ -276,7 +276,7 @@ final class CompactionQueue extends ConcurrentSkipListMap<byte[], Boolean> {
       short last_delta = -1;  // Time delta, extracted from the qualifier.
       KeyValue longest = row.get(0);  // KV with the longest qualifier.
       int longest_idx = 0;            // Index of `longest'.
-      final int nkvs = row.size();
+      int nkvs = row.size();
       for (int i = 0; i < nkvs; i++) {
         final KeyValue kv = row.get(i);
         final byte[] qual = kv.qualifier();
@@ -292,6 +292,8 @@ final class CompactionQueue extends ConcurrentSkipListMap<byte[], Boolean> {
           // or it could be something from a future version of OpenTSDB that
           // we're not forward compatible with.
           if (len % 2 != 0 || len == 0) {
+            row.remove(i);
+            nkvs--;
             continue;
           }
           trivial = false;

@tsuna

tsuna commented Feb 4, 2013

Copy link
Copy Markdown
Member

I had some time to test my code. I did the testing manually, but we need some unit tests here, because this stuff is actually more complicated than it seems.

So my code above and your code actually both contain a few different bugs. There are several corner cases:

  • Row containing just an annotation. Your code was handling this almost correctly, with the small exception that it was incorrectly incrementing nrows++ in TsdbQuery even though the row was ignored because it didn't contain any actual datapoint.
  • Row containing just two or more annotations. In this case the row ends up empty after we ignore all annotations. Your code was only handling the case where we're left with row.size() == 1.
  • Row containing only one data point and one or more annotations. In this case we end up with just the data point. You attempted to correctly handle this case by returning that data point (because a row containing a single data point is compacted by definition), however there were two bugs: unconditionally setting compacted[0] even though compacted may be null, and not handling the case where the data point that's left is an old, incorrectly encoded floating point value that needs to be fixed. Since the compact() method already has special code at the beginning to handle all of this correctly, I'm just calling compact() recursively at this point.
  • The code needs to properly ignore KeyValues that have an empty column qualifier.

My last attempt is at tsuna/opentsdb@fd09e81 – please let me know what you think.

@manolama

manolama commented Feb 4, 2013

Copy link
Copy Markdown
Member Author

Oh wow, yeah I completely missed the nrows++ and hadn’t tested the two annotations and one dp, though I had tried it with a compacted cell. Everything looks good to me in your patch but I’ll test today just to verify. Thanks for handling this!

@tsuna

tsuna commented Feb 6, 2013

Copy link
Copy Markdown
Member

Have you had a chance to test my change? Would like to cut the release soon...

@manolama

manolama commented Feb 7, 2013

Copy link
Copy Markdown
Member Author

Sorry, swamped with a deadline. I just tested and it worked properly on one and two annotations so run with it :) Thanks!

@ghost ghost assigned tsuna Feb 7, 2013
@tsuna

tsuna commented Feb 7, 2013

Copy link
Copy Markdown
Member

Alright, so this will go in v1.1.0 then.

@manolama manolama closed this Feb 7, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants