OpenTSDB should reliably operate on a single node #796

IzakMarais · 2016-05-19T05:21:05Z

We run single node OpenTSDB with HBase writing to local file (RAID backed) in stead of HDFS when deploying to smaller clusters. OpenTSDB easily handles the ingestion rate (about 7000 dps).

However we have had repeated file level corruption problems. Over the last few months our 2 test systems have 5 times had an HBase 'tsdb' region is stuck in a FAILED_OPEN state. The only way I could recover from this is to delete the region file from the disk.

Is there something we can improve in our setup to avoid these errors? I am thinking about moving to HDFS. Is it possible/worth while to run a single node HDFS (with mulitple JBOD disks for reliability).

kev009 · 2016-05-19T06:48:07Z

OpenTSDB has nothing to do with that error.

You're going to have to take a look in the HBase log and see what caused the region error. Is it splitting when this happens? How much heap memory do you have for the HBase master and regionserver?

johann8384 · 2016-05-19T15:14:53Z

Thanks for opening the issue, we'll use this as a placeholder for a single-node mode for OpenTSDB.

nickman · 2016-05-19T21:08:38Z

We had similar issues. Initially, I grudgingly accepted a virtual server
(VMware) to run hbase. We had a ton of disk and 8 GB of RAM. After we saw
similar problems (usually after a failed compaction) we switched to
physical and it has been humming ever since.

Not on a virtual are you?
On May 19, 2016 1:21 AM, "Izak Marais" notifications@github.com wrote:

We run single node OpenTSDB with HBase writing to local file (RAID backed)
in stead of HDFS when deploying to smaller clusters. OpenTSDB easily
handles the ingestion rate (about 7000 dps).

However we have had repeated file level corruption problems. Over the last
few months our 2 test systems have 5 times had an HBase 'tsdb' region is
stuck in a FAILED_OPEN state. The only way I could recover from this is to
delete the region file from the disk.
[image: regions_in_transition]
https://cloud.githubusercontent.com/assets/8266640/15383276/2e923a2c-1d92-11e6-82c5-92d521ceae62.PNG

Is there something we can improve in our setup to avoid these errors? I am
thinking about moving to HDFS. Is it possible/worth while to run a single
node HDFS (with mulitple JBOD disks for reliability).

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#796

vitorboschi · 2016-06-02T18:41:54Z

We had a similar setup here for about a year and there were no issues. Running on a physical machine too.

IzakMarais · 2016-06-03T07:54:36Z

We are running on hardware. (I see my previous email reply hasn't made it onto Github).

We have a 1TB drive with the default 12GB of Regionserver RAM, which should be enough, according to this article.

If I recall correctly it appeared during HBase startup, i.e. after restarting either the host or just the HBase regionserver service: the region in question wouldn't come back online.

Our first machine has been running for more than a year has triggered it twice. The second one triggered it 3 times in a couple of months. On the second machine there where other processes also contending for CPU and disk access.

johann8384 added enhancement feature request and removed enhancement labels May 19, 2016

johann8384 added this to the v2.4.0 milestone Jul 6, 2016

johann8384 modified the milestones: v2.5.0, 2.6.0 Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTSDB should reliably operate on a single node #796

OpenTSDB should reliably operate on a single node #796

IzakMarais commented May 19, 2016

kev009 commented May 19, 2016

johann8384 commented May 19, 2016 via email

nickman commented May 19, 2016

vitorboschi commented Jun 2, 2016

IzakMarais commented Jun 3, 2016

OpenTSDB should reliably operate on a single node #796

OpenTSDB should reliably operate on a single node #796

Comments

IzakMarais commented May 19, 2016

kev009 commented May 19, 2016

johann8384 commented May 19, 2016 via email

nickman commented May 19, 2016

vitorboschi commented Jun 2, 2016

IzakMarais commented Jun 3, 2016