Add support for HerdDB database #183

eolivelli · 2019-09-04T15:46:32Z

Master Issue: #177

This patch introduces support for HerdDB as storage:

new "init database" script
fix references to 'timestamp' column in SQL, use MySQL escape syntax as 'timestamp' is a reserved word in HerdDB (that in turn uses Apache Calcite as SQL parser)
fix SQL in "delete" statements to make Apache Calcite happy, as it forbids adding/subtracting unknown datatype (the '?' JDBC parameter is of type UNKNOWN for Calcite at SQL parsing time)
fixed a query with a "GROUP BY" clause that was not grouping all of the expected columns (Apache Calcite treats that fact as an error)
added a little example to boot HerdDB in memory-only mode (tested manually on a standalone setup)

It is possible to boot the server inside the application for persistent storage or even for replication.
In case of replication it uses ZooKeeper and BookKeeper.

Side note:

I had to add some "Maven repositories" to make the build pass locally on my machine

In order to test this branch:

change the src/main/resources/application.properties file and check for comments about HerdDB
run pulsar manager as usual

tuteng · 2019-09-05T01:51:39Z

@eolivelli Thank you very much. Is it necessary to create a new file herddb-schema.sql for herdDB? is it fully compatible with MySQL?

But I threw the following exception when running locally. What do I need to do:

FAILURE: Build failed with an exception.

* What went wrong:
Could not resolve all files for configuration ':compileClasspath'.
> Could not find org.herddb:herddb-jdbc:0.12.0-SNAPSHOT.
  Searched in the following locations:
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/maven-metadata.xml
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.pom
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.jar
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/maven-metadata.xml
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.pom
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.jar
  Required by:
      project :

Do I need to change a version?

eolivelli · 2019-09-05T08:28:55Z

@tuteng thanks for taking a look.
I have reworked the patch and now it seems to work at 100%.

I am not an user (yet) of Pulsar Manager so I can't tell if every of the features are working as expected.
If an expert of Pulsar Manager could take a tour it would be great.

cc @sijie

sijie · 2019-09-05T19:49:43Z

@eolivelli - @tuteng is the expert of Pulsar Manager. He can help review your code change.

eolivelli · 2019-09-05T20:54:06Z

Cool.
I will be happy to provide working examples for production ready setups.

I still don't have it clear how you a deploying this application.
Is all the state of the application on the sql database? Where do you store sessions?
Is there any way to start more then one inatance of the backend? I don't know SpringBoot at all.

If we start the DB inside the same process you can still run many instances of the backend with HerdDB.

Another point: as far as I can see one Pulsar Manager is able to connect to more than one Pulsar cluster, but there is a 'bookie' option, what is that for?

If you are managing a single Pulsar cluster, with a ZK cluster and BK cluster you can use that one to support the Herd DB instance

The best setup for standalone (single machine) backend I think in HerdDB case is to run the server inside the backend and provide a directory for data, metadata and journal, without Bookkeeper/ZK.

sijie · 2019-09-05T21:17:17Z

I still don't have it clear how you a deploying this application.

Pulsar manager comprises of frontend and backend. The frontend is a vuejs application rendering the management and monitoring pages. The backend is a springboot application. You can co-run frontend and backend in one machine or one containerr.

Is all the state of the application on the sql database?

Currently we store very minimal informations in SQL database, plus the collected metrics. Most of the restful requests are forwarded back to the real Pulsar brokers.

Is there any way to start more then one inatance of the backend?

Yes.

If we start the DB inside the same process you can still run many instances of the backend with HerdDB.

I am not sure how does it. I would image HerdDB is more like a mysql instance. Then many backend instances will connect to the mysql / herdDB?

as far as I can see one Pulsar Manager is able to connect to more than one Pulsar cluster, but there is a 'bookie' option, what is that for?

a Pulsar cluster is comprised of brokers, bookies, function workers and many other components. The cluster management and monitoring provides the capability to manage and monitor all the components.

eolivelli · 2019-09-06T16:14:19Z

Thanks for your clarification @sijie.

I will send links to docs for available deployment options for HerdDB

eolivelli · 2019-09-10T12:35:37Z

@tuteng
we are going to release HerdDB 0.12.0-SNAPSHOT soon.
I will update this patch and it will download HerdDB dependencies from Maven Central.

Do you have time for a review ?

tuteng · 2019-09-10T13:24:13Z

@tuteng
we are going to release HerdDB 0.12.0-SNAPSHOT soon.
I will update this patch and it will download HerdDB dependencies from Maven Central.

Do you have time for a review ?

Ok, no problem.

- add HerdDB dependency - change queries in order to be compliant with MySQL dialect of Apache Calcite - add example configuration for HerdDB

eolivelli · 2019-09-12T13:32:38Z

@tuteng this patch is ready for review now.
HerdDB is disabled by default, I kept SQLLite

eolivelli · 2019-09-13T15:33:30Z

@tuteng I did another pass, trying to click on every page of the web application.
I found a bunch of issues and committed the fixes

eolivelli · 2019-09-13T15:36:15Z

src/main/java/io/streamnative/pulsar/manager/mapper/TopicsStatsMapper.java

                    "topic IN <foreach collection='topicList' item='topic' open='(' separator=',' close=')'> #{topic} </foreach>" +
-            "GROUP BY cluster, persistent, topic" +
+            "GROUP BY environment, cluster, tenant, namespace, persistent, topic, `timestamp` " +


this "group by" clause was incomplete and Apache Calcite wasn't happy

I fixed it here streamnative#190

eolivelli · 2019-09-14T09:06:11Z

@tuteng HerdDB standalone, embedded (same process of the backend) can handle big databases as this is its primary usecase in production in other projects.

You can evaluate to switch to HerdDB for tests and single instance deployments instead of SQLLite.

Supporting PostGre is good as it is widely used by many companies and the license is better than MySQL.

HerdDB can again be a good alternative to have a replicated (high availability) database, it is really easy to setup replication if you already have a ZK cluster and you can even use your Bookies from Pulsar

eolivelli · 2019-09-15T16:25:58Z

@sijie I will rebase this patch.

tuteng · 2019-09-16T01:40:27Z

@eolivelli I have done some tests locally and all the features are running well. Thank you very much for your contribution.

However, when I started the service, the log encountered the following errors about HerdDB. I don't know if they have any influence. Can you take a look at them for me?

I will consider using it as a test database. Before that, I will first learn about the use of this database.

eolivelli · 2019-09-16T05:31:59Z

The 'ERROR' lines are logged as level SEVERE.
This is expected although it is not very nice with your logging configuration.
I will file an issue

sijie · 2019-09-16T23:54:27Z

I will consider using it as a test database.

+1 that's a good idea to start using it as a test database. We can promote HerdDB as one of the databases to use in the documentation.

sijie · 2019-09-16T23:58:00Z

@tuteng @eolivelli if this pull request is almost ready, we can probably move forward with it so that we can start using HerdDB in the tests.

eolivelli · 2019-09-17T04:54:19Z

@sijie awesome !

I will rebase today

eolivelli · 2019-09-18T05:13:17Z

Superseded by #194

eolivelli force-pushed the fix/herddb-demo branch 2 times, most recently from 932a339 to 3a563a6 Compare September 5, 2019 08:06

eolivelli changed the title ~~This is only a playground for trying HerdDB~~ Add support for HerdDB database Sep 5, 2019

eolivelli marked this pull request as ready for review September 10, 2019 15:43

eolivelli requested review from sijie and tuteng as code owners September 10, 2019 15:43

Support for HerdDB as database:

b763af0

- add HerdDB dependency - change queries in order to be compliant with MySQL dialect of Apache Calcite - add example configuration for HerdDB

eolivelli force-pushed the fix/herddb-demo branch from d176ee6 to b763af0 Compare September 12, 2019 13:30

Merge branch 'master' into fix/herddb-demo

2a51b51

Enrico Olivelli added 4 commits September 13, 2019 14:11

Fix schema for HerdDB DDL

c7b39ec

Fix query

8407199

Fix timestamp ref

c7a8927

fix GROUP by

250a417

eolivelli commented Sep 13, 2019

View reviewed changes

sijie assigned eolivelli Sep 16, 2019

sijie added area/backend type/enhancement labels Sep 16, 2019

sijie added this to the 0.0.2 milestone Sep 16, 2019

sijie mentioned this pull request Sep 18, 2019

Support HerdDB database #194

Merged

eolivelli closed this Sep 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for HerdDB database #183

Add support for HerdDB database #183

eolivelli commented Sep 4, 2019 •

edited by sijie

Loading

tuteng commented Sep 5, 2019 •

edited

Loading

eolivelli commented Sep 5, 2019

sijie commented Sep 5, 2019

eolivelli commented Sep 5, 2019

sijie commented Sep 5, 2019

eolivelli commented Sep 6, 2019

eolivelli commented Sep 10, 2019

tuteng commented Sep 10, 2019

eolivelli commented Sep 12, 2019

eolivelli commented Sep 13, 2019

eolivelli Sep 13, 2019

tuteng Sep 14, 2019 •

edited

Loading

eolivelli commented Sep 14, 2019

eolivelli commented Sep 15, 2019

tuteng commented Sep 16, 2019

eolivelli commented Sep 16, 2019

sijie commented Sep 16, 2019

sijie commented Sep 16, 2019

eolivelli commented Sep 17, 2019

eolivelli commented Sep 18, 2019

Add support for HerdDB database #183

Add support for HerdDB database #183

Conversation

eolivelli commented Sep 4, 2019 • edited by sijie Loading

tuteng commented Sep 5, 2019 • edited Loading

eolivelli commented Sep 5, 2019

sijie commented Sep 5, 2019

eolivelli commented Sep 5, 2019

sijie commented Sep 5, 2019

eolivelli commented Sep 6, 2019

eolivelli commented Sep 10, 2019

tuteng commented Sep 10, 2019

eolivelli commented Sep 12, 2019

eolivelli commented Sep 13, 2019

eolivelli Sep 13, 2019

Choose a reason for hiding this comment

tuteng Sep 14, 2019 • edited Loading

Choose a reason for hiding this comment

eolivelli commented Sep 14, 2019

eolivelli commented Sep 15, 2019

tuteng commented Sep 16, 2019

eolivelli commented Sep 16, 2019

sijie commented Sep 16, 2019

sijie commented Sep 16, 2019

eolivelli commented Sep 17, 2019

eolivelli commented Sep 18, 2019

eolivelli commented Sep 4, 2019 •

edited by sijie

Loading

tuteng commented Sep 5, 2019 •

edited

Loading

tuteng Sep 14, 2019 •

edited

Loading