Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for HerdDB database #183

Closed
wants to merge 6 commits into from

Conversation

eolivelli
Copy link
Contributor

@eolivelli eolivelli commented Sep 4, 2019

Master Issue: #177

This patch introduces support for HerdDB as storage:

  • new "init database" script
  • fix references to 'timestamp' column in SQL, use MySQL escape syntax as 'timestamp' is a reserved word in HerdDB (that in turn uses Apache Calcite as SQL parser)
  • fix SQL in "delete" statements to make Apache Calcite happy, as it forbids adding/subtracting unknown datatype (the '?' JDBC parameter is of type UNKNOWN for Calcite at SQL parsing time)
  • fixed a query with a "GROUP BY" clause that was not grouping all of the expected columns (Apache Calcite treats that fact as an error)
  • added a little example to boot HerdDB in memory-only mode (tested manually on a standalone setup)

It is possible to boot the server inside the application for persistent storage or even for replication.
In case of replication it uses ZooKeeper and BookKeeper.

Side note:

  • I had to add some "Maven repositories" to make the build pass locally on my machine

In order to test this branch:

  1. change the src/main/resources/application.properties file and check for comments about HerdDB
  2. run pulsar manager as usual

@tuteng
Copy link
Member

tuteng commented Sep 5, 2019

@eolivelli Thank you very much. Is it necessary to create a new file herddb-schema.sql for herdDB? is it fully compatible with MySQL?

But I threw the following exception when running locally. What do I need to do:

FAILURE: Build failed with an exception.

* What went wrong:
Could not resolve all files for configuration ':compileClasspath'.
> Could not find org.herddb:herddb-jdbc:0.12.0-SNAPSHOT.
  Searched in the following locations:
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/maven-metadata.xml
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.pom
    - file:/Users/tuteng/.m2/repository/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.jar
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/maven-metadata.xml
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.pom
    - https://jcenter.bintray.com/org/herddb/herddb-jdbc/0.12.0-SNAPSHOT/herddb-jdbc-0.12.0-SNAPSHOT.jar
  Required by:
      project :

Do I need to change a version?

@eolivelli eolivelli force-pushed the fix/herddb-demo branch 2 times, most recently from 932a339 to 3a563a6 Compare September 5, 2019 08:06
@eolivelli
Copy link
Contributor Author

@tuteng thanks for taking a look.
I have reworked the patch and now it seems to work at 100%.

I am not an user (yet) of Pulsar Manager so I can't tell if every of the features are working as expected.
If an expert of Pulsar Manager could take a tour it would be great.

cc @sijie

@eolivelli eolivelli changed the title This is only a playground for trying HerdDB Add support for HerdDB database Sep 5, 2019
@sijie
Copy link
Member

sijie commented Sep 5, 2019

@eolivelli - @tuteng is the expert of Pulsar Manager. He can help review your code change.

@eolivelli
Copy link
Contributor Author

Cool.
I will be happy to provide working examples for production ready setups.

I still don't have it clear how you a deploying this application.
Is all the state of the application on the sql database? Where do you store sessions?
Is there any way to start more then one inatance of the backend? I don't know SpringBoot at all.

If we start the DB inside the same process you can still run many instances of the backend with HerdDB.

Another point: as far as I can see one Pulsar Manager is able to connect to more than one Pulsar cluster, but there is a 'bookie' option, what is that for?

If you are managing a single Pulsar cluster, with a ZK cluster and BK cluster you can use that one to support the Herd DB instance

The best setup for standalone (single machine) backend I think in HerdDB case is to run the server inside the backend and provide a directory for data, metadata and journal, without Bookkeeper/ZK.

@sijie
Copy link
Member

sijie commented Sep 5, 2019

I still don't have it clear how you a deploying this application.

Pulsar manager comprises of frontend and backend. The frontend is a vuejs application rendering the management and monitoring pages. The backend is a springboot application. You can co-run frontend and backend in one machine or one containerr.

Is all the state of the application on the sql database?

Currently we store very minimal informations in SQL database, plus the collected metrics. Most of the restful requests are forwarded back to the real Pulsar brokers.

Is there any way to start more then one inatance of the backend?

Yes.

If we start the DB inside the same process you can still run many instances of the backend with HerdDB.

I am not sure how does it. I would image HerdDB is more like a mysql instance. Then many backend instances will connect to the mysql / herdDB?

as far as I can see one Pulsar Manager is able to connect to more than one Pulsar cluster, but there is a 'bookie' option, what is that for?

a Pulsar cluster is comprised of brokers, bookies, function workers and many other components. The cluster management and monitoring provides the capability to manage and monitor all the components.

@eolivelli
Copy link
Contributor Author

Thanks for your clarification @sijie.

I will send links to docs for available deployment options for HerdDB

@eolivelli
Copy link
Contributor Author

@tuteng
we are going to release HerdDB 0.12.0-SNAPSHOT soon.
I will update this patch and it will download HerdDB dependencies from Maven Central.

Do you have time for a review ?

@tuteng
Copy link
Member

tuteng commented Sep 10, 2019

@tuteng
we are going to release HerdDB 0.12.0-SNAPSHOT soon.
I will update this patch and it will download HerdDB dependencies from Maven Central.

Do you have time for a review ?

Ok, no problem.

@eolivelli eolivelli marked this pull request as ready for review September 10, 2019 15:43
- add HerdDB dependency
- change queries in order to be compliant with MySQL dialect of Apache Calcite
- add example configuration for HerdDB
@eolivelli
Copy link
Contributor Author

@tuteng this patch is ready for review now.
HerdDB is disabled by default, I kept SQLLite

@eolivelli
Copy link
Contributor Author

@tuteng I did another pass, trying to click on every page of the web application.
I found a bunch of issues and committed the fixes

"topic IN <foreach collection='topicList' item='topic' open='(' separator=',' close=')'> #{topic} </foreach>" +
"GROUP BY cluster, persistent, topic" +
"GROUP BY environment, cluster, tenant, namespace, persistent, topic, `timestamp` " +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "group by" clause was incomplete and Apache Calcite wasn't happy

Copy link
Member

@tuteng tuteng Sep 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it here streamnative#190

@eolivelli
Copy link
Contributor Author

@tuteng HerdDB standalone, embedded (same process of the backend) can handle big databases as this is its primary usecase in production in other projects.

You can evaluate to switch to HerdDB for tests and single instance deployments instead of SQLLite.

Supporting PostGre is good as it is widely used by many companies and the license is better than MySQL.

HerdDB can again be a good alternative to have a replicated (high availability) database, it is really easy to setup replication if you already have a ZK cluster and you can even use your Bookies from Pulsar

@eolivelli
Copy link
Contributor Author

@sijie I will rebase this patch.

@tuteng
Copy link
Member

tuteng commented Sep 16, 2019

@eolivelli I have done some tests locally and all the features are running well. Thank you very much for your contribution.

However, when I started the service, the log encountered the following errors about HerdDB. I don't know if they have any influence. Can you take a look at them for me?

image

I will consider using it as a test database. Before that, I will first learn about the use of this database.

@eolivelli
Copy link
Contributor Author

The 'ERROR' lines are logged as level SEVERE.
This is expected although it is not very nice with your logging configuration.
I will file an issue

@sijie
Copy link
Member

sijie commented Sep 16, 2019

I will consider using it as a test database.

+1 that's a good idea to start using it as a test database. We can promote HerdDB as one of the databases to use in the documentation.

@sijie
Copy link
Member

sijie commented Sep 16, 2019

@tuteng @eolivelli if this pull request is almost ready, we can probably move forward with it so that we can start using HerdDB in the tests.

@eolivelli
Copy link
Contributor Author

@sijie awesome !

I will rebase today

@sijie sijie mentioned this pull request Sep 18, 2019
@eolivelli
Copy link
Contributor Author

Superseded by #194

@eolivelli eolivelli closed this Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants