Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgresql support #1249

Merged
merged 77 commits into from
Jul 1, 2020
Merged

Postgresql support #1249

merged 77 commits into from
Jul 1, 2020

Conversation

rorp
Copy link
Contributor

@rorp rorp commented Dec 10, 2019

The main goal of this PR is to improve fault tolerance of eclair nodes in enterprise settings, as well as to "standardize" backup procedures. As a bonus point it could simplify scaling out eclair nodes in the future.

@rorp rorp changed the title Postgresql support [POC] Postgresql support Dec 10, 2019
@pm47
Copy link
Member

pm47 commented Dec 11, 2019

One of the main complications with using a remote db backend with eclair is how you prevent two instances of eclair running at the same time and using the same db, which would be catastrophic. Doable, but not trivial. There has to be a locking mechanism somewhere.

@pm47 pm47 mentioned this pull request Dec 11, 2019
@rorp
Copy link
Contributor Author

rorp commented Dec 11, 2019

There has to be a locking mechanism somewhere.

Exactly! That's why is just a POC. We are in fact discussing the locking mechanism atm. Unfortunately, it's not that easy to implement it for PostgresSQL, comparing to the sqlite implementation. But we have some ideas, which I'm going to implement soon.

private def acquireExclusiveTableLock(lockTimeout: FiniteDuration)(implicit connection: Connection): Unit = {
using(connection.createStatement()) { statement =>
statement.executeUpdate(s"SET lock_timeout TO '${lockTimeout.toSeconds}s'")
statement.executeUpdate(s"LOCK TABLE $LockTable IN ACCESS EXCLUSIVE MODE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n1bor
Copy link
Contributor

n1bor commented Dec 20, 2019

Few thoughts on this:

  1. Not sure I agree with @pm47 concern about 2 instances connecting. Only going to use postgres in a server type environment with firewall rules, numerous controls, start/stop scripts etc... which would check. But added safety at the DB level would be good.
  2. Brings up what to do if database goes down. My thinking is it should drop all peer connections and go into a watch only mode. In effect a watchtower for itself. So just listen to bitcoind for any tx's for past commitments - and publish commitments if needed. Raised this before as had a corruption of sqlite that caused a total mess as eclair carried on running but could not update the channels db. Should eclair exit on SQL Disk errors? #1171
  3. Thing should have option to run with non-critical tables remaining in sqlite - i.e. those you can recreate from gossip messages. As think most would run like this to reduce load on backend db.
  4. Need a way to run all the tests against postgress. Even if only run manually on occasion - and to validate server install/version is OK.

But in general would be great. Am running a node on AWS and so with this could have a synchronous postgres replica in another data centre. So even if there was total loss of an AWS data centre I could (theoretically assume everyone else is not doing the same ;)) restart the node in perfect condition in a few mins - and not lose any channels or money.

And would mean we could move away from the "dump the whole database every 15mins" approach which is not very scale-able over time.

case CURRENT_VERSION =>
statement.executeUpdate("CREATE TABLE IF NOT EXISTS local_channels (channel_id TEXT NOT NULL PRIMARY KEY, data BYTEA NOT NULL, is_closed BOOLEAN NOT NULL DEFAULT FALSE)")
statement.executeUpdate("CREATE TABLE IF NOT EXISTS htlc_infos (channel_id TEXT NOT NULL, commitment_number TEXT NOT NULL, payment_hash TEXT NOT NULL, cltv_expiry BIGINT NOT NULL, FOREIGN KEY(channel_id) REFERENCES local_channels(channel_id))")
statement.executeUpdate("CREATE INDEX IF NOT EXISTS htlc_infos_idx ON htlc_infos(channel_id, commitment_number)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index on is_closed? As used for listLocalChannels.

@rorp
Copy link
Contributor Author

rorp commented Jan 2, 2020

  1. Not sure I agree with @pm47 concern about 2 instances connecting. Only going to use postgres in a server type environment with firewall rules, numerous controls, start/stop scripts etc... which would check. But added safety at the DB level would be good.

Even though there are a bunch of security settings outside Eclair, there is always a way to shoot in the foot.

  1. Brings up what to do if database goes down. My thinking is it should drop all peer connections and go into a watch only mode. In effect a watchtower for itself. So just listen to bitcoind for any tx's for past commitments - and publish commitments if needed. Raised this before as had a corruption of sqlite that caused a total mess as eclair carried on running but could not update the channels db. Should eclair exit on SQL Disk errors? #1171

I think that's a generic problem unrelated to the database type. It should be addressed in a separate PR.

  1. Thing should have option to run with non-critical tables remaining in sqlite - i.e. those you can recreate from gossip messages. As think most would run like this to reduce load on backend db.

We are going to do some performance testing to see how slow Postgres is comparing to Sqlite.

  1. Need a way to run all the tests against postgress. Even if only run manually on occasion - and to validate server install/version is OK.

It's already there. Probably the naming of the tests were misleading, so I renamed them.

Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.

I'd rather rename the psql package to postgres (or even pg if we really want to make it short). psql isn't a common abbreviation of postgres AFAIK. I'm aware the postgres shell is named psql (but it's even more confusing) and it's close to pl/sql which is also a different thing. Same for classes e.g. PsqlNetworkDb could be PgNetworkDb.

One thing that bothers me a little bit is using blocking db calls from within actors. That makes sense when using an embedded db such as sqlite, but is a bit more problematic if we are dealing with a remote db server with network latency etc. I'm not saying it is necessarily a big blocker (pun intended), but it's an important thing to keep in mind.

@@ -263,5 +273,11 @@
<version>1.5.9</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.opentable.components</groupId>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now use three different way of spawning binaries in the tests (exec call for bitcoind, docker for electrum, and this specialized lib). FWIW in an other project we do use docker to spawn a postgres instance in test. Well, probably doesn't matter much.

eclair-core/src/main/resources/reference.conf Outdated Show resolved Hide resolved
eclair-core/src/main/resources/reference.conf Show resolved Hide resolved
Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯
I'll let @pm47 add his review and if we're good to go, we'll include this in our next release.

docs/PostgreSQL.md Outdated Show resolved Hide resolved
Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

eclair-core/src/main/scala/fr/acinq/eclair/Setup.scala Outdated Show resolved Hide resolved
Comment on lines +65 to +75
sealed trait TestDatabases {
val connection: Connection
def network(): NetworkDb
def audit(): AuditDb
def channels(): ChannelsDb
def peers(): PeersDb
def payments(): PaymentsDb
def pendingRelay(): PendingRelayDb
def getVersion(statement: Statement, db_name: String, currentVersion: Int): Int
def close(): Unit
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use the Databases trait? You can mix-in the close method if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I needed a sealed trait for exhaustive pattern matching
  2. Some of the tests need a fresh connection every time network, audit, etc gets called, but they are defined as val in Databases
  3. Databases require obtainExclusiveLock() to be implemented, but it's database specific and can't be tested in a generic tests.

@t-bast
Copy link
Member

t-bast commented Jun 30, 2020

I've made some fixes to the stats DB call in #1470, I think you can easily replicate the changes in the postgres implementation.

@rorp
Copy link
Contributor Author

rorp commented Jun 30, 2020

I've made some fixes to the stats DB call in #1470, I think you can easily replicate the changes in the postgres implementation.

Done

Copy link
Member

@t-bast t-bast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rorp!

@t-bast t-bast merged commit b63c4aa into ACINQ:master Jul 1, 2020
@pm47
Copy link
Member

pm47 commented Jul 1, 2020

🎉

@rorp rorp deleted the postgresql branch July 1, 2020 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants