Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create scaffolding for adding Postgres support #171

Merged
merged 12 commits into from
Jun 12, 2024

Conversation

domMayhew
Copy link
Contributor

Add 'postgres' flag and dummy Postgres module

Add 'postgres' flag to:

  1. make targets:
    1. make configure
    2. make /dist/bin/kupo
  2. kupo.cabal:
    1. Conditionally include the Postgres module vs the SQLite module
    2. Add cpp-options: -Dpostgres
  3. Kupo.App.Database:
    1. Import Postgres or SQLite module depending on flag
    2. Postgres module is currently a duplicate of the SQLite module

Also add additional files to make clean to cause the project to be recompiled after clean

@domMayhew domMayhew marked this pull request as draft April 25, 2024 20:05
Copy link
Member

@KtorZ KtorZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds about right so far. I take it that the current Postgres module is a copy of the SQLite at the moment? (It's written in the PR description 👍)

@domMayhew
Copy link
Contributor Author

domMayhew commented May 29, 2024

New work completed

  1. Create a DBPool type that abstracts the setup/instantiation of a DB, providing a few different types of connections (through a resource pool), and shutdown. This allows SQLite to create its two separate pools and use its locking mechanism to provide a priority ("long-lived") connection, without forcing Postgres to follow the same semantics. Postgres can create a single pool of connections for readers and writers, for example, and does not need to create a DB file on initial start up.
    1. The types of connections supported are simply the types that were already in use: a blocking connection, a connection that will fail if the max number of resources are already in use, and an "exclusive writer" connection (this is the "long-lived" connection).
  2. Rename WorkDir to DatabaseLocation and add a --postgres-url=URL option to its sum types. The alternative to this approach is to conditionally compile the WorkDir option, but this increases the number of places that conditional compilation is taking place. It also forces us to conditionally compile the parser for the option. By simply adding an extra sum type, we reduce the static type safety since we have to check at runtime that a --postgres-url has not been passed to an SQLite binary, but we significantly reduce the complexity of conditional compilation.
  3. Make minor modifications to test files to keep up with the changes.
    1. One significant change is that an in-memory DB created by DatabaseSpec is not created with a test-specific connection string. It is created with the same connection string as running instances of Kupo. Does this mean that if a test is run on a machine that has a running instance of Kupo there will be a conflict between the test in-memory DB and the actual in-memory DB?

Current Issues

  1. The Makefile currently fails to build the Postgres version because it depends on a the libpq system library. I am not sure what the best way to access this is. A couple options:
    1. As part of the configure stage, clone the PostgreSQL repo and, in the build phase, provide the libpq directory to cabal.
    2. Modify the flake to include libpq.
    3. Create a small wrapper around the flake to add libpq to the dev shell.
  2. The unit tests are failing for me, both in this branch and in master.

Work still to be done

  1. Write the Postgres module.
  2. Consider if the concurrency parameters need to change for Postgres (i.e., how many concurrent connections do we want to support?)
  3. Determine if the Postgres module should support a priority/long-lived connection like the SQL module does, or if that should just be a normal connection.
  4. Determine how to handle WriteOnly connections and database copying.
  5. Write tests to support both SQLite and Postgres compilations.
  6. Make any changes necessary to allow migrations to run on a PostgreSQL instance.

@domMayhew domMayhew marked this pull request as ready for review June 3, 2024 18:01
@KtorZ
Copy link
Member

KtorZ commented Jun 11, 2024

Rename WorkDir to DatabaseLocation and add a --postgres-url=URL option to its sum types. The alternative to this approach is to conditionally compile the WorkDir option, but this increases the number of places that conditional compilation is taking place.

I agree that the full alternative is somewhat a stretch, and that sounds good. To not confuse people though, I'd still suggest to hide the --postgres-url from the cli unless the flag is present using hidden. That would keep the code rather simple and gives the same illusion from an interface standpoint 😅.

Does this mean that if a test is run on a machine that has a running instance of Kupo there will be a conflict between the test in-memory DB and the actual in-memory DB?

No; because in-memory database using the :memory: connection strings are separate and lives in-process. There are ways to create shared in-memory databases with SQLite using connection flags provided by the open_v2 underlying method and this is what happens when one passes --in-memory as a flag:

InMemory Nothing ->
    ("file::kupo:?mode=memory&cache=shared", openFlags)

What's not obvious here is that the file name is :kupo:, and it identifies a named memory location. Using different names would yield different segmented memory locations. And this is why the test use either :memory:, which is actually a special one in SQLite that always create a separate in-memory location so tests don't even conflict with one another. Or, they specify their own in-memory location like:

            [ ( "in-memory"
              , \test ->
                    test (InMemory (Just "file::concurrent-read-write:?cache=shared&mode=memory"))
              )

This cannot conflict with a live running instance because users cannot actually provide an InMemory with Just a file location. Only InMemory Nothing. So the Just side is only available from test code.

The Makefile currently fails to build the Postgres version because it depends on a the libpq system library. I am not sure what the best way to access this is. A couple options:

It should be possible by using the -iog-full version of the devx nix shell instead of -iog, which includes postgreSQL (primarily for db-sync, but therefore, of good use to us!).

@KtorZ
Copy link
Member

KtorZ commented Jun 11, 2024

The unit tests are failing for me, both in this branch and in master.

How so ?

Add 'postgres' flag to:
1. `make` targets:
  1. `make configure`
  1. `make /dist/bin/kupo`
1. kupo.cabal:
  1. Conditionally include the Postgres module vs the SQLite module
  1. Add `cpp-options: -Dpostgres`
1. Kupo.App.Database: import Postgres or SQLite module depending on flag

Also add additional files to `make clean` to cause the project to be recompiled after `clean`
Previously, creating a SQLite DB file, creating read-only and read-write connection pools, and gatekeeping connection types (e.g., `ReadWrite`)
against the Kupo instance type (e.g., read only replica) was exposed by the `SQLite` module and performed in `Kupo`.

Now, those functions are encapsulated within the DB-specific modules to allow unique implementations. For instance, PostgreSQL does not have a DB file
and does not need separate pools for read-only and read-write connections.
Use a single data structure, `databaseLocation`, to describe either a work directory or in-memory when compiled for SQLite
or a remote URL when compiled for PostgreSQL. This provides strong type safety but comes at the cost of having to include a
argument parser in the DB modules, which is not really where it belongs. Alternatively, conditional compilation could be
added to the argument parsing module, but that also seems out of place. Ultimately, including all 3 alternatives in
`databaseLocation` and validating at runtime may be a cleaner solution, despite the loss of type safety.
Add a command line option to specify a remote URL as DB location instead of work directory
or in memory. This was added as another sum type to the existing option to prevent additional
conditional compilation and avoid the need to conditionally compile the option parser. This
comes at the expense of type safety, as each DB module throws a runtime error if an
incompatible DB Location is provided.

Trace logging of the max concurrency configuration was kept in `Kupo` rather than the DB modules
due to a cyclical dependency between `App/Configuration`, `SQLite`/`Postgres`, and `App/Database`.
Add a command line option to specify a remote URL as DB location instead of work directory
or in memory. This was added as another sum type to the existing option to prevent additional
conditional compilation and avoid the need to conditionally compile the option parser. This
comes at the expense of type safety, as each DB module throws a runtime error if an
incompatible DB Location is provided.

Trace logging of the max concurrency configuration was kept in `Kupo` rather than the DB modules
due to a cyclical dependency between `App/Configuration`, `SQLite`/`Postgres`, and `App/Database`.
Clean whole build directory so cabal actually rebuilds the binary, and include the postgres flag in configure stage
longestRollback
dbFile
(action InstallIndexesIfNotExist)
(withDatabaseBlocking dbPool) ReadOnly (action InstallIndexesIfNotExist)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes slightly alter the behavior. Before, the main background connection will not actually be part of the pool; mostly because there's no need: this connection never goes back to the pool; it's a long running connection that persists the entire life of the application.

It's not a big deal though as it will only consume one reader connection when setup as a read-only replica. It makes things easier down the line.

KtorZ and others added 5 commits June 11, 2024 12:26
  We cannot just use 'InMemory' for tests as this creates a shared in-memory database (each test therefore conflicting with one another). This is why the 'DatabaseFile' allows for passing an extra path to the InMemory variant. Not that having both 'DatabaseFile' and 'DatabaseLocation' feels somewhat redundant and just adds confusion overall. We shall perhaps consider getting rid of one.
  Ideally, we should just not create any resource pool when in read-only mode. But the impact on the API is sufficiently annoying to not be worth it. So we still create a read/write pool which simply remains idle for the entire application lifetime. Yet we must ensure that the pool size is >= 1, otherwise it crashes at runtime. Yet, we don't want to report any writer capabilities in log.
Including `newDBPoolFromFile` in the external API of `App/Database`
creates issues for the Postgres module, which does not have any
notion of a DB file. Instead, we add a `FilePath` to
`Configuration.InMemory` but do not expose ability to provide a
filepath to the command line. This allows us to provide a filepath
for testing purposes but does allow the user to supply one, all
while maintaining good abstraction boundaries.
Due to an issue with CPP, multiline string literals had to be replaced with
concatenated single line literals.

`internal` was used instead of `hidden` to hide the options from the
long help text as well as the short help.
@KtorZ KtorZ merged commit 5d8f52c into CardanoSolutions:master Jun 12, 2024
@KtorZ
Copy link
Member

KtorZ commented Jun 12, 2024

Merging this already as we've got a few things to do on top which might conflict; so to save everyone time, let's bring new changes as new PRs :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants