Skip to content

Add reader database cluster support#1417

Merged
shangyian merged 13 commits intoDataJunction:mainfrom
shangyian:reader-writer
Jun 21, 2025
Merged

Add reader database cluster support#1417
shangyian merged 13 commits intoDataJunction:mainfrom
shangyian:reader-writer

Conversation

@shangyian
Copy link
Copy Markdown
Collaborator

@shangyian shangyian commented Jun 20, 2025

Summary

This PR adds support for using a reader database cluster (for the metadata database) for read-only operations, along with the default writer database cluster for all other operations. It improves scalability by offloading read traffic from the writer database, and it sets up support for read/write-aware session management across both REST and GraphQL interfaces. The reader cluster configs are optional, so users who don't have a read replica setup can just set the writer cluster.

  • For REST, we automatically detect when there is a GET request and route to the reader session.
  • For GraphQL, we automatically detect when a request is not a mutation and route to the reader session (at the moment this is all GraphQL requests since we don't support mutations).

Settings

The settings for configuring a metadata database are now done via the nested writer_db and reader_db fields, each with a DatabaseConfig model to support detailed connection settings like pool_size, max_overflow, connect_timeout, etc.

.env files should be updated to include READER_DB__URI and other nested settings if reader DB is used:

# Writer DB (required)
WRITER_DB__URI=postgresql+psycopg://dj:dj@postgres_metadata:5432/dj
WRITER_DB__POOL_SIZE=20
WRITER_DB__MAX_OVERFLOW=20
WRITER_DB__POOL_TIMEOUT=10
WRITER_DB__CONNECT_TIMEOUT=5
WRITER_DB__POOL_PRE_PING=true
WRITER_DB__ECHO=false
WRITER_DB__KEEPALIVES=1
WRITER_DB__KEEPALIVES_IDLE=30
WRITER_DB__KEEPALIVES_INTERVAL=10
WRITER_DB__KEEPALIVES_COUNT=5

# Reader DB (optional)
READER_DB__URI=postgresql+psycopg://dj:dj@postgres_metadata:5432/dj
READER_DB__POOL_SIZE=10
READER_DB__MAX_OVERFLOW=10
READER_DB__POOL_TIMEOUT=5
READER_DB__CONNECT_TIMEOUT=5
READER_DB__POOL_PRE_PING=true
READER_DB__ECHO=false
READER_DB__KEEPALIVES=1
READER_DB__KEEPALIVES_IDLE=30
READER_DB__KEEPALIVES_INTERVAL=10
READER_DB__KEEPALIVES_COUNT=5

Test Plan

Locally. Modified unit tests.

  • PR has an associated issue: #
  • make check passes
  • make test shows 100% unit test coverage

Deployment Plan

N/A

@netlify
Copy link
Copy Markdown

netlify bot commented Jun 20, 2025

Deploy Preview for thriving-cassata-78ae72 canceled.

Name Link
🔨 Latest commit 29a4326
🔍 Latest deploy log https://app.netlify.com/projects/thriving-cassata-78ae72/deploys/6856622cf61b6c0008b3c236

Copy link
Copy Markdown
Member

@agorajek agorajek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I think it will make our life better.

getattr(defn, "operation", None) == OperationType.MUTATION
for defn in document.definitions
)
except (GraphQLError, json.JSONDecodeError):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of the above lines could trigger a GraphQLError ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user inputs an invalid GraphQL query, then calling parse(query) could result in a GraphQLError.


request._receive = receive # type: ignore

# Set up the database session based on whether it's a mutation or not
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool


uri: str
pool_size: int = 20
max_overflow: int = 20
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may still want to do -1 here for better sleep, no?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this default to 100

@shangyian shangyian marked this pull request as ready for review June 21, 2025 14:38
@shangyian shangyian merged commit 87ed92a into DataJunction:main Jun 21, 2025
17 checks passed
@shangyian shangyian deleted the reader-writer branch June 21, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants