Skip to content

fix(proxy): prevent file descriptor leakage#384

Merged
freshtonic merged 1 commit intomainfrom
fix/file-descriptor-leaks
Mar 25, 2026
Merged

fix(proxy): prevent file descriptor leakage#384
freshtonic merged 1 commit intomainfrom
fix/file-descriptor-leaks

Conversation

@freshtonic
Copy link
Contributor

@freshtonic freshtonic commented Mar 25, 2026

The Proxy was not closing connections correctly which lead to a file descriptor leak which manifested itself as an inability to connect the Postgres database during a test run.

This issue would have been present in production but went unnoticed because:

  1. In customer infrastructure connections to the Proxy would likely be pooled.
  2. Compared to customers using the JS SDK there are not many using Proxy in production.

Root Cause

The ChannelWriter struct held a sender field for its own channel, creating a circular dependency:

The receiver.recv() loop waits for ALL senders to be dropped before returning None.

But one of those senders was owned by the ChannelWriter itself: This meant the channel could never close because it was waiting for itself to drop!

The Solution

Two changes were needed:

channel_writer.rs:49 - Drop the ChannelWriter's own sender at the start of the receive() method:

  drop(self.sender);  // Break the circular dependency

handler.rs:157-158 - Explicitly drop frontend/backend after try_join (good practice, though they'll drop when handler returns anyway):

  drop(frontend);
  drop(backend);

Additionally, the integration tests were creating a new database connection for every operation (clear, insert, query), causing file descriptor exhaustion when running test suites. Each test created 4+ connections that would accumulate faster than the proxy's 60-second timeout could clean them up.

Changes:

  1. Added *_with_client() variants of all helper functions in common.rs:

    • clear_with_client()
    • insert_with_client(), insert_jsonb_with_client()
    • query_by_with_client(), query_by_params_with_client()
    • Existing simple_query_with_client() now used consistently
  2. Updated test files to reuse single connection per test:

    • select/jsonb_array_elements.rs - reduced from 12 to 3 connections
    • select/jsonb_path_query.rs - reduced from 20 to 5 connections
  3. Documented connection reuse pattern in common.rs with examples

Acknowledgment

By submitting this pull request, I confirm that CipherStash can use, modify, copy, and redistribute this contribution, under the terms of CipherStash's choice.

Summary by CodeRabbit

  • Bug Fixes

    • Improved connection lifecycle and shutdown to avoid resource exhaustion and ensure orderly writer shutdown.
    • Added clearer tracing/logging around connection and task lifecycle.
  • Tests

    • Updated integration tests to use explicit, reusable client connections for stability.
  • Chores

    • Refactored helpers to support client-scoped operations and centralized connection handling.

@freshtonic freshtonic requested a review from tobyhede March 25, 2026 02:58
@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d53bda0d-8a20-4777-b248-45fe40a11642

📥 Commits

Reviewing files that changed from the base of the PR and between bc2ac19 and fdc3bd6.

📒 Files selected for processing (5)
  • packages/cipherstash-proxy-integration/src/common.rs
  • packages/cipherstash-proxy-integration/src/select/jsonb_array_elements.rs
  • packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs
  • packages/cipherstash-proxy/src/connect/channel_writer.rs
  • packages/cipherstash-proxy/src/postgresql/handler.rs
✅ Files skipped from review due to trivial changes (1)
  • packages/cipherstash-proxy-integration/src/select/jsonb_array_elements.rs
🚧 Files skipped from review as they are similar to previous changes (4)
  • packages/cipherstash-proxy/src/connect/channel_writer.rs
  • packages/cipherstash-proxy/src/postgresql/handler.rs
  • packages/cipherstash-proxy-integration/src/common.rs
  • packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs

📝 Walkthrough

Walkthrough

Added client-scoped DB helpers for connection reuse in integration tests and updated tests to use TLS-backed Client references. Adjusted channel writer to drop its sender, flush and shutdown the writer on exit. Modified the handler to spawn-and-await the channel-writer task and explicitly drop channels to drive orderly shutdown.

Changes

Cohort / File(s) Summary
Database connection helpers
packages/cipherstash-proxy-integration/src/common.rs
Added *_with_client variants: clear_with_client, query_by_with_client, query_by_params_with_client, insert_with_client, insert_jsonb_with_client. Refactored existing no-client helpers to create a single Client and delegate to the new _with_client functions.
Integration tests — JSONB select
packages/cipherstash-proxy-integration/src/select/jsonb_array_elements.rs, packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs
Tests now create a TLS Client via connect_with_tls(PROXY), use clear_with_client/insert_jsonb_with_client, and pass &Client into helper functions which call the new *_with_client query helpers. Imports updated accordingly.
Channel writer lifecycle
packages/cipherstash-proxy/src/connect/channel_writer.rs
ChannelWriter::receive() logs lifecycle events, drops its own sender handle before awaiting, breaks on write_all errors, and performs explicit flush() and shutdown() with error logging at the end of the loop.
Handler shutdown sequencing
packages/cipherstash-proxy/src/postgresql/handler.rs
Spawned channel_writer.receive() is stored in channel_writer_task and awaited after try_join!. Handler explicitly drops frontend and backend senders to trigger closure and logs if the channel-writer task panics; returns result? from try_join!.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler
    participant Frontend as Frontend Channel
    participant Backend as Backend Channel
    participant ChannelWriter
    participant Writer as IO Writer

    rect rgba(100, 150, 200, 0.5)
    Note over Handler,ChannelWriter: Previous (fire-and-forget)
    Handler->>+ChannelWriter: spawn receive()
    Handler->>Frontend: run client_to_server
    Handler->>Backend: run server_to_client
    Handler->>Handler: try_join! completes
    Handler-->>Client: return
    Note over ChannelWriter: May still be running\nNo guaranteed orderly shutdown
    end

    rect rgba(150, 200, 100, 0.5)
    Note over Handler,ChannelWriter: New (explicit await & cleanup)
    Handler->>+ChannelWriter: channel_writer_task = spawn receive()
    ChannelWriter->>ChannelWriter: drop sender handle
    Handler->>Frontend: run client_to_server
    Handler->>Backend: run server_to_client
    Handler->>Handler: try_join! completes
    Handler->>Frontend: drop (signal closure)
    Handler->>Backend: drop (signal closure)
    Frontend-->>ChannelWriter: channel closes
    ChannelWriter->>Writer: flush()
    ChannelWriter->>Writer: shutdown()
    ChannelWriter-->>Handler: task completes
    Handler->>Client: return with result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • tobyhede
  • coderdan
  • auxesis

Poem

🐇 I nibble code in evening light,
Clients shared and sockets right,
No loose handles left to roam,
Writers sleep and find their home,
Hoppy shutdowns — tests feel light!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(proxy): prevent file descriptor leakage' directly and clearly identifies the primary issue being fixed, matching the substantive changes across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/file-descriptor-leaks

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@freshtonic
Copy link
Contributor Author

@tobyhede what should we do with this semi-related PR? #379

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
packages/cipherstash-proxy/src/postgresql/handler.rs (1)

272-278: Consider semantics: task panic is logged but doesn't affect return value.

If channel_writer_task panics, the error is logged but handler() will still return Ok(()) if the protocol handling succeeded. This is likely intentional (protocol result takes precedence over cleanup failures), but worth confirming this is the desired behavior.

If a panic should propagate as an error, you could change this to:

📝 Optional: Propagate panic as error
-    if let Err(err) = channel_writer_task.await {
-        error!(
-            client_id,
-            msg = "Channel writer task panicked",
-            error = ?err
-        );
-    }
+    channel_writer_task.await.map_err(|err| {
+        error!(
+            client_id,
+            msg = "Channel writer task panicked",
+            error = ?err
+        );
+        Error::from(ProtocolError::InternalError)
+    })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cipherstash-proxy/src/postgresql/handler.rs` around lines 272 - 278,
The current code logs a panic from channel_writer_task but still lets handler()
return Ok(()), which may hide cleanup failures; decide whether panics in
channel_writer_task should fail the handler: if you want to propagate the panic,
change the channel_writer_task.await handling in handler() to convert the
JoinError into a handler::Error and return Err(...) (e.g. map the
Err(join_error) to an appropriate Error variant and return early), otherwise
make the intent explicit by adding a comment near channel_writer_task and the
error! call indicating that cleanup panics are intentionally non-fatal.
packages/cipherstash-proxy-integration/src/common.rs (1)

3-37: Document the Docker prerequisite alongside the connection-reuse guidance.

Nice addition. Consider adding one short note that these integration helpers assume Docker-backed test services, so local host services aren’t accidentally used in ad-hoc runs.

As per coding guidelines: Integration tests should use Docker containers for reproducible environments.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cipherstash-proxy-integration/src/common.rs` around lines 3 - 37,
Add a short note to the module doc comment alongside the connection-reuse
guidance stating that the integration helpers assume Docker-backed test services
(so local host services shouldn’t be used for ad-hoc runs); place this sentence
near the top where the pattern examples live (near the connect_with_tls,
clear_with_client and *_with_client discussion) and reference the project
guideline “Integration tests should use Docker containers for reproducible
environments.” to make the prerequisite explicit.
packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs (1)

34-125: Optional: extract repeated test setup into one helper.

Each test repeats connect/clear/insert; pulling that into a tiny setup helper would reduce duplication.

Refactor sketch
+    async fn setup_jsonb(client: &Client) {
+        clear_with_client(client).await;
+        insert_jsonb_with_client(client).await;
+    }
+
     #[tokio::test]
     async fn select_jsonb_path_query_number() {
         trace();
         let client = connect_with_tls(PROXY).await;
-
-        clear_with_client(&client).await;
-        insert_jsonb_with_client(&client).await;
+        setup_jsonb(&client).await;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs` around
lines 34 - 125, Tests duplicate the same three setup steps
(connect_with_tls(PROXY), clear_with_client, insert_jsonb_with_client) across
select_jsonb_* tests; extract them into a single async helper (e.g.,
setup_jsonb_test or init_client_with_jsonb) that performs
connect_with_tls(PROXY).await, clear_with_client(&client).await,
insert_jsonb_with_client(&client).await and returns the client, then update each
test (select_jsonb_path_query_number, select_jsonb_path_query_string,
select_jsonb_path_query_value, select_jsonb_path_query_with_unknown,
select_jsonb_path_query_with_alias) to call that helper at the start and use the
returned client variable instead of repeating the three calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/cipherstash-proxy-integration/src/common.rs`:
- Around line 3-37: Add a short note to the module doc comment alongside the
connection-reuse guidance stating that the integration helpers assume
Docker-backed test services (so local host services shouldn’t be used for ad-hoc
runs); place this sentence near the top where the pattern examples live (near
the connect_with_tls, clear_with_client and *_with_client discussion) and
reference the project guideline “Integration tests should use Docker containers
for reproducible environments.” to make the prerequisite explicit.

In `@packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs`:
- Around line 34-125: Tests duplicate the same three setup steps
(connect_with_tls(PROXY), clear_with_client, insert_jsonb_with_client) across
select_jsonb_* tests; extract them into a single async helper (e.g.,
setup_jsonb_test or init_client_with_jsonb) that performs
connect_with_tls(PROXY).await, clear_with_client(&client).await,
insert_jsonb_with_client(&client).await and returns the client, then update each
test (select_jsonb_path_query_number, select_jsonb_path_query_string,
select_jsonb_path_query_value, select_jsonb_path_query_with_unknown,
select_jsonb_path_query_with_alias) to call that helper at the start and use the
returned client variable instead of repeating the three calls.

In `@packages/cipherstash-proxy/src/postgresql/handler.rs`:
- Around line 272-278: The current code logs a panic from channel_writer_task
but still lets handler() return Ok(()), which may hide cleanup failures; decide
whether panics in channel_writer_task should fail the handler: if you want to
propagate the panic, change the channel_writer_task.await handling in handler()
to convert the JoinError into a handler::Error and return Err(...) (e.g. map the
Err(join_error) to an appropriate Error variant and return early), otherwise
make the intent explicit by adding a comment near channel_writer_task and the
error! call indicating that cleanup panics are intentionally non-fatal.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: becdf982-ad36-436a-9796-08d4f3cdf867

📥 Commits

Reviewing files that changed from the base of the PR and between 6906b18 and bc2ac19.

📒 Files selected for processing (5)
  • packages/cipherstash-proxy-integration/src/common.rs
  • packages/cipherstash-proxy-integration/src/select/jsonb_array_elements.rs
  • packages/cipherstash-proxy-integration/src/select/jsonb_path_query.rs
  • packages/cipherstash-proxy/src/connect/channel_writer.rs
  • packages/cipherstash-proxy/src/postgresql/handler.rs

The Proxy was not closing connections correctly which lead to a file
descriptor leak which manifested itself as an inability to connect the
Postgres database during a test run.

This issue would have been present in production but went unnoticed
because:

1. In customer infrastructure connections to the Proxy would likely be
   pooled.
2. Compared to customers using the JS SDK there are not many using Proxy
   in production.

Root Cause
==========

The ChannelWriter struct held a sender field for its own channel,
creating a circular dependency:

The receiver.recv() loop waits for ALL senders to be dropped before
returning None.

But one of those senders was owned by the ChannelWriter itself: This
meant the channel could never close because it was waiting for itself to
drop!

The Solution
============

Two changes were needed:

channel_writer.rs:49 - Drop the ChannelWriter's own sender at the start
of the receive() method:

```rust
  drop(self.sender);  // Break the circular dependency
```

handler.rs:157-158 - Explicitly drop frontend/backend after try_join
(good practice, though they'll drop when handler returns anyway):

```rust
  drop(frontend);
  drop(backend);
```

Additionally, the integration tests were creating a new database
connection for every operation (clear, insert, query), causing file
descriptor exhaustion when running test suites. Each test created 4+
connections that would accumulate faster than the proxy's 60-second
timeout could clean them up.

Changes:

1. Added `*_with_client()` variants of all helper functions in common.rs:
   - clear_with_client()
   - insert_with_client(), insert_jsonb_with_client()
   - query_by_with_client(), query_by_params_with_client()
   - Existing simple_query_with_client() now used consistently

2. Updated test files to reuse single connection per test:
   - select/jsonb_array_elements.rs - reduced from 12 to 3 connections
   - select/jsonb_path_query.rs - reduced from 20 to 5 connections

3. Documented connection reuse pattern in common.rs with examples
@freshtonic freshtonic force-pushed the fix/file-descriptor-leaks branch from bc2ac19 to fdc3bd6 Compare March 25, 2026 03:43
@freshtonic freshtonic merged commit 0a412c9 into main Mar 25, 2026
6 checks passed
@freshtonic freshtonic deleted the fix/file-descriptor-leaks branch March 25, 2026 04:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants