Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proxy] Proxy stops working for proxying Broker connections while Admin API proxying keeps working #14075

Closed
lhotari opened this issue Jan 31, 2022 · 1 comment · Fixed by #14078
Assignees
Labels
area/proxy type/bug The PR fixed a bug or issue reported a bug

Comments

@lhotari
Copy link
Member

lhotari commented Jan 31, 2022

Describe the bug

Proxy stops working for proxying Broker connections while Admin API proxying keeps working.
The proxy logs are filled with this type of warnings:

[pulsar-proxy-io-2-1] WARN  org.apache.pulsar.client.impl.ConnectionPool - Failed to open connection to pulsar-dev-broker/172.20.4.120:6650 : io.netty.channel.AbstractChannel$AnnotatedConnectException: connect(.      .) failed: Cannot assign requested address: pulsar-dev-broker.pulsar.svc.cluster.local/172.20.4.120:6650

The "Cannot assign requested address" error message is a sign of a port exhaustion issue where there are many connections open, possibly hanging.

To Reproduce

The steps to reproduce are not known.

Expected behavior

Pulsar Proxy should contain timeout handling so that idling/hanging connections get cleaned up.

@lhotari lhotari added type/bug The PR fixed a bug or issue reported a bug area/proxy labels Jan 31, 2022
@lhotari lhotari self-assigned this Jan 31, 2022
@lhotari lhotari changed the title [Proxy] Proxy stops working for proxying Broker connections while Admin Admin proxying keeps working [Proxy] Proxy stops working for proxying Broker connections while Admin API proxying keeps working Jan 31, 2022
@lhotari
Copy link
Member Author

lhotari commented Jan 31, 2022

I'm working on a fix

lhotari added a commit to lhotari/pulsar that referenced this issue Jan 31, 2022
Fixes apache#14075
Fixes apache#13923

- Optimize the proxy connection to fail-fast if the target broker isn't active
  - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached.
  - Pulsar client will retry connecting after a back off timeout

- Fixes the race condition in the Pulsar Proxy when opening a connection since that
  could lead to invalid states and hanging connections

- Add connect timeout handling to proxy connection
  - default to 10000 ms which is also the default of client's connect timeout

- Add read timeout handling to incoming connection and proxied connection
  - the ping/pong keepalive messages should prevent the timeout happening,
    however it's possible that the connection is in a state where keepalives aren't happening.
    - therefore it's better to have a connection level read timeout prevent broken connections left
      hanging in the proxy
lhotari added a commit to lhotari/pulsar that referenced this issue Jan 31, 2022
Fixes apache#14075
Fixes apache#13923

- Optimize the proxy connection to fail-fast if the target broker isn't active
  - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached.
  - Pulsar client will retry connecting after a back off timeout

- Fixes the race condition in the Pulsar Proxy when opening a connection since that
  could lead to invalid states and hanging connections

- Add connect timeout handling to proxy connection
  - default to 10000 ms which is also the default of client's connect timeout

- Add read timeout handling to incoming connection and proxied connection
  - the ping/pong keepalive messages should prevent the timeout happening,
    however it's possible that the connection is in a state where keepalives aren't happening.
    - therefore it's better to have a connection level read timeout prevent broken connections left
      hanging in the proxy
lhotari added a commit to lhotari/pulsar that referenced this issue Jan 31, 2022
Fixes apache#14075
Fixes apache#13923

- Optimize the proxy connection to fail-fast if the target broker isn't active
  - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached.
  - Pulsar client will retry connecting after a back off timeout

- Fixes the race condition in the Pulsar Proxy when opening a connection since that
  could lead to invalid states and hanging connections

- Add connect timeout handling to proxy connection
  - default to 10000 ms which is also the default of client's connect timeout

- Add read timeout handling to incoming connection and proxied connection
  - the ping/pong keepalive messages should prevent the timeout happening,
    however it's possible that the connection is in a state where keepalives aren't happening.
    - therefore it's better to have a connection level read timeout prevent broken connections left
      hanging in the proxy
lhotari added a commit to lhotari/pulsar that referenced this issue Feb 4, 2022
Fixes apache#14075
Fixes apache#13923

- Optimize the proxy connection to fail-fast if the target broker isn't active
  - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached.
  - Pulsar client will retry connecting after a back off timeout

- Fixes the race condition in the Pulsar Proxy when opening a connection since that
  could lead to invalid states and hanging connections

- Add connect timeout handling to proxy connection
  - default to 10000 ms which is also the default of client's connect timeout

- Add read timeout handling to incoming connection and proxied connection
  - the ping/pong keepalive messages should prevent the timeout happening,
    however it's possible that the connection is in a state where keepalives aren't happening.
    - therefore it's better to have a connection level read timeout prevent broken connections left
      hanging in the proxy
lhotari added a commit to lhotari/pulsar that referenced this issue Feb 7, 2022
Fixes apache#14075
Fixes apache#13923

- Optimize the proxy connection to fail-fast if the target broker isn't active
  - This reduces the number of hanging connections when unavailable brokers aren't unnecessarily attempted to be reached.
  - Pulsar client will retry connecting after a back off timeout

- Fixes the race condition in the Pulsar Proxy when opening a connection since that
  could lead to invalid states and hanging connections

- Add connect timeout handling to proxy connection
  - default to 10000 ms which is also the default of client's connect timeout

- Add read timeout handling to incoming connection and proxied connection
  - the ping/pong keepalive messages should prevent the timeout happening,
    however it's possible that the connection is in a state where keepalives aren't happening.
    - therefore it's better to have a connection level read timeout prevent broken connections left
      hanging in the proxy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant