Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: keepalive or auto-reconnect #663

Open
spidercensus opened this issue Feb 9, 2017 · 4 comments
Open

Feature request: keepalive or auto-reconnect #663

spidercensus opened this issue Feb 9, 2017 · 4 comments

Comments

@spidercensus
Copy link
Contributor

I'm working to implement Salt proxy minions to Juniper devices in a customer network where idle-timeouts are configured for 5 minutes. This means that if we don't send something into the session before the 5 minutes are up, the session dies and a new proxy minion must be built.

I have tried adding the keepalive directive to my ssh_config file, but this had no effect.

Host * 
    ServerAliveInterval  

Based on the above, it seems like transport layer keepalive is not going to solve this problem. The Netconf spec currently does not contain a keepalive operation, and the IETF mailing list seems to have agreed not to create one: https://www.ietf.org/mail-archive/web/netconf/current/msg08888.html

I see three options for solving this problem

  1. No changes to the Netconf repo. Keepalives implemented entirely in application.
  2. Implement keepalive Device parameter and keepalive thread which executes some RPC on a set interval. I don't like this option because it will fill up device logs and induce CPU churn
  3. Add an auto_reconnect parameter to Device. This would allow for a check to be performed before every RPC is executed to make sure that the underlying SSH transport is still connected and functioning. If it is not, then call open() to get the transport up before running the RPC.

I think the third option is the easiest to implement. Would you consider a merge if I created this?

@vnitinv
Copy link
Contributor

vnitinv commented Feb 9, 2017

@spidercensus I think such check (and consecutive action) should be taken care by user's code.
right now dev.connected value is static, I am planning to make to property hence the value will be returning the current state of connection. Using this value user can take action as per there need.

@spidercensus
Copy link
Contributor Author

I agree that Device.connected needs to become a property. After my discussion with stacy, I was going to raise another request for that.

There is value in building this feature into the execute() and cli() methods on demand. Frameworks such as salt will be forced to check the connection state before every RPC request, which leads to greater overhead than an internal check only when the flag is turned on.

@spidercensus
Copy link
Contributor Author

spidercensus commented Feb 9, 2017

pull 664

stacywsmith added a commit to stacywsmith/py-junos-eznc that referenced this issue Feb 14, 2017
…ONF over SSH sessions.

Without SSH keepalives, a NAT or stateful firewall along the network
path between the PyEZ host and the target Junos device,
may timeout an inactive TCP flow and cause the NETCONF over SSH
session to hang. Sending SSH keepalives avoids this situation. The
default value is 30 seconds. Setting this parameter to a value of 0
disables SSH keepalives.

Note: This is a different situation than Issue Juniper#663 in which the
target Junos device is timing out the NETCONF over SSH session
due to a configured idle-timeout on the system login class.
@spidercensus
Copy link
Contributor Author

This is implemented in pull #669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants