Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-91] - Enable SSH Tunneling on Database Connections #21789

Closed
hughhhh opened this issue Oct 12, 2022 · 7 comments
Closed

[SIP-91] - Enable SSH Tunneling on Database Connections #21789

hughhhh opened this issue Oct 12, 2022 · 7 comments
Labels
2022.7 sip Superset Improvement Proposal
Projects

Comments

@hughhhh
Copy link
Member

hughhhh commented Oct 12, 2022

Motivation

Users are currently blocked on setting up ssh tunneling entirely through superset. This is causing us to lose potential users to leverage this product as their analytics tool.

Proposed Change

Describe how the feature will be implemented, or the problem will be solved. If possible, include mocks, screenshots, or screencasts (even if from different tools).

  1. wrap @contextmanager around get_sqla_engine()
    1. @contextmanager logic for blocking localhost
      1. Does the DB have ssh tunnel metadata associated with it (stored in a separate table or column)? If it has, create the tunnel binding on 127.0.0.1 and a random port, replace the host and port in the SQLAlchemy URI with 127.0.0.1 and the port that the ssh tunnel created.
      2. If the DB has no ssh tunnel metadata, check the host of the SQL Alchemy URI. If it's localhost or some variation (127.0.0.1, ::1, 0:0:0:0:0:0:0:1, etc.) then block it (unless the config allows).
        1. checking for host name for that resolve to [localhost](http://localhost) as well (library) [there will be a feature flag that will users to override this)
    2. We'll be able to grab the local_binding_port from server object that sshtunnel package returns in the contextmanager
    3. We are blocking localhost due to security concerns for users who main try to get access to other ports within a given deployment/instance. Specifically for Preset, we want to make sure get user's cannot get access to other's db without proper credentials
@contextmanager
def get_sqla_engine_with_context()
        # enable ssh
              # check if ssh tunnel enabled
                   # true -> create tunnel with creds
                   # false -> verify that user isn't connecting to localhost (under config flag)
        
         yield engine
    
         # tear down ssh
  1. Refactor all the places trying to create_engine to have new with format
with get_sqla_engine as engine:
     # use engine with ssh tunnel created
  1. Define schema that is needed for the client to properly SSH tunnel to a remote host
class SSHTunnelCredentials(Schema):
    database_id: int

    server_host: str
    server_port: int
    username: str
    password: Optional[str]
    private_key: Optional[str]
    private_key_password: Optional[str]

    bind_host: str
    bind_port: int
  1. Create table for TunnelConfig that mapped to Database (fk: database_id)
    • migration required and schema should match TunnelConfig
  2. Create tunnel using information provided in the TunnelConfig table for a specific database
    
with sshtunnel.open_tunnel(
    # ...
) as server:
  1. Inside get_sqla_engine with ssh if current db has encrypted_credentials.ssh_tunnel enable tunnel and deconstruct before returning
    1. if doing 2, we’d leave the connection open for this client
  2. Managing SSL with ssh tunnel
    1. if ssh tunnel is enabled + ssl we need allow the certificates to be ignored
    2. For Postgres if you pass sslmode=verify-ca it will ignore the names in the certificates. sslmode=verify-full would fail in this case.

New or Changed Public Interfaces

Describe any new additions to the model, views or REST endpoints. Describe any changes to existing visualizations, dashboards and React components. Describe changes that affect the Superset CLI and how Superset is deployed.

I will be creating a new table name ssh_tunnel_config. This table will hold all the necessary information for the client to establish the connection to any Database living between the proxy.

class SSHTunnelCredentials(Schema):
    database_id: int

    server_host: str
    server_port: int
    username: str
    password: Optional[str]
    private_key: Optional[str]
    private_key_password: Optional[str]

    bind_host: str
    bind_port: int
  • the bind_host + bind_port will be built based upon the information provided in the sqlalchemy_uri.

New dependencies

We'll be leveraging sshtunnel pip package to help establish connections.
https://pypi.org/project/sshtunnel/

@betodealmeida
Copy link
Member

betodealmeida commented Oct 20, 2022

A few comments:

In (1.i.b) it would be nice to explain that you're planning to add a configuration flag to prevent users from connecting to databases in localhost. There are plenty of legit cases for that, so it should be off by default. It probably only makes sense to turn this on for multi-tenant Superset deploymend (Preset, eg). Though I think this is outside the scope of this SIP, and you could leave it out.

For the table schema:

class TunnelConfig(Schema):
	database_id: int # fk
	ssh_server: str, # IP address 
        ssh_username: str, # username for ssh
        ssh_password: str, # password
	remote_server_address: Tuple[str, int] # (REMOTE_SERVER_IP, 443)
        ssh_pkey: Optional[str],
        ssh_private_key_password: Optional[str] 
  • Why are you using ssh_pkey but then ssh_private_key_password? It's better to be consistent, and I'd say, explicit. Let's have ssh_private_key instead of ssh_pkey. Even better, let's drop the superfluous ssh_ prefix and use private_key.
  • If you're allowing authenticating with a private key the password field should be optional.
  • Not all SSH servers run on port 22. You should add a column for the SSH port.
  • remove_server_address is confusing, a better name would be remote_bind_address, since this is not the name of the remote server, but of the server/port the tunnel will bind to. Also, I would store this in separate columns, since not all databases support complex types.

My suggestion for the schema would be:

class SSHTunnelConfig(Schema):

    database_id: int

    server_host: str
    server_port: int
    username: str
    password: Optional[str]
    private_key: Optional[str]
    private_key_password: Optional[str]

    bind_host: str
    bind_port: int

This would map to the following SSH command:

ssh -p ${server_port} :${bind_host}:${bind_port} ${username}@${server_host}

@betodealmeida betodealmeida added sip Superset Improvement Proposal 2022.7 labels Oct 20, 2022
@hughhhh hughhhh changed the title [SIP-88] - Enable SSH Tunneling [SIP-89] - Enable SSH Tunneling Oct 20, 2022
@hughhhh hughhhh changed the title [SIP-89] - Enable SSH Tunneling [SIP-91] - Enable SSH Tunneling Oct 20, 2022
@michael-s-molina
Copy link
Member

michael-s-molina commented Oct 21, 2022

@hughhhh Can we change the title to Enable SSH Tunneling on Database Connections? SSH Tunneling is a generic concept that may be applied in many parts of the application and this SIP is just one part.

+1 to @betodealmeida's comments

@hughhhh hughhhh changed the title [SIP-91] - Enable SSH Tunneling [SIP-91] - Enable SSH Tunneling on Database Connections Oct 21, 2022
@geido
Copy link
Member

geido commented Oct 24, 2022

This looks good with above comments. Thanks @hughhhh!

@craig-rueda
Copy link
Member

Agree with @betodealmeida on this. We should provide a "hook" that would allow folks to override the connection building on their own. If localhost access needs to be blocked, for instance, that logic would land in each custom manager's implementation.

Ideally, we would follow a similar pattern as FAB's SecurityManager where the FQCN is called out in config and is then instantiated at bootstrap time and is wired up accordingly.

@hughhhh
Copy link
Member Author

hughhhh commented Nov 10, 2022

Agree with @betodealmeida on this. We should provide a "hook" that would allow folks to override the connection building on their own. If localhost access needs to be blocked, for instance, that logic would land in each custom manager's implementation.

Ideally, we would follow a similar pattern as FAB's SecurityManager where the FQCN is called out in config and is then instantiated at bootstrap time and is wired up accordingly.

@craig-rueda I'll make sure to build in hook mechanism to allow devs to override the credentials in their ssh tunnel before generating their tunnel

@ktmud
Copy link
Member

ktmud commented Nov 13, 2022

Have we considered other alternatives without using ContextManager? For example, using the do_connect event to apply SSH-tunneling before connection---I'd imagine you can create a new Connection superclass that is SSH aware.

Current ContextManager implementation would still create a new SSH connection whenever a new db connection is created. The downstream code for DbEngineSpec will look much cleaner if you can hide that complexity within the DbEngineSpec itself.

@rusackas rusackas added this to VOTING (in the mailing list) in SIPs Nov 30, 2022
@rusackas
Copy link
Member

rusackas commented Dec 9, 2022

Closing as approved, and updating the project board! Please continue to reference this issue in related PRs whenever relevant!

@rusackas rusackas closed this as completed Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2022.7 sip Superset Improvement Proposal
Projects
Status: Implemented / Done
SIPs
VOTING (in the mailing list)
Development

No branches or pull requests

7 participants