Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH for Postgres Source #5742

Merged
merged 25 commits into from
Sep 2, 2021
Merged

SSH for Postgres Source #5742

merged 25 commits into from
Sep 2, 2021

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Aug 30, 2021

What

How

  • Handles injecting the SSH tunnel in the check, discover, read methods of PostgresSource directly. This make it relatively straightforward to inject without worrying about touching other jdbc dbs.

Recommended reading order

  1. PostgresSource.java
  2. SshTunnel.python
  3. AbstractSshPostgresSourceAcceptanceTest.java
  4. SshPasswordPostgresSourceAcceptanceTest.java
  5. etc...

Pre-merge Checklist

  • add secrets to GH
  • publish version

@github-actions github-actions bot added the area/connectors Connector related issues label Aug 30, 2021
@cgardens
Copy link
Contributor Author

cgardens commented Aug 30, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183623144

@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 20:25 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Aug 30, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183674111
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183674111

@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 20:43 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Aug 30, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183716158
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183716158

@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 21:00 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Aug 30, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183820042
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1183820042

@jrhizor jrhizor temporarily deployed to more-secrets August 30, 2021 21:34 Inactive
@@ -105,7 +106,10 @@ protected String getImageName() {

@Override
protected ConnectorSpecification getSpec() throws Exception {
return Jsons.deserialize(MoreResources.readResource("spec.json"), ConnectorSpecification.class);
final ConnectorSpecification originalSpec = Jsons.deserialize(MoreResources.readResource("spec.json"), ConnectorSpecification.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use a function to match the actual getspec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it in ssh helpers

@@ -5,6 +5,12 @@ plugins {

dependencies {
implementation 'commons-cli:commons-cli:1.4'
implementation group: 'org.apache.sshd', name: 'sshd-mina', version: '2.7.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting doesn't match the 'commons-cli:commons-cli:1.4' style we're using elsewhere.

@cgardens cgardens mentioned this pull request Aug 30, 2021
4 tasks
Preconditions.checkNotNull(tunnelSshPort);
Preconditions.checkNotNull(user);
Preconditions.checkArgument(sshKey != null || password != null,
"SSH Tunnel was requested to be opened while it was already open. This is a coding error.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this error message correct? i'm not sure how it relates to the condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. fixed.

this.tunnelDatabasePort = tunnelDatabasePort;

this.sshclient = createClient();
this.tunnelSession = openTunnel(sshclient);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we opening a tunnel in the constructor of this object? shouldn't this either be opened explicitly or be re-entrant/a singleton?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? the original version was a mix between a factory and this. so i just picked one of the patterns. what does singleton or reentrant get us here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I mixed two concerns in my initial comment.

  1. The main unexpected behavior I'm seeing here is that we instantiate a tunnel upon creating an object. It just seemed unintuitive. I guess this is fine since it's meant to be used in a try-with-resources?
  2. I remember there was a case where we got into trouble for trying to open the tunnel twice (because read calls check or something) hence the re-entrant suggestion, so we don't create two tunnels.

"multiline": true,
"order": 4
},
"tunnel_db_remote_host": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be the current hostname field, no need to add an extra host field.

Transparently we should replace the host field with localhost and use this in the SSH tunnel config instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah. good point. let me see what i need to do to make that work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada - i followed the path here. it complicates the code a fair amount, but it does provide a better use experience. i think it is the right thing to do, but would love to get your eyes on it before I go forward. You can either look at this PR as it is now or if you want to see exactly what had to change see this commit: d55049d.

The thing that is a bit complicated is that you need to add logic that:

  1. knows what fields in the config are the host and port
  2. replaces those with localhost and the port that we use for the tunnel
  3. make sure the source actually uses the new config

Because this is tricky, I went ahead and did the same SshSource decorator that I did for the destination. I think that at least contains all of the complexity in one spot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! I think you placed it in a good place

Copy link
Contributor

@airbyte-jenny airbyte-jenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marked as approve so I'm non-blocking. The only important change is to modify that toString() method so creds don't end up in a log file. The rest is just style.

", tunnelSshPort='" + tunnelSshPort + '\'' +
", user='" + user + '\'' +
", sshkey='" + sshkey + '\'' +
", password='" + password + '\'' +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secrets (key, pass) should not be included in the toString method, as it can leak secrets to logs.


if (check.getStatus().equals(AirbyteConnectionStatus.Status.FAILED)) {
throw new RuntimeException("Unable establish a connection: " + check.getMessage());
final SshTunnel tunnel = SshTunnel.getInstance(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're getting a tunnel instance here but I don't see a call to open the tunnel. Is that a coding mistake, or is the call happening farther in?

Copy link
Contributor Author

@cgardens cgardens Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed the structure of the class so that when the tunnel class is instantiated it immediately creates the actual ssh tunnel.

@cgardens
Copy link
Contributor Author

cgardens commented Aug 31, 2021

@sherifnada do you have any opinions on the spec. are these the clearest names? I did a rename within the constructor of SshTunnel to names that were easier for me to keep track of. LMK if you like those better and I'll swap them into the spec.

One additional thing, I'm considering is removing the name "database" from any of the fields. After all, all of the SSH machinery can be used for any resource that we want to access via an ssh tunnel. The only remenents of it being specific to a db at this point is that some fields have the word database in them. so something like:

tunnel_db_remote_host => tunnel_resource_remote_host
tunnel_db_remote_port => tunnel_resource_remote_port

wdyt?

@cgardens
Copy link
Contributor Author

cgardens commented Aug 31, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1185049046
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1185049046

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 06:38 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Aug 31, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1187063893
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1187063893

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 17:12 Inactive
@sherifnada
Copy link
Contributor

tunnel_db_remote_host => tunnel_resource_remote_host
tunnel_db_remote_port => tunnel_resource_remote_port

My only gripe here is that for the foreseeable future, we're only gonna be doing this for DBs. I think the actual variable name is better with your change, but the title should say database not resource. At least until we modify the injectSshConfig method to allow setting the title.

@cgardens
Copy link
Contributor Author

cgardens commented Aug 31, 2021

My only gripe here is that for the foreseeable future, we're only gonna be doing this for DBs.

@sherifnada agreed. it is one of those things that is pretty painful to change that's why i was thinking about being forward thinking. i'll do as you suggest, i will change the names in the spec to say resource, but the titles will say db. thanks!

once that's done, i'll publish and merge.

@cgardens
Copy link
Contributor Author

cgardens commented Sep 1, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1191076063
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1191076063

@jrhizor jrhizor temporarily deployed to more-secrets September 1, 2021 17:01 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Sep 2, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194204683
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194204683

@jrhizor jrhizor temporarily deployed to more-secrets September 2, 2021 12:43 Inactive
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Sep 2, 2021
@cgardens cgardens changed the base branch from cgardens/pg_ssh_final to master September 2, 2021 13:50
@cgardens
Copy link
Contributor Author

cgardens commented Sep 2, 2021

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194443376
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194443376

@jrhizor jrhizor temporarily deployed to more-secrets September 2, 2021 13:53 Inactive
@cgardens
Copy link
Contributor Author

cgardens commented Sep 2, 2021

/publish connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194813214
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/1194813214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants