Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataproc SSL Issue #177

Closed
yoong93 opened this issue Feb 7, 2018 · 5 comments
Closed

Dataproc SSL Issue #177

yoong93 opened this issue Feb 7, 2018 · 5 comments
Assignees

Comments

@yoong93
Copy link

yoong93 commented Feb 7, 2018

I have opened the issue before here, but I still having problem with this and I think this is more related with dataproc than simply SQL Server(#166)

I am trying to connect to SQL Server 2017 from dataproc, but I have trouble with passing SSL handshake. There are three instances that I am using for testing. VM instance with MYSQL and SQL Server 2017 installed(Windows), another with MYSQL and PostgreSQL installed(Debian) and a dataproc where I am trying to connect to those servers. All three instances are in same network, and for mssql connection, I am trying with jdbc6.2.2 and SQL Server Driver 13.
Connection works fine with sqlalchemy & pyodbc for all three servers(MYSQL, PostgreSQL, MSSQL), but when using spark and jdbc driver, socket closes when trying ssl handshake when connecting to MSSQL. Also I teststed all cases from my local spark, and they all worked including connecting SQL Server with spark. Here is the error code from dataproc when I tried connecting to MSSQL

py4j.protocol.Py4JJavaError: An error occurred while calling o94.load. : com.microsoft.sqlserver.jdbc.SQLServerException: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. 
Error: "Socket is closed". ClientConnectionId:43d4f0fe-5c31-4c69-b434-1ffd1b16a4a4 
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2435) 
    at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1816) 
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2022) 
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687) 
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528) 
    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866) 
    at com.microsoft.sqlserver.jdbc.SQLServerDriver.connect(SQLServerDriver.java:569) 
    at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45) 
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:61) 
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:52) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:58) 
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.(JDBCRelation.scala:113)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47) 
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178) 
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498) 
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) 
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) 
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:214) 
    at java.lang.Thread.run(Thread.java:748) 
Caused by: java.net.SocketException: Socket is closed at org.conscrypt.SslWrapper.doHandshake(SslWrapper.java:368) 
    at org.conscrypt.ConscryptFileDescriptorSocket.startHandshake(ConscryptFileDescriptorSocket.java:211) 
    at org.conscrypt.Java8SocketWrapper.startHandshake(Java8SocketWrapper.java:324) 
    at com.microsoft.sqlserver.jdbc.TDSChannel.enableSSL(IOBuffer.java:1753) ... 25 more 

Here are lists of things that I have checked --

  1. Dataproc Configuration - I checked all the firewall rules, and they are all allowing ingress and egress connections. And if firewalls blocked connection between dataproc - sql server, I think it should have also blocked connection when using sqlalchemy & pyodbc.

  2. SQL Server Configuration - From local spark or AWS(for testing), I could easily connect to the SQL server without any certificate. Connection from dataproc always failed. I read about a similar case here (https://blogs.msdn.microsoft.com/dataaccesstechnologies/2016/11/30/intermittent-jdbc-connectivity-issue-the-driver-could-not-establish-a-secure-connection-to-sql-server-by-using-secure-sockets-layer-ssl-encryption-error-sql-server-returned-an-incomplete-respons/) but in my case, issue happens without any exception and my version is up to date.

So this is my status quo right now. Can you help me with this issue?

Also, for the last trial, when I forced ssl for mysql(on instance where SQL Server 2017 is installed), the dataproc could connect to server without any certificate whereas my local spark could not. So I am assuming dataproc automatically finds a certificate for mysql ssl, but apparently, it can't find the right one for mssql ssl. Can you locate the certificate for mssql so that I can try it out? I tried cacerts in /etc/ssl/java/, but didn't work.

Thanks!

@yoong93
Copy link
Author

yoong93 commented Feb 7, 2018

I have posted this on google issue tracker(https://issuetracker.google.com/issues/72886952), and they suggested me to talk about the issue here. Can I get some help on this?

@karth295
Copy link
Contributor

karth295 commented Feb 7, 2018

CC @bsidhom as the expert on Conscrypt/SSL.

@yoong93 Dataproc by default uses Conscrypt (https://github.com/google/conscrypt) for SSL performance, and I see that library in the stacktrace. Just for my sanity, can you try the Java-native SSL implementation by creating a cluster with --properties dataproc:dataproc.conscrypt.provider.enable=false. E.g.

gcloud dataproc clusters create <cluster-name> --properties dataproc:dataproc.conscrypt.provider.enable=false

Also, try spinning up a raw GCE VM on the same subnet (so the firewall rules are the same), and check whether you can connect to your SQL server.

@yoong93
Copy link
Author

yoong93 commented Feb 7, 2018

  1. Turning off Conscrypt -- it worked. You said this is just for sanity check, but is that simply turning off all ssl for connection? I am not so familiar with issues about security, but I might not want to simply turn off all the securities.
  2. This one also worked for me. I set up instance, install spark & drivers, and it worked fine.

@karth295
Copy link
Contributor

karth295 commented Feb 7, 2018

If you disable Conscrypt in Dataproc, it falls back to the Java-builtin SSL implementation. So it's still secure -- you are not turning off SSL. Conscrypt is just a more optimized (C-based) library, so we use it for performance, particularly when talking to GCS.

This issue on Conscrypt seems potentially related: google/conscrypt#104. If you're really interested in digging further, consider installing Conscrypt on a raw GCE VM and setting up a repro for the Conscrypt folks.

I'm going to close this issue for now since it isn't directly a Dataproc bug, nor is it related to the initialization actions in this repository.

@karth295 karth295 closed this as completed Feb 7, 2018
@yoong93
Copy link
Author

yoong93 commented Feb 8, 2018

ok thanks for help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants