[ISSUE] `databricks_mount` fails to mount AWS S3 bucket with `Unable to execute HTTP request: Remote host terminated the handshake` #1500

mvrangeme · 2022-07-26T06:21:45Z

Configuration

resource "aws_s3_bucket" "this" {
  bucket = "test_databricks_data"
}

resource "aws_s3_bucket_acl" "this" {
  bucket = aws_s3_bucket.this.id
  acl    = "private"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "this" {
  bucket = aws_s3_bucket.this.bucket
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "this" {
  bucket                             = aws_s3_bucket.this.id
  block_public_acls          = true
  block_public_policy       = true
  ignore_public_acls         = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_versioning" "this" {
  bucket = aws_s3_bucket.this.id
  versioning_configuration {
    status = "Enabled"
  }
}

# This is the resource which is failing / timing out
resource "databricks_mount" "data" {
  name       = "data"
  cluster_id = var.general_purpose_cluster_id
  s3 {
    instance_profile = var.databricks_instance_profile_id
    bucket_name      = aws_s3_bucket.this.id
  }
}

Note that

I've tested this with and without a bucket policy. It doesn't seem to make a difference so I've not included the bucket policy in the configuration I'm sharing
The S3 bucket is in the same region as the general purpose cluster instances
The instance profile role attached to the general purpose cluster instances is as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowList",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::test_databricks_data"
        },
        {
            "Sid": "AllowObjectActions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObjectAcl",
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3::test_databricks_data/*"
        }
    ]
}

Expected Behavior

What should have happened?

The databricks_mount resource should successfully reach the "created" status
The /mnt/data mount should be accessible on Databricks

Actual Behavior

What actually happened?

The databricks_mount resource is in "creating" status for 10 minutes and eventually times out
The /mnt/data mount actually is created but the terraform apply command still fails due to the databricks_mount resource timing out in the "creating" stauts
The Log4j logs on the general purpose cluster shows quite a few instances of the following error:

22/07/19 04:11:06 ERROR DatabricksS3LoggingUtils$:V3: S3 request failed with com.amazonaws.SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake; Request ID: null, Extended Request ID: null, Cloud Provider: AWS, Instance ID: i-0c463b179dd21c571
com.amazonaws.SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1216)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1162)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5453)
	at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:6428)
	at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:6401)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5438)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5400)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5394)
	at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:971)
	at shaded.databricks.org.apache.hadoop.fs.s3a.EnforcingDatabricksS3Client.listObjectsV2(EnforcingDatabricksS3Client.scala:214)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$5(S3AFileSystem.java:1852)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:333)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:294)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:1843)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3388)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3348)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3287)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.innerListStatus(S3AFileSystem.java:2809)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listStatus$12(S3AFileSystem.java:2788)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:118)
	at shaded.databricks.org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:112)
	at shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem.listStatus(S3AFileSystem.java:2788)
	at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$listStatus$2(DatabricksFileSystemV2.scala:97)
	at com.databricks.s3a.S3AExceptionUtils$.convertAWSExceptionToJavaIOException(DatabricksStreamUtils.scala:66)
	at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$listStatus$1(DatabricksFileSystemV2.scala:94)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:413)
	at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:507)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:528)
	at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:29)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
	at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:27)
	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:283)
	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:282)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionContext(DatabricksFileSystemV2.scala:512)
	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:318)
	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:303)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.withAttributionTags(DatabricksFileSystemV2.scala:512)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:502)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:422)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperationWithResultTags(DatabricksFileSystemV2.scala:512)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:413)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:385)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.recordOperation(DatabricksFileSystemV2.scala:512)
	at com.databricks.backend.daemon.data.client.DBFSV2.listStatus(DatabricksFileSystemV2.scala:94)
	at com.databricks.backend.daemon.data.client.DatabricksFileSystem.listStatus(DatabricksFileSystem.scala:164)
	at com.databricks.backend.daemon.dbutils.FSUtils$.$anonfun$ls$1(DBUtilsCore.scala:157)
	at com.databricks.backend.daemon.dbutils.FSUtils$.withFsSafetyCheck(DBUtilsCore.scala:91)
	at com.databricks.backend.daemon.dbutils.FSUtils$.ls(DBUtilsCore.scala:155)
	at com.databricks.backend.daemon.dbutils.FSUtils.ls(DBUtilsCore.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
	at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
	at sun.security.ssl.SSLSocketImpl.handleEOF(SSLSocketImpl.java:1596)
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1426)
	at sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1324)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:439)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:410)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:436)
	at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384)
	at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy55.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1343)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154)
	... 67 more
Caused by: java.io.EOFException: SSL peer shut down incorrectly
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:481)
	at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:470)
	at sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160)
	at sun.security.ssl.SSLTransport.decode(SSLTransport.java:110)
	at sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1418)
	... 90 more

Steps to Reproduce

Please list the steps required to reproduce the issue, for example:

terraform apply

Terraform and provider versions

Please paste the output of terraform version. If version of databricks provider is not the latest (https://github.com/databricks/terraform-provider-databricks/releases), please make sure to use the latest one.

Terraform v1.2.4
on darwin_amd64
+ provider registry.terraform.io/databricks/databricks v1.0.0
+ provider registry.terraform.io/hashicorp/aws v4.21.0

Debug Output

Please add turn on logging, e.g. TF_LOG=DEBUG terraform apply and run command again, paste it to gist & provide the link to gist. If you're still willing to paste in log output, make sure you provide only relevant log lines with requests.

It would make it more readable, if you pipe the log through | grep databricks | sed -E 's/^.* plugin[^:]+: (.*)$/\1/', e.g.:

TF_LOG=DEBUG terraform plan 2>&1 | grep databricks | sed -E 's/^.* plugin[^:]+: (.*)$/\1/'

Relevant output as follows:

2022-07-26T16:18:45.300+1000 [DEBUG] ProviderTransformer: "module.storage.databricks_mount.data" (*terraform.NodeValidatableResource) needs provider["registry.terraform.io/databricks/databricks"].mgmt

2022-07-26T16:18:45.305+1000 [DEBUG] ReferenceTransformer: "module.storage.databricks_mount.data" references: [module.storage.var.gp_cluster_id (expand) module.storage.var.databricks_instance_profile_id (expand) module.storage.local.data_science_pipeline_bucket_id (expand)]

If Terraform produced a panic, please provide a link to a GitHub Gist containing the output of the crash.log.

Important Factoids

Are there anything atypical about your accounts that we should know?

The text was updated successfully, but these errors were encountered:

nfx · 2022-07-26T09:11:00Z

relevant note to DBFS team:

22/07/19 04:11:06 ERROR DatabricksS3LoggingUtils$:V3: S3 request failed with com.amazonaws.SdkClientException: Unable to execute HTTP request: Remote host terminated the handshake; Request ID: null, Extended Request ID: null, Cloud Provider: AWS, Instance ID: i-0c463b179dd21c571
...
	at com.databricks.backend.daemon.data.client.DBFSV2.$anonfun$listStatus$2(DatabricksFileSystemV2.scala:97)
...
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
Caused by: java.io.EOFException: SSL peer shut down incorrectly

stormwindy · 2022-07-26T13:01:45Z

Are there any firewall/VPC settings possibly not whitelisting s3 endpoints, including regional ones? Regional endpoint whitelisting should be done for the region here the bucket exists from the top of my mind.

thaiphv · 2022-07-27T05:21:03Z

This is the same issue that I ran into. In my case I was using S3 gateway endpoint. For some reason, newly created buckets can't be queried using the global endpoint s3.amazonaws.com. However, they can be queried from the regional endpoint where they are created, for example s3.ap-southeast-2.amazonaws.com in my case.

I lodged the issue #1492 with the hope that I didn't have to create a dedicated cluster just for mounting S3 buckets in my workspace and I could just let the Terraform provider spin up a new cluster with a given instance profile and I could pass the fs.s3a.endpoint setting to the dbutils.fs.mount function via the extra_configs attribute.

However since issue was rejected, I had to work around that by doing something like below:

resource "aws_s3_bucket" "workspace" {
  bucket = local.s3_bucket_name
}

resource "databricks_mount" "workspace" {
  name       = local.workspace_name
  cluster_id = var.generic_cluster_id
  uri        = "s3a://${aws_s3_bucket.workspace.id}"
  extra_configs = {
    "fs.s3a.endpoint" = "s3.${var.aws_region}.amazonaws.com"
  }
}

mvrangeme · 2022-07-27T23:12:56Z

Thanks for the feedback so far. I'm attempting @thaiphv's workaround now.

There was some truth to @stormwindy's comment too. We had a firewall that was not whitelisting a regional endpoint.

I'll update this issue after testing today.

mvrangeme · 2022-07-28T11:33:07Z

Ok so it's definitely related to our AWS Network Firewall. Here are some relevant packets that I captured:

So you can see that the Client Hello is not acknowledged.

I'm still working out exactly what's wrong with our Network Firewall rules. I'll update this issue when I have a resolution.

mvrangeme · 2022-07-29T05:57:31Z

Ok fixed 😓 .

Issue was that our Databricks cluster is deployed to the us-west-2 region and we were trying to mount an S3 bucket in the us-west-1 region.

Traffic to us-west-2 S3 buckets goes via an S3 VPC endpoint and therefore bypasses our Network FIrewall.

Fix was to punch a hole through our Network Firewall allowing access to the s3.us-west-1.amazonaws.com endpoint.

Thanks to everyone for their comments!

nfx changed the title ~~[ISSUE] Provider issue~~ [ISSUE] databricks_mount fails to mount AWS S3 bucket with Unable to execute HTTP request: Remote host terminated the handshake Jul 28, 2022

nfx added the invalid This issue is not relevant to this provider or works as designed label Jul 28, 2022

mvrangeme closed this as completed Jul 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE] `databricks_mount` fails to mount AWS S3 bucket with `Unable to execute HTTP request: Remote host terminated the handshake` #1500

[ISSUE] `databricks_mount` fails to mount AWS S3 bucket with `Unable to execute HTTP request: Remote host terminated the handshake` #1500

mvrangeme commented Jul 26, 2022

nfx commented Jul 26, 2022

stormwindy commented Jul 26, 2022

thaiphv commented Jul 27, 2022 •

edited

mvrangeme commented Jul 27, 2022

mvrangeme commented Jul 28, 2022

mvrangeme commented Jul 29, 2022 •

edited

[ISSUE] databricks_mount fails to mount AWS S3 bucket with Unable to execute HTTP request: Remote host terminated the handshake #1500

[ISSUE] databricks_mount fails to mount AWS S3 bucket with Unable to execute HTTP request: Remote host terminated the handshake #1500

Comments

mvrangeme commented Jul 26, 2022

Configuration

Expected Behavior

Actual Behavior

Steps to Reproduce

Terraform and provider versions

Debug Output

Important Factoids

nfx commented Jul 26, 2022

stormwindy commented Jul 26, 2022

thaiphv commented Jul 27, 2022 • edited

mvrangeme commented Jul 27, 2022

mvrangeme commented Jul 28, 2022

mvrangeme commented Jul 29, 2022 • edited

[ISSUE] `databricks_mount` fails to mount AWS S3 bucket with `Unable to execute HTTP request: Remote host terminated the handshake` #1500

[ISSUE] `databricks_mount` fails to mount AWS S3 bucket with `Unable to execute HTTP request: Remote host terminated the handshake` #1500

thaiphv commented Jul 27, 2022 •

edited

mvrangeme commented Jul 29, 2022 •

edited