S3 HTTP connection pool timeout when many files are processed multiple times

## Describe the bug
This happens because some of streams are created but not closed. This happens on EMR.

```
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5403)
	at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:26)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:12)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor$CallPerformer.call(GlobalS3Executor.java:111)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:138)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:186)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
	at com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:43)
	at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.getFileMetadataFromCacheOrS3(Jets3tNativeFileSystemStore.java:636)
	at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:320)
	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:512)
	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1767)
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:402)
	at za.co.absa.cobrix.spark.cobol.source.streaming.FileStreamer.getHadoopPath(FileStreamer.scala:125)
	at za.co.absa.cobrix.spark.cobol.source.streaming.FileStreamer.<init>(FileStreamer.scala:44)
	at za.co.absa.cobrix.spark.cobol.source.scanners.CobolScanners$.$anonfun$buildScanForVarLenIndex$2(CobolScanners.scala:52)
```

## Code snippet that caused the issue

```
  spark.read
   .format("cobol")
   .option("copybook", "..some.copybook...")
   .load("s3://bucket/path")
```

## Expected behavior
All open files on S3 need to be closed to free HTTP connections in the pool.

## Context
- Cobrix version: 2.9.3

## Copybook (if possible)
--


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 HTTP connection pool timeout when many files are processed multiple times #803

Describe the bug

Code snippet that caused the issue

Expected behavior

Context

Copybook (if possible)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

S3 HTTP connection pool timeout when many files are processed multiple times #803

Description

Describe the bug

Code snippet that caused the issue

Expected behavior

Context

Copybook (if possible)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions