Add S3 put plus get bucketWithVersioning API #84

mdedetrich · 2023-04-23T08:19:56Z

So the context behind this PR is that I am trying to get integration tests working against the real AWS S3 (see #67) and as part of that integration I am implementing the ability of the test suite to dynamically create/delete buckets (to prevent flaky test issues as a result of concurrency/left over state). In that work I ended up realizing that in order to get some of the tests to work I have to create buckets with versioning hence why I am making this PR.

Slightly annoyingly, the API of https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketVersioning.html / https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketVersioning.html has some unique peculiarities which made it harder to model. For starters the API differentiates between the versioning status of a bucket never being set versus it being set to false, alongside that there is an MFA field which is also part of the request hence why I modelled it. There are integration tests that verify this behaviour (tested on my companies AWS account) however since its highly impractical to test MFA I just ended up writing a unit test to make sure the headers are being correctly set (see "add x-amz-mfa headers for a putBucketVersioning request").

There are some important design decisions which should be considered that I am definitely open to changing, i.e. both the status and mfaDelete fields are being modelled with an ADT rather than a Boolean. This is because the status field doesn't appear to be a true boolean (its Enabled/Suspended rather than just Enabled/Disabled) which means there is a chance that more fields can be added and also because of what was mentioned before wrt to the API explicitly modelling the case where a field is never set vs it being set to false you would end up with Option[Boolean] which isn't the nicest. There is however an exception to this which is BucketVersioningResult.mfaField which does contain Option[Boolean]. This is because when setting the status field via put you have to provide actual MFA details where as when you retrieve the status via a get result its just telling whether its never being set/enabled/disabled. If I wanted to model this I would have to create an entirely separate ADT just for that Boolean like value since MFAStatus is already being used for the put request and I didn't want to create so many types for what is largely going to be a case that most people won't care about.

mdedetrich · 2023-04-23T08:23:27Z

s3/src/main/scala/org/apache/pekko/stream/connectors/s3/impl/Marshalling.scala

+      case x =>
+        val status = (x \ "Status").headOption.map(_.text match {
+          case "Enabled"   => BucketVersioningStatus.Enabled
+          case "Suspended" => BucketVersioningStatus.Suspended


So I am slightly torn on what to do here. As explained previously I have an impression that there could be a chance that a new value gets added to this field (since its Suspended rather than just being Disabled). If this happens then you would get a match runtime error here, there are alternatives however any of these alternatives would be a completely new case for Marshalling.scala, i.e.

If we get a value other than "Enabled"/"Suspended" we can just return "Suspended" but then log an error saying that we discovered a new field. This means we won't break end users but it might also cause unintended behaviour

Throw a proper exception rather than just a match error, no such exception actually exists currently since this would be a first

mdedetrich · 2023-04-28T08:57:13Z

@He-Pin @pjfanning @nvollmar Would it be possible to look into this? This ticket is blocking adding in integration tests for AWS S3 which I would prefer to get done before release because I want to verify that there aren't any more problems with the S3 client.

pjfanning

lgtm - I don't think we need to worry as much about mainintaining legacy alpakka behaviour in the specific connectors, if the legacy code is causing trouble

pjfanning · 2023-04-28T10:01:00Z

s3/src/main/scala/org/apache/pekko/stream/connectors/s3/impl/HttpRequests.scala

 import pekko.stream.scaladsl.Source
 import pekko.util.ByteString
 import software.amazon.awssdk.regions.Region

 import scala.collection.immutable.Seq
 import scala.concurrent.{ ExecutionContext, Future }
+import scala.xml.NodeSeq


one minor quibble - scala.xml is a bit unloved, the use here is small but it might be better just to use Java's in-built XML support (JAXP API)

For now I will keep using Scala xml because that is what's used throughout the codebase, can always file an issue about it

mdedetrich · 2023-04-28T10:13:49Z

lgtm - I don't think we need to worry as much about mainintaining legacy alpakka behaviour in the specific connectors, if the legacy code is causing trouble

We are actually using this so there is selfish motivation here but aside from that I am working with INFRA to get an S3 account for integration tests and they are waiting for a reason.

He-Pin · 2023-04-28T10:58:15Z

Sorry I never used s3 😓😓😓

mdedetrich requested review from jrudolph, gmethvin, He-Pin, nvollmar, seglo, pjfanning and raboof April 23, 2023 08:19

mdedetrich commented Apr 23, 2023

View reviewed changes

Add S3 put plus get bucketWithVersioning API

38b5d02

mdedetrich force-pushed the add-put-get-bucket-versioning branch from 3a8057c to 38b5d02 Compare April 25, 2023 06:21

pjfanning approved these changes Apr 28, 2023

View reviewed changes

pjfanning reviewed Apr 28, 2023

View reviewed changes

mdedetrich merged commit ae4567e into apache:main Apr 28, 2023
48 of 50 checks passed

mdedetrich mentioned this pull request Apr 29, 2023

Add bucketVersioningEnabled helper methods #86

Merged

pjfanning added the release note should be mentioned in release notes label Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add S3 put plus get bucketWithVersioning API #84

Add S3 put plus get bucketWithVersioning API #84

mdedetrich commented Apr 23, 2023 •

edited

mdedetrich Apr 23, 2023 •

edited

mdedetrich commented Apr 28, 2023

pjfanning left a comment

pjfanning Apr 28, 2023

mdedetrich Apr 28, 2023

mdedetrich commented Apr 28, 2023

He-Pin commented Apr 28, 2023

Add S3 put plus get bucketWithVersioning API #84

Add S3 put plus get bucketWithVersioning API #84

Conversation

mdedetrich commented Apr 23, 2023 • edited

mdedetrich Apr 23, 2023 • edited

Choose a reason for hiding this comment

mdedetrich commented Apr 28, 2023

pjfanning left a comment

Choose a reason for hiding this comment

pjfanning Apr 28, 2023

Choose a reason for hiding this comment

mdedetrich Apr 28, 2023

Choose a reason for hiding this comment

mdedetrich commented Apr 28, 2023

He-Pin commented Apr 28, 2023

mdedetrich commented Apr 23, 2023 •

edited

mdedetrich Apr 23, 2023 •

edited