Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 put plus get bucketWithVersioning API #84

Merged
merged 1 commit into from Apr 28, 2023

Conversation

mdedetrich
Copy link
Contributor

@mdedetrich mdedetrich commented Apr 23, 2023

So the context behind this PR is that I am trying to get integration tests working against the real AWS S3 (see #67) and as part of that integration I am implementing the ability of the test suite to dynamically create/delete buckets (to prevent flaky test issues as a result of concurrency/left over state). In that work I ended up realizing that in order to get some of the tests to work I have to create buckets with versioning hence why I am making this PR.

Slightly annoyingly, the API of https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketVersioning.html / https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetBucketVersioning.html has some unique peculiarities which made it harder to model. For starters the API differentiates between the versioning status of a bucket never being set versus it being set to false, alongside that there is an MFA field which is also part of the request hence why I modelled it. There are integration tests that verify this behaviour (tested on my companies AWS account) however since its highly impractical to test MFA I just ended up writing a unit test to make sure the headers are being correctly set (see "add x-amz-mfa headers for a putBucketVersioning request").

There are some important design decisions which should be considered that I am definitely open to changing, i.e. both the status and mfaDelete fields are being modelled with an ADT rather than a Boolean. This is because the status field doesn't appear to be a true boolean (its Enabled/Suspended rather than just Enabled/Disabled) which means there is a chance that more fields can be added and also because of what was mentioned before wrt to the API explicitly modelling the case where a field is never set vs it being set to false you would end up with Option[Boolean] which isn't the nicest. There is however an exception to this which is BucketVersioningResult.mfaField which does contain Option[Boolean]. This is because when setting the status field via put you have to provide actual MFA details where as when you retrieve the status via a get result its just telling whether its never being set/enabled/disabled. If I wanted to model this I would have to create an entirely separate ADT just for that Boolean like value since MFAStatus is already being used for the put request and I didn't want to create so many types for what is largely going to be a case that most people won't care about.

case x =>
val status = (x \ "Status").headOption.map(_.text match {
case "Enabled" => BucketVersioningStatus.Enabled
case "Suspended" => BucketVersioningStatus.Suspended
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am slightly torn on what to do here. As explained previously I have an impression that there could be a chance that a new value gets added to this field (since its Suspended rather than just being Disabled). If this happens then you would get a match runtime error here, there are alternatives however any of these alternatives would be a completely new case for Marshalling.scala, i.e.

  • If we get a value other than "Enabled"/"Suspended" we can just return "Suspended" but then log an error saying that we discovered a new field. This means we won't break end users but it might also cause unintended behaviour
  • Throw a proper exception rather than just a match error, no such exception actually exists currently since this would be a first

@mdedetrich mdedetrich force-pushed the add-put-get-bucket-versioning branch from 3a8057c to 38b5d02 Compare April 25, 2023 06:21
@mdedetrich
Copy link
Contributor Author

@He-Pin @pjfanning @nvollmar Would it be possible to look into this? This ticket is blocking adding in integration tests for AWS S3 which I would prefer to get done before release because I want to verify that there aren't any more problems with the S3 client.

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - I don't think we need to worry as much about mainintaining legacy alpakka behaviour in the specific connectors, if the legacy code is causing trouble

import pekko.stream.scaladsl.Source
import pekko.util.ByteString
import software.amazon.awssdk.regions.Region

import scala.collection.immutable.Seq
import scala.concurrent.{ ExecutionContext, Future }
import scala.xml.NodeSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor quibble - scala.xml is a bit unloved, the use here is small but it might be better just to use Java's in-built XML support (JAXP API)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I will keep using Scala xml because that is what's used throughout the codebase, can always file an issue about it

@mdedetrich
Copy link
Contributor Author

lgtm - I don't think we need to worry as much about mainintaining legacy alpakka behaviour in the specific connectors, if the legacy code is causing trouble

We are actually using this so there is selfish motivation here but aside from that I am working with INFRA to get an S3 account for integration tests and they are waiting for a reason.

@mdedetrich mdedetrich merged commit ae4567e into apache:main Apr 28, 2023
48 of 50 checks passed
@He-Pin
Copy link
Member

He-Pin commented Apr 28, 2023

Sorry I never used s3 😓😓😓

@pjfanning pjfanning added the release note should be mentioned in release notes label Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release note should be mentioned in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants