Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace s3 mock with minio #43

Merged
merged 2 commits into from
Aug 31, 2021
Merged

Replace s3 mock with minio #43

merged 2 commits into from
Aug 31, 2021

Conversation

mdedetrich
Copy link
Contributor

@mdedetrich mdedetrich commented Aug 30, 2021

About this change - What it does

This change replaces the adobe S3 mock with minio running in a docker container using scala test-containers as well as throttling the source input in the Kafka mock so that the entire source isn't processed at once

References: #38

Why this way

The tests for the S3 backup client have randomly started failing somewhat recently (see linked issue). After some investigation I believe the main case is due to the fact that the S3 mock couldn't handle receiving the data so fast. Hence as a solution I did the following things

  1. I replaced S3 mock with Minio. I am not 100% sure that this made a difference in making the tests reliable but it does have much higher usage and its what alpakka uses to test s3
  2. I added a sourceTransform function to the Kafka mock which allows you to apply a function on the Source. We use this to throttle the Source (which is just a hardcoded list of events) so that events occur naturally over time rather than all at once
  3. I have replaced the delay value with a value proportional to the throttle which also makes the tests run faster (i.e. the test will finish earlier if the data size generated is smaller) and be more reliable in cases where the data set is very high

Note that since this PR gets test-containers up and running, its the exact same type of work that is required for GCS.

@mdedetrich mdedetrich requested a review from jlprat August 30, 2021 16:45
@@ -14,7 +14,7 @@ object Generators {
)

lazy val bucketAllCharGen: Gen[Char] = Gen.frequency(
(6, Gen.alphaLowerChar),
(10, Gen.alphaLowerChar),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to the shrinking mechanism of scalacheck and the filter done by checkInvalidDuplicateChars, the test was sometimes hitting too many cases where the checkInvalidDuplicateChars returned false. Increasing the proportion of alpha lower chars (which are never filtered) made this issue go away

@mdedetrich mdedetrich force-pushed the replace-s3-mock-with-minio branch 2 times, most recently from 70caeee to 5d7ff57 Compare August 30, 2021 16:50
implicit val s3Attrs: Attributes = S3Attributes.settings(s3Settings)

val delay =
(THROTTLE_AMOUNT * (kafkaDataWithTimePeriod.data.size / THROTTLE_ELEMENTS) * 1.2) + (10 millis) match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incase you are wondering, yes this is the only way to convert a Duration to a FiniteDuration....

).asJava
)
lazy val s3Settings = S3Settings()
.withEndpointUrl(s"http://${container.getHostAddress}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how important it is, but S3 will be HTTPS very likely, I think you could also configure TLS with Minio and be as close to production as possible. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth exploring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view on this is the point of this test specifically is to see that the backup flow for guardian works on S3 as much as possible without needing an S3 account. Testing https in this context is really only testing that the Alpakka library (which we use for S3) can properly load the S3 client configuration for https which is a test that is likely already done in the underlying library (see https://github.com/akka/alpakka/blob/master/s3/src/test/scala/akka/stream/alpakka/s3), so in other words we would just be duplicating a test.

An integration test for this could also work, but it would be closer to a unit style test where we just make sure that the S3 client can do a trivial operation over https, i.e. creating a single bucket (in contrast to this test which is actually making sure everything in the backup flow is being written to S3 correctly).

If you feel strongly about it and its not too difficult (ergo needing to setup fake certificates) I am not against changing this test to use https in a future PR however note that it is intended in the future to be able to do a full end2end test in guardian on an actual S3 account (which would include https).

Copy link
Contributor

@jlprat jlprat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments and a question

@mdedetrich
Copy link
Contributor Author

@jlprat I have just rebased the PR with the only change being making the constants follow Scala style guide. I didn't make this an additional commit because this PR has multiple commits and then I would have had to squish things out of order (which is annoying/difficult)

Copy link
Contributor

@jlprat jlprat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mdedetrich mdedetrich merged commit 5b5dee9 into main Aug 31, 2021
@mdedetrich mdedetrich deleted the replace-s3-mock-with-minio branch August 31, 2021 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants