Skip to content

Conversation

@kggilmer
Copy link
Contributor

@kggilmer kggilmer commented May 7, 2021

Issue #, if available: smithy-lang/smithy-kotlin#306

Description of changes:

This PR adds a sample app to demonstrate S3 functionality. It continues on the theme of movie media files. In this case it is a simple media ingestion job that demonstrates the following capabilities of the SDK:

  • Create S3 client, operations: listBuckets, listObjects, putObject
  • Showcases async nature of operations by using Kotlin flow for processing

Testing done

  • Verified that "media" files are uploaded to S3 as expected
  • Verified that S3 bucket check works
  • Verified that key collision check works

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@kggilmer kggilmer requested review from aajtodd and ianbotsf May 7, 2021 02:43
Copy link
Contributor

@aajtodd aajtodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the example overall. I have two suggestions/concerns:

  1. Having actual media files (.avi) may hinder this example. Of course a user could just create fake .avi files but I'm wondering how to make it as easy as possible to grok and try out.

  2. I like the example but I definitely think we need to demonstrate getObject() (either in this example or in a different one) because it has the scoped response. Perhaps a separate, non-media project example would suffice? Something like a "round-tripped" example that just simply creates, uploads, lists, and re-downloads the result? In particular demonstrating or adding some comments around the ways ByteStream can be created or consumed (e.g. ByteStream.fromString(), fromBytes(), fromFile() and toByteArray(), decodeToString(), writeToFile().


dependencies {
implementation(kotlin("stdlib"))
implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.4.3")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion

probably move the coroutines version to the root examples project so that we only have to change in one place for all examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but how? I tried a few obvious variations, looked in our source tree and did some searching but didn't find anything that worked. Can you point me to an example?

import java.nio.file.Files

const val bucketName = "s3-media-ingestion-example"
const val ingestionDirPath = "/tmp/media-in"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question

are these meant to be changed by the user? These paths are *unix specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some notes in the file

}

// Check for valid S3 configuration based on account
suspend fun validateS3(s3Client: S3Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion

validateBucketExists(...) or even change it to ensureBucketExists(..) and showcase a simple create bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to create bucket if not exists, good idea.

.listObjects(ListObjectsRequest { bucket = bucketName })
.contents?.any { it.key == mediaMetadata.s3KeyName } ?: false

if (existsInS3) return Failure("${mediaMetadata.s3KeyName} already uploaded.", mediaMetadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question

should this be a failure? I could see it being useful to just log a statement but otherwise succeed. Or just overwrite?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Failure" may be the wrong name but I like the idea of this method returning multiple statuses, including one for "this already exists, no-op".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to FileExistsError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah my notion for adding this was simply to have more idomatic kotlin (sealed class types to represent result states), but it's not super tied to S3 itself.

Comment on lines 55 to 56
// Check for valid S3 configuration based on account
suspend fun validateS3(s3Client: S3Client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Prefer KDoc over plain comments:

/** Check for valid S3 configuration based on account */
suspend fun validateS3(s3Client: S3Client) {

Applies to multiple functions in this file.

Comment on lines 77 to 81
// Upload to S3 if file not already uploaded
suspend fun uploadToS3(s3Client: S3Client, mediaMetadata: MediaMetadata): UploadResult {
val existsInS3 = s3Client
.listObjects(ListObjectsRequest { bucket = bucketName })
.contents?.any { it.key == mediaMetadata.s3KeyName } ?: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correctness: Listing files has a few problems:

  • S3 recommends no longer using ListObjects in favor of ListObjectsV2
  • There's no pagination here. When the bucket has a lot of files, this will yield an incorrect outcome if the expected file isn't found in the first page of results.
  • Listing resources when we want exactly one by primary key is generally wasteful. It's generally more costly (in this case ~10x) and less efficient.

Instead, how about a HEAD request. Unvalidated code sample:

val existsInS3 = try {
    s3Client.headObject(
        HeadObjectRequest {
            bucket = bucketName
            key = mediaMetadata.s3KeyName
        }
    )
    true
} catch (e: NotFoundException) {
    false
}    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, great idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this uncovers one area where our SDK is not yet ready...error customizations. The s3 client should throw a NotFoundException but it does not at this time because we don't have a customized error handler for S3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it throws UnknownServiceErrorException because here we fail to deserialize anything from the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then I'll leave it up to you. Maybe calling listObjects is a better example since it's semantically clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I prefer the head calls. More efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a note in the code that the exception catching will change in a subsequent SDK release

.listObjects(ListObjectsRequest { bucket = bucketName })
.contents?.any { it.key == mediaMetadata.s3KeyName } ?: false

if (existsInS3) return Failure("${mediaMetadata.s3KeyName} already uploaded.", mediaMetadata)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Failure" may be the wrong name but I like the idea of this method returning multiple statuses, including one for "this already exists, no-op".

Comment on lines 35 to 41
val uploadResults = ingestionDir
.walk().asFlow()
.mapNotNull(::mediaMetadataExtractor)
.map { mediaMetadata ->
uploadToS3(client, mediaMetadata)
}
.toList()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Generally I find it cleaner when using multiline fluent calls to keep each method call in the chain on a separate line. That is, either:

val x = obj.doA().doB().doC()

Or:

val x = obj
    .doA()
    .doB()
    .doC()

But not:

val x = obj
    .doA().doB()
    .doC()

In this case, .asFlow() might be better on a dedicated line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

@kggilmer kggilmer merged commit 56857c4 into main May 7, 2021
@kggilmer kggilmer deleted the feat-s3-example branch May 7, 2021 23:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants