-
Notifications
You must be signed in to change notification settings - Fork 55
Add s3 file ingestion sample to examples source tree for M1 release. #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the example overall. I have two suggestions/concerns:
-
Having actual media files (
.avi) may hinder this example. Of course a user could just create fake.avifiles but I'm wondering how to make it as easy as possible to grok and try out. -
I like the example but I definitely think we need to demonstrate
getObject()(either in this example or in a different one) because it has the scoped response. Perhaps a separate, non-media project example would suffice? Something like a "round-tripped" example that just simply creates, uploads, lists, and re-downloads the result? In particular demonstrating or adding some comments around the waysByteStreamcan be created or consumed (e.g.ByteStream.fromString(), fromBytes(), fromFile()andtoByteArray(), decodeToString(), writeToFile().
|
|
||
| dependencies { | ||
| implementation(kotlin("stdlib")) | ||
| implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.4.3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
probably move the coroutines version to the root examples project so that we only have to change in one place for all examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but how? I tried a few obvious variations, looked in our source tree and did some searching but didn't find anything that worked. Can you point me to an example?
| import java.nio.file.Files | ||
|
|
||
| const val bucketName = "s3-media-ingestion-example" | ||
| const val ingestionDirPath = "/tmp/media-in" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
are these meant to be changed by the user? These paths are *unix specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some notes in the file
| } | ||
|
|
||
| // Check for valid S3 configuration based on account | ||
| suspend fun validateS3(s3Client: S3Client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion
validateBucketExists(...) or even change it to ensureBucketExists(..) and showcase a simple create bucket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to create bucket if not exists, good idea.
| .listObjects(ListObjectsRequest { bucket = bucketName }) | ||
| .contents?.any { it.key == mediaMetadata.s3KeyName } ?: false | ||
|
|
||
| if (existsInS3) return Failure("${mediaMetadata.s3KeyName} already uploaded.", mediaMetadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question
should this be a failure? I could see it being useful to just log a statement but otherwise succeed. Or just overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Failure" may be the wrong name but I like the idea of this method returning multiple statuses, including one for "this already exists, no-op".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to FileExistsError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah my notion for adding this was simply to have more idomatic kotlin (sealed class types to represent result states), but it's not super tied to S3 itself.
| // Check for valid S3 configuration based on account | ||
| suspend fun validateS3(s3Client: S3Client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Prefer KDoc over plain comments:
/** Check for valid S3 configuration based on account */
suspend fun validateS3(s3Client: S3Client) {Applies to multiple functions in this file.
| // Upload to S3 if file not already uploaded | ||
| suspend fun uploadToS3(s3Client: S3Client, mediaMetadata: MediaMetadata): UploadResult { | ||
| val existsInS3 = s3Client | ||
| .listObjects(ListObjectsRequest { bucket = bucketName }) | ||
| .contents?.any { it.key == mediaMetadata.s3KeyName } ?: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correctness: Listing files has a few problems:
- S3 recommends no longer using
ListObjectsin favor ofListObjectsV2 - There's no pagination here. When the bucket has a lot of files, this will yield an incorrect outcome if the expected file isn't found in the first page of results.
- Listing resources when we want exactly one by primary key is generally wasteful. It's generally more costly (in this case ~10x) and less efficient.
Instead, how about a HEAD request. Unvalidated code sample:
val existsInS3 = try {
s3Client.headObject(
HeadObjectRequest {
bucket = bucketName
key = mediaMetadata.s3KeyName
}
)
true
} catch (e: NotFoundException) {
false
} There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, great idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this uncovers one area where our SDK is not yet ready...error customizations. The s3 client should throw a NotFoundException but it does not at this time because we don't have a customized error handler for S3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it throws UnknownServiceErrorException because here we fail to deserialize anything from the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well then I'll leave it up to you. Maybe calling listObjects is a better example since it's semantically clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I prefer the head calls. More efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a note in the code that the exception catching will change in a subsequent SDK release
| .listObjects(ListObjectsRequest { bucket = bucketName }) | ||
| .contents?.any { it.key == mediaMetadata.s3KeyName } ?: false | ||
|
|
||
| if (existsInS3) return Failure("${mediaMetadata.s3KeyName} already uploaded.", mediaMetadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Failure" may be the wrong name but I like the idea of this method returning multiple statuses, including one for "this already exists, no-op".
| val uploadResults = ingestionDir | ||
| .walk().asFlow() | ||
| .mapNotNull(::mediaMetadataExtractor) | ||
| .map { mediaMetadata -> | ||
| uploadToS3(client, mediaMetadata) | ||
| } | ||
| .toList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Generally I find it cleaner when using multiline fluent calls to keep each method call in the chain on a separate line. That is, either:
val x = obj.doA().doB().doC()Or:
val x = obj
.doA()
.doB()
.doC()But not:
val x = obj
.doA().doB()
.doC()In this case, .asFlow() might be better on a dedicated line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
Issue #, if available: smithy-lang/smithy-kotlin#306
Description of changes:
This PR adds a sample app to demonstrate S3 functionality. It continues on the theme of movie media files. In this case it is a simple media ingestion job that demonstrates the following capabilities of the SDK:
listBuckets,listObjects,putObjectTesting done
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.