Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiple file upload directive #1033

Merged
merged 5 commits into from Sep 19, 2017
Merged

Add multiple file upload directive #1033

merged 5 commits into from Sep 19, 2017

Conversation

adlawson
Copy link
Contributor

@adlawson adlawson commented Apr 14, 2017

Adds support for multiple file uploads with the same field name.
I've followed the same implementation and testing approach as fileUpload. I did think about changing the fileUpload implementation to take the head of fileUploadAll but it would remove any benefit Sink.headOption gives you over Sink.seq.

// scaladsl
def fileUploadAll(fieldName: String): Directive1[immutable.Seq[(FileInfo, Source[ByteString, Any])]]

// javadsl
def fileUploadAll(fieldName: String, inner: JFunction[JList[JMap.Entry[FileInfo, Source[ByteString, Any]]], Route]): Route

Tasks

Following review at #1033 (review) there are a few outstanding pieces of work that need to be done before merge

  • Rename directives
  • Separate destination file creation
  • Refactor Source(files).map to just files.map
  • Java Documentation
  • Scala Documentation
  • Deprecate uploadedFile directive(?) Todo in a following PR

@lightbend-cla-validator

Hi @adlawson,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

http://www.lightbend.com/contribute/cla

@akka-ci
Copy link

akka-ci commented Apr 14, 2017

Can one of the repo owners verify this patch?

* for streaming the file contents somewhere. If there is no such field the request will be rejected.
*/
def fileMultiUpload(fieldName: String, inner: JFunction[JList[JMap.Entry[FileInfo, Source[ByteString, Any]]], Route]): Route = RouteAdapter {
def toJava[F <: FileInfo](info: F): FileInfo = info
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about this bit, but without it I'd need to asInstanceOf to make it work with Map.Entry.

@jlprat
Copy link
Member

jlprat commented Apr 15, 2017

OK TO TEST

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR that is currently being validated by Jenkins labels Apr 15, 2017
@akka-ci
Copy link

akka-ci commented Apr 15, 2017

Test FAILed.

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) validating PR that is currently being validated by Jenkins labels Apr 18, 2017
@akka-ci
Copy link

akka-ci commented Apr 18, 2017

Test PASSed.

@adlawson adlawson changed the title [WIP] Add fileMultiUpload directive Add fileMultiUpload directive Apr 18, 2017
.filter(part ⇒ part.filename.isDefined && part.name == fieldName)
.map(part ⇒ (FileInfo(part.name, part.filename.get, part.entity.contentType), part.entity.dataBytes))

val partF = partSource.runWith(Sink.seq[(FileInfo, Source[ByteString, Any])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this approach will not work if the file uploads are big enough. You try to collect ongoing sources for the parts into a seq. However, without reading the first source you will not get the second source, so that the seq will never complete and the stream will deadlock. This is an unfortunate consequence of the squential multipart format. It will work with smaller payloads that fit into all the buffers but if the whole request doesn't fit into the buffers it will deadlock.

I think we need to do the same that other web servers / frameworks are doing and buffer the incoming files on the disk so that we can drain the whole request body before giving out any data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a real shame. I've been trying to get this to work with larger files today and hitting timeouts, but couldn't trace the root of the problem. Super obvious when you think about it.

If you'd like I can continue working on it and submit updated changes, since it's something I need anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course. We'll be glad if you keep working on it. I think buffering to disk isn't completely trivial to do but would definitely be a worthwhile addition.

Copy link
Member

@jrudolph jrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some solution to the deadlocking problem, commented above. (Just adding the "Changes Requested" marker for administrative reasons)

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR that is currently being validated by Jenkins labels Apr 28, 2017
@akka-ci
Copy link

akka-ci commented Apr 28, 2017

Test PASSed.

@adlawson
Copy link
Contributor Author

adlawson commented Apr 28, 2017

I've updated the implementation to buffer to disk and squashed the commits into one. The changes now include 2 directives, where the first depends on the second.

def fileMultiUpload(fieldName: String): Directive1[immutable.Seq[(FileInfo, Source[ByteString, Any])]]

def uploadedFiles(fieldName: String): Directive1[immutable.Seq[(FileInfo, File)]]

I had started to go down the route of a custom GraphStage to buffer to disk, just like Alpakka's akka.stream.alpakka.s3.impl.DiskBuffer but it proved harder to implement and more complex than necessary. I also couldn't figure out a way to guarantee reading from a written-to file. Maybe chunking writes may have helped but I still don't see any guarantees.

Another solution I tried was Flow.fromSinkAndSource but it appears the Source doesn't wait for the Sink to fully complete, so this wouldn't work.

The final solution looks too simple to me, but I've tested it with multiple large files and they stream over perfectly fine. I suppose the only downside is that we effectively have to wait for all ByteString sinks to complete before we can begin sourcing again from disk.

@patrox
Copy link

patrox commented Jul 10, 2017

hi @adlawson - I'm quite interested in the functionality from this PR, so I would like to ask if you are going to finish it ?

@jrudolph Can you please give me some overview of what has to be done in order to get this PR to a mergable state ?

Copy link
Member

@jrudolph jrudolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking quite good (even it seems to simple, streams should make things simple ;) ).

Apart from the few comments, it also needs documentation before it can be merged.

*
* @group fileupload
*/
def uploadedFiles(fieldName: String): Directive1[immutable.Seq[(FileInfo, File)]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a name which better describes what it does (maybe storeUploadedFilesToTempFiles? better ideas?). It should have a parameter to specify the directory to put the temporary files into.

.filter(part ⇒ part.filename.isDefined && part.name == fieldName)
.mapAsync(1) { part ⇒
val fileInfo = FileInfo(part.name, part.filename.get, part.entity.contentType)
val dest = File.createTempFile("akka-http-upload", ".tmp")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this part should be completely abstracted away, so that the user can choose the destination freely based on the file info.

implicit val ec = ctx.executionContext

uploadedFiles(fieldName).flatMap { files ⇒
val partsSource: Source[(FileInfo, Source[ByteString, Any]), Any] = Source(files).map {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be even simpler: no need for Source(files) when source and target datastructures are a Seq. You can just map the files datastructure.

*
* @group fileupload
*/
def fileMultiUpload(fieldName: String): Directive1[immutable.Seq[(FileInfo, Source[ByteString, Any])]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked as @ApiMayChange.

*
* @group fileupload
*/
def uploadedFiles(fieldName: String): Directive1[immutable.Seq[(FileInfo, File)]] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked as @ApiMayChange.

@jrudolph
Copy link
Member

(And sorry for keeping it around for so long. It's not because the PR wasn't good, we were just busy.)

@adlawson
Copy link
Contributor Author

Thanks for the feedback. I stuck with similar names to the single file uploads, just for consistency, but I'm happy to change them to be more descriptive. I'll also get onto your other points like separating temp file construction etc. too.

Don't worry about the delay - I've been buried under a mountain of work too. I'll try to get this updated within the next week.

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins and removed tested PR that was successfully built and tested by Jenkins labels Jul 30, 2017
@adlawson
Copy link
Contributor Author

Hey @jrudolph, I've made some updates since your last review

  • Renamed uploadedFiles to storeUploadedFiles to better describe what it does
  • Separated out the temporary file creation
  • Added @ApiMayChange where necessary, and removed the mima-filter exclusions

I've also taken a couple of further steps

  • Renamed fileMultiUpload to fileUploadAll so "multi" isn't confused with "multipart"
  • Added a storeUploadedFile (singular) to have a consistent API with the new directive. This should replace uploadedFile (which still exists and behaves the same as before, but could be deprecated in favour of this new function)

Let me know what you think, and whether you agree with storeUploadedFile or not.

@akka-ci
Copy link

akka-ci commented Sep 9, 2017

Test PASSed.

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins and removed tested PR that was successfully built and tested by Jenkins labels Sep 10, 2017
- Add fileUploadAll directive, buffering multiple files on disk
- Add storeUploadedFiles directive to stream multiple files to disk
- Add mima-filter exclusions for new directives
- Add destFn argument to directives to support custom destinations
- Replace uploadedFile directive with storeUploadedFile
- Document fileUploadAll directive
- Document storeUploadedFile directive
- Document storeUploadedFiles directive
@akka-ci akka-ci added validating PR that is currently being validated by Jenkins needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR that is currently being validated by Jenkins labels Sep 10, 2017
@akka-ci
Copy link

akka-ci commented Sep 10, 2017

Test FAILed.

@akka-ci akka-ci added tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) labels Sep 10, 2017
@akka-ci
Copy link

akka-ci commented Sep 10, 2017

Test PASSed.

@adlawson
Copy link
Contributor Author

adlawson commented Sep 10, 2017

I've added the discardBytes call you asked for, but I haven't been able to come up with a working test for it. The closest I had was observing completing promises when the streams are consumed, just like the tests in akka/http/scaladsl/model/EntityDiscardingSpec.scala#L44-L53, but the futures never complete. I also tried simply consuming streams again to trigger and catch an IllegalStateException but I was able to consume them all again fine, even the ones consumed to files. If you have any pointers I'll keep trying, but I'm at a loss right now.

In addition to the change, I've squashed the commits into more reasonable, safe, chunks of work.

storeUploadedFile has a more descriptive name and also adds the possibility
to select the target file.
Directive1[T] = Directive[Tuple1[T]], so don't nest another Tuple2 inside.
This also allows to get rid of the `case` syntax when using the directive.

Slightly improved implementation as well .
Before only Strict BodyParts/HttpEntity were tested which are not
susceptible to deadlocks. The new version should test the more common
streaming uploads more comprehensively.
@jrudolph
Copy link
Member

@adlawson looking very good now. I added a few commits with smallish cleanups and the deprecation of uploadedFile. Could you review and merge https://github.com/adlawson/akka-http/pull/1 into this PR (couldn't push to your branch)?

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR that is currently being validated by Jenkins labels Sep 19, 2017
@akka-ci
Copy link

akka-ci commented Sep 19, 2017

Test PASSed.

@adlawson
Copy link
Contributor Author

Changes look good, @jrudolph. Thanks for the updates.
All merged and ready to go 👍

@ktoso
Copy link
Member

ktoso commented Sep 19, 2017

Great, will give it a final read then :)

Copy link
Member

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great effort, thanks a lot 👍

* @group fileupload
*/
@ApiMayChange
def storeUploadedFile(fieldName: String, destFn: FileInfo ⇒ File): Directive[(FileInfo, File)] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice name btw 👍

@ktoso ktoso dismissed jrudolph’s stale review September 19, 2017 09:33

Docs added and Johanness's PR also merged, everything looks good

@ktoso ktoso merged commit a1a9c27 into akka:master Sep 19, 2017
@ktoso
Copy link
Member

ktoso commented Sep 19, 2017

Resolved #1273 Thanks!

@adlawson adlawson deleted the adlawson/directive-file-multiupload branch September 19, 2017 09:43
/**
* Streams the bytes of the file submitted using multipart with the given field name into designated files on disk.
* If there is an error writing to disk the request will be failed with the thrown exception, if there is no such
* field the request will be rejected. Stored files are cleaned up on exit but not on failure.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does storeUploadedFiles Directive reject request if field is empty?
It seems there is no implementation of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested PR that was successfully built and tested by Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants