Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document consuming multipart entities #807

Open
jrudolph opened this issue Jan 23, 2017 · 8 comments
Open

Document consuming multipart entities #807

jrudolph opened this issue Jan 23, 2017 · 8 comments
Labels
3 - in progress Someone is working on this ticket hackathon Issues that could be tackled during a hackathon help wanted Identifies issues that the core team will likely not have time to work on t:docs Issues related to the documentation

Comments

@jrudolph
Copy link
Member

Not only for the server-side (i.e. for consuming file-uploads) but also for the client-side where servers may return multipart/byteranges or other multipart datastructures.

Information it should contain:

  • how to use Unmarshal(response / entity).to[Multipart.General]
  • other subclasses of Multipart
  • explain stream-of-streams nature of Multipart.parts and Multipart.BodyPart.entity.dataBytes
  • note about head-of-line blocking: the data of a part needs to be read fully before a new part will be dispatched. Just filtering parts can deadlock a stream.
@jrudolph jrudolph added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted help wanted Identifies issues that the core team will likely not have time to work on t:docs Issues related to the documentation labels Jan 23, 2017
@domitian
Copy link
Contributor

@jrudolph I would like to take this task if other's haven't picked it up.

@jrudolph
Copy link
Member Author

👍 @domitian yes, please go ahead. Thanks for taking this on.

@jrudolph jrudolph assigned jrudolph and unassigned jrudolph Jul 26, 2017
@jrudolph jrudolph added 3 - in progress Someone is working on this ticket and removed 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted labels Jul 26, 2017
@daddykotex
Copy link
Contributor

daddykotex commented Apr 28, 2018

Hi, sorry for raising an old issue.

Today at work, we had to deal with a multipart form containing form data part (that are not files) and a file. Our initial approach was to use storeUploadedFile(file, destFile) and formFieldsMap together. Unfortunately, thoses two directives read the body and when the entity was larger than a few hundreds KBs, we would end up with a 404 (an exception being swallowed in the formFieldsMap directive.

So I wrote this, and I'm wondering if it would fit in the akka-http code base:

type FileNameFn = FileInfo  File

final case class PartsAndFiles(form: immutable.Map[String, List[String]], files: immutable.Seq[(FileInfo, File)]) {
  def addForm(fieldName: String, content: String): PartsAndFiles = this.copy(
    form = {
      val existingContent: List[String] =this.form.getOrElse(fieldName, List.empty)
      val newContents: List[String] = content :: existingContent

      this.form + (fieldName -> newContents)
    }
  )
  def addFile(info: FileInfo, file: File): PartsAndFiles = this.copy(
    files = this.files :+ ((info, file))
  )
}
object PartsAndFiles {
  val Empty = PartsAndFiles(immutable.Map.empty, immutable.Seq.empty)
}

  def fileUploadAndForm(
    fileFields: immutable.Seq[(String, FileNameFn)]
  ): Directive1[PartsAndFiles] =
    entity(as[Multipart.FormData]).flatMap { formData 
      extractRequestContext.flatMap { ctx 
        implicit val mat = ctx.materializer
        implicit val ec = ctx.executionContext

        val uploadingSink =
          Sink.foldAsync[PartsAndFiles, Multipart.FormData.BodyPart](PartsAndFiles.Empty) {
            (acc, part) 
              def discard(): Future[PartsAndFiles] = {
                part.entity.discardBytes()
                Future.successful(acc)
              }

              part.filename.map { _ 
                fileFields.find(_._1 == part.name)
                  .map {
                    case (_, destFn) 
                      val fileInfo = FileInfo(part.name, part.filename.get, part.entity.contentType)
                      val dest = destFn(fileInfo)

                      part.entity.dataBytes.runWith(FileIO.toPath(dest.toPath)).map { _ 
                        acc.addFile(fileInfo, dest)
                      }
                  }.getOrElse(discard())
              } getOrElse {
                part.entity match {
                  case HttpEntity.Strict(ct, data) if ct.isInstanceOf[ContentType.NonBinary] 
                    val charsetName = ct.asInstanceOf[ContentType.NonBinary].charset.nioCharset.name
                    val partContent = data.decodeString(charsetName)

                    Future.successful(acc.addForm(part.name, partContent))
                  case _ 
                    discard()
                }
              }
          }

        val uploadedF = formData.parts.runWith(uploadingSink)

        onSuccess(uploadedF)
      }
    }

Basically, anything that fits in a Strict entity is taken as a FormPart and added to the Map and parts where a filename is defined and the part name is contained within a list given by the user are stored on disk.

If it's valuable, I'll add more documentation and a few tests and submit a PR.

For example, this form:

POST /submit HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 1101
Content-Type: multipart/form-data; boundary=------------------------5e894a22b602a377
--------------------------5e894a22b602a377
Content-Disposition: form-data; name="name"
Content-Type: text/plain; charset=UTF-8

basic_image
--------------------------5e894a22b602a377
Content-Disposition: form-data; name="version"
Content-Type: text/plain; charset=UTF-8

python27
--------------------------5e894a22b602a377
Content-Disposition: form-data; name="file"; filename="requirements.txt"
Content-Type: text/plain; charset=UTF-8

tornado==4.2.1
simplejson==3.11.1
app-utils>=1.0.258
requests
blob-store==1.0.11
pandas==0.20.3
futures
--------------------------5e894a22b602a377

Would be handled by:

val destFn: FileNameFn = ???
fileUploadAndForm(Seq("file" -> destFn)) { 
   case PartsAndFiles(form, files) =>
      //files has one item containing the infor about the file requirements.txt
      // form is a map that has 2 entries (name and version)
}

@domitian
Copy link
Contributor

@daddykotex Please do add the documentation, I was not able to do it.

@jrudolph jrudolph added the hackathon Issues that could be tackled during a hackathon label May 29, 2019
@mal19992
Copy link

it may fail here:
case HttpEntity.Strict(ct, data)
the form (even when very small) may not be completely read to memory, and the entity is not necessary Strict

@mal19992
Copy link

mal19992 commented Oct 30, 2021

There is still no easy way to get form processing in scala. It is so much easier in https://commons.apache.org/proper/commons-fileupload/ (it has streaming and etc)

All I need is:

a POST form of multipart/form-data type is submitted. I want

  1. all submitted files (which may be of >200Mb size) to be saved as temp files to disk, and
  2. submitted form fields received as say List[(String,String)].

Basically a Seq of posted files and a Seq of form parameters available simultaneously

@jrudolph
Copy link
Member Author

It seems the storeUploadedFiles directive would almost work (it doesn't collect fields). If you look at its source code, here's where other fields are discarded:

.mapConcat { part =>
if (part.filename.isDefined && part.name == fieldName) part :: Nil
else {
part.entity.discardBytes()
Nil
}

You could try to copy the whole directive and make changes to collect those fields as well.

Alternatively, if you can control the format, I'd rather avoid using form fields and file uploads at the same time and use URL parameters instead for the remaining fields to avoid the obligatory issues with streaming.

@daddykotex
Copy link
Contributor

The documentation is also proposing another alternative to get the files into temp files and have the form fields into a map: https://doc.akka.io/docs/akka-http/current/routing-dsl/index.html#file-uploads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - in progress Someone is working on this ticket hackathon Issues that could be tackled during a hackathon help wanted Identifies issues that the core team will likely not have time to work on t:docs Issues related to the documentation
Projects
None yet
Development

No branches or pull requests

4 participants