Mime API #1839

Closed
wants to merge 34 commits into
from

Conversation

Projects
None yet
6 participants
Collaborator

monnerat commented Aug 29, 2017

This PR is rather an RFC, since I have push permission.
I already tried such a RFC in the mailing list, without any response, and hope this PR will trigger some.

The commits here implements a new MIME API that can be used to replace the old form API, as well as compose a multipart mail message to be sent by SMTP or appended via IMAP. See CURLOPT_MIMEPOST.

The cli tool has been changed to use it for the -F option instead of CURLOPT_HTTPPOST. It properly generates --libcurl code accordingly. The -F argument syntax has been extended to support multipart mails.

The old form API now uses the mime API for its intermediate data (getformdata).

Some related tests have been added. The only one that fails is 1135 (check entry points order): I did nothing about it yet since it'll probably require an SONAME change too.

As a new behavior: "Expect 100-continue" is not generated for form posts that are measured smaller that the configured limit.

Documentation has been updated accordingly.

The only things that left to implement are:

  • real implementation of content-transfer encoders (base64, quoted-printable).
  • maybe a text/binary feature for file and memory data: it will ease EOL handling and better support non-ascii platforms.

Should I commit in master branch ?

monnerat added some commits Aug 18, 2017

http: introduce mime posting as a form posting replacement.
New CURLOPT_MIMEPOST option with a curl_mime * parameters.
Old CURLOPT_HTTPPOST option unaltered.
Curl_base64_encode: always call with a real data handle.
Some calls in different modules were setting the data handle to NULL, causing
segmentation faults when using builds that enable character code conversions.
http: improve resume/range.
- Use read when no seek function is provided.
- Disable for mime/form: no sense with multipart.

Also properly rewind form when needed.
mime: save easy handle in structures.
This is required to support character code conversions.
non-ascii: allow conversion functions to be called with a NULL data h…
…andle.

This fixes segmentation faults, such as they can occur when calling
deprecated curl_escape() and curl_unescape() functions with a build that
enables character code conversions.
Using a NULL handle should be however avoided: a conversion descriptor is
created and deleted each time a conversion function is called with a NULL
handle, and thus is not very efficient.
mime: add parent pointers in mime and part structures.
This allows checking if subparts are not attached twice and will help
curl tool finish a multipart.
runtests.pl: Apply strippart to upload too.
This allows substitution of boundaries in mail messages.
checksrc: disable SPACEBEFOREPAREN for case statement.
The case keyword may be followed by a constant expression and thus should
allow it to start with an open parenthesis.
test647: IMAP APPEND multipart message.
Also document --form extended syntax.

Coverage Status

Coverage decreased (-0.09%) to 73.003% when pulling 5ae54b6 on monnerat:mime into ff40c89 on curl:master.

Coverage Status

Coverage decreased (-0.08%) to 73.016% when pulling d7f508a on monnerat:mime into ff40c89 on curl:master.

Coverage Status

Coverage decreased (-0.1%) to 72.986% when pulling 5140fcc on monnerat:mime into ff40c89 on curl:master.

monnerat added some commits Aug 30, 2017

Coverage Status

Coverage decreased (-0.009%) to 73.083% when pulling 4b97d70 on monnerat:mime into ff40c89 on curl:master.

@bagder bagder self-requested a review Aug 30, 2017

monnerat added some commits Aug 31, 2017

runtests.pl: allow <file[1-4]> tags in client section.
This enables tests to create more than one file on the client side.
test564: Add a warning comment about shell profile output.
Shell profile output makes the SSH server failing and this problem reason
is not easy to find when no hint is given.

Coverage Status

Coverage decreased (-0.08%) to 73.016% when pulling e39c7ab on monnerat:mime into ff40c89 on curl:master.

- return TRUE;
- return FALSE;
+
+ return result;
@bagder

bagder Aug 31, 2017

Owner

This looks like a fix that is totally independent from the mime API work, right? If so I think we should cherry-pick it out from here and land it first, separately so that we don't make this work bigger than it has to be...

@monnerat

monnerat Aug 31, 2017

Collaborator

Yes, I encountered a few of them during this work (like NULL data pointer in calling code conversion functions), and fixed them on the fly. Fixes are tiny however.

Owner

bagder commented Aug 31, 2017

@monnerat, I think this is lovely work and I think you've done great work at pulling it this far already. This is awesome!

Here are a few comments and ideas that popped up during my first reviews of this large PR:

  1. curl_mime_encoder() - is documented to not be implemented. That's fine, but I think we should not add the man page until the function actually exists...
  2. curl_mime_file() the use of a FILE * argument is problematic and needs to be reconsidered. On some platforms (like Windows), we can't pass such pointers from an application to libcurl running as a DLL! The ownership of that memory is not passed over so the DLL cannot access it. Also, the related function curl_mime_filedata() takes a file name as argument so why the different choices? If *_file() would take a file name, it would also be easier to figure out the file size without resorting to seeking it. Which takes me to the next point...
  3. Maybe we shouldn't try to seek or fstat the file to get the size but instead somehow get that set by the user as well, so that we better can support growing (or shrinking?) files? We don't today, so maybe it isn't worth it...
  4. I miss the AVAILABILITY section in the new man pages that explain in which version they can be assumed to exist.
  5. The CURLOPT_MIMEPOST man page should probably mention that it deprecates CURLOPT_HTTPPOST
  6. It'd be very cool to get a bunch of unit tests written for these functions. It shouldn't be that hard and will help us to make sure they work as intended now and avoid regressions in the future.
  7. What do you think of the the "new form post API" wiki? Should we just remove that or update to reflect your work? It might be worth doing the latter, so that people can get a more easy overview on the new API.
  8. It'd be cool with a curl_mime_* version of the postit2.c example!
  9. Perhaps even some sort of basic documentation on how to convert a curl_formadd app to a curl_mime one.
  10. This is a lot of commits. Is this the minimum number of commits you want it presented/merged as, or do you have iterate development in here that could be squashed into fewer commits?
Collaborator

monnerat commented Aug 31, 2017

Hi Daniel,
Many thanks for the review and the suggestions. I'm happy this work pleases you. From your reply, I may understand you accept this code and that I can commit after having changed the points you mention.

  1. curl_mime_encoder: OK to remove for the moment. I have created it as a placeholder, because the entry point exists (I want to add entry points in one chunk to avoid more than one SONAME change). In any case, I'll implement it in the following days.
  2. curl_mime_file: well, the MIMEKIND_FILE is there primarily to support the "-" pseudo file name for stdin read, and I estimated this feature can be extended to other opened files via a curl_mime_file function (maybe unnamed pipes or files we don't know the name ?). I understand your W$ caveat, but does it apply to stdin too ?
  3. seek/fstat size computing: on one side, you are true and maybe we should give the size in argument. On the other side, files that change size during their use is very bad for this code (and probably for other data sources in libcurl) because we do our best to know the exact size before transfer (to avoid chunked encoding). Thus I think this point has to be discussed and clarified. Maybe we should limit the read size to the one used in Curl_mime_size(): this may improve the behavior with growing files, but not with shrinking ones...
  4. AVAILABILITY: I'll set it in the next commits.
  5. CURLOPT_MIMEPOST: will mention CURLOPT_HTTPPOST deprecation.
  6. Unit tests: I did not really look at them yet, but the API is pretty much tested now that the curl tool uses it for forms. In addition, there are specific lib* tests. The drawback is: the old form API is not tested a lot anymore. Maybe you can tell me what unit test syou think of?
  7. IMHO , form post API wiki can be kept as a kind of tutorial, but must be updated according to the effective mime API, and kept up to date.
  8. postit2.c: yes, you're right. In addition, we could keep the same name (obsoleting the old form API example), and adapt the program to the new API.
  9. API use conversion doc: yes, but very basic, because there are some unclear behaviors in the old API. I looked at it deeply to recode the Curl_getformdata() function and the new Curl_mime_prepare_headers() functions. IMHO, this task has however a lower priority.
  10. I tried to commit at "checkpoints", i.e.: once a new coding step that can be tested was achieved. In addition, there are some commits due to fixes after travis-CI and AppVeyor tell me new warnings I do not have with local compilations. But if you want I merge commits before applying them to master, that's OK. But in any case, this is a big work and having several commits was the best way to publish the work for continuous evaluation by the community during coding.

I'll fix the points above in the current PR before we have a new checkpoint.

Cheers,
Patrick

Coverage Status

Coverage decreased (-0.1%) to 72.964% when pulling 624f380 on monnerat:mime into ff40c89 on curl:master.

Collaborator

monnerat commented Aug 31, 2017

One more question: do we have to provide a curl_mime_get() function as there currently is a curl_formget() ?

Owner

bagder commented Sep 1, 2017

I can commit after having changed the points you mention

Sure! We have until Wednesday to merge and then 4 more weeks to polish details. Since this is a new API, the risk of breaking things for old users should be minimal.

FILE * on Windows

I think the problem is quite simply that an application cannot pass a FILE * to a DLL and have that DLL use the data, as the ownership of that struct isn't passed over. It doesn't matter what sort of file the handle has opened, stdin or a regular file.

Unit tests

I was thinking of more specific tests that add stuff and then verifies how the POST ends up, with legal and illegal input etc. But as you say, the basic functionality is verified already pretty good at least for when used the way we think it should be used. We can add more tests after landing too.

do we have to provide a curl_mime_get() function

I think we can leave that one out until we actually hear from someone asking for that functionality. We added curl_formget() a long time ago but based on basically zero bug reports and no questions about it through the years, I estimate there really aren't any users of it.

Also, with our support for reading data from files on demand and streaming via callbacks etc, it has a very limited use. (Back when the formget function was added, the entire formpost was alwas generated and kept in memory and thus "easy" to provide.)

fchandur commented Sep 1, 2017

We added curl_formadd() a long time ago but based on basically zero bug reports and no questions about it through the years, I estimate there really aren't any users of it.

I can confirm there is at least one direct user of curl_formadd() :).

The lack of questions might be because it is well documented and that I took time to carefully read it.

Owner

bagder commented Sep 1, 2017

I can confirm you there is at least one direct user of curl_formadd() :).

Sorry, I meant to write formget() there... =) (and I'll edit my post now to say so)

Collaborator

monnerat commented Sep 1, 2017

@bagder

Since this is a new API, the risk of breaking things for old users should be minimal.

This is not completely true: Cul_getformdata has been completely rewritten and the intermediate form is kept as a mime structure. CURLOPT_HTTPOST may be impacted. In addition, cli tool -F option now uses CURLOPT_MIMEPOST.

FILE * on Windows

an application cannot pass a FILE * to a DLL and have that DLL use the data...

I can hardly believe that: it seems to be a very restricting caveat of the OS. In addition, other kind of pointers can be passed to a DLL and pointed data can be accessed by the DLL without problem. AFAICR, libxml2 does it without problem. On the other side, it is true I am not a W$ speciallist (same for you, AFAIK): I don't even have one here. Maybe we should have a file descriptor argument rather than a FILE * and use 0 for stdin ?

... It doesn't matter what sort of file the handle has opened, stdin or a regular file.

That means the old form API "-" file name should already fail on W$. Does it ?

Collaborator

monnerat commented Sep 2, 2017

This is now pushed in the master tree. Thanks for review.
Closing.

@monnerat monnerat closed this Sep 2, 2017

Owner

jay commented Sep 3, 2017

I get a few warnings for conversion from 'curl_off_t' to 'size_t', possible loss of data in Visual Studio x86 (curl_off_t 8, size_t 4) It looks like it is because datasize and origin are used for both memory and file so in the latter case they need to be curl_off_t and that may be larger than size_t. the first two truncations below are the in memory functions, and the third one I don't know but I'm guessing same

curl/lib/mime.c

Line 285 in 3baf36e

size_t sz = part->datasize - part->state.offset;

curl/lib/mime.c

Line 315 in 3baf36e

part->state.offset = offset;

escaped = c_escape(data, part->datasize);

Owner

jay commented Sep 3, 2017

an application cannot pass a FILE * to a DLL and have that DLL use the data...

I can hardly believe that: it seems to be a very restricting caveat of the OS. In addition, other kind of pointers can be passed to a DLL and pointed data can be accessed by the DLL without problem. AFAICR, libxml2 does it without problem. On the other side, it is true I am not a W$ speciallist (same for you, AFAIK): I don't even have one here. Maybe we should have a file descriptor argument rather than a FILE * and use 0 for stdin ?

In Windows FILE / fd are only supposed to be operated on by the module that opened them. The reason for that is because in Windows each module can have its own independent CRT (C runtime). The same is true of malloc, for example if you malloc in one DLL you can't free that memory in another DLL (it's a little more complicated than that but for brevity just assume that for now). You can get away with it if your modules share the same DLL CRT, but that's not something we can depend on.

Regarding 0/1/2 it's tricky. The CRTs can share the same underlying native HANDLEs to start but let's say for example one CRT calls freopen on stdout, that causes the underlying handle to be closed and then in that one CRT stdout has a new HANDLE so it writes to a file. But then the other modules don't know that so they will continue to write to the HANDLE that is closed, or worse some new HANDLE that has the old HANDLE value. In any case printf and other calls will continue to return bytes written as if they wrote something.

Collaborator

monnerat commented Sep 3, 2017

@jay: thanks a lot for reporting the conversion warnings. I've pushed a fix that should avoid them.

Thanks also for the FILE in DLLs explanation. This makes me think libcurl may already have some potential problems for years:

  • Reopening stdin: this falls in the 0/1/2 case you mention. The problem can occur if CURLOPT_READFUNCTION has not been set (it still is libcurl's CRT fread). It can probably also occur where pseudo-fillename "-" is used.
  • Setting CURLOPT_READDATA to another file pointer without setting CURLOPT_READFUNCTION to caller's CRT fread.

We probably may have similar problems on output.

Maybe something must be said about it in the documentation ?

For the current mime case, I suppressed curl_mime_file() from exported entry points, but I had to retain it internally to support pseudo-file "-" for compatibility with the old form API, thus keeping potential problems. For the feature itself, I guess it should be emulated by the caller using curl_mime_data_cb().

Many thanks again for your comments.

Owner

jay commented Sep 4, 2017

It actually is noted in CURLOPT_{READ,WRITE}DATA that if one sets the parameter that way then they must set the respective function as well.

On another topic, I see in 1ab9e9b just now that you bumped the soname and the order of the API is different. That seems unnecessary, couldn't you have just preserved the existing order of the API?

Owner

bagder commented Sep 4, 2017

@monnerat:

I have created it as a placeholder, because the entry point exists (I want to add entry points in one chunk to avoid more than one SONAME change). In any case, I'll implement it in the following days.

We only ever do one SONAME bump per release anyway so it would not have been a problem. I really don't like having function prototypes in curl.h for functions we don't actually provide, but if you will provide this function soon I figure it'll be fine.

And as @jay remarks, the DLL problems we have are already noted in the docs and I my remarks on the mime API was in order to avoid us doing more of those same mistakes.

@jay:

couldn't you have just preserved the existing order of the API?

It's really quirky to maintain the order as then all new prototypes have to be added at the end of multi.h the way @dscho had to do it for the multissl API. So once we've agreed on this minor SONAME bump, we can just as well clean things up and put the prototypes where they belong in an order that makes better sense for us and future users.

Collaborator

monnerat commented Sep 4, 2017

@jay:

In addition to bagder's soname explanation, please note it has been done the way it should: it will be effective only on platforms where the entry point order matters.

@bagder:

... but if you will provide this function soon I figure it'll be fine.

I'm currently working on it !

Collaborator

dfandrich commented Sep 8, 2017

Collaborator

dfandrich commented Sep 8, 2017

This is probably also the root of this test 650 failure, with CURL_MAX_WRITE_SIZE set to its smallest allowed value of 1024: https://curl.haxx.se/dev/log.cgi?id=20170908020827-25842#prob7

Collaborator

dfandrich commented Sep 8, 2017

Here's a different build failure on another non-HTTP build: https://curl.haxx.se/dev/log.cgi?id=20170908020807-25537#prob1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment