Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMTP binary file transfer using a callback results in corrupted file #4860

Closed
RuurdBeerstra opened this issue Jan 28, 2020 · 7 comments
Closed

SMTP binary file transfer using a callback results in corrupted file #4860

RuurdBeerstra opened this issue Jan 28, 2020 · 7 comments
Labels

Comments

@RuurdBeerstra
Copy link
Contributor

@RuurdBeerstra RuurdBeerstra commented Jan 28, 2020

I did this

Transfer a binary file using SMTP. Use the curl_mime_data_cb function to obtain
the binary data. When the file exceeds some 64KB in size, the resulting MIME attachment
at the receiving end is exactly the right size, but the contents is corrupted.

When other means are used to pass the data to curl (name of a file, text string)
the problem does NOT occur, only when using curl_mime_data_cb.

The internal buffering in cURL seems to send a few buffered bytes twice, omitting others.
I have written a small reproduction program, using an example program as a start.
It fails predictably, every time, using the latests cURL version.
I have used a test file produced with this:

    $ awk 'BEGIN {for(i = 0; i < 50000; i++) printf("Small line %06d\n",i);}' > many_lines.txt

That produces a file like this :
-rw-rw-r-- 1 rbeerstra bea-tools 900000 Jan 27 16:14 many_lines.txt

The attached mime.c example program mails that to me using SMTP. The received mail has an
attachment, which is exactly 900000 bytes (which is correct). But when opened with a text editor, looking at line 3615:

Small line 003613
Small line 003614
Small line 00361ne 003612
Small line 003613
Small line 003614
Small line 003615
Small line 003616
Small line 003617
Small line 003618
Small line 003619
Small line 003620
Small line 003621
Small line 003622
Smal03626
Small line 003627
Small line 003628

There is corruption at that point, always. Part is repeated, another part is missing.
When I delete all proper lines like so (in VI): :g/^Small line ....../d
it should have deleted all lines, but I end up with:
Smal03626
Sm 007267
ne 010908
Small lineall line 014535
Small liSmall line 018176
Small 6
Smal457
Small lineSmall line 025472
Sm29098
Small li2
032739
Small 753
Smal36394
Sm 040035
ne 043676
Small lineall line 047303

So every 3600-ish lines the problem repeats, showing a buffering problem.
Obviously, when the binary data is less predictable, errors are harder to spot or, worse,
end up being undetected. It was spotted in our case by a failing 'gunzip' of the attachment.

When the "binary" in the test program is changed into "text" (see lines 101/102), the problem does NOT occur.

I expected the following

Binary files should be transferred correctly :-)

curl/libcurl version

I tested with 2 versions: The first is the one we currently incorporate in our producte: curl-7.65.1.

curl -V output:
curl 7.65.1 (x86_64-pc-linux-gnu) libcurl/7.65.1 OpenSSL/1.1.1c zlib/1.2.7
Release-Date: 2019-06-05
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

The 2nd is the latest one from https://curl.haxx.se/download.html: curl-7.68.0
curl -V output:
curl 7.68.0 (x86_64-pc-linux-gnu) libcurl/7.68.0 OpenSSL/1.1.1c zlib/1.2.7
Release-Date: 2020-01-08
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS HTTPS-proxy IPv6 Largefile libz NTLM NTLM_WB SSL TLS-SRP UnixSockets

Both fail in exactly the same way.

operating system

Bug was spotted on Windows, reproduced on Linux. I think it is OS independent.
mime.c.txt

@bagder bagder added the SMTP label Jan 28, 2020
@bagder
Copy link
Member

@bagder bagder commented Jan 28, 2020

@monnerat is redoing that callback logic in #4833, it would be most interesting if you would be able to try that patch and see if you can still reproduce the problem!

@RuurdBeerstra
Copy link
Contributor Author

@RuurdBeerstra RuurdBeerstra commented Jan 28, 2020

@bagder
Copy link
Member

@bagder bagder commented Jan 28, 2020

You can follow the instructions in #4833 if you want to use the git approach, or you can get the "raw" patch by appending .patch to the PR URL: https://github.com/curl/curl/pull/4833.patch

@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 28, 2020

Maybe this is not the problem, but you should not use "binary" encoding with SMTP.
SMTP data flow requires short CRLF-ending lines, so you should use "base64" if it is really binary, or "quoted-printable" if the data is mostly characters.
"text" encoding does not exist, thus setting encoder as such fails (transfer behaves as if you don't set an encoder).

That said, I'll investigate if there is a problem with encoders.

@RuurdBeerstra
Copy link
Contributor Author

@RuurdBeerstra RuurdBeerstra commented Jan 28, 2020

1st: I'm much impressed with the speedy responses here and the level of support that actually helps :-)

I did a clone of the latest GIT repository, built and tested: Problem persists.
I downloaded the patch 4833 and saw that the cloned repository code did not have those modifications.
Applied the patch, built new lib, linked my test program against it.

Good news: Bug is gone, patch #4833 fixes the issue.
When will this fix be part of the next curl release?

@monnerat: A user-program of ours used curl_mime_encoder(part,"binary");
So does my little test program. And for the test, the "binary" is simple short lines, and that fails.
Also, these days the SMTP servers do not seem to complain about this, and accept and deliver binary content properly. IMHO: If an impossible/invalid thing is asked of curl_perform(), should it not refuse and throw an error?

Thanks very much for your support in this!
Ruurd

@monnerat
Copy link
Contributor

@monnerat monnerat commented Jan 28, 2020

A user-program of ours used curl_mime_encoder(part,"binary");

This is normally for HTTP. I would not rely on it for SMTP.

BTW: I found something weird in the binary encoder: will work on a fix and a test. Thanks for the report :-)

@bagder bagder closed this in b2caaa0 Mar 1, 2020
@bagder
Copy link
Member

@bagder bagder commented Mar 2, 2020

We had to revert the merge, this issue is still present

@bagder bagder reopened this Mar 2, 2020
@bagder bagder closed this in e5b366c Mar 7, 2020
@lock lock bot locked as resolved and limited conversation to collaborators Jun 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.