New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Raw option on win64 doubles amount of data when used with HTTP Chunks #2303

Closed
driekus77 opened this Issue Feb 10, 2018 · 19 comments

Comments

Projects
None yet
4 participants
@driekus77

driekus77 commented Feb 10, 2018

I did this

I'm investigating HTTP Chunked vs No Chunks using NancyFX 2.0 / Kestrel Web Server. During this investigation I noticed big file differences when using cURL Raw option. The output both for binary and text doubles in size when using cURL on Win64 in combination with HTTP Chunking.
When I use cURL on my macbook to the same server url I get the expected size and result.

I expected the following

Same result on Win64 as on my Macbook!

curl/libcurl version

curl/7.58.0 WRONG (Win64)
curl/7.43.0 GOOD (Mac High Sierra)

[curl -V output]
On my macbook where it works as expected:

curl -ivs --no-keepalive --raw -o raw_linecounts_txt_chunked.mac.bin   http://192.168.1.88:8088/tests/stream/LineCounts.txt
*   Trying 192.168.1.88...
* Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> GET /tests/stream/LineCounts.txt HTTP/1.1
> Host: 192.168.1.88:8088
> User-Agent: curl/7.43.0
> Accept: */*

On my Windows 10 64 bits
hp probook where it does not work as expected:**

curl -ivs --no-keepalive --raw -o raw_linecounts_text_chunks.win64.bin  http://192.168.1.88:8088/tests/stream/LineCounts.txt
*   Trying 192.168.1.88...
* TCP_NODELAY set
* Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> GET /tests/stream/LineCounts.txt HTTP/1.1
> Host: 192.168.1.88:8088
> User-Agent: curl/7.58.0
> Accept: */*

operating system

Problem showed up on Windows 10 - 64 bits on my HP Probook.

Attached 4 files zipped showing the problem when you diff them:
cURL_Raw_Chunked_vs_NotChunked.zip

I'm not that in to network related stuff so it could be I'm doing something wrong. But the difference in file size is strange.

Kind regards,

Henry Roeland

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Feb 10, 2018

Member

Can you please upload LineCounts.txt

Member

jay commented Feb 10, 2018

Can you please upload LineCounts.txt

@jay jay added the HTTP label Feb 10, 2018

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 10, 2018

A sorry for that! According to Notepad++ this file is UTF-8 without a Byte Order Mark (BOM).
LineCounts.txt

driekus77 commented Feb 10, 2018

A sorry for that! According to Notepad++ this file is UTF-8 without a Byte Order Mark (BOM).
LineCounts.txt

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 10, 2018

Member

If you use --trace-ascii dumpfile you'll see exactly what curl receives (and sends). Can you attach the 'dumpfile' of the problematic case here for us?

To me this looks like your server sends curl something weird. What does curl say on this response when you don't use --raw ?

Member

bagder commented Feb 10, 2018

If you use --trace-ascii dumpfile you'll see exactly what curl receives (and sends). Can you attach the 'dumpfile' of the problematic case here for us?

To me this looks like your server sends curl something weird. What does curl say on this response when you don't use --raw ?

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 10, 2018

Curl version 7.53.1 is not having this issue:

> curl -ivs --no-keepalive --raw -o raw2_linecounts_text_chunks.win64.bin  http://192.168.1.88:8088/tests/stream/LineCounts.txt
> *   Trying 192.168.1.88...
> * TCP_NODELAY set
> * Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> > GET /tests/stream/LineCounts.txt HTTP/1.1
> > Host: 192.168.1.88:8088
> > User-Agent: curl/7.53.1
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < Date: Sat, 10 Feb 2018 22:50:53 GMT
> < Content-Type: text/plain
> < Server: Kestrel
> < Transfer-Encoding: chunked
> <
> { [8479 bytes data]
> * Connection #0 to host 192.168.1.88 left intact

curls_7_53_1_Test.zip

driekus77 commented Feb 10, 2018

Curl version 7.53.1 is not having this issue:

> curl -ivs --no-keepalive --raw -o raw2_linecounts_text_chunks.win64.bin  http://192.168.1.88:8088/tests/stream/LineCounts.txt
> *   Trying 192.168.1.88...
> * TCP_NODELAY set
> * Connected to 192.168.1.88 (192.168.1.88) port 8088 (#0)
> > GET /tests/stream/LineCounts.txt HTTP/1.1
> > Host: 192.168.1.88:8088
> > User-Agent: curl/7.53.1
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> < Date: Sat, 10 Feb 2018 22:50:53 GMT
> < Content-Type: text/plain
> < Server: Kestrel
> < Transfer-Encoding: chunked
> <
> { [8479 bytes data]
> * Connection #0 to host 192.168.1.88 left intact

curls_7_53_1_Test.zip

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 10, 2018

Now the dump/trace files for both chunked and no chunks:
trace.zip

Using cURL on Windows 7.58.0 as before!

driekus77 commented Feb 10, 2018

Now the dump/trace files for both chunked and no chunks:
trace.zip

Using cURL on Windows 7.58.0 as before!

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Feb 10, 2018

Member

hm something is up. i'll bisect it

Member

jay commented Feb 10, 2018

hm something is up. i'll bisect it

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 11, 2018

For clarification:
I don't see the wrong data in the browser (Chrome) or under Wireshark when using HTTP Chunks.
Only when using the --raw function in cURL version 7.58.0 on Windows 64 bits.

driekus77 commented Feb 11, 2018

For clarification:
I don't see the wrong data in the browser (Chrome) or under Wireshark when using HTTP Chunks.
Only when using the --raw function in cURL version 7.58.0 on Windows 64 bits.

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 11, 2018

For anybody interested:
I found a public URL which serves HTTP Chunked image:

curl -ivs --no-keepalive --raw  -o raw_chunkedimage.jpg http://www.httpwatch.com/httpgallery/chunked/chunkedimage.aspx

Difference in file sizes between cURL version 7.53.1 and 7.58.0 on Win64:

> User-Agent: curl/7.53.1
02/11/2018  12:25 PM            34,196 raw_chunkedimage.jpg

> User-Agent: curl/7.58.0
02/11/2018  12:25 PM            67,849 raw_chunkedimage.jpg

driekus77 commented Feb 11, 2018

For anybody interested:
I found a public URL which serves HTTP Chunked image:

curl -ivs --no-keepalive --raw  -o raw_chunkedimage.jpg http://www.httpwatch.com/httpgallery/chunked/chunkedimage.aspx

Difference in file sizes between cURL version 7.53.1 and 7.58.0 on Win64:

> User-Agent: curl/7.53.1
02/11/2018  12:25 PM            34,196 raw_chunkedimage.jpg

> User-Agent: curl/7.58.0
02/11/2018  12:25 PM            67,849 raw_chunkedimage.jpg
@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 11, 2018

It looks to be a version thing: cURL 7.58.0 builded on my macbook has te same issue.

-rw-r--r--  1 henry  staff    **33K** Feb 11 22:12 raw_chunkedimage_v7_43_0.jpg
-rw-r--r--  1 henry  staff    **66K** Feb 11 22:13 raw_chunkedimage_v7_58_0.jpg

Diff on trace files:

diff raw_chunkedimage_v7_43_0.jpg.trdmp raw_chunkedimage_v7_58_0.jpg.trdmp 
1a2
> == Info: TCP_NODELAY set
6c7
< 004e: User-Agent: curl/7.43.0
---
> 004e: User-Agent: curl/7.58.0
30c31
< 0000: Date: Sun, 11 Feb 2018 21:17:44 GMT
---
> 0000: Date: Sun, 11 Feb 2018 21:19:15 GMT

Notice the TCP_NODELAY difference! But when I explicitly set

--tcp-nodelay

for both raw output I still have double for v7.58.0.

Trace files:
Archive.zip

driekus77 commented Feb 11, 2018

It looks to be a version thing: cURL 7.58.0 builded on my macbook has te same issue.

-rw-r--r--  1 henry  staff    **33K** Feb 11 22:12 raw_chunkedimage_v7_43_0.jpg
-rw-r--r--  1 henry  staff    **66K** Feb 11 22:13 raw_chunkedimage_v7_58_0.jpg

Diff on trace files:

diff raw_chunkedimage_v7_43_0.jpg.trdmp raw_chunkedimage_v7_58_0.jpg.trdmp 
1a2
> == Info: TCP_NODELAY set
6c7
< 004e: User-Agent: curl/7.43.0
---
> 004e: User-Agent: curl/7.58.0
30c31
< 0000: Date: Sun, 11 Feb 2018 21:17:44 GMT
---
> 0000: Date: Sun, 11 Feb 2018 21:19:15 GMT

Notice the TCP_NODELAY difference! But when I explicitly set

--tcp-nodelay

for both raw output I still have double for v7.58.0.

Trace files:
Archive.zip

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 12, 2018

Member

TCP_NODELAY is set by default since 7.50.2, so totally expected.

Member

bagder commented Feb 12, 2018

TCP_NODELAY is set by default since 7.50.2, so totally expected.

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 12, 2018

Tested some older versions and found the version in which its different:

-rw-r--r--  1 henry  staff    33K Feb 12 01:58 raw_chunkedimage_v7_56_0.bin
-rw-r--r--  1 henry  staff    33K Feb 12 02:03 raw_chunkedimage_v7_56_1.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:53 raw_chunkedimage_v7_57_0.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:52 raw_chunkedimage_v7_58_0.bin

Source compare and Github blame comes up with:
dbcced8#diff-3bd07f668a09e230441f7991bc8a68ca

Good luck in fixing!
Keep up the good work with cURL!

driekus77 commented Feb 12, 2018

Tested some older versions and found the version in which its different:

-rw-r--r--  1 henry  staff    33K Feb 12 01:58 raw_chunkedimage_v7_56_0.bin
-rw-r--r--  1 henry  staff    33K Feb 12 02:03 raw_chunkedimage_v7_56_1.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:53 raw_chunkedimage_v7_57_0.bin
-rw-r--r--  1 henry  staff    66K Feb 12 01:52 raw_chunkedimage_v7_58_0.bin

Source compare and Github blame comes up with:
dbcced8#diff-3bd07f668a09e230441f7991bc8a68ca

Good luck in fixing!
Keep up the good work with cURL!

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 12, 2018

Collaborator

My bad: bug introduced in commit dbcced8.
I will issue a fix ASAP.

Collaborator

monnerat commented Feb 12, 2018

My bad: bug introduced in commit dbcced8.
I will issue a fix ASAP.

monnerat added a commit that referenced this issue Feb 12, 2018

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 12, 2018

Collaborator

Commit 155ea88 in master should fix the issue.

Collaborator

monnerat commented Feb 12, 2018

Commit 155ea88 in master should fix the issue.

@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Feb 12, 2018

Member

Commit 155ea88 in master should fix the issue.

Works here.

Member

jay commented Feb 12, 2018

Commit 155ea88 in master should fix the issue.

Works here.

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 12, 2018

Collaborator

Works here.

Thanks for testing !

Collaborator

monnerat commented Feb 12, 2018

Works here.

Thanks for testing !

@bagder

This comment has been minimized.

Show comment
Hide comment
@bagder

bagder Feb 12, 2018

Member

Can any of you think of a test we could create that would've caught this?

Member

bagder commented Feb 12, 2018

Can any of you think of a test we could create that would've caught this?

@monnerat

This comment has been minimized.

Show comment
Hide comment
@monnerat

monnerat Feb 12, 2018

Collaborator

I will try to create one.

Collaborator

monnerat commented Feb 12, 2018

I will try to create one.

@driekus77

This comment has been minimized.

Show comment
Hide comment
@driekus77

driekus77 Feb 12, 2018

Test:
The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

Question:
Are you guys into unit testing, integration testing or system testing?

Building v7.58.0 from cloned master on my mac showed me:

-rw-r--r--  1 henry  staff  67849 Feb 12 18:17 raw_chunkedimage_v7_58_0_beforePatch.bin
-rw-r--r--  1 henry  staff  34196 Feb 12 18:32 raw_chunkedimage_v7_58_0_afterPatch.bin

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

Thanks guys for the quick actions and feedback!
I really like curl and its enjoying to work with.

Kind regards,
Henry Roeland

driekus77 commented Feb 12, 2018

Test:
The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

Question:
Are you guys into unit testing, integration testing or system testing?

Building v7.58.0 from cloned master on my mac showed me:

-rw-r--r--  1 henry  staff  67849 Feb 12 18:17 raw_chunkedimage_v7_58_0_beforePatch.bin
-rw-r--r--  1 henry  staff  34196 Feb 12 18:32 raw_chunkedimage_v7_58_0_afterPatch.bin

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

Thanks guys for the quick actions and feedback!
I really like curl and its enjoying to work with.

Kind regards,
Henry Roeland

monnerat added a commit that referenced this issue Feb 13, 2018

tests: new tests for http raw mode
Test 319 checks proper raw mode data with non-chunked gzip
transfer-encoded server data.
Test 326 checks raw mode with chunked server data.

Bug: #2303
Closes #2308
@jay

This comment has been minimized.

Show comment
Hide comment
@jay

jay Feb 13, 2018

Member

The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

No. --raw disables all decoding. In this case you have included the http headers in the response (-i) so those come first, then the chunked encoding is not being decoded so the contents will be the hex value of each chunk and then the chunk, and finally a chunk of 0 (assuming the transfer completed).

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

I checked Win64 and it works there. Thanks for your report and all your follow-ups.

Are you guys into unit testing, integration testing or system testing?

curl has unit tests and also full tests using the curl tool, which also tests libcurl and I guess you could refer to as system testing. integration testing depends how you define it. the tests are not combined, they are run sequentially. if some scenario needs to be varied in most cases there's a libcurl test with an ifdef guard separating the two tests or just a separate test.

@monnerat added 2 tests for this issue in e551910.

Member

jay commented Feb 13, 2018

The sum of all the Chunk lengths should be equal to the resulting raw file size(?).

No. --raw disables all decoding. In this case you have included the http headers in the response (-i) so those come first, then the chunked encoding is not being decoded so the contents will be the hex value of each chunk and then the chunk, and finally a chunk of 0 (assuming the transfer completed).

So on Mac its fine. Unfortunately I don't have time to rebuild it on Win64 but I think this is not really necessary to check it there.

I checked Win64 and it works there. Thanks for your report and all your follow-ups.

Are you guys into unit testing, integration testing or system testing?

curl has unit tests and also full tests using the curl tool, which also tests libcurl and I guess you could refer to as system testing. integration testing depends how you define it. the tests are not combined, they are run sequentially. if some scenario needs to be varied in most cases there's a libcurl test with an ifdef guard separating the two tests or just a separate test.

@monnerat added 2 tests for this issue in e551910.

@jay jay closed this Feb 13, 2018

@lock lock bot locked as resolved and limited conversation to collaborators May 14, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.