Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements to multipart parser #2007

Merged
merged 9 commits into from
Jan 5, 2020

Conversation

aemseemann
Copy link
Contributor

This PR addresses some issues with the current multipart parser implementation.

  • Allow part headers to span multiple TCP packets (similar to how HTTP headers are handled). It is very unlikely for this issue to occur for requests with only one part, since all the headers are probably part of the first TCP packet. However, requests with multiple parts may fail randomly if a part header just happens to span across a TCP packet boundary.
  • Allow construction of the MultipartParser instance to fail, e.g. when the 'boundary' part of the Content-Type header is missing.
  • Make parser state (multipart_parser_t) a member of MultipartParser to avoid additional dynamic memory allocation.
  • Allow (and ignore) preamble data. Although preamble data is rarely used in practice, it has been observed that some clients add an additional newline before the message body, which, technically, qualifies as preamble data.

@slaff slaff requested a review from mikee47 December 18, 2019 20:07
parser = nullptr;
if(request.headers.contains(HTTP_HEADER_CONTENT_TYPE)) {
// Content-Type: multipart/form-data; boundary=------------------------a48863c0572edce6
int startPost = request.headers[HTTP_HEADER_CONTENT_TYPE].indexOf(_F("boundary="));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimisation: indexOf(_F("boundary="), 9)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any downside in using indexOf(FS("boundary=")) instead?
At least for small strings and thanks to your SSO, WString should not do a heap allocation, so both _F and FS with its implicit operator WString will essentially do a memcpy_aligned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. With SSO indexOf(F("boundary=")) is also an option.

Copy link
Contributor

@mikee47 mikee47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! just a couple of suggestions.

One thing that's been bugging me about the MultipartParser is that the entire framework needs rebuilding when ENABLE_HTTP_SERVER_MULTIPART changes. I've got an idea about that, will push something to your repo for consideration.

Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
Sming/Components/MultipartParser/MultipartParser.cpp Outdated Show resolved Hide resolved
@mikee47
Copy link
Contributor

mikee47 commented Dec 20, 2019

...the entire framework needs rebuilding when ENABLE_HTTP_SERVER_MULTIPART changes.

I've pushed a commit to mikee47@ca58413.

@slaff slaff added this to the 4.0.1 milestone Dec 21, 2019
Sming/Components/MultipartParser/MultipartParser.h Outdated Show resolved Hide resolved
+ }
+ break;
+
+ case s_preamble:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code should be tested. My very quick test resulted in getting stuck here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, it works for me (tested with the otatool.py upload from PR#2006, modified to include some preamble data).
Can you describe your test setup?
(The preamble data must end with '\r\n' before the first boundary)

@slaff slaff removed the 0 - Backlog label Jan 3, 2020
- p->boundary_length = strlen(boundary);
-
- p->lookbehind = (p->multipart_boundary + p->boundary_length + 1);
+ p->multipart_boundary = boundary;
Copy link
Contributor

@slaff slaff Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aemseemann This is C code. And in C you should use strcpy to copy strings. Here you are making a copy of a pointer that is invalid when you actually have to access multipart_boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention is to copy the pointer and not the string content (to avoid an additional allocation).

It is the caller's responsibility to ensure that the content stays stays valid (and unchanged) throughout the lifetime of the multipart_parser. This is ensured, because the caller is the MultipartParser constructor and both multipart_parser and the boundary string are both members of MultipartParser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aemseemann can you try the following: Compile HttpServer_FirmwareUpload in debug mode for the Host

cd $SMING_HOME/../sample/HttpServer_FirmwareUpload
make dist-clean
make clean
make SMING_ARCH=Host ENABLE_GDB=1 ENABLE_LWIPDEBUG=1 ENABLE_MALLOC_COUNT=0
make flash

Once that finishes if you have valgrind run the application by calling

# setup your virtual network interface. see: https://sming.readthedocs.io/en/latest/_inc/Sming/Arch/Host/Components/lwip/index.html
make valgrind

At least on my system valgrind starts to report issues with the multipart_boundary. Example:

==23134== Invalid read of size 1
==23134==    at 0x809CC14: multipart_parser_execute (multipart_parser.c:112)

You can also run the application in a visual debugger and start monitoring the bytes starting at the memory address that multipart_boundary points to. As far as I can see the first bytes are overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm the bug, thanks for the explanation!
I accidentally took the address from the MultipartParser constructor argument, which was shadowing the boundary member. This is fixed now.

However, valgrind discovered an issue with TcpConnection::checkSelfFree:

==8800== Invalid read of size 1
==8800==    at 0x12C374: checkSelfFree (TcpConnection.h:191)
==8800==    by 0x12C374: TcpConnection::checkSelfFree() (TcpConnection.h:189)
==8800==    by 0x12CD26: TcpConnection::internalOnPoll() (TcpConnection.cpp:507)

The reason is in TcpConnection.cpp:500pp

err_t TcpConnection::internalOnPoll()
{
	//if (tcp->state != ESTABLISHED)
	//	return ERR_OK;

	sleep++;
	err_t res = onPoll();
	checkSelfFree(); /// <<< instance was deleted inside `onPoll`
	debug_tcp_ext("<poll");
	return res;
}

Could be fixed with:

	err_t res = onPoll(); 
	if (res != ERR_TIMEOUT) {
		// on ERR_TIMEOUT, checkSelfFree is already called inside checkSelfFree
		checkSelfFree();
	}

However, I lack the in-depth knowledge of the low level classes and relying on the error code seems quite fragile.

@slaff slaff removed this from the 4.1.0 milestone Jan 3, 2020
@aemseemann
Copy link
Contributor Author

Moving sources to 'src' and code formatting is done.

I am still thinking about @mikee47's proposal. Maybe we can go a step further: Instead of just avoiding the SmingCore recompilation, MultipartParser could be converted into a Library. Then remove ENABLE_HTTP_SERVER_MULTIPART and make it the user's responsibility to do the following call during web server configuration:

server.setBodyParser(MIME_FORM_MULTIPART, formMultipartParser);

@slaff
Copy link
Contributor

slaff commented Jan 4, 2020

Can confirm the bug, thanks for the explanation!

@aemseemann Awesome, thanks a lot for fixing it :)

and make it the user's responsibility to do the following call during web server configuration:

Yes, let's do it. Add the related changes as part of this PR.

The reason is in TcpConnection.cpp:500pp

Create a separate PR for this issue.

@slaff slaff added this to the 4.1.0 milestone Jan 4, 2020
@aemseemann
Copy link
Contributor Author

MultipartParser conversion into a Library is done.

The TcpConnection fix will have to wait untill later this week.

@slaff
Copy link
Contributor

slaff commented Jan 5, 2020

@aemseemann This PR is scheduled for merging. Once the small changes are committed I will take care to merge it in develop.

@slaff slaff merged commit ca120a3 into SmingHub:develop Jan 5, 2020
@slaff slaff removed the 3 - Review label Jan 5, 2020
mikee47 added a commit to mikee47/Sming that referenced this pull request Jan 6, 2020
slaff pushed a commit that referenced this pull request Jan 6, 2020
* Fix issues with debug statements whilst running exit handlers

* Fix issue with `TcpConnection::checkSelfFree()` being called twice

Reported in #2007 (comment) and #2016
@slaff slaff mentioned this pull request Jan 6, 2020
4 tasks
mikee47 added a commit to mikee47/Sming that referenced this pull request Jan 6, 2020
@aemseemann aemseemann deleted the fix/improve-multipart-parser branch January 10, 2020 19:50
slaff pushed a commit to slaff/Sming that referenced this pull request Feb 6, 2020
New features
------------------
- No-WiFi build option SmingHub#2004 - get more resources if your application is not using WIFI.
- Multiple SSL adapters based on axTLS and BearSSL.  SmingHub#1999
- Added basic Crypto support library SmingHub#2014
- Updates framework to build using GCC 9.2.0 toolchain for C++17. SmingHub#1825
- Modbus master SmingHub#1992
- Implemented Small String Optimisation (SSO). SmingHub#1951
- Webcam stream and sample webcam web server. SmingHub#1981
- Allow HTTP connections to ignore rejected body content SmingHub#1928

Improvements
-------------------
- Some improvements to multipart parser SmingHub#2007
- Update ArduinoJson to 6.13.0 SmingHub#1979
- Added precaching from Arduino for ESP8266. SmingHub#1965
- Add support for 'Expect: 100-continue' in HTTP server. SmingHub#1931
- Upgrade to FlashString Library  SmingHub#1974, SmingHub#2013

Bug fixes
-------------
- Updated mqtt-codec to allow publish messages without payload. SmingHub#1976
- HttpConnection freed twice. SmingHub#1938
- Hangs at startup when custom heap enabled. SmingHub#1996
- Fix issues reported by valgrind SmingHub#2017

Breaking changes and Migration
-------------------------------------------
- See our [dedicated page](https://sming.readthedocs.io/en/latest/upgrading/4.0-4.1.html) for migration from 4.0.0 to 4.1.0.

All PRs scheduled for this release can be seen from [here](https://github.com/SmingHub/Sming/milestone/23)
slaff pushed a commit to slaff/Sming that referenced this pull request Feb 6, 2020
New features
------------------
- No-WiFi build option SmingHub#2004 - get more resources if your application is not using WIFI.
- Multiple SSL adapters based on axTLS and BearSSL.  SmingHub#1999
- Added basic Crypto support library SmingHub#2014
- Updates framework to build using GCC 9.2.0 toolchain for C++17. SmingHub#1825
- Modbus master SmingHub#1992
- Implemented Small String Optimisation (SSO). SmingHub#1951
- Webcam stream and sample webcam web server. SmingHub#1981
- Allow HTTP connections to ignore rejected body content SmingHub#1928

Improvements
-------------------
- Some improvements to multipart parser SmingHub#2007
- Update ArduinoJson to 6.13.0 SmingHub#1979
- Added precaching from Arduino for ESP8266. SmingHub#1965
- Add support for 'Expect: 100-continue' in HTTP server. SmingHub#1931
- Upgrade to FlashString Library  SmingHub#1974, SmingHub#2013

Bug fixes
-------------
- Updated mqtt-codec to allow publish messages without payload. SmingHub#1976
- HttpConnection freed twice. SmingHub#1938
- Hangs at startup when custom heap enabled. SmingHub#1996
- Fix issues reported by valgrind SmingHub#2017

Breaking changes and Migration
-------------------------------------------
- See our [dedicated page](https://sming.readthedocs.io/en/latest/upgrading/4.0-4.1.html) for migration from 4.0.0 to 4.1.0.

All PRs scheduled for this release can be seen from [here](https://github.com/SmingHub/Sming/milestone/23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants