Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP client fixes #2163

Merged
merged 11 commits into from Dec 4, 2020
Merged

HTTP client fixes #2163

merged 11 commits into from Dec 4, 2020

Conversation

mikee47
Copy link
Contributor

@mikee47 mikee47 commented Dec 1, 2020

  • Improvements to pipelining.
  • Improvements to retrying the current request(as is the case for URLs protected with digest authentication).
  • Fixed post requests sending files and/or form data.
  • Small improvements to the handing of failed writes in TcpConnection.

@slaff
Copy link
Contributor

slaff commented Dec 2, 2020

@mikee47 Added one more commit to this PR which addresses possible problems caused by pipelining 3a248f9 . Can you take a look at it and test it on your Windows or Linux VM machine?

@slaff slaff added this to the 4.2.0 milestone Dec 2, 2020
@mikee47
Copy link
Contributor Author

mikee47 commented Dec 2, 2020

@slaff Looks fine, works well! Thank you.

@slaff slaff changed the title Fix handling of closed HTTP connection HTTP client fixes Dec 2, 2020
@@ -60,6 +60,7 @@ class MultipartStream : public MultiStream
Producer producer;
BodyPart bodyPart;
char boundary[16] = {0};
bool finished = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shadows MultiStream::finished...

@slaff
Copy link
Contributor

slaff commented Dec 3, 2020

@mikee47 While trying to make the HttpClient work again I have noticed that the LimitedMemoryStream was broken in this PR #2119.

The changes in the isFinished(), given below

	bool isFinished() override
	{
		return (readPos >= capacity);
	}

cause endless loop when trying to copy the data:

size_t ReadWriteStream::copyFrom(IDataSourceStream* source, size_t size)
{
	if(source == nullptr || size == 0) {
		return 0;
	}

	auto bufSize = std::min(size, maxBufferSize);
	char buffer[bufSize];
	size_t total = 0;
	size_t count;
	while(!source->isFinished()) {
		count = source->readMemoryBlock(buffer, sizeof(buffer));
		if(count == 0) {
			continue;
		}
		auto written = write(reinterpret_cast<uint8_t*>(buffer), count);
		source->seek(written);
		total += count;
		if(written != count) {
			break;
		}
	}

	return total;
}

called from here:

int onDownload(HttpConnection& connection, bool success)
{
	debugf("\n=========[ URL: %s ]============", connection.getRequest()->uri.toString().c_str());
	debugf("RemoteIP: %s", connection.getRemoteIp().toString().c_str());
	debugf("Got response code: %d", connection.getResponse()->code);
	debugf("Success: %d", success);
	if(connection.getRequest()->method != HTTP_HEAD) {
		Serial.print(_F("Got content: "));
		auto stream = connection.getResponse()->stream;
		if(stream == nullptr || stream->available() == 0) {
			Serial.println("EMPTY!");
		} else {
			Serial.copyFrom(stream);

IMHO the isFinished method here should be just:

	bool isFinished() override
	{
		return (available() == 0 );
	}

@mikee47 Can you create a separate PR with a fix?

@@ -315,6 +315,7 @@ void HttpClientConnection::onClosed()
System.queueCallback([this]() {
cleanup();
init(HTTP_RESPONSE);
state = eHCS_Ready;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done in init() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be done in init() ?

Yes, indeed. I will add this correction in a second...

@slaff
Copy link
Contributor

slaff commented Dec 3, 2020

@mikee47 There is a problem with SSL and sending data in the TcpConnection. I will address also this but it will be part of a separate PR.

mikee47 and others added 6 commits December 3, 2020 13:37
There appear to be a couple of things at play here:

1. Need to move items off execution back onto waiting queue - cleanup() does this
2. Don't restart immediately, but defer using task queue - let's TCP stack sort itself out first
…s that don't support keep-alive connections.

Initial conditions
------------------
3 consecutive GET requests + 1 POST request go to the same HTTP server.

Before this change
------------------
The HTTP client starts using HTTP pipelining (https://en.wikipedia.org/wiki/HTTP_pipelining) without making sure that the HTTP server is supporting it.
It will try to send the first 3 HTTP requests in one go. If the HTTP server does not support pipelining it will either return bad request response or a normal response
with Connection: close header. The first request will be successful but then the connection will be closed and the remaining two requests will be moved back to the waiting queue.
They will be AFTER the POST request which is a **problem**.

Once the POST request has completed the two GET requests will be moved back and forth between the waiting and execution queue which is another **problem**.

After this change
-----------------
The first GET request will be sent but the others will wait in the waiting queue.
Once the request has completed the Connection response header will be analyzed. 

a) If the header is Connection:close then the connection will be closed and there will be no need to move something from the execution queue back to the waiting queue
as there will be 0 requests in execution and the rest will be in the waiting queue.

b) if there is no Connection:close header in the response the HTTP client will assume that the HTTP server supports pipelining and will allow pipelining consecutive requests
 in the same Http Client Connection. This way the remaining 2 GET requests will be pipelined and the POST request will be executed at the right time.
…in wrong multipart boudary headers and endless loop.

This commit removes the introduced problems.
Add BodPart::bool() operator as it provides opportunity to clarify how it's used
Rename `finished` -> `footerSent`
Retrying of the current request will keep its order and not get at the end of the execution queue.
Fixes a problem with requests that go to URLs protected with digest authentication (requires the same request to be sent twice - first time without authentication headers to get the tokens and second one with authentication and calculated checksums.
@slaff
Copy link
Contributor

slaff commented Dec 3, 2020

@mikee47 Can you compile the latest code from this PR and test if you still have the problems described in #2158?

@mikee47
Copy link
Contributor Author

mikee47 commented Dec 3, 2020

@slaff I got the failed requests again, so re-tested with the previous working commit and still failing! So I increased the scanner fetchInterval from 250 to 500 (ms) and it started working again! However, behaviour with/without the task queue is the same; I re-checked in the debugger and the stack usage is minimal so looks like that isn't necessary and commit 8bb1312 is fine.

I then revisited the issue wth using the task queue in ControlPoint - https://github.com/mikee47/Sming-UPnP/blob/3e66a17cbd9c9287485a4a2b177a79a0744d2f64/src/ControlPoint.cpp#L211 onwards. The issue persists, and is definitely related to this issue. I need to improve the debug output so requests can be correlated with responses more easily, then perhaps the logs will shed more light on exactly what's happening...

@mikee47
Copy link
Contributor Author

mikee47 commented Dec 3, 2020

So at present we can retry sending headers but the body stream also needs to be recovered. I started attempting to fix that by restoring and resetting the stream (see commit for StreamTransformer::seekFrom()) but of course by that time it's been destroyed by the call to freeStreams() in TcpClient::pushAsyncPart().

Is there some way we can pre-empt all this and avoid starting a new request if the connection is due to be closed?

@slaff slaff force-pushed the fix/http-close-handling branch 2 times, most recently from 36bb982 to 198b488 Compare December 4, 2020 08:48
@slaff
Copy link
Contributor

slaff commented Dec 4, 2020

@mikee47 Once Travis is ready I will merge this PR in develop. It does not solve your problem but at least has some useful fixes. Any idea how I can reproduce the problem that your experiencing on my machine? This will help me go step by step with the debugger and correlate network traffic with live wireshark capture.

@mikee47
Copy link
Contributor Author

mikee47 commented Dec 4, 2020

@slaff I got the same issue running the Basic_ControlPoint sample on an ESP8266. I'll re-check to confirm and get back to you.

@slaff slaff merged commit da225ba into SmingHub:develop Dec 4, 2020
@slaff
Copy link
Contributor

slaff commented Dec 4, 2020

I'll re-check to confirm and get back to you.

Get the latest develop and rebase on it. Compile your sample using the following commands (adjust accordingly ):

cd <sample-dir>
make config-clean
make dist-clean
make SMING_ARCH=Host ENABLE_GDB=1 DEBUG_VERBOSE_LEVEL=3 

@slaff
Copy link
Contributor

slaff commented Dec 4, 2020

There is a problem with SSL and sending data in the TcpConnection. I will address also this but it will be part of a separate PR.

See #2173 .

@mikee47 mikee47 deleted the fix/http-close-handling branch December 4, 2020 15:00
@mikee47
Copy link
Contributor Author

mikee47 commented Dec 4, 2020

@slaff I've rebuilt and re-tested as follows:

cd %SMING_HOME%
make dist-clean
make fetch UPnP
cd Libraries\UPnP\samples\Basic_ControlPoint
make -j20 SMING_ARCH=Host ENABLE_GDB=1 DEBUG_VERBOSE_LEVEL=3
make run

The attached log initial-run shows the results of, all looks fine.

I then checked out the fix/http-debugging branch of UPnP d7ccee6e6f07b7742ae661bcd7a16978eda32e38:

make -j20 Sming-build all
make run

The results are in controlpoint-task-queue, and it should be visible that the response to the first action request gets lost in the HTTP connection closing business.

When a device has been discovered and created by ControlPoint, the client callback creates action requests and submits them.
The client should be able to successfully sent action requests at any time, so deferring the callback is a good test for this.
I've considered delaying the callback for a fixed time (e.g. 2 seconds) to allow any connection closing to complete - that's more of a workaround though and would just mask the problem.

debug_logs.zip

The action requests are for ContentDirectory services on my archer VR900 router - it has a USB stick with audio files on it. See https://www.tp-link.com/uk/support/download/archer-vr900/v1/. Here's the relevant DD fields:

        <deviceType>urn:schemas-upnp-org:device:MediaServer:1</deviceType>
        <manufacturer>Justin Maggard</manufacturer>
        <manufacturerURL>http://www.tp-link.com/</manufacturerURL>
        <modelDescription>MiniDLNA on Linux</modelDescription>
        <modelName>Windows Media Connect compatible (MiniDLNA)</modelName>
        <X_DLNADOC xmlns:dlna="urn:schemas-dlna-org:device-1-0">DMS-1.50</X_DLNADOC>

You could probably run MiniDLNA on another device to check this - seems very unlikely it's the router which is the problem but I'll see if I can find an alternative device to test this against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants