Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Poco HTTP Server zero-copy again #19516

Merged
merged 18 commits into from
Feb 19, 2021
Merged

Conversation

abyss7
Copy link
Contributor

@abyss7 abyss7 commented Jan 23, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP.

@robot-clickhouse robot-clickhouse added the pr-performance Pull request with some performance improvements label Jan 23, 2021
@abyss7 abyss7 changed the title [WIP] Make Poco HTTP Server zero-copy again Make Poco HTTP Server zero-copy again Feb 8, 2021
@abyss7 abyss7 marked this pull request as ready for review February 8, 2021 15:34
@abyss7
Copy link
Contributor Author

abyss7 commented Feb 11, 2021

Fuzzer (debug): #20046

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Feb 19, 2021

Ok.

  1. Finally we can increase the 16KB limit on URL size. We can make it in next PR after merge.
  2. The code in HTTP directory is based on Poco. We should put README.md file for attribution.
  3. There can be dead code (unused methods) there (like sendFile). We can review and remove these methods.
  4. while (in.read(ch) && !Poco::Ascii::isSpace(ch) && method.size() <= MAX_METHOD_LENGTH) method += ch; - we can introduce "limited read until" in ReadHelpers.
  5. catch (Poco::Net::NoMessageException &) - const.
  6. void stopAll(bool abortCurrent = false); - variable name is out of style.
  7. wrapReadBufferReference maybe just return std::unique_ptr in request.getStream().
  8. Sending responses still involving ostream. Decided to address it in subsequent PR.
  9. in.peek(ch) - after introduction of this method we can search and replace all occurences of !in.eof() && *in.position() == ...

@abyss7 abyss7 merged commit 414f470 into ClickHouse:master Feb 19, 2021
traceon added a commit to traceon/ClickHouse that referenced this pull request Feb 19, 2021
* master: (160 commits)
  Make Poco HTTP Server zero-copy again (ClickHouse#19516)
  Fixed documentation
  ccache 4.2+ does not requires any quirks for SOURCE_DATE_EPOCH
  Add a function `htmlOrXmlCoarseParse` to extract content from html or xml format string. (ClickHouse#19600)
  Reinterpret function added Decimal, DateTim64 support
  Add test
  Update InterpreterSelectQuery.cpp
  Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the Map data type. Omitted values are now set by default.
  Log stdout and stderr when failed to start docker in integration tests.
  Added comment
  Don't backport base commit of branch in the same branch (ClickHouse#20628)
  Fix fasttest retry for failed tests
  Dictionary create source with functions crash fix
  Added error reinterpretation tests
  Update run.sh
  Updated documentation
  fix subquery with limit
  Rename untyped function reinterpretAs into reinterpret
  ignore data store files
  Support vhost
  ...
traceon added a commit to traceon/ClickHouse that referenced this pull request Feb 19, 2021
* master:
  Add gdb to fasttest image
  Make Poco HTTP Server zero-copy again (ClickHouse#19516)
  Use fixed version for aerospike
  Fixed documentation
  ccache 4.2+ does not requires any quirks for SOURCE_DATE_EPOCH
  Reinterpret function added Decimal, DateTim64 support
  test/stress: fix permissions for clickhouse directories
  test/stress: improve backtrace catching on server failures
  test/stress: use clickhouse builtin start/stop to run server from the same user
  Added error reinterpretation tests
  Updated documentation
  Rename untyped function reinterpretAs into reinterpret

# Conflicts:
#	src/Server/HTTPHandler.cpp
#	src/Server/HTTPHandler.h
traceon added a commit to traceon/ClickHouse that referenced this pull request Feb 19, 2021
* master: (153 commits)
  Add gdb to fasttest image
  Make Poco HTTP Server zero-copy again (ClickHouse#19516)
  Use fixed version for aerospike
  Fixed documentation
  ccache 4.2+ does not requires any quirks for SOURCE_DATE_EPOCH
  Add a function `htmlOrXmlCoarseParse` to extract content from html or xml format string. (ClickHouse#19600)
  Reinterpret function added Decimal, DateTim64 support
  test/stress: fix permissions for clickhouse directories
  test/stress: improve backtrace catching on server failures
  test/stress: use clickhouse builtin start/stop to run server from the same user
  Add test
  Update InterpreterSelectQuery.cpp
  Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the Map data type. Omitted values are now set by default.
  Log stdout and stderr when failed to start docker in integration tests.
  Added comment
  Don't backport base commit of branch in the same branch (ClickHouse#20628)
  Fix fasttest retry for failed tests
  Dictionary create source with functions crash fix
  Added error reinterpretation tests
  Update run.sh
  ...
@filimonov
Copy link
Contributor

Why creating copy not changing the poco itself? (we already have a list of own patches to poco).

Not easy to see the diff with the current poco version.

Is there some chances to push that into the poco upstream?

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Feb 20, 2021

No chance.

The whole purpose of this patch is to replace iostreams with Read/WriteBuffers.

@alexey-milovidov
Copy link
Member

Also we are looking into perspective to also rewrite TCPServer.

@alexey-milovidov
Copy link
Member

@filimonov See #8441 (comment)

@abyss7
Copy link
Contributor Author

abyss7 commented Feb 20, 2021

Why creating copy not changing the poco itself? (we already have a list of own patches to poco).

Not easy to see the diff with the current poco version.

Is there some chances to push that into the poco upstream?

At lease now we can remove our unused patches from Poco - for simpler upstream update in the future.

@filimonov
Copy link
Contributor

filimonov commented Feb 25, 2021

Cool. Thanks, now it's clear. Makes 100% sence.

@abyss7 abyss7 mentioned this pull request Mar 29, 2021
alexey-milovidov added a commit that referenced this pull request Apr 2, 2021
azat added a commit to azat/ClickHouse that referenced this pull request Apr 25, 2024
Previously HTTP server does not tries to send exception if something had
been written to the client already, but after rewriting to "zero-copy"
HTTP implementation (back in ClickHouse#19516) this got broken, and you can get
pretty "dangerous" at least at first glance errors, like:
- Malformed chunked encoding
- or most likely to CHECKSUM_DOESNT_MATCH error

This is not crucial because it will never pass checksum checks, but
still icky. So let's fix it.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
azat added a commit to azat/ClickHouse that referenced this pull request Apr 28, 2024
Previously HTTP server does not tries to send exception if something had
been written to the client already, but after rewriting to "zero-copy"
HTTP implementation (back in ClickHouse#19516) this got broken, and you can get
pretty "dangerous" at least at first glance errors, like:
- Malformed chunked encoding
- or most likely to CHECKSUM_DOESNT_MATCH error

This is not crucial because it will never pass checksum checks, but
still icky. So let's fix it.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants