Skip to content

multi transfer book keeping using mid #16761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

icing
Copy link
Contributor

@icing icing commented Mar 18, 2025

Change multi's book keeping of transfers to no longer use lists, but a special table and bitsets for unsigned int values.

multi->xfers is the uint_tbl where multi_add_handle() inserts a new transfer which assigns it a unique identifier mid. Use bitsets to keep track of transfers that are in state "process" or "pending" or "msgsent".

Use sparse bitsets to replace conn->easyq and event handlings tracking of transfers per socket.

Provide base data structures and document them in docs/internal:

  • uint_tbl: a table of transfers with mid as lookup key, handing out a mid for adds between 0 - capacity.
  • uint_bset: a bitset keeping unsigned ints from 0 - capacity.
  • uint_spbset: a sparse bitset for keeping a small number of unsigned int values
  • uint_hash: for associating mids with a pointer.

This makes the mid the recommended way to refer to transfers inside the same multi without risk of running into a UAF.

Modifying table and bitsets is safe while iterating over them. Overall memory requirements are lower than with the double linked list apprach.

memory

To keep track of 1000 transfers, the dominating factor before were 2 struct Curl_llist_node in each Curl_easy. Each occupies 32 bytes each, so 64 KB for 1000 easy handles.

With this PR, we have one table and 3 bitsets, which are kept 25% free. The table uses 8 bytes per possible entry, so ~10 KB for 1300 rows of capacity. A bitset for numbers 0-1299 occupies 168 bytes, which makes ~540 bytes for all three. Plus a "sparse" bitset per connection. For the case that each transfer has its own connection, that adds 32 bytes for each one.

# easy llist table + bitsets
100 6.4 KB ~1.4 KB
1000 64 KB ~14 KB
10000 640 KB ~140 KB

*) the usual rounding errors with 1000 ~= 1024 apply.😌

@icing icing added the feature-window A merge of this requires an open feature window label Mar 18, 2025
@github-actions github-actions bot added tests libcurl API CI Continuous Integration labels Mar 18, 2025
@curl curl deleted a comment from testclutch Mar 19, 2025
@icing icing requested a review from bagder March 19, 2025 14:46
@icing icing force-pushed the mid-uint-sets branch 4 times, most recently from 70480f8 to a5035ae Compare March 25, 2025 08:47
icing added 3 commits April 17, 2025 11:37
        Change multi's book keeping of transfers to no longer use lists,
        but a special table and bitsets for unsigned int values.

        `multi-xfers` is the `uint_tbl` where `multi_add_handle()` inserts
        a new transfer which assigns it a unique identifier `mid`. Use
        bitsets to keep track of transfers that are in state "process"
        or "pending" or "msgsent".

        Use sparse bitsets to replace `conn->easyq` and event handlings
        tracking of transfers per socket. Instead of pointers, keep the
        mids involved.

        Provide base data structures and document them in docs/internal:
        * `uint_tbl`: a table of transfers with `mid` as lookup key,
           handing out a mid for adds between 0 - capacity.
        * `uint_bset`: a bitset keeping unsigned ints from 0 - capacity.
        * `uint_spbset`: a sparse bitset for keeping a small number of
          unsigned int values
        * `uint_hash`: for associating `mid`s with a pointer.

        This makes the `mid` the recommended way to refer to transfers
        inside the same multi without risk of running into a UAF.

        Modifying table and bitsets is safe while iterating over them.
        Overall memory requirements are lower as with the double linked
        list apprach.
Use UINT_MAX everywhere as invalid mid number, replacing extra define
Add more tracing of multi_add/remove/cleanup with mid and counters
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, I only spotted very minor nits...

@icing icing requested a review from bagder April 17, 2025 13:48
@bagder bagder closed this in 909af1a Apr 17, 2025
nbaws pushed a commit to nbaws/curl that referenced this pull request Apr 26, 2025
Change multi's book keeping of transfers to no longer use lists, but a
special table and bitsets for unsigned int values.

`multi-xfers` is the `uint_tbl` where `multi_add_handle()` inserts a new
transfer which assigns it a unique identifier `mid`. Use bitsets to keep
track of transfers that are in state "process" or "pending" or
"msgsent".

Use sparse bitsets to replace `conn->easyq` and event handlings tracking
of transfers per socket. Instead of pointers, keep the mids involved.

Provide base data structures and document them in docs/internal:
* `uint_tbl`: a table of transfers with `mid` as lookup key,
   handing out a mid for adds between 0 - capacity.
* `uint_bset`: a bitset keeping unsigned ints from 0 - capacity.
* `uint_spbset`: a sparse bitset for keeping a small number of
  unsigned int values
* `uint_hash`: for associating `mid`s with a pointer.

This makes the `mid` the recommended way to refer to transfers inside
the same multi without risk of running into a UAF.

Modifying table and bitsets is safe while iterating over them. Overall
memory requirements are lower as with the double linked list apprach.

Closes curl#16761
nbaws pushed a commit to nbaws/curl that referenced this pull request Apr 26, 2025
Change multi's book keeping of transfers to no longer use lists, but a
special table and bitsets for unsigned int values.

`multi-xfers` is the `uint_tbl` where `multi_add_handle()` inserts a new
transfer which assigns it a unique identifier `mid`. Use bitsets to keep
track of transfers that are in state "process" or "pending" or
"msgsent".

Use sparse bitsets to replace `conn->easyq` and event handlings tracking
of transfers per socket. Instead of pointers, keep the mids involved.

Provide base data structures and document them in docs/internal:
* `uint_tbl`: a table of transfers with `mid` as lookup key,
   handing out a mid for adds between 0 - capacity.
* `uint_bset`: a bitset keeping unsigned ints from 0 - capacity.
* `uint_spbset`: a sparse bitset for keeping a small number of
  unsigned int values
* `uint_hash`: for associating `mid`s with a pointer.

This makes the `mid` the recommended way to refer to transfers inside
the same multi without risk of running into a UAF.

Modifying table and bitsets is safe while iterating over them. Overall
memory requirements are lower as with the double linked list apprach.

Closes curl#16761
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration feature-window A merge of this requires an open feature window libcurl API tests
Development

Successfully merging this pull request may close these issues.

2 participants