Skip to content

replace Set<Integer> with BitSet for HTTP statuses#1496

Merged
richardstartin merged 5 commits into
masterfrom
richardstartin/bitset-http-statuses
May 27, 2020
Merged

replace Set<Integer> with BitSet for HTTP statuses#1496
richardstartin merged 5 commits into
masterfrom
richardstartin/bitset-http-statuses

Conversation

@richardstartin
Copy link
Copy Markdown
Contributor

Replaces the HTTP statuses with a more compact data structure, because they're static final and surprisingly large:

java.util.HashSet@65ab7765d footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1      1040      1040   [Ljava.util.HashMap$Node;
       100        16      1600   java.lang.Integer
         1        16        16   java.lang.Object
         1        48        48   java.util.HashMap
       100        32      3200   java.util.HashMap$Node
         1        16        16   java.util.HashSet
       204                5920   (total)

We have between 2 and 4 of the above. This means we could have overhead between 12 and 24KB (if there is an override with a lot of status codes) depending on what configuration overrides are in place.

Using a BitSet we have:

server errors:

java.util.BitSet@1b28cdfad footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        96        96   [J
         1        24        24   java.util.BitSet
         2                 120   (total)

and client errors

java.util.BitSet@1b28cdfad footprint:
     COUNT       AVG       SUM   DESCRIPTION
         1        80        80   [J
         1        24        24   java.util.BitSet
         2                 104   (total)

The worst case size overhead with BitSet is ~0.5KB.

This also means we can avoid wrapping status codes if we represent them with primitive types in the future.

@richardstartin richardstartin requested a review from a team as a code owner May 26, 2020 17:39
@richardstartin
Copy link
Copy Markdown
Contributor Author

richardstartin commented May 27, 2020

Here's a quick demonstration of why we might want to avoid boxing on things like this (none of the times are large enough to care about):

@State(Scope.Benchmark)
public class HashSetVsBitSet {

    BitSet bitSet;
    HashSet<Integer> hashSet;

    @Param({"0", "500"})
    int offset;

    int[] codes;

    @Setup(Level.Trial)
    public void init() {
        bitSet = new BitSet(offset + 100);
        bitSet.set(offset, offset + 100);
        hashSet = new HashSet<>();
        for (int i = offset; i < offset + 100; ++i) {
            hashSet.add(i);
        }
        codes = new int[] { offset, offset + 1, offset + 50, offset + 101};
    }

    @Benchmark
    @OperationsPerInvocation(4)
    public void hashSet(Blackhole bh) {
        for (int code : codes) {
            bh.consume(hashSet.contains(code));
        }
    }

    @Benchmark
    @OperationsPerInvocation(4)
    public void bitSet(Blackhole bh) {
        for (int code : codes) {
            bh.consume(bitSet.get(code));
        }
    }
}
Benchmark                                             (offset)  Mode  Cnt     Score     Error   Units
HashSetVsBitSet.bitSet                                       0  avgt    5     4.052 ±   0.193   ns/op
HashSetVsBitSet.bitSet:·gc.alloc.rate                        0  avgt    5    ≈ 10⁻⁴            MB/sec
HashSetVsBitSet.bitSet:·gc.alloc.rate.norm                   0  avgt    5    ≈ 10⁻⁶              B/op
HashSetVsBitSet.bitSet:·gc.count                             0  avgt    5       ≈ 0            counts
HashSetVsBitSet.bitSet                                     500  avgt    5     3.999 ±   0.211   ns/op
HashSetVsBitSet.bitSet:·gc.alloc.rate                      500  avgt    5    ≈ 10⁻⁴            MB/sec
HashSetVsBitSet.bitSet:·gc.alloc.rate.norm                 500  avgt    5    ≈ 10⁻⁶              B/op
HashSetVsBitSet.bitSet:·gc.count                           500  avgt    5       ≈ 0            counts
HashSetVsBitSet.hashSet                                      0  avgt    5     4.538 ±   0.425   ns/op
HashSetVsBitSet.hashSet:·gc.alloc.rate                       0  avgt    5    ≈ 10⁻⁴            MB/sec
HashSetVsBitSet.hashSet:·gc.alloc.rate.norm                  0  avgt    5    ≈ 10⁻⁶              B/op
HashSetVsBitSet.hashSet:·gc.count                            0  avgt    5       ≈ 0            counts
HashSetVsBitSet.hashSet                                    500  avgt    5     9.255 ±   1.451   ns/op
HashSetVsBitSet.hashSet:·gc.alloc.rate                     500  avgt    5  1099.837 ± 171.409  MB/sec
HashSetVsBitSet.hashSet:·gc.alloc.rate.norm                500  avgt    5    16.000 ±   0.001    B/op
HashSetVsBitSet.hashSet:·gc.churn.G1_Eden_Space            500  avgt    5  1092.047 ± 304.471  MB/sec
HashSetVsBitSet.hashSet:·gc.churn.G1_Eden_Space.norm       500  avgt    5    15.882 ±   3.069    B/op
HashSetVsBitSet.hashSet:·gc.churn.G1_Old_Gen               500  avgt    5     0.002 ±   0.004  MB/sec
HashSetVsBitSet.hashSet:·gc.churn.G1_Old_Gen.norm          500  avgt    5    ≈ 10⁻⁵              B/op
HashSetVsBitSet.hashSet:·gc.count                          500  avgt    5    31.000            counts
HashSetVsBitSet.hashSet:·gc.time                           500  avgt    5    16.000                ms

Since we're dealing with HTTP statuses, we're outside the default range of the Integer cache, so get the scenario at the bottom where the boxes are allocated. With a bitset and primitive HTTP status codes, there's no allocation at all.

Copy link
Copy Markdown
Contributor

@tylerbenson tylerbenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One optional suggestion.

assert config.httpServerErrorStatuses == toBitSet(expected.toSet())
assert config.httpClientErrorStatuses == toBitSet(expected.toSet())
assert propConfig.httpServerErrorStatuses == toBitSet(expected.toSet())
assert propConfig.httpClientErrorStatuses == toBitSet(expected.toSet())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these don't need to be sets anymore, consider removing the toSet() and updating toBitSet

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@richardstartin richardstartin merged commit 0716397 into master May 27, 2020
@richardstartin richardstartin deleted the richardstartin/bitset-http-statuses branch May 27, 2020 17:01
@github-actions github-actions Bot added this to the 0.53.0 milestone May 27, 2020
iNikem pushed a commit to open-telemetry/opentelemetry-java-instrumentation that referenced this pull request Jun 2, 2020
* Add time in queue (DataDog/dd-trace-java#1481)

* Minor upgrades (DataDog/dd-trace-java#1495)

* Allow user to disable kafka time in queue tag (DataDog/dd-trace-java#1487)

* Replace Set<Integer> with BitSet for HTTP statuses (DataDog/dd-trace-java#1496)

* Register WeakMapProvider earlier in AgentInstaller (DataDog/dd-trace-java#1480)

* Update codenarc (DataDog/dd-trace-java#1500)

Co-authored-by: Tyler Benson <tyler.benson@datadoghq.com>
Co-authored-by: Nikolay Martynov <mar.kolya@gmail.com>
Co-authored-by: Richard Startin <richard.startin@datadoghq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants