Removing a counterproductive optimization. #375

lemire · 2020-04-06T15:58:31Z

This is related to PR

lemire · 2020-04-06T16:02:01Z

@richardstartin : can you have a look? This PR will undo one of your optimizations.

I think it is counterproductive to check the cardinality first as it forces two traversal through the bitmap.

richardstartin · 2020-04-06T16:07:28Z

Makes sense, actually I just ported this logic from CRoaring in the first place. Might be worth running the fuzz tests before removing it.

coveralls · 2020-04-06T16:13:30Z

Coverage increased (+0.03%) to 91.538% when pulling e9f79ec on dlemire/issue374 into a03f716 on master.

lemire · 2020-04-06T16:15:23Z

@richardstartin I think you need to click on "approve" (as I formally asked you to approve this PR).

@incaseoftrouble can you check whether this fixes your issue? I suspect that it will.

lemire · 2020-04-06T16:17:16Z

@richardstartin The code in C does not have this optimization...

bool roaring_bitmap_is_subset(const roaring_bitmap_t *ra1,
                              const roaring_bitmap_t *ra2) {
    const int length1 = ra1->high_low_container.size,
              length2 = ra2->high_low_container.size;

    int pos1 = 0, pos2 = 0;

    while (pos1 < length1 && pos2 < length2) {
        const uint16_t s1 = ra_get_key_at_index(&ra1->high_low_container, pos1);
        const uint16_t s2 = ra_get_key_at_index(&ra2->high_low_container, pos2);

        if (s1 == s2) {
            uint8_t container_type_1, container_type_2;
            void *c1 = ra_get_container_at_index(&ra1->high_low_container, pos1,
                                                 &container_type_1);
            void *c2 = ra_get_container_at_index(&ra2->high_low_container, pos2,
                                                 &container_type_2);
            bool subset =
                container_is_subset(c1, container_type_1, c2, container_type_2);
            if (!subset) return false;
            ++pos1;
            ++pos2;
        } else if (s1 < s2) {  // s1 < s2
            return false;
        } else {  // s1 > s2
            pos2 = ra_advance_until(&ra2->high_low_container, s1, pos2);
        }
    }
    if (pos1 == length1)
        return true;
    else
        return false;
}

lemire · 2020-04-06T16:19:06Z

Got around the bureaucratic check.

lemire · 2020-04-06T16:19:19Z

Merging. Will release soon.

richardstartin · 2020-04-06T16:28:16Z

@lemire you're right, but I definitely don't remember adding this as an optimisation, possibly for correctness or just absent-mindedness. I'll run the some of the fuzz tests on this to check.

incaseoftrouble · 2020-04-06T16:32:59Z

@lemire Thanks for pushing this so fast! 👍

richardstartin · 2020-04-06T16:42:42Z

@lemire this was added in 470500087b6ac3462718a88854beb823bc6c078a and was papering over cracks later fixed in cfafb50644d195ca7e22db4f0ff141caa9852c24, not an optimisation, so was absolutely safe to remove. The relevant fuzz tests didn't find any problems.

lemire · 2020-04-06T17:31:51Z

@richardstartin Thank you.

Note the benchmark that I added... this optimization definitively made things much, much slower (like hundreds of times) in some cases. So I have to thank @incaseoftrouble.

richardstartin · 2020-04-06T17:36:44Z

Once again, it wasn’t an optimisation 👍

lemire · 2020-04-06T17:53:07Z

it wasn’t an optimisation

Ok. Fair enough. But it looked like one when I stared at the code.

BTW it is not obvious why it would be bad. You have to sit down and think hard.

richardstartin · 2020-04-06T18:02:32Z

In general whenever the bitmaps are passed over twice it costs (could also improve xorCardinality by calculating the cardinality containerwise rather than bitmap wise to avoid multiple bitmap passes) when the bitmaps are large... on this occasion it wasn’t an unprofiled optimisation obtained by a process of divination, but a hack.

richardstartin · 2020-04-06T18:04:49Z

Also, this is a problem whenever you don’t have access to modify the source code... any boolean circuit which could be calculated in a single pass over the containers can end up passing over bitmaps several times, and it has a cost.

lemire · 2020-04-07T13:51:56Z

Also, this is a problem whenever you don’t have access to modify the source code... any boolean circuit which could be calculated in a single pass over the containers can end up passing over bitmaps several times, and it has a cost.

Right. There might be clever techniques to get around this without forcing people to hack the internals.

Removing a counterproductive optimization.

e9f79ec

lemire requested a review from richardstartin April 6, 2020 16:01

lemire mentioned this pull request Apr 6, 2020

Add cached cardinality #374

Closed

lemire removed the request for review from richardstartin April 6, 2020 16:18

lemire merged commit 8e76afd into master Apr 6, 2020

Smallhi pushed a commit to Smallhi/RoaringBitmap that referenced this pull request Jun 14, 2021

Removing a counterproductive optimization. (RoaringBitmap#375)

205fb94

lemire deleted the dlemire/issue374 branch June 14, 2021 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing a counterproductive optimization. #375

Removing a counterproductive optimization. #375

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

coveralls commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

incaseoftrouble commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 7, 2020

Removing a counterproductive optimization. #375

Removing a counterproductive optimization. #375

Conversation

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

coveralls commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

incaseoftrouble commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 6, 2020

richardstartin commented Apr 6, 2020

richardstartin commented Apr 6, 2020

lemire commented Apr 7, 2020