Skip to content

Similarity Groups

Liam DeVoe edited this page Jun 28, 2021 · 13 revisions

When you enable the Similarity investigation, a Sim Group dropdown is added to each loadable. These are similarity groups, or sim groups for short. Sim groups offer finer grained control over exactly which replays are compared to which other replays, and can significantly speed up the time circleguard takes when comparing replays for larger (>100 replays) batches. We explain what a sim group is and how to use it below.

Basic Investigation

Here's how an investigation might go without using (or rather, without modifying) sim groups. Say you suspect mathi of stealing his replay on https://osu.ppy.sh/b/221777. To check this, we would add two loadables - one for the map, and one for mathi's replay - and investigate for similarity.

This will give the desired result of comparing mathi's replay against each of the top 50 replays on the map, but it does something more. What this is really doing is chucking all 50 replays on the map, and mathi's replay, into the same "group" of replays. Then it looks at each pair of replays in that group and compares them. So what is actually happening is mathi's replay is being compared against each of the 50 replays on the map, but also every replay on the map is being compared with every other replay on the map.

So, assuming you don't think anybody in the top 50 of the map has stolen a replay from anybody else in the top 50, this is doing more work than it needs to. If you know a bit of math / notation, this is 50 choose 2 = O(n^2) comparisons. That means for n=51 replays, like we have above (50 for the map + mathi's 1 replay), we do roughly 51^2 = 2601 comparisons. And if there were 100 replays, we would do roughly 10,000 comparisons.

Circleguard's similarity calculation is fast, but it's not that fast. At ten thousand comparisons, you're going to be waiting about 30 seconds for the comparison to finish. And this will very quickly grow worse as you add more than 51 replays.

The solution to this is sim groups.

Sim Group Investigation

Let's take the same example as above. So two loadables, one being a Map of https://osu.ppy.sh/b/221777 and the other being a Map Replay of mathi on that same map. If you want to compare only mathi's replay to each of the replays on the map, and nothing more, all you need to do is put them into two separate groups. So switch either the Map or Map Replay to Sim Group 2, while keeping the other at Sim Group 1.

Now what circleguard will do is only pair up replays from different groups. So since all the replays on the map are in the same group, none of them will get compared to each other. Since they're all in a different group than mathi's replay, each of them will get compared to mathi's replay, like we wanted.

If the first group has n replays and the second group has m replays, we will make n * m comparisons. So in the example above, we have n=50 and m=1, so we will make only 50 comparisons instead of 2601. This is an especially noticeable improvement when you start comparing more than 100 or so total replays.

When To Use

If you're comparing less than 50 replays at a time, I wouldn't bother with sim groups. Just take the default of everything being in Sim Group 1. Circleguard's similarity calculations are so fast that it barely saves you any time to change the sim groups with so few replays.

However, if you're comparing more than 100 replays at a time, I would recommend learning and using sim groups, which can significantly decrease the time circleguard takes to calculate similarity.