-
-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
url: find scheme with a "perfect hash" #12347
Conversation
0b97656
to
216258c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
ff5ed08
to
23ce6d7
Compare
225e05b
to
2409fc1
Compare
I tried using A considerably simpler alternative would be to simply sort the table and use |
I think the gperf approach is worse than this PR. Slower and more complicated. Tweaking the hash when we get more entries is not likely to be a problem.
The current version (without this PR) is sorted based on (assumed) protocol popularity, on the basis that the scheme use is not actually random but URLs are more likely to use
It can be that simple, but it will get even better if you try tweaking the hash function. I actually just did and managed to shrink the table a little more... 😁 |
I've now improved schemetable.c so that it can search for the optimal config for its hash algorithm. It runs over a range of different initial and shift values to see which combo that makes the smallest output array. It helped me reduce the table a few more entries down to 67. It also means that when adding a scheme or two to the table, we can just rerun the program and it can find a new optimal combo by itself. |
This tool generates a scheme-matching table.
Instead of a loop to scan over the potentially 30+ scheme names, use a "perfect hash" table. This works fine because the set of schemes is known and cannot change in a build. The hash algorithm and table size is made to only make a single scheme index a single table entry. The perfect hash is generated by a separate tool (scheme2num.c) that needs to be provided as well if we decide to go with route.
c693656
to
6bd136a
Compare
Instead of a loop to scan over the potentially 30+ scheme names, use a "perfect hash" table. This works fine because the set of schemes is known and cannot change in a build. The hash algorithm and table size is made to only make a single scheme index a single table entry.
The perfect hash is generated by a separate tool (
schemetable.c
) that is provided as well.