-
Notifications
You must be signed in to change notification settings - Fork 988
[draft] Replace void** with opaque hashtablePosition #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] Replace void** with opaque hashtablePosition #2148
Conversation
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2148 +/- ##
============================================
+ Coverage 71.19% 71.46% +0.26%
============================================
Files 122 122
Lines 66046 66214 +168
============================================
+ Hits 47020 47317 +297
+ Misses 19026 18897 -129
🚀 New features to boost your workflow:
|
p->table_index = table_index; | ||
} | ||
|
||
return b != NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this works. But it drives me nuts. Regardless of implementation details, booleans should not be treated as integers.
Here you have a boolean expression... and are assigning it to be returned in an integer value. Why not make the return value a bool? (We allow that now.)
If you really want an integer, make the conversion explicit, like this:
return (b != NULL) ? 1 : 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do allow bools too, you could just make it a bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of hashtable functions that return 1 or 0. This was done before we started to accept bool. We can change all of them to bool.
I'm fine with just doing old-fashion C style too though. Apparently, Brian Kernighan and Dennis Ritchie didn't think bool was necessary. All the logic operators and conditional constructs are defined on integer values. For example, !=
returns 0 or 1, if
checks for non-zero or zero, etc. C is not really type safe anyway. You can crash in many ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bool makes it more immediately obvious how to interpret the result and makes for cleaner code, so I switched hashtable over to bool. (I was on call and needed a low-context easy-to-pause task to do between other work 😅)
* incremental rehashing or modifies the table. This method does not pause | ||
* rehashing. */ | ||
int hashtableFindPosition(hashtable *ht, const void *key, hashtablePosition *pos) { | ||
position *p = positionFromOpaque(pos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you really want to ensure safety, you should:
- Add an "epoch" counter on the table. Just an unsigned integer that increments on every modification. It doesn't matter if it wraps around.
- Record the current epoch in the hashtablePosition when it is created.
- Validate that the epoch hasn't changed (assert) when the position is used.
Making types more opaque can improve things, but we have many problems that are worse than this IMO. All the functions in networking.c and server.c and their dependency on global state is possibly more error-prone than this stuff. Refactoring some of that and making it possible to unit-test these parts of the system could possible add more value. The gigantic server.h is a dependency monster. Whenever a unit includes it, we have a cross-dependency on the entire server. We could move prototypes and type definitions from server.h to new header files that would correspond to a C file, for example move prototypes related to db.c to a new header db.h, t_string.h for t_string.c, and so on. |
Signed-off-by: Rain Valentine <rsg000@gmail.com>
Well, I did some benchmarks and it's not what I was hoping for. Is it from adding epoch math and checks? 🤔 Moreover, can it be fixed or does the change need to be dropped?
|
@@ -319,8 +320,10 @@ static_assert(sizeof(hashtableIterator) >= sizeof(iter), | |||
/* Position, used by some hashtable functions such as two-phase insert and delete. */ | |||
typedef struct { | |||
bucket *bucket; | |||
uint16_t pos_in_bucket; | |||
uint16_t table_index; | |||
uint32_t *current_epoch; /* Pointer to the epoch of the hashtable. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that the position
doesn't include a reference to the hashtable
. So there's no mechanism to validate that a given position is actually associated with a given hashtable.
So, I see what you're doing here, keeping a reference to the epoch so that you can compare. But wouldn't it be more useful to keep a reference to the hashtable instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but do I need access to the hashtable? I don't see how it would be possible to use a position with a different hashtable, given that it intrinsically points to a specific hashtable's data and is effectively read-only by being opaque 🤔
This is still a draft - I'm currently running benchmarks to ensure there are no regressions.
This neatly avoids void** references to internal hashtable data, instead using hashtablePosition to replace elements when needed. The result is safer and more readable. (At present I've not changed the hashtableScanDefrag callback signature though.)