Skip to content

Conversation

SoftlyRaining
Copy link
Contributor

This is still a draft - I'm currently running benchmarks to ensure there are no regressions.

This neatly avoids void** references to internal hashtable data, instead using hashtablePosition to replace elements when needed. The result is safer and more readable. (At present I've not changed the hashtableScanDefrag callback signature though.)

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Copy link

codecov bot commented May 28, 2025

Codecov Report

❌ Patch coverage is 93.33333% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.46%. Comparing base (36b51c9) to head (d27ffcb).
⚠️ Report is 233 commits behind head on unstable.

Files with missing lines Patch % Lines
src/hashtable.c 91.01% 8 Missing ⚠️
src/kvstore.c 92.59% 2 Missing ⚠️
src/defrag.c 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2148      +/-   ##
============================================
+ Coverage     71.19%   71.46%   +0.26%     
============================================
  Files           122      122              
  Lines         66046    66214     +168     
============================================
+ Hits          47020    47317     +297     
+ Misses        19026    18897     -129     
Files with missing lines Coverage Δ
src/db.c 90.08% <100.00%> (+0.09%) ⬆️
src/evict.c 98.47% <100.00%> (ø)
src/pubsub.c 96.79% <100.00%> (ø)
src/server.c 87.99% <100.00%> (-0.02%) ⬇️
src/server.h 100.00% <ø> (ø)
src/t_hash.c 96.19% <100.00%> (ø)
src/t_set.c 97.84% <100.00%> (+0.22%) ⬆️
src/t_zset.c 96.89% <100.00%> (+0.04%) ⬆️
src/defrag.c 81.53% <50.00%> (-1.10%) ⬇️
src/kvstore.c 95.03% <92.59%> (ø)
... and 1 more

... and 24 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

p->table_index = table_index;
}

return b != NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this works. But it drives me nuts. Regardless of implementation details, booleans should not be treated as integers.

Here you have a boolean expression... and are assigning it to be returned in an integer value. Why not make the return value a bool? (We allow that now.)

If you really want an integer, make the conversion explicit, like this:

return (b != NULL) ? 1 : 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do allow bools too, you could just make it a bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of hashtable functions that return 1 or 0. This was done before we started to accept bool. We can change all of them to bool.

I'm fine with just doing old-fashion C style too though. Apparently, Brian Kernighan and Dennis Ritchie didn't think bool was necessary. All the logic operators and conditional constructs are defined on integer values. For example, != returns 0 or 1, if checks for non-zero or zero, etc. C is not really type safe anyway. You can crash in many ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bool makes it more immediately obvious how to interpret the result and makes for cleaner code, so I switched hashtable over to bool. (I was on call and needed a low-context easy-to-pause task to do between other work 😅)

* incremental rehashing or modifies the table. This method does not pause
* rehashing. */
int hashtableFindPosition(hashtable *ht, const void *key, hashtablePosition *pos) {
position *p = positionFromOpaque(pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to ensure safety, you should:

  • Add an "epoch" counter on the table. Just an unsigned integer that increments on every modification. It doesn't matter if it wraps around.
  • Record the current epoch in the hashtablePosition when it is created.
  • Validate that the epoch hasn't changed (assert) when the position is used.

@zuiderkwast
Copy link
Contributor

zuiderkwast commented Jun 2, 2025

Making types more opaque can improve things, but we have many problems that are worse than this IMO. All the functions in networking.c and server.c and their dependency on global state is possibly more error-prone than this stuff. Refactoring some of that and making it possible to unit-test these parts of the system could possible add more value.

The gigantic server.h is a dependency monster. Whenever a unit includes it, we have a cross-dependency on the entire server. We could move prototypes and type definitions from server.h to new header files that would correspond to a C file, for example move prototypes related to db.c to a new header db.h, t_string.h for t_string.c, and so on.

Signed-off-by: Rain Valentine <rsg000@gmail.com>
@SoftlyRaining
Copy link
Contributor Author

Well, I did some benchmarks and it's not what I was hoping for. Is it from adding epoch math and checks? 🤔 Moreover, can it be fixed or does the change need to be dropped?

  unstable this change % change
set 383230.5 367985.5719 -4%
get 702641.4 689760.9386 -2%

@@ -319,8 +320,10 @@ static_assert(sizeof(hashtableIterator) >= sizeof(iter),
/* Position, used by some hashtable functions such as two-phase insert and delete. */
typedef struct {
bucket *bucket;
uint16_t pos_in_bucket;
uint16_t table_index;
uint32_t *current_epoch; /* Pointer to the epoch of the hashtable. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that the position doesn't include a reference to the hashtable. So there's no mechanism to validate that a given position is actually associated with a given hashtable.

So, I see what you're doing here, keeping a reference to the epoch so that you can compare. But wouldn't it be more useful to keep a reference to the hashtable instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, but do I need access to the hashtable? I don't see how it would be possible to use a position with a different hashtable, given that it intrinsically points to a specific hashtable's data and is effectively read-only by being opaque 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants