Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate liblzf segfaults #312

Closed
danielealbano opened this issue Apr 9, 2023 · 2 comments
Closed

Investigate liblzf segfaults #312

danielealbano opened this issue Apr 9, 2023 · 2 comments
Assignees

Comments

@danielealbano
Copy link
Owner

When generating the RDB snapshots, if strings are compressed with liblzf using the LZF algorithm, cachegrand crashes with a segfault.

The crash needs to be investigated.

@danielealbano danielealbano self-assigned this Apr 9, 2023
danielealbano added a commit that referenced this issue Apr 9, 2023
This PR implements the necessary support to generate the RDB snapshots
in the background.

A new fiber has been introduced which takes care of generating a
snapshot, depending on the settings. A new set of parameters have been
introduced, example below with explanation in line.

```
  snapshots:
    # The path where the snapshot file will be stored, if the rotation is enabled, the path will be used as prefix and
    # the timestamp of the start of the snapshot will be appended to the file name.
    path: /var/lib/cachegrand/dump.rdb
    # The interval between the snapshots, the allowed units are s, m, h, if not specified the default is seconds.
    interval: 5m
    # The number of keys that must be changed before a snapshot is taken, 0 means no limit
    min_keys_changed: 1000
    # The amount of data that must be changed before a snapshot is taken, the allowed units are b, k, m, g, if not
    min_data_changed: 100mb
    # Rotation settings, optional, if missing the snapshots rotation will be disabled
    rotation:
      # The max number of snapshots files to keep, minimum 2
      max_files: 10
```

The example is from cachegrand.yaml.skel

The new mechanism takes care of reporting every 3 seconds a status
update.

Closes #293 

Notes:
- Currently the redis commands SAVE and BGSAVE are not implemented, so
it's not possible to trigger a backup on demand (#314)
- The SHUTDOWN command needs to be updated to support SAVE and NOSAVE
(#315)
- The shutdown logic needs to be updated to trigger a dump at the
shutdown unless the SHUTDOWN NOSAVE command has been issued (#316)
- The current implementation also doesn't compress strings with the LZF
algorithm as liblzf is causing segfault and the issue has to be
investigated (#312)
- Tests for the high level snapshotting process (implemented in
storage_db_snapshot.c mostly) are not included in this PR (#317)
@danielealbano
Copy link
Owner Author

liblzf uses an obscene amount of stack to compress the data, about 1mb, meanwhile cachegrand uses about 40kb for its own fibers.

There are two options:

  • replace liblzf with something that doesn't abuse the stack in that way
  • run the snapshotting via a special fiber using a different stacksize, but this would prevent the normal fibers to snapshot a value that is being replaced but supposed to be snapshotted.

Further investigation is required

@danielealbano
Copy link
Owner Author

Will implement the second option as it's the ones the makes most sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Completed
Development

No branches or pull requests

1 participant