Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immudb instance doesn't load data from S3 storage after reboot (randomly, discarding snapshots and other data) #1856

Closed
MaksymVynohradovDA opened this issue Nov 3, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@MaksymVynohradovDA
Copy link

What happened
We host Immudb as AWS Fargate task (literally - docker container). As back-end storage we use S3 bucket as storage.
When container/task restarted Immudb doesn't load existed data from storage, it become just empty. But in the same time looks like it's continue update storage.

What you expected to happen
After reload Immudb instance should load data from S3 storage.

How to reproduce it (as minimally and precisely as possible)

  1. Start Immudb instance that connected to the AWS S3 storage.
  2. Create table and write some date to it, wait while this data will be persisted to S3
  3. Reload Immudb instance
  4. Check that data loaded or not from S3

Environment

immudb: v1.5.0 (git RC1)
webconsole: v1.0.18 (git ebf53ef)

Additional info (any other context about the problem)

@MaksymVynohradovDA MaksymVynohradovDA added the bug Something isn't working label Nov 3, 2023
@MaksymVynohradovDA MaksymVynohradovDA changed the title Immudb instance doesn't load data from S3 storage after reboot Immudb instance doesn't load data from S3 storage after reboot (randomly) Nov 3, 2023
@MaksymVynohradovDA
Copy link
Author

Update: according to the logs we found next:


immudb 2023/11/03 12:15:47 INFO: Index '/var/lib/immudb/defaultdb/index' {ts=0, discarded_snapshots=1} successfully loaded
immudb 2023/11/03 12:15:47 INFO: Discarding snapshots due to invalid checksum at '/var/lib/immudb/defaultdb/index'
...
immudb 2023/11/03 12:15:48 INFO: tx data is corrupted: ALH mismatch at tx 14356323323871488: discarding pre-committed transaction: 1

And a lot of other mentions about "discarded" data
Then after some time immudb starts empty db:

immudb 2023/11/03 12:15:47 INFO: Started with an empty default database

So why it may happen and how to fix this?

@MaksymVynohradovDA MaksymVynohradovDA changed the title Immudb instance doesn't load data from S3 storage after reboot (randomly) Immudb instance doesn't load data from S3 storage after reboot (randomly, discarding snapshots and other data) Nov 3, 2023
@jeroiraz
Copy link
Contributor

jeroiraz commented Nov 3, 2023

Update: according to the logs we found next:


immudb 2023/11/03 12:15:47 INFO: Index '/var/lib/immudb/defaultdb/index' {ts=0, discarded_snapshots=1} successfully loaded
immudb 2023/11/03 12:15:47 INFO: Discarding snapshots due to invalid checksum at '/var/lib/immudb/defaultdb/index'
...
immudb 2023/11/03 12:15:48 INFO: tx data is corrupted: ALH mismatch at tx 14356323323871488: discarding pre-committed transaction: 1

And a lot of other mentions about "discarded" data Then after some time immudb starts empty db:

immudb 2023/11/03 12:15:47 INFO: Started with an empty default database

So why it may happen and how to fix this?

We'll review it asap.

Some data seems to be loaded and thus the mismatch described in the logs. Non-fully committed transactions may be discarded as the client shouldn't have received any confirmation.

@MaksymVynohradovDA
Copy link
Author

@jeroiraz

Hi! Thanks a lot!
It happens when the immudb container crushed/reload by some reason (like automatically by AWS Fargate). Just my assumption that files on S3 updates not all-in-once but one-by-one or even batches. And in this case immudb can't process it completely before reload, therefore signature become invalid... it's just my assumption =)
But, anyway reload of service it's quite common process.

@MaksymVynohradovDA
Copy link
Author

Hi! We investigated the issue. Steps to reproduce:

  • Run ImmuDB on AWS Fargate, connected to S3 as storage
  • Reach memory limit due back-up restore or/and a lot of simultaneous queries to DB
  • Fargate task will be moved to the DRAINED status and then restarted.
  • New Fargate task will be running with error described in this issue and then "Started with an empty default database"

The rootcause that from the one side ImmuDB anyway host some recent data files (like tx or ay else) on the Docker conatiner (in our case - Fargate site) File system and from another side - instead docker or docker-compose - AWS Fargate create completly new instance of the tasks . Old data (volume) is just vanished without any way to restore them.

So, looks like it impossible to use ImmuDB running it on AWS Fargate and to be sure that data will not be lost after Fargate task crushes.

@jeroiraz
Copy link
Contributor

Hi! We investigated the issue. Steps to reproduce:

  • Run ImmuDB on AWS Fargate, connected to S3 as storage
  • Reach memory limit due back-up restore or/and a lot of simultaneous queries to DB
  • Fargate task will be moved to the DRAINED status and then restarted.
  • New Fargate task will be running with error described in this issue and then "Started with an empty default database"

The rootcause that from the one side ImmuDB anyway host some recent data files (like tx or ay else) on the Docker conatiner (in our case - Fargate site) File system and from another side - instead docker or docker-compose - AWS Fargate create completly new instance of the tasks . Old data (volume) is just vanished without any way to restore them.

So, looks like it impossible to use ImmuDB running it on AWS Fargate and to be sure that data will not be lost after Fargate task crushes.

Currently, immudb asynchronously upload data to S3, so as you describe data may be lost in such cases. Replication could be used to mitigate this scenarios but master election is not implemented so it may require manual intervention or an external tooling to determine the best instance. @SimoneLazzaris you may be able to extend in this aspect.

Another possibility is to implement a synchronous operation mode when using S3 but a noticeable performance degradation would be expected. Nevertheless it seems a nice capability for use cases were degraded performance is still acceptable.

@ostafen ostafen closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants