Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query CAR database according to epoch #3555

Open
2 of 3 tasks
lemmih opened this issue Oct 9, 2023 · 0 comments
Open
2 of 3 tasks

Query CAR database according to epoch #3555

lemmih opened this issue Oct 9, 2023 · 0 comments
Assignees
Labels
Performance Priority: 4 - Low Ready Issue is ready for work and anyone can freely assign it to themselves

Comments

@lemmih
Copy link
Contributor

lemmih commented Oct 9, 2023

Issue summary

There are two kinds of data stores in Forest: Read-only CAR files and writeable ParityDB databases. Since all values are uniquely determined by their key, it does not matter for correctness which data store we query first. It does matter for performance, though, and we want to query as few data stores as possible.

We first query the CAR data stores in the order they were added. If a key isn't present in the CAR data stores, we then query the ParityDB database. However, in regular operation, we're significantly more likely to query new data than old data. As such, querying the data stores with the latest data first is a better option. The ParityDB database contains current data and is, therefore, the newest and should be queried first. The CAR data stores should be sorted from highest to lowest by epoch.

  • Query ParityDB first.
  • Sort CAR data stores by the epoch of the heaviest tipset.
  • (Optional) Trim unusable CAR data stores if we know we'll never access their data. This happens if we're evaluating tipset 1000, and we have CAR stores for epochs 0-1500 and 1500-3000. The second CAR store will not be used for evaluating this tipset and can be removed from the list.

Other information and links

@lemmih lemmih added Priority: 3 - Medium Performance Ready Issue is ready for work and anyone can freely assign it to themselves labels Oct 9, 2023
@elmattic elmattic self-assigned this Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Priority: 4 - Low Ready Issue is ready for work and anyone can freely assign it to themselves
Projects
None yet
Development

No branches or pull requests

2 participants