Zero downtime upgrades, merge-based index construction #42

vlofgren · 2023-08-24T16:58:36Z

This makes modifications to the loader so that the live production environment doesn't need to be taken offline to prepare a new index.

A big change is keeping the URL database in a separate sqlite db instead of mariadb. This removes the need to take the system offline during loading.

It also moves the index construction bits out of the index-server and into a process, to make it possible to process with a different version of the logic from the index. A very neat side-effect of this is that you get a sort of dehydrated index you can back-up and restore to roll a problematic release with minimal downtime.

The pull request also deprecates the lexicon service altogether, and almost completely rewrites index-construction to use an index merging based approach that does not require as much RAM.

It also reduces the RAM requirements for the index service by a lot, since it no longer needs a lexicon. This makes the index faster because it can use a sub-32 Gb heap and CompressedOOPs. The index service also no longer needs to load the lexicon on start-up, enabling it to restart instantaneously.

Have a single class responsible for encoding and decoding URL ids, as it's a bit finicky and used all over.

Deprecate the LoadUrl instruction entirely. We no longer need to be told upfront about which URLs to expect, as IDs are generated from the domain id and document ordinal. For now, we no longer store new URLs in different domains. We need to re-implement this somehow, probably in a different job or a as a different output.

Also refactor along the way. Really needs an additional pass, these tests are very hairy.

They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.

SWAP_LEXICON doesn't instruct the index service to do anything. It just moves the file.

It's not necessary anymore with the new linkdb.

This provides a much cleaner separation of concerns, and makes it possible to get rid of a lot of the gunkier parts of the index service. It will also permit lowering the Xmx on the index service a fair bit, so we can get CompressedOOps again :D

This is a system-wide change. The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table. This made index-construction easier, but it also added a fairly significant RAM penalty to both the index service and the loader. The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices. It also became necessary half-way through to upgrade guice as its error reporting wasn't *quite* compatible with JDK20.

This enables documents to be ranked properly.

… earlier commit

…kup.

…ats for reverse index construction.

…ker.

* Reduce memory churn in LoaderIndexJournalWriter, fix bug with keyword mappings as well * Remove remains of OldDomains * Ensure LOADER_PROCESS_OPTS gets fed to the processes * LinkdbStatusWriter won't execute batch after each added item post 100 items

vlofgren added 18 commits August 24, 2023 09:06

(linkdb) New Module for sqlite-backed document db

b22f4fb

(file-storage) New File Storage type for linkdb

b958acb

(common) Deprecate EdgeId and similar

7bb3e44

(common) New UrlIdCodec class

c70670b

Have a single class responsible for encoding and decoding URL ids, as it's a bit finicky and used all over.

(index) Implement new URL ID coding scheme.

9894f37

Also refactor along the way. Really needs an additional pass, these tests are very hairy.

(search) Basic working integration of linkdb in search service

c909120

(system) Remove EdgeId<T> and similar objects

1e68005

They seemed like a good idea at the time, but in practice they're wasting resources and not really providing the clarity I had hoped.

(index) Clean up result domain deduplicator

56eb833

(index) Clean up and optimize valuator

b911665

(converter) Update confusing state description

5ed5298

SWAP_LEXICON doesn't instruct the index service to do anything. It just moves the file.

(search) Remove endpoint flush-search-caches

e741301

It's not necessary anymore with the new linkdb.

(control) Display progress of process tasks

70a5df9

(control) Simplify ConvertAndLoadActor

28188a6

(db) Remove EC_URL and EC_PAGE_DATA from mariadb database

e710e05

(index,control) Recoverable index backups

194a605

(minor) Comment build.gradle

4e694fd

vlofgren changed the title ~~WIP: No downtime upgrades~~ WIP: Zero downtime upgrades Aug 25, 2023

vlofgren changed the title ~~WIP: Zero downtime upgrades~~ WIP: Zero downtime upgrades, merge-based index construction Aug 28, 2023

vlofgren added 9 commits August 28, 2023 14:36

(reverse-index) Fix over-allocation of the count array in merging

00c4686

(minor) Fix typo in ActorStateMachine's logging

ffa0366

(index) Hook in missing DocIdRewriter

b6a9250

This enables documents to be ranked properly.

(minor) Improved logging and error messages

6525b16

(loader) Revert accidental experimental changes that slipped by in an…

ba4513e

… earlier commit

(index-reverse) Add documentation and clean up code.

a2e6616

(control-service) Remove old index journal files when restoring a bac…

c57a2d0

…kup.

(heartbeat, reverse-index) Better heartbeat mocking, improved heartbe…

39c1857

…ats for reverse index construction.

(process) Automatic flightrecorder runs for processes when run in doc…

fa87c7e

…ker.

vlofgren added 2 commits August 29, 2023 15:37

(minor) Clean up dead endpoints

3f288e2

vlofgren marked this pull request as ready for review August 29, 2023 15:05

vlofgren merged commit bdcbfb1 into master Aug 29, 2023

vlofgren changed the title ~~WIP: Zero downtime upgrades, merge-based index construction~~ Zero downtime upgrades, merge-based index construction Sep 14, 2023

vlofgren deleted the no-downtime-upgrades branch March 21, 2024 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero downtime upgrades, merge-based index construction #42

Zero downtime upgrades, merge-based index construction #42

vlofgren commented Aug 24, 2023 •

edited

Zero downtime upgrades, merge-based index construction #42

Zero downtime upgrades, merge-based index construction #42

Conversation

vlofgren commented Aug 24, 2023 • edited

vlofgren commented Aug 24, 2023 •

edited