Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix config caching #1129

Merged
merged 3 commits into from Mar 23, 2024
Merged

fix config caching #1129

merged 3 commits into from Mar 23, 2024

Conversation

mbostock
Copy link
Member

Previously normalizeConfig would return a new Config instance given the same spec. We want it to return the same instance given the same spec, so that downstream code (such as search) can use the Config as a key in a WeakMap. This applies two additional optimizations: readPages is now cached, keyed on the file contents within the directory; and parseMarkdownMetadata is used instead of parseMarkdown within readPages, avoiding unnecessary work to compute the default page list.

@mbostock mbostock requested a review from Fil March 23, 2024 00:05
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed these two test pages were inadvertently committed.

@@ -42,7 +42,7 @@ describe("build", () => {

await rm(actualDir, {recursive: true, force: true});
if (generate) console.warn(`! generating ${expectedDir}`);
const config = Object.assign(await readConfig(undefined, path), {output: outputDir});
const config = {...await readConfig(undefined, path), output: outputDir};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is to avoid mutating the Config instance: the same config is used by other tests.

@@ -4,7 +4,7 @@ import {normalizeConfig as config, mergeToc, readConfig, setCurrentDate} from ".
import {LoaderResolver} from "../src/dataloader.js";

describe("readConfig(undefined, root)", () => {
before(() => setCurrentDate(new Date("2024-01-11T01:02:03")));
before(() => setCurrentDate(new Date("2024-01-10T16:00:00")));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is to match the build tests — since the config is now cached, we want the footer to have the same data, so the behavior of the tests is the same independent of order.

@@ -84,7 +85,7 @@ export async function searchIndex(config: Config, effects = defaultEffects): Pro
)
);

indexCache.set(config, {json, freshUntil: +new Date() + reindexingDelay});
indexCache.set(pages, {json, freshUntil: Date.now() + reindexDelay});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is keyed by the pages instead of the config so that if the set of pages change, the search index is recomputed.

try {
visited.add(statSync(join(root, ".observablehq")).ino);
} catch {
// ignore the .observablehq directory, if it exists
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that if you have a .md file in your docs/.observablehq/cache, it was being picked up by the default pages. We don’t want that, so let’s just ignore everything in .observablehq under the root.

for (const file of visitMarkdownFiles(root)) {
if (file === "index.md" || file === "404.md") continue;
const source = readFileSync(join(root, file), "utf8");
const parsed = parseMarkdown(source, {path: file, md});
files.push({file, source});
hash.update(file).update(source);
Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried hashing the mtime of the file instead of the source, which is slightly faster in the cached case. But I don’t think it’s fast enough that it matters, and this felt simpler.

@@ -88,18 +89,29 @@ export async function readDefaultConfig(root?: string): Promise<Config> {
return normalizeConfig(await importConfig(tsPath), root);
}

let cachedPages: {key: string; pages: Page[]} | null = null;
Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m only storing a single value (the most recent value) in the cache for readPages, just to keep things simple. I don’t expect that there would be more than one set of pages needed simultaneously.

@mbostock mbostock mentioned this pull request Mar 23, 2024
@mbostock mbostock enabled auto-merge (squash) March 23, 2024 00:15
Comment on lines +131 to +132
const cachedConfig = configCache.get(spec);
if (cachedConfig) return cachedConfig;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer this pattern over:

  if (configCache.has(spec)) return configCache.get(spec);

For performance maybe?

Copy link
Member Author

@mbostock mbostock Mar 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TypeScript. (Though, it could be faster, too.)

Previously you had declared the WeakMap without types, so keys and values were type any by default. In this case TypeScript considers the return value of map.get as any, too, and hence doesn’t require that it returns a defined value and allows map.get(key).property even though it is unsafe.

When we declare the cache as WeakMap<Config["pages"], {json: string; freshUntil: number}>, then TypeScript will enforce type checking because map.get(key) now returns {json: string; freshUntil: number} | undefined. So if we don’t use the pattern above, we get an Object is possibly 'undefined' type error:

Screenshot 2024-03-23 at 9 07 57 AM

TypeScript isn’t smart enough to know that if map.has(key) returns true, then a subsequent map.get(key) with the same key returns a defined value. (There’s no way to declare such a relationship in an interface, at least not yet.) So as a type-safe alternative, we call map.get(key) first, and then test the returned value which can’t change, and hence TypeScript knows that the value is defined.

src/search.ts Outdated
processTerm(term) {
return term.match(/\p{N}/gu)?.length > 6 ? null : term.slice(0, 15).toLowerCase(); // fields to return with search results
processTerm(term: string) {
return (term.match(/\p{N}/gu)?.length ?? 0) > 6 ? null : term.slice(0, 15).toLowerCase(); // fields to return with search results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm moving this comment to line 22 (the issue predates this PR, I just happened to see it here).

@mbostock mbostock merged commit be7fe26 into main Mar 23, 2024
4 checks passed
@mbostock mbostock deleted the mbostock/cache-config branch March 23, 2024 09:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants