Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knex migration slowed down content bakes #3462

Closed
Marigold opened this issue Apr 9, 2024 · 7 comments
Closed

Knex migration slowed down content bakes #3462

Marigold opened this issue Apr 9, 2024 · 7 comments
Assignees

Comments

@Marigold
Copy link
Contributor

Marigold commented Apr 9, 2024

Description

Migration to knex slowed down content bakes from ~7min to ~15min. The difference can be seen on these two builds - before and after (be aware that time bars on the right in buildkite correspond to the step on the next line!). Here's a breakdown:

  1. bakeAllChangedGrapherPagesVariablesPngSvgAndDeleteRemovedGraphers - 5 min
  2. baked google doc posts - 1 min
  3. baked special pages - 1 min

Expected behaviour

Such a large slowdown doesn't look like it is just knex overhead. Perhaps the schema has changed and we're missing indexes somewhere?

@marcelgerber
Copy link
Member

Just a (somewhat informed) hunch:
I don't think it's any changes in indexes used, because there shouldn't be any.

I think, however, that the slowdown is (partly) in the gdocs handling in datapage baking code.
DatapageHelpers#getDatapageGdoc now calls getGdocBaseObjectById rather than GdocPost.findOneBy:

let datapageGdoc =
(await getGdocBaseObjectById(knex, googleDocId, true)) ?? null

That method then calls parsePostsGdocsRow to enrich the entire Gdoc - which I guess we didn't do eagerly before (not sure)?
Or maybe there was a caching layer in place somewhere before, either implicit or explicit, that we now no longer have?

@ikesau
Copy link
Member

ikesau commented Apr 9, 2024

I did some benchmarking by baking 1000 charts with various things commented out.

The datapage code, while slow, is actually faster than the Grapher path - getPostEnrichedBySlug, getPostRelatedCharts, and getRelatedArticles are all adding significant overhead and should be optimized.

I didn't find what was causing the datapage slowdown because I wasted a bunch of time with a flamegraph library, but DB access is what I'd check out first.

branch here: https://github.com/owid/owid-grapher/tree/knex-bake-degradation

@marcelgerber
Copy link
Member

So, @danyx23 and I looked into this for a bit.

This is actually happening because of:

// TODO: not sure if the shared transaction will be an issue - I think it should be fine but just to put a flag here
// that this could be causing issues

In my testing, it turned out that with multiple transactions, the chart baking yarn runLocalBake --steps charts runs twice as fast.

Also, I sanity-checked that, when run with a single transaction, this is not using a connection pool but rather just a single connection.
You can try that yourself by running

console.log(
    await db.knexRawFirst(knex, "SELECT CONNECTION_ID() as connectionId")
)

-- for example, at the top of bakeSingleGrapherChart.

I think that using multiple transactions for baking is the way to go here.

@marcelgerber
Copy link
Member

I think there's no way around this.
According to this, mysql cannot share transactions across multiple connections, which makes sense I guess.

I'm gonna assign this issue to @danyx23, just so we don't forget about it.

@danyx23
Copy link
Contributor

danyx23 commented May 7, 2024

@Marigold with the recent work from @ikesau and @marcelgerber we made some progress in bake times. Is this ok for now? If not please open a new issue and tell us which area we should focus on. Thanks!

@danyx23 danyx23 closed this as completed May 7, 2024
@danyx23
Copy link
Contributor

danyx23 commented May 7, 2024

#3514 might also be related

@Marigold
Copy link
Contributor Author

Marigold commented May 8, 2024

Marcel's fix helped. It's as smooth as before!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants