Optimize and simplify filterable list preview rendering #8450

chosak · 2024-05-31T20:04:06Z

This PR rewrites the way that our filterable list results are rendered, in an attempt to simplify and optimize. The short summary of what this does is:

Optimizes the way we paginate results from Elasticsearch to avoid pulling back a huge list of page IDs
Implements a serializer for filterable list results so that the Jinja is dealing with basic data types
Removes the need for our custom "post preview cache"

Details

Filterable list queries in the existing implementation before this PR include the following steps:

We do an Elasticsearch query to retrieve pages matching the user's filtering criteria.
We read back the full list of matching page IDs from ES (this could be thousands of IDs for e.g. the activity log or blog).
We then paginate this giant list and retrieve only a subset (one results page's worth) of Page objects from the Wagtail database.
The results template iterates over each page and renders a post preview.
For each post preview, our Jinja logic makes various additional DB queries to pull back page authors, categories, and tags.
To mitigate the above, we have an old "post preview cache" built on custom code that caches the HTML of the post preview itself.

Because our Django caches are implemented in the database, this means that at best, even after retrieving the pages from the DB, each post preview render requires an additional DB query to pull out the cached HTML. On a page like the blog with 25 results, that means 25 DB queries just to pull those back.

If you start a fresh local Django development server with no caching (our default developer setup),using Django Debug Toolbar (ENABLE_DEBUG_TOOLBAR=1 ./runserver.sh), and visit http://localhost:8000/about-us/blog/, on the main branch (in an incognito window) we currently make 422 DB queries. If you enable local caching (ENABLE_DEBUG_TOOLBAR=1 ENABLE_DEFAULT_CACHE=1 ENABLE_POST_PREVIEW_CACHE=1 ./runserver.sh) and load that page twice (to fill the caches), we then make 105 DB queries for that page.

With these changes, we do the following instead:

We do an Elasticsearch query to retrieve pages matching the user's filtering criteria.
We paginate the results in Elasticsearch and pull back only the page IDs we need for a single page of results.
We retrieve only these Page objects from the Wagtail database, and use prefetch_related to pull back all page authors, categories, and tags at once.
We use a new DRF serializer class to serialize these Page objects into basic Python types.
The results template then iterates over these basic objects and renders them without having to do any additional Python logic or DB queries. We no longer need a post preview cache.

On this branch, using ENABLE_DEBUG_TOOLBAR=1 ./runserver.sh, visiting http://localhost:8000/about-us/blog/ makes 36 queries. If you enable the default cache using ENABLE_DEBUG_TOOLBAR=1 ENABLE_DEFAULT_CACHE=1 ./runserver.sh and load the page twice, we make only 33 queries.

(I will note that our basic page rendering of something simple like http://localhost:8000/ takes ~17 queries or so. A bunch of these (~7) are unavoidable Wagtail queries for page lookup. We add lookups for our sharing site, page banners, feature flags, breadcrumbs, and translations, depending on page type. There might be more to do to squeeze these in future should we want to look more at performance, but we definitely have a few places like FLs where we routinely make 100+ queries per page.)

How to test this PR

The best way to test this is to compare various filterable list pages like the blog with production, as there should be no user-facing changes. A good test URL is the events archive page where we list events with both a map or an image.

Notes and todos

I opened this as draft because it is a pretty big change. There are still a few rough edges and code that needs tests, along with various tests that are failing due to my changes. I wanted to open this now because another filterable list bug just came up today (internal https://github.local/Design-Development/Design-and-Content-Team/issues/471). @willbarton if you think this is a promising approach I will finish it out but I wanted to run it by you before completing.
Removing the post preview cache also unblocks Add support for previewing blog posts in list view #6245, the ability to preview post previews without worrying about corrupting the cache.
This PR includes the minor event location formatting changes from Fix: EventPage previews without a location #8434.

Checklist

PR has an informative and human-readable title
Changes are limited to a single goal (no scope creep)
Code follows the standards laid out in the CFPB development guidelines
Future todos are captured in comments and/or tickets
Project documentation has been updated, potentially one or more of:

chosak · 2024-05-31T20:05:02Z

cfgov/v1/documents.py

+        if self._count is None:
+            self._count = self.search_obj.count()
+        return self._count


All the changes in this file are to cache the ES results count where possible. Currently we make multiple calls to retrieve the count but we don't need to do that unless the filtering criteria changes.

chosak · 2024-05-31T20:07:31Z

cfgov/v1/models/filterable_page.py

@@ -13,6 +13,19 @@
 from v1.util.ref import get_category_children


+class SearchResultsPaginator(Paginator):


This logic lets us move the pagination upstream into the Elasticsearch query instead of pulling back the full list of page IDs from ES and paginating that. I dug into the django-opensearch-dsl code and there doesn't seem to be anything built-in to support this.

Adding prefetch_related here also lets us prefetch the page related objects to avoid the N+1 queries problem.

cfgov/v1/serializers.py

chosak · 2024-05-31T20:14:57Z

cfgov/v1/models/filterable_page.py

+            object_list = (
+                object_list.to_queryset()
+                .specific()
+                .live()


@willbarton calling this out specifically as a last-minute way to filter out any not-live pages that may still be in the ES index for some reason.

Although the current code does try to filter out not-live pages as part of the ES query, it doesn't actually check the status at the place those results get converted to Page objects.

csebianlander · 2024-06-03T19:33:11Z

Everything looks OK to me here but I'm definitely not the best one to do a more in-depth review of the code!

wpears

These changes make sense to me, not really steeped in the intricacies enough to have a specific opinion on some of the details. I did check that the prefetch_related bit is being used correctly everywhere (no surprises that it is)

willbarton · 2024-06-05T14:55:34Z

We use a new DRF serializer class to serialize these Page objects into basic Python types.

I'm curious about this choice — after prefetching related, what does this step do for us?

My impression from reading the PR is that this moves the logic around page type out of the preview template, is that the intention?

chosak · 2024-06-05T15:44:26Z

We use a new DRF serializer class to serialize these Page objects into basic Python types.

I'm curious about this choice — after prefetching related, what does this step do for us?

My impression from reading the PR is that this moves the logic around page type out of the preview template, is that the intention?

Yes, that's correct. This work was motivated by my experience using DRF on TCCP.

There are benefits and drawbacks, but the main idea is to better separate the 3 pieces of code: fetching data from the DB, preparing that data for rendering, actually rendering.

Django's template language is powerful and lets you do things like {% for category in page.categories.all() %}{{ category.name }}{% endfor %} in the template; this reads simpler but lately I've been feeling like this actually hides complexity and introduces challenges with testing and optimizing. In the given example, the category objects aren't actually fetched from the DB until the template is rendered, and the actual thing we want to render (the category name) similarly isn't queried until that time. In order to test this code we have to put data in the DB and render the entire template, verifying that the names appear properly. There's a tight coupling between the backend database and the template code.

By pre-serializing the objects, we can pre-prepare the list of category names (if that's all we need) and the template can be {% for category_name in category_names %}{{ category_name }}{% endfor %}. Someone editing the template only needs to know that the template is receiving a list of category name strings, and doesn't have to worry about DB queries or backend code. This makes the template easier to test. We can test the serializer independently, verifying that it converts the DB objects to simple types (I still need to write those tests for this PR).

Conceptually this also makes it easier to migrate to or even just think about a more loosely coupled backend API + frontend rendering model, where the backend produces some data and the frontend renders it, and neither need to worry about each other.

willbarton · 2024-06-05T17:38:25Z

Conceptually this also makes it easier to migrate to or even just think about a more loosely coupled backend API + frontend rendering model, where the backend produces some data and the frontend renders it, and neither need to worry about each other.

That's what I was wondering about, and I love this.

willbarton · 2024-06-05T18:44:26Z

@willbarton if you think this is a promising approach I will finish it out but I wanted to run it by you before completing.

@chosak I definitely think so!

chosak · 2024-06-12T19:38:26Z

This PR is now ready for review. I've added full test coverage and done more regression testing against various pages to ensure that all functionality continues to work. The coverage failure is because the PR changed settings/local.py, but this isn't a file that we had coverage for before.

I want to restate known user-facing changes here:

Currently filterable list pages link to their own filtered URL e.g. href="/about-us/blog/?topics=foo. With this change we only link to the relative filtered URL e.g. href="?topics=foo". This was already mentioned to @csebianlander above.
Currently post preview tags render in a random order -- compare the order of tags as entered in the Wagtail editor and the order when rendered (example). This PR changes them to be always alphabetical (and does the same for categories as well). This also impacts the order of categories/tags when rendered in RSS feeds.

willbarton

I love the pattern of using DRF serializers to pull logic out of our templates. We should do more of this ❤️.

This works as advertised in all my testing!

cfgov/v1/feeds.py

willbarton · 2024-06-14T13:41:25Z

cfgov/mega_menu/jinja2tags.py

+        try:
+            return Menu.objects.get(language=default_language)
+        except Menu.DoesNotExist:
+            pass


The changes you have here are fine, and no action is necessary on this comment, but I love nothing about this function. I wonder if there's a way to avoid repeating the try/except.

There is, thanks to the code snippet you shared with me and I've pushed as a commit to this PR!

chosak requested review from anselmbradford, wpears, willbarton and csebianlander May 31, 2024 20:04

chosak commented May 31, 2024

View reviewed changes

wpears reviewed Jun 5, 2024

View reviewed changes

chosak force-pushed the optimize/filterable-results branch 3 times, most recently from 88dbd7f to 2ecccd3 Compare June 12, 2024 19:20

chosak marked this pull request as ready for review June 12, 2024 19:38

willbarton approved these changes Jun 14, 2024

View reviewed changes

chosak added 12 commits June 14, 2024 11:20

Filterable list preview optimizations

0a72c3f

Delete now-deprecated post preview cache

55cf291

Remove accidental test logging change

169afb6

stash

b93fe3b

Fixup filterable list serializer tests

a68d7f3

Fixup page author name ordering tests

12f8aae

Fixup v1.tests.models tests

7dbcc91

Fixup tests

dc43587

Fix merge conflict issue around alt text

1080cd5

Feed tests fixup

ad192a1

Cleanup settings.local to make SonarCloud happy

620874e

Fixups: ActivityLog, events, and feeds

8e5eafc

chosak added 2 commits June 14, 2024 11:20

Fixup feed tests

3d3b136

Optimize mega menu lookup (thx @willbarton)

b68b70c

chosak force-pushed the optimize/filterable-results branch from c01dd46 to b68b70c Compare June 14, 2024 15:20

chosak added this pull request to the merge queue Jun 14, 2024

Merged via the queue into main with commit 942be70 Jun 14, 2024
16 of 17 checks passed

chosak deleted the optimize/filterable-results branch June 14, 2024 15:58

wpears mentioned this pull request Jun 15, 2024

Fix index bug in author sorting method and display bug in item introduction #8478

Merged

chosak mentioned this pull request Jun 18, 2024

Add support for previewing blog posts in list view #6245

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize and simplify filterable list preview rendering #8450

Optimize and simplify filterable list preview rendering #8450

chosak commented May 31, 2024 •

edited

Loading

chosak May 31, 2024

chosak May 31, 2024

chosak May 31, 2024

csebianlander commented Jun 3, 2024

wpears left a comment

willbarton commented Jun 5, 2024 •

edited

Loading

chosak commented Jun 5, 2024

willbarton commented Jun 5, 2024

willbarton commented Jun 5, 2024

chosak commented Jun 12, 2024

willbarton left a comment

willbarton Jun 14, 2024

chosak Jun 14, 2024

		@@ -13,6 +13,19 @@
		from v1.util.ref import get_category_children


		class SearchResultsPaginator(Paginator):

Optimize and simplify filterable list preview rendering #8450

Optimize and simplify filterable list preview rendering #8450

Conversation

chosak commented May 31, 2024 • edited Loading

Details

How to test this PR

Notes and todos

Checklist

chosak May 31, 2024

Choose a reason for hiding this comment

chosak May 31, 2024

Choose a reason for hiding this comment

chosak May 31, 2024

Choose a reason for hiding this comment

csebianlander commented Jun 3, 2024

wpears left a comment

Choose a reason for hiding this comment

willbarton commented Jun 5, 2024 • edited Loading

chosak commented Jun 5, 2024

willbarton commented Jun 5, 2024

willbarton commented Jun 5, 2024

chosak commented Jun 12, 2024

willbarton left a comment

Choose a reason for hiding this comment

willbarton Jun 14, 2024

Choose a reason for hiding this comment

chosak Jun 14, 2024

Choose a reason for hiding this comment

chosak commented May 31, 2024 •

edited

Loading

willbarton commented Jun 5, 2024 •

edited

Loading