Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize tables for search #3478

Closed
6 of 7 tasks
Tracked by #3388
jadudm opened this issue Mar 5, 2024 · 6 comments
Closed
6 of 7 tasks
Tracked by #3388

Optimize tables for search #3478

jadudm opened this issue Mar 5, 2024 · 6 comments
Assignees
Labels

Comments

@jadudm
Copy link
Contributor

jadudm commented Mar 5, 2024

Our search is very non-performant.

This is because multiple reasons:

  1. all searches of interest require joins across multiple (large) tables
  2. the dissemination tables were never designed (for search).

Tasks

Proposed: MATERIALIZED VIEW

We had discussed various strategies to cache/re-format tables that would be more performant.

Postgres has a notion of materialized views. A materialized view is... a view, but it is populated with data, and not dynamic at query-time. Think of it as a table, expressed by a query, that gets populated. As a result, it can be used for searches without incurring additional overhead.

Materialized views have a cost to create (in time and space), but once created, behave like other tables. They do not stay up to date with their "linked" tables, however. As a result, they must be periodically refreshed.

Solution sketch

https://github.com/GSA-TTS/FAC/tree/jadudm/materialized-views

This adds

  • run.sh changes to drop and create a materialized view for our searchable data of interest
  • sql files in dissemination for the drop, creation, and refresh of materialized views
  • A django command to manage the dropping, creation, and refreshing

This will drop (if exists) and create (if not exists) a materialized view called dissemination_combined that combines general, federalaward, and finding rows. It only combines data where there are findings. So, this table is only of use if someone is searching for audits that have findings.

use

fac materialized_views --drop

fac materialized_views --create

fac materialized_views --refresh

The third command could be triggered via GH Action periodically (daily? 2x daily? 4x daily?) to update the view.

@danswick
Copy link
Contributor

danswick commented Mar 6, 2024

For now, @sambodeme's goal will be a branch where we can run a particularly taxing query against an unmanaged materialized view. @sambodeme, could you drop the query in this thread?

@jadudm
Copy link
Contributor Author

jadudm commented Mar 6, 2024

Also, please feel free to rewrite/modify/etc. this ticket as needed. It... I don't know how I should have documented the spike/thinking differently.

@sambodeme
Copy link
Contributor

Here is the query params:
Date range: 10/1/2022-10/31/2022
ALN: 20
Findings: Any

@jadudm
Copy link
Contributor Author

jadudm commented Mar 13, 2024

From a question to the dev channel, the idea that we can use LEFT JOIN to construct the MATERIALIZED VIEW is very productive.

@danswick
Copy link
Contributor

Collecting some related issues and PRs here for future navigators:

@danswick
Copy link
Contributor

There are some outstanding follow-up tasks here, but this ticket is otherwise complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

3 participants