Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations #1114

Merged
merged 19 commits into from Apr 15, 2019

Conversation

Projects
None yet
3 participants
@luontola
Copy link
Collaborator

commented Apr 11, 2019

Closes #863

When there are 1000 applications in the database, the first request after application restart to list applications takes 3-4 seconds and the following requests take about 30ms. The cache is prewarmed on application startup so it'll be always fast to the users.

@luontola luontola force-pushed the perf-optimizations branch from bafaaf5 to f1af21a Apr 11, 2019

Show resolved Hide resolved src/clj/rems/api/applications_v2.clj Outdated
Show resolved Hide resolved src/clj/rems/poller/common.clj Outdated

@luontola luontola force-pushed the perf-optimizations branch from 3ff9502 to 86a1e1d Apr 12, 2019

Show resolved Hide resolved src/clj/rems/db/test_data.clj Outdated

@luontola luontola requested a review from Macroz Apr 12, 2019

@luontola luontola force-pushed the perf-optimizations branch from 127d338 to acfc2df Apr 12, 2019

avoid linear search in roles lookup: get-roles time 4ms -> 1ms
If there was no DB access, it would be even faster: 0.001ms

@luontola luontola force-pushed the perf-optimizations branch from acfc2df to c75e0a1 Apr 12, 2019

luontola added some commits Apr 12, 2019

avoid linear search in all applications lookup
Benchmark with 1000 applications:
- get-all-applications time: 4ms -> 1ms
- get-all-applications time, excluding I/O: 2.5ms -> 0.004ms
- cache size 19.0 MB -> 25.8 MB

Benchmark with 10000 applications:
- get-all-applications time: 34ms -> 1ms
- cache size: 183.8 MB -> 216.5 MB
@Macroz

Macroz approved these changes Apr 15, 2019

Copy link
Collaborator

left a comment

This caching approach needs to be discussed and documented well. There's now quite a bit of moving pieces so an ADR is at least warranted. There are also assumptions on cache timeouts, which should definitely be discussed.

I think the setup is not as simple as I'd like to be but at the moment I don't understand the new model well enough to make good suggestions.

@luontola luontola merged commit 5d664a3 into master Apr 15, 2019

6 checks passed

ci/circleci: build Your tests passed on CircleCI!
Details
ci/circleci: doo Your tests passed on CircleCI!
Details
ci/circleci: ok Your tests passed on CircleCI!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
ci/circleci: war Your tests passed on CircleCI!
Details
ci/circleci: without-db Your tests passed on CircleCI!
Details
@opqdonut
Copy link
Collaborator

left a comment

looks great, I have some questions though

(defn get-unrestricted-application
"Returns the full application state without any user permission
checks and filtering of sensitive information. Don't expose via APIs."
[application-id]
(let [events (applications/get-dynamic-application-events application-id)]
(let [events (applications/get-dynamic-application-events application-id)

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

so getting all the events from the db is not the bottleneck? reducing them together is?

This comment has been minimized.

Copy link
@luontola

luontola Apr 15, 2019

Author Collaborator

Getting a single application from the database is fast and doesn’t need caching (except in the pollers which naively do the get once for each event - refactoring the pollers could avoid that).

In the case of reading all applications, even then reading the events is very fast - it’s a sequential read from disk (though the indexes might not yet be optimal, so it’s maybe not sequential at the moment). What takes longest is the injections, because it does lots of small DB queries.

Show resolved Hide resolved src/clj/rems/api/applications_v2.clj
all-applications-cache
(fn [state events]
;; Because enrich-with-injections is not idempotent,
;; it's necessary to hold on to the "raw" applications.

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

why is it not idempotent? by necessity or by accident?

This comment has been minimized.

Copy link
@luontola

luontola Apr 15, 2019

Author Collaborator

It does ‘(dissoc ::draft-answers ::submitted-answers ::previous-submitted-answers)’ but other than that it’s mostly idempotent. It could be made idempotent with small changes.

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

moving the dissocs to get-all-unrestricted-applications would make sense to me

This comment has been minimized.

Copy link
@luontola

luontola Apr 15, 2019

Author Collaborator

I'm thinking that apply-user-permissions could do it in hide-non-public-information.

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

that sounds good too


(defn reload-cache! []
(events-cache/empty! all-applications-cache)
(refresh-all-applications-cache!))

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

this looks like a race condition but I guess it isn't since it doesn't matter who calls refresh-all-applications-cache!

This comment has been minimized.

Copy link
@luontola

luontola Apr 15, 2019

Author Collaborator

Yes. An unlucky user might get in between there and experience a few seconds of cache miss. A better way would be to rebuild the cache asynchronously and overwrite the cache after it’s finished. That way there would be no cache misses for users.

(.scheduleWithFixedDelay reload-cache! 1 1 TimeUnit/HOURS))
:stop (doto all-applications-cache-reloader
(.shutdown)
(.awaitTermination 60 TimeUnit/SECONDS)))

This comment has been minimized.

Copy link
@opqdonut

opqdonut Apr 15, 2019

Collaborator

could consider factoring out this mount-cron-thing we use in a couple of places ... but not in this PR please :)

Show resolved Hide resolved src/clj/rems/poller/common.clj
Show resolved Hide resolved test/clj/rems/db/testing.clj
Show resolved Hide resolved test/clj/rems/test_performance.clj
Show resolved Hide resolved test/clj/rems/test_performance.clj
Show resolved Hide resolved test/clj/rems/test_performance.clj

@luontola luontola deleted the perf-optimizations branch Apr 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.