Cache scores object #169

nmdefries · 2021-09-09T22:48:50Z

Description

Store scores as a global-ish variable (the environment where it's stored is not actually the global environment, but it is a unique environment accessible to all user sessions within the app instance/R session) and only update when the contents of the S3 bucket changes in some way. Reduces the number of times we read the score data, from once per user session to twice a week.

Changes

Move data-fetching code out of server function.
- Functions getData and getFallbackData
- Code to get S3 bucket object. Create replacement getS3Bucket function.
- Code to load data from S3 bucket, filter columns, and rename columns. Create replacement getAllData function.
Create getRecentDataHelper closure. This function generates a function getRecentData to return either stored score data or new score data if the score data has been updated since it was last read. Whether or not it was updated since it was last read is determined by comparing the contents of the stored S3 bucket object, saved at the same time as the stored score data, and a newly-read S3 bucket object. This check happens at the beginning of each user session. getRecentData updates score data and the S3 bucket object stored in it's parent (getRecentDataHelper) environment.
Create a global getRecentData function from getRecentDataHelper so that a single environment is shared by and persists between all calls to getRecentData.

Fixes

Right now, the dashboard is slow to load, especially when multiple people are using it simultaneously. This is because R Shiny is single-threaded by default so user requests are queued and handled one at a time. If there are a bunch of users, the queue is a multiplicative factor longer, exacerbated by requests that take a long time, notably reading score data from the S3 bucket.

Implications

This change means that score data should only be pulled twice a week, for the first dashboard visitor following each pipeline run and cached for all other users. Locally, sessions using cached data load in about 1 sec, down from about 10 sec.

Since a given instance of the "cache" (scores stored as a variable in memory) is only available within a single R session, it is desirable to have many users share a given app/R session. It's not desirable to have a huge server/session pool since the data is loaded (slow step) again for each R session. For example, # R sessions should be << the number of users. (It is possible to store data in an on-disk cache that would be accessible to all app/R sessions, depending on server pool setup, but session load time is slower when reading from disk. It might also be possible to allocate a shared memory region using mmap.).

This approach to caching won't extend elegantly to other function calls (e.g. plots), so we'd want to use memoise to support those if desired.

sgratzl

This approach to caching won't extend elegantly to other function calls (e.g. plots), so we'd want to use memoise to support those.

I don't think that plots ore filtering are the bottleneck here but downloading the data.

number machines in the pool should be << the number of users. (It is possible to store data in an on-disk cache that would be accessible to all app/R sessions, depending on server pool setup, but session load time is slower when reading from disk.)

don't forget the middle step that is there: docker containers. Each physical machine could host multiple shiny containers at the same time.

dashboard/app.R

nmdefries · 2021-09-10T19:01:58Z

This approach to caching won't extend elegantly to other function calls (e.g. plots), so we'd want to use memoise to support those.

I don't think that plots or filtering are the bottleneck here but downloading the data.

I definitely agree that downloading is the slow part. I just mean anything else we might want to cache in the future wouldn't fit super well into the structure added here; whether or not we want to cache anything else is a separate decision.

don't forget the middle step that is there: docker containers. Each physical machine could host multiple shiny containers at the same time.

Thanks for the correction; I updated the description above.

nmdefries · 2021-09-14T20:31:44Z

Kate doesn't recall any particular reason why caching wasn't being done and doesn't think adding caching will cause any problems.

sgratzl

looks good from what I can see. Tho, I'm not an R export

nmdefries added 3 commits September 9, 2021 14:40

cache data fetching and bucket-address fetching

34e53b6

pull s3bucket object every 4 hours

755c5e7

memoise manually to reduce number of data pulls required

4bd942c

nmdefries requested review from krivard and sgratzl and removed request for krivard September 10, 2021 16:05

sgratzl reviewed Sep 10, 2021

View reviewed changes

dashboard/app.R Show resolved Hide resolved

nmdefries requested a review from sgratzl September 14, 2021 20:32

sgratzl approved these changes Sep 15, 2021

View reviewed changes

nmdefries merged commit f7874e9 into dev Sep 15, 2021

nmdefries deleted the data-caching branch September 15, 2021 15:08

This was referenced Sep 17, 2021

Check whether data update is necessary on new app session #97

Closed

consider changing data logic #174

Open

Set dashboard timeout thresholds to reduce disconnects #178

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache scores object #169

Cache scores object #169

Uh oh!

nmdefries commented Sep 9, 2021 •

edited

Loading

Uh oh!

sgratzl left a comment •

edited

Loading

Uh oh!

Uh oh!

nmdefries commented Sep 10, 2021 •

edited

Loading

Uh oh!

nmdefries commented Sep 14, 2021

Uh oh!

sgratzl left a comment

Uh oh!

Uh oh!

Cache scores object #169

Cache scores object #169

Uh oh!

Conversation

nmdefries commented Sep 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Fixes

Implications

Uh oh!

sgratzl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nmdefries commented Sep 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nmdefries commented Sep 14, 2021

Uh oh!

sgratzl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nmdefries commented Sep 9, 2021 •

edited

Loading

sgratzl left a comment •

edited

Loading

nmdefries commented Sep 10, 2021 •

edited

Loading