Adding caching #96

daviddavo · 2022-07-28T08:35:38Z

Make CacheRequester threadsafe (CacheRequester.request is called from multiple threads)
Adding flask-caching to the application with either redis or filesystem caching
Solve problems with caching when using gunicorn (it spawns multiple processes).
- It works "the first time", when the cache is clean. But if you visit the same page again, last update date is Jan 1, 1 at 00:00 UTC...
Cache dropdown generation
Cache layout generation
Check what else to cache (search everything that makes complex calculations)
Create some kind of test to check how the changes improve the access time to the application

The text was updated successfully, but these errors were encountered:

daviddavo · 2022-08-04T08:09:52Z

The staging server seems to reload the page indefinitely... This does not happen in the normal deployment, or in development in localhost...

daviddavo · 2022-08-04T09:41:23Z

The callbacks that take the most time to load for the first time are:

callback	time (c+n)
`change_subpage`	1500ms + 500ms
`load_ecosystem`	1900ms + 100ms
`org_filters`	4000ms

daviddavo · 2022-08-04T10:49:47Z

Both get_layout (inside change_subpage) and load_ecosystem call the get_organization_list function.

get_organization_list will always return the same result. So it can be cached even among workers... The first one to enter (each day) will have a hard time. But the following ones will experience a better result. We could even try to populate this cache after running the cache_scripts, somehow.

Nevertheless, as we use multiple processes, caching using a global variable (like we do on the CacheRequester) simply won't do. This method works because we are caching a disk read (and not a complex calculation). With long computations, the result should be cached between workers, and even between runs if possible.

All of this is possible with long callbacks, but we could also use Flask-Caching. Using Flask-Caching doesn't have background tasks, but it would be better because we would cache the result of our functions, and not the entire callback. For example, we should just cache get_organization_list instead of the callbacks that use it. I think this is the method wikichron uses. Furthermore, these processing tasks take a couple of seconds, not half a minute, we don't need background processing nor showing a progress bar.

The cache would be invalidated somehow when the datawarehouse is updated.

On deployment, we could even use the redis server used for wikichron (but I'd prefer we use a different one, to avoid disrupting it).

daviddavo · 2022-09-05T08:50:33Z

The main problem with flask-caching is that it is an inter-process cache, while the CacheRequester is inter-thread (but intra-process). Every process has its own CacheRequester, which are not synchronized between them.

If we do the CacheRequester something inter-process, we will need to use the filesystem, which will make the CacheRequester useless. So we can either:

Make CacheRequester an inter-process cache (using a backend like Redis)
Remove the caching from CacheRequester and make it so the functions that use CacheRequester have to make sure of not requesting a lot.

Perhaps we could even implement both options (use Redis only if a redis backend is specified)

Furthermore, what will happen if we have two users trying to do the same computation? I think that, with our current approach (using just flask-caching), the following will happen:

Computation in P1 starts
Computation in P2 starts
Computation in PX (either P1 or P2) finishes, and the result is saved to the cache
Computation in PY (the other process) finishes, and the result is saved to the cache, overwritting the previous result.

I.e. the computation will be done more than necessary, consuming resources. This could be improved using a task queue, like Celery.

Celery is so well integrated with Flask, that you won't even need to install another package.

daviddavo · 2022-09-05T10:15:15Z

It seems that Dash Caching (with DiskCache as a backend) allows for fine-tuning the things we couldn't fine-tune with Flask-Caching, like adding a function to check whether to invalidate the cache (which checks the last time the dw changed).

Caching the entire callbacks won't be as space efficient as just caching the "computation functions" (the functions which make long computation and take a lot of time), but I think it will be easier to implement and will integrate better. It also adds a task queue, which will fix a lot of problems: if you go now to dao-analyzer.science, you'll have to wait a bit to see a graph because all callbacks start at the same time.

Check if a priority system could be implemented so tasks that render the app go before tasks that render the first few graphs, that go before the rest.

Furthermore, we could create our custom wrappers for more fine-tuning.

At the time, the only thing that has been memoized is PlatformDao, there was not a lot of hours put into the flask-caching issue.

The only drawback of using background callbacks instead of flask-caching is that if two or more callbacks use an intermediate function, the result of that intermediate function won't be cached, which would be great. Nevertheless, we still could have the option of creating a custom decorator with dash's Cache Managers, or even using flask-caching alongside.

tldr: I think that we should pivot to use background callbacks and its caching and ditch the work done with flask-caching. (Or maybe keep both).

javiag · 2022-09-05T15:52:28Z

Probably we should have a conversation about it. Anyway some thoughts.

Some users have reported very long (unacceptable indeed) loading times in the dropdown menu component while searching for a DAO. This is the kind of problem that we have to fix.

Another important issue is that, since we update data every day, for most situations the data cached will be deprecated each day, right? Moreover, on many days we won't have two users interested in the same DAO. According to this, we should prioritize the DAO platform page, especially the info in the two panels, (even pre-loading the cache) and worry less about the problem for each DAO and the tab data.

As a guiding principle, even in life, "better strive for the good than for the best". So, probably, it makes no sense to keep both lines of work alive, but try to use a good solution even if not perfect. In this case, I'd try to fix the unacceptable loading times that we are aware of and the loading time of the DAOplatform page.

I can be overlooking some other issues and technical problems... but as I said first, we should make a decision by having a conversation.

daviddavo · 2022-09-07T09:22:23Z

I can't make background callbacks work...

I tried making chart_controller use a background callback, but then all graphs will be the same.

This is because the callback is added to a list of callbacks using the following hashing method:

    @staticmethod
    def hash_function(fn):
        fn_source = inspect.getsource(fn)
        fn_str = fn_source
        return hashlib.sha1(fn_str.encode("utf-8")).hexdigest()

Because all functions have the same code (only the id of the output changes), the code will be the same, and the hashed will be equal among all functions. I think the only possible solution would be to define the callback functions using exec.

Nevertheless, I'll ask on the forum to see if anyone else has come with a solution.

daviddavo · 2022-09-07T10:01:27Z

Instead of creating multiple functions with the same code (one per each component), we could create just one of them, and use pattern matching callbacks for the output.

The main problem is that each controller (and thus, each callback) has a different adapter (and perhaps a different layout), so this is not possible without creating a custom "callback registry"...

daviddavo · 2022-09-07T10:08:00Z

I think I'll put this on hold for a while, and focus on the dropdown

daviddavo · 2022-09-08T08:07:37Z

I submitted a bug report to Dash

[BUG] Background callbacks with different outputs not working plotly/dash#2221

daviddavo · 2022-09-08T09:20:12Z

I managed to use background=True with the functions in main_view_controller. The improvement is not very noticeable in development, but perhaps it's noticeable after deployment.

daviddavo · 2022-09-24T10:22:31Z

I ended up with a branch with both Flask-Caching and Dash Background Callbacks. I can't test if it's working and it has a lot of problems... Moving this to low priority and help wanted.

daviddavo added the enhancement New feature or request label Jul 28, 2022

daviddavo added this to the 1.2.0 milestone Jul 28, 2022

daviddavo added this to To do (Low Priority) in Kanban via automation Jul 28, 2022

daviddavo moved this from To do (Low Priority) to In progress in Kanban Aug 1, 2022

daviddavo changed the title ~~Page takes time to load~~ Adding flask-caching Aug 11, 2022

daviddavo mentioned this issue Sep 6, 2022

Fix DropDown on page load #98

Open

daviddavo moved this from In progress to To do (High Priority) in Kanban Sep 7, 2022

daviddavo changed the title ~~Adding flask-caching~~ Adding caching Sep 8, 2022

daviddavo moved this from To do (High Priority) to In progress in Kanban Sep 8, 2022

daviddavo moved this from In progress to To do (High Priority) in Kanban Sep 12, 2022

daviddavo moved this from To do (High Priority) to In progress in Kanban Sep 20, 2022

daviddavo moved this from In progress to To do (Low Priority) in Kanban Sep 24, 2022

daviddavo added the help wanted Extra attention is needed label Sep 24, 2022

daviddavo removed this from the 1.2.0 milestone Sep 27, 2022

daviddavo added backend Tasks that modify the backend and removed help wanted Extra attention is needed labels Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding caching #96

Adding caching #96

daviddavo commented Jul 28, 2022 •

edited

Loading

daviddavo commented Aug 4, 2022

daviddavo commented Aug 4, 2022

daviddavo commented Aug 4, 2022 •

edited

Loading

daviddavo commented Sep 5, 2022

daviddavo commented Sep 5, 2022

javiag commented Sep 5, 2022

daviddavo commented Sep 7, 2022 •

edited

Loading

daviddavo commented Sep 7, 2022

daviddavo commented Sep 7, 2022

daviddavo commented Sep 8, 2022

daviddavo commented Sep 8, 2022

daviddavo commented Sep 24, 2022

Adding caching #96

Adding caching #96

Comments

daviddavo commented Jul 28, 2022 • edited Loading

daviddavo commented Aug 4, 2022

daviddavo commented Aug 4, 2022

daviddavo commented Aug 4, 2022 • edited Loading

daviddavo commented Sep 5, 2022

daviddavo commented Sep 5, 2022

javiag commented Sep 5, 2022

daviddavo commented Sep 7, 2022 • edited Loading

daviddavo commented Sep 7, 2022

daviddavo commented Sep 7, 2022

daviddavo commented Sep 8, 2022

daviddavo commented Sep 8, 2022

daviddavo commented Sep 24, 2022

daviddavo commented Jul 28, 2022 •

edited

Loading

daviddavo commented Aug 4, 2022 •

edited

Loading

daviddavo commented Sep 7, 2022 •

edited

Loading