Improve support for long lists #40

malexmave · 2018-02-22T15:38:55Z

Right now, long (>500 sites) lists cause a number of problems.

If they are very long, they cannot be submitted properly due to timeouts
Even 500 site lists take ages to load in the frontend (10+ seconds, spent waiting for the server)
Newly submitted lists are sometimes apparantly not queued correctly for scanning

We'll probably need some major refactoring and optimization to support this kind of stuff, but the end-goal should be that loading a list of 1000 websites should not take more than a second. Maybe including some profiling to figure out where the bottlenecks are would be best (perhaps in a staging / dev environment).

muelli · 2018-02-22T16:21:33Z

I think the first thing we want to do is to not render the very same data twice, e.g. somehow cache a rendered page and invalidate it if the underlying data changes.

Long term, we should probably think of not rendering on the server side at all, but let the clients do that. While this is a bigger change that should discuss another time, we could keep that in mind when building stop-gap measures.

wichmannpas · 2018-02-22T17:19:51Z

Profiling some months ago revealed that the most time of the list pages is spent parsing the JSON results of the relevant scans (about 80-90% of the page loading time IIRC).
As the database uses a JSONB field, the performance could be dramatically increased by only loading the required attributes of the JSON result (thus reducing the amount of data that needs to be serialized by Postgres and, more important, parsed by Python). That would require some way for checks to announce the keys of the result dict they require. The full result contains a lot of data that is not required by the checks (e.g., long lists of headers for all requests).

malexmave · 2018-02-23T08:12:20Z

That would require some way for checks to announce the keys of the result dict they require.

Which would make it harder to work with old results when we change the structure of the dict (e.g. with #39 or future PRs). But, on the other hand, we'll have that problem either way.

In a sense, we already have that information - it's encapsulated in the nasty dictionary that contains the logic for the ratings. Each check there announces which fields it wants to see. Although we probably should not overly rely on that part of the code remaining as it is, as this will have to be redone at some point (at the latest, once we add custom ranking schemes - see #32 and #33).

malexmave added bug enhancement backend labels Feb 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve support for long lists #40

Improve support for long lists #40

malexmave commented Feb 22, 2018

muelli commented Feb 22, 2018

wichmannpas commented Feb 22, 2018

malexmave commented Feb 23, 2018

Improve support for long lists #40

Improve support for long lists #40

Comments

malexmave commented Feb 22, 2018

muelli commented Feb 22, 2018

wichmannpas commented Feb 22, 2018

malexmave commented Feb 23, 2018