Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve support for long lists #40

Open
malexmave opened this issue Feb 22, 2018 · 3 comments
Open

Improve support for long lists #40

malexmave opened this issue Feb 22, 2018 · 3 comments

Comments

@malexmave
Copy link
Contributor

Right now, long (>500 sites) lists cause a number of problems.

  • If they are very long, they cannot be submitted properly due to timeouts
  • Even 500 site lists take ages to load in the frontend (10+ seconds, spent waiting for the server)
  • Newly submitted lists are sometimes apparantly not queued correctly for scanning

We'll probably need some major refactoring and optimization to support this kind of stuff, but the end-goal should be that loading a list of 1000 websites should not take more than a second. Maybe including some profiling to figure out where the bottlenecks are would be best (perhaps in a staging / dev environment).

@muelli
Copy link
Contributor

muelli commented Feb 22, 2018

I think the first thing we want to do is to not render the very same data twice, e.g. somehow cache a rendered page and invalidate it if the underlying data changes.

Long term, we should probably think of not rendering on the server side at all, but let the clients do that. While this is a bigger change that should discuss another time, we could keep that in mind when building stop-gap measures.

@wichmannpas
Copy link
Member

Profiling some months ago revealed that the most time of the list pages is spent parsing the JSON results of the relevant scans (about 80-90% of the page loading time IIRC).
As the database uses a JSONB field, the performance could be dramatically increased by only loading the required attributes of the JSON result (thus reducing the amount of data that needs to be serialized by Postgres and, more important, parsed by Python). That would require some way for checks to announce the keys of the result dict they require. The full result contains a lot of data that is not required by the checks (e.g., long lists of headers for all requests).

@malexmave
Copy link
Contributor Author

That would require some way for checks to announce the keys of the result dict they require.

Which would make it harder to work with old results when we change the structure of the dict (e.g. with #39 or future PRs). But, on the other hand, we'll have that problem either way.

In a sense, we already have that information - it's encapsulated in the nasty dictionary that contains the logic for the ratings. Each check there announces which fields it wants to see. Although we probably should not overly rely on that part of the code remaining as it is, as this will have to be redone at some point (at the latest, once we add custom ranking schemes - see #32 and #33).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants