-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Speed up assets and accounts pages #988
base: main
Are you sure you want to change the base?
Conversation
…is relies on the fact that the index of the internal API already take care of auth. Running agains the simulation database, it takes 52s to load the asset page on average. Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
…oint' into feature/crud/get-from-index-endpoint
The loading time for the asset page is still unacceptable. How we process our internal API responses needs some work, I believe. For example, for assets, we are going through considerable trouble to recreate Asset objects, from what our API gives us, without querying our database and keeping them out of our db session. But then we also assign to it an owner (via a db query), child assets (via recursively processing our internal API response), a parent (via a db query) and sensors (via a db query). This is all costing considerable time when loading the assets page. I see a couple of options:
Before you start work, know that I did a tech spike on these options, so I can quickly push some code once we decided. @nhoening your input is desired. |
All I want to maintain is that if possible we call the internal API, as then we know we use our one layer where authorization is defined. One or two internal API calls per UI page load should suffice, of course. We can definitely change anything after that. |
Signed-off-by: F.N. Claessen <felix@seita.nl>
Signed-off-by: F.N. Claessen <felix@seita.nl>
…l using internal API Signed-off-by: F.N. Claessen <felix@seita.nl>
Signed-off-by: F.N. Claessen <felix@seita.nl>
I implemented option 1, such that we still call the internal API for the auth check, but then reload the Assets from the db instead of separately querying its parents, owner, children and sensors. This further reduces the loading times on the assets page (I saw loading times of 17 seconds now). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that there are fewer assets in the /assets
listing compared to before this PR, so there must still be something wrong. Also, the asset icons (which are based on the asset type) are gone.
Should we run a profiler to check where the 17 seconds are spent? |
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
I tried it out locally and found the same. It turns out that we were only considering the assets of the |
flexmeasures/api/v3_0/assets.py
Outdated
@as_json | ||
def index(self, account: Account): | ||
def index(self, account: Account | None): | ||
"""List all assets owned by a certain account. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Owned or accessible? I think by now this index endpoint morphed into the latter. In which case, should we include the public assets? Unless an account is passed, of course.
The docstring would need a rewrite, too.
Technically, it's also a breaking change in the API. We could maintain backwards compatibility if we add a new optional field that signals the caller wants to get all accessible accounts instead of the assets their own account owns. Or a new endpoint, but I don't favour that option, because it's still an asset index endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, good point. I've included the parameter all_accessible
(default=False).
This parameter signals if we want to get all the assets that the requesting user has read permissions or, by default, the ones that it owns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What might nicely round this up would be to revise check_access
to accept an as_account
argument that defaults to the current user, in combination with putting back the permissions decorator.
So then we'd have the following four cases:
- No
account
is passed, andall_accessible=false
: caller wants to list their own assets - No
account
is passed, andall_accessible=true
: caller wants to list all assets that they can access - An
account
is passed, andall_accessible=false
: caller wants to list some other account's assets - an
account
is passed, andall_accessible=true
: caller wants to list all assets that some other account can access. Caller requires read access on the passed account (checked using the decorator) and the passed account requires read access on the assets (to be checked using the revisedcheck_access
I suggested above).
One open point would be whether we should implicitly assume that, if the caller has access to account
, it also has access to all accounts that account
has access to. That is, we assume consultant permissions propagate.
Alternatively, we could call check_access
on the assets for both the caller and the account
(in case they are not the same). That is, we do not assume consultant permissions propagate.
Propagation has my preference. Whichever the case, we also might want to mention the choice of policy explicitly in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An unlikely case but possible is the user to have consultancy
role but no read
role. Thus, I think we can't really bring permission_required_for_context
back. Perhaps, if len(accounts) == 0
, we should raise Forbidden
or Unauthorized
.
Regarding the changes to the check_access
, to my (superficial) understanding, they are already covered by the authorization policy:
- No account is passed, and all_accessible=false: caller wants to list their own assets
Given that account is not passed, the marshmallow field will default to AccountIdField.load_current
and check_access
will make sure that the current user has read permissions for the account.
- No account is passed, and all_accessible=true: caller wants to list all assets that they can access
In this scenario, the user could or couldn't have read
access to his/her own account. Yet, it could have a consultant
role and have access over the accounts that have the consultant_account_id
set to the user's account.
- An account is passed, and all_accessible=false: caller wants to list some other account's assets
check_access
make sure that the requesting user has read
permissions to its own account.
- An account is passed, and all_accessible=true: caller wants to list all assets that some other account can access. Caller requires read access on the passed account (checked using the decorator) and the passed account requires read access on the assets (to be checked using the revised check_access I suggested above).
Here probably we should raise if the current user has no read
permissions for its own account even if it has consultant role.
One open point would be whether we should implicitly assume that, if the caller has access to account, it also has access to all accounts that account has access to. That is, we assume consultant permissions propagate.
This is an interesting point, indeed. I think we should open a new issue for this and handle that on the Account __acl__
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this touches auth policies I want to make sure we get @nhoening 's opinion, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting solution!
- With the new structure, I guess we could deprecate the
/public
endpoint (as we can return public endpoints here), but if one * only* wants public assets, that would require loading a potentially long list and parsing the result. Or we add another parameteronly_public
. - Consultancy is not propagating across multiple levels of accounts, and this is not the time or place to go there. We don't need it. Let's leave authorization (and
check_access
where it is right now) - "An unlikely case but possible is the user to have consultancy role but no read role." I don't understand this. If you have that role, and your account is consultant of a client account, you can read client account data.
- I would also not do Case 4 ("An account is passed, and all_accessible=true: caller wants to list all assets that some other account can access."). Or is there a real use case? Why should we go to this trouble? We could say that both are not possible together and raise 422.
- Docstring can still be better: "List all assets that the user can access, on a given account (default: own) or all that the user has permission to read (excluding public)".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"An unlikely case but possible is the user to have consultancy role but no read role." I don't understand this. If you have that role, and your account is consultant of a client account, you can read client account data.
This is the case that a user has consultant roles but no doesn't have the read
. This means that it can read their client's account but not his/her.
I would also not do Case 4 ("An account is passed, and all_accessible=true: caller wants to list all assets that some other account can access."). Or is there a real use case? Why should we go to this trouble? We could say that both are not possible together and raise 422.
I agree, currently we don't need to check what a user belonging to a different account can list. Also, If we were to implement this, it gets more complex because the user roles + account roles + consultancy relationship determine if a user can read an asset.
…count Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
…oint' into feature/crud/get-from-index-endpoint
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
Reminder: this PR requires an API changelog entry. |
Signed-off-by: Victor Garcia Reolid <victor@seita.nl>
…oint' into feature/crud/get-from-index-endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small refactoring ideas.
I tested it and I observed some speedup, so this is not bad.
With many many assets (like when we do simulations) the real solution is to not ask the API for all assets at once, but use server-side pagination). We do it on the client side now.
|
||
asset_ids_filter = [GenericAsset.id.in_(ad["id"] for ad in assets)] | ||
|
||
return db.session.scalars(select(GenericAsset).where(*asset_ids_filter)).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What fields are we displaying which are not in the initial API response?
I'd wager it's the account details. Maybe a good idea to mention why we go to the database again (to gather all data not just direct asset fields)
@@ -89,7 +91,25 @@ def index(self, account: Account, include_inactive: bool = False): | |||
:status 403: INVALID_SENDER | |||
:status 422: UNPROCESSABLE_ENTITY | |||
""" | |||
users = get_users(account_name=account.name, only_active=not include_inactive) | |||
|
|||
if account is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't the code in here which deals with getting the accounts be similar to the code currently in the API endpoint for accounts?
It seems that one has two features we could also use here (take the current user's account if None passed, and maybe we want to raise if the user doesn't have access to the account they sent in.
Maybe worth a small utility function.
Description
Running against the simulation DB with 196 Assets and 59 users, the existing code takes about 134s to load the asset page. With the changes introduced in this PR, the asset pages loading time is brought down to 53s, which is still pretty high.
Further Improvements
Related Items
Closes #964