Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge performance issues #2881

Open
ishank011 opened this issue May 17, 2022 · 5 comments
Open

Edge performance issues #2881

ishank011 opened this issue May 17, 2022 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ishank011
Copy link
Contributor

ishank011 commented May 17, 2022

When calling the GET /ocs/v1.php/apps/files_sharing/api/v1/shares?include_tags=false&state=all&shared_with_me=true endpoint, each received share is statted 23 times (twice via the Stat call, the rest of the times via the ListStorageSpaces call), since ListStorageSpaces is called every time any type of operation needs to be performed.

Enabling the storageprovider cache in the gateway reduces this to 12 stats for subsequent calls. For an actual heterogenous, distributed system, this makes the platform unusable.

The performance is further worsened by two calls to list all the received shares for EVERY received share in the shares storage provider.

Note that these numbers have been obtained after the change to route requests by mount IDs was merged.

cc @labkode @butonic @micbar @dragotin

@ishank011
Copy link
Contributor Author

One suggestion would be to not call ListStorageSpaces for every lookup. Just verify that the provider ID or mountpoint matches, return that and assume that it’ll handle the actual storage request.

I tried to implement something related to this but since generating the space path requires attributes of the space a reference belongs to, I couldn’t. The gateway storage provider constructs a relative reference using this path and further uses that in the request. IMO this should be delegated to the storage provider itself.

@butonic
Copy link
Contributor

butonic commented May 20, 2022

The gateway storage provider constructs a relative reference using this path and further uses that in the request. IMO this should be delegated to the storage provider itself.

I always wanted to keep all path handling in the registry: it should know the full path to every space root. That would have been a very simple design, but it only works for a spaces and id first approach.

For path based requests the space registry might only be responsible for the prefix that leads to the storage provider. But this is misleading as a storage provider can provide both: personal and project spaces. A more precise description would be that the storage registry is responsible to know the path prefix for a set of spaces that lives on the same storage provider (or even multiple ones). That would allow defining which storage providers to query for a given type or path.

The storage drivers however would have to ba able to deal with path based requests. Something we deliberately got rid of.

Maybe the storageprovider can deal with the space lookup for a path. It could make a ListStorageSpaces call with a path filter to get the space and then call the actual method.

@butonic
Copy link
Contributor

butonic commented May 20, 2022

cc @kobergj @aduffeck ^

@butonic butonic self-assigned this May 20, 2022
@butonic butonic added the bug Something isn't working label May 20, 2022
@butonic
Copy link
Contributor

butonic commented May 23, 2022

We have to differentiate to types of requests as they follow different lookup semantics:

  1. relative requests that have a ResourceID and a Path set
  2. Path based requests

For 1. we added #2792 which should use the providerids that are provided in ocis since: owncloud/ocis#3619. ocdav does a LookUpStorageSpaceByID call which contains an ID filter. The gateway storageprovider splits the spaceid into a providerid and a nodeid. The space registries ListProvider then makes a findProvidersForResource call which currently then makes a ListStorageSpaces call on the storageprovider with the matching providerid or a provider without an id set.

I think we mixed responsibilities. An the reva storage registry may not need to do an actual ListStorageSpaces call on the provider as the reva storage registry is only responsible for routing. To route a PROPFIND or MKCOL request the ocdav service does not need to know the etag or mtime of the storage space. The gateway certainly does not need it for routing.

I may have failed to clarify that the graph service, which lists the spaces available to a user really is the only service that needs to be aware of etags and mtimes...

🤔

For 2. the routing happens based on path prefixes. The reva storage registry can match requests based on path prefix and The gateway should IMO truncate the path segments it can consume before forwarding the request to the storage provider (/users/e/einstein/relative/path is matched by /users/[a-k], but only the provider path /users is prefix trimmed, leading to e/einstein/relative/path reatching the storage provider).

The storage provider should then use the storage driver interface to look up the space for the given path. In the running example from above e/einstein could be the space alias that can be used to build a relative reference with a storage id and a path relative/path.

Now, while I would prefer making the storage driver interface always requires relative references and the storage provider always using a yet to specify ResolveSpaceAlias or ListStorageSpaces() call with a filter on the alias, I see the tradeoff when using path only navigation: all the path to space navigation may be unnecessary to reach a resource.

But paths are fragile and the web ui as well as all clients will always use relative requests in the future.

I think both types of requests can be made efficient without an additional call to a storage space. Only the ListStorageSpaces call needs to make actual calls on the providers ... which it currently does anyway...

🤔

I'll check if I can kill the ListStorageSpaces call in the spaces registry. It should be possible because we know the providerid.

@micbar
Copy link
Member

micbar commented Jun 15, 2022

@butonic can you link the PR which reduces the stat requests here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants