Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move unwrapping and wrapping of paths to the gateway #2016

Closed
wants to merge 1 commit into from

Conversation

C0rby
Copy link
Contributor

@C0rby C0rby commented Aug 24, 2021

We've moved the wrapping and unwrapping of reference paths to the storage gateway so that the storage provider doesn't have to know its mount path.

To test this with ocis you need these changes: owncloud/ocis#2460

Fixes #578 (comment)

@update-docs
Copy link

update-docs bot commented Aug 24, 2021

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes.

@C0rby C0rby force-pushed the wrap-unwrap-gateway2 branch 7 times, most recently from 020861c to d30ee97 Compare August 31, 2021 13:46
@C0rby C0rby marked this pull request as ready for review August 31, 2021 14:02
@C0rby C0rby requested a review from labkode as a code owner August 31, 2021 14:02
@C0rby C0rby requested review from butonic and refs August 31, 2021 14:02
Copy link
Member

@refs refs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, logging in doesn't seem to work for me on this PR. I flushed /var/tmp/ocis but still nothing, when rebuilding the entire tree it only creates directories

@C0rby
Copy link
Contributor Author

C0rby commented Sep 1, 2021

Hm, logging in doesn't seem to work for me on this PR. I flushed /var/tmp/ocis but still nothing, when rebuilding the entire tree it only creates directories

Yes, you're right. I forgot to link the ocis changes that are necessary to be able to test this here...

owncloud/ocis#2460

@C0rby C0rby self-assigned this Sep 3, 2021
@C0rby C0rby force-pushed the wrap-unwrap-gateway2 branch 3 times, most recently from e6672c3 to c5b1fdc Compare September 8, 2021 15:43
refs
refs previously approved these changes Sep 23, 2021
@butonic
Copy link
Contributor

butonic commented Oct 12, 2021

fixes #578

refs
refs previously approved these changes Oct 13, 2021
@ishank011
Copy link
Contributor

ishank011 commented Oct 27, 2021

@C0rby this won't work in all scenarios, especially when the registry rules don't necessarily correspond to the actual mount paths.

Consider this scenario where we have user spaces sharded across various storage providers, so the registry rules would like
/users/[a-h] -> sp1
/users/[i-p] -> sp2
/users/[q-z] -> sp3

Since each of these storage providers has to serve requests corresponding to various users, in the existing scenario, they would be mounted at a path which would have been the common prefix of all paths served by it, in this case, /users. And we would assume that the root of sp1 would have the directories a, b, c, .. h.

So for a request to list /users/a/alice/documents, sp1 would trim the prefix /users (which would have been its mount path) and then forward the request /a/alice/documents to storage.FS. With this change, you'd be forwarding /alice/documents (as you're trimming the registry rule) and it won't work.

@ishank011
Copy link
Contributor

ishank011 commented Oct 27, 2021

You can get around this by storing the mount path along with the rule but that would lead to duplication of config and won't be the best design, so I don't really agree with the decision. What's your use case?

@butonic
Copy link
Contributor

butonic commented Oct 27, 2021

@ishank011 All you are trying to do with this is shard the storage provider for /users.

In your example the three storage providers
/users/[a-h] -> sp1
/users/[i-p] -> sp2
/users/[q-z] -> sp3
actually all are 'mounted' under /users. IMO The storage registry should just return /users for them. Then the storage providers would receive requests like /a/alice/path/to/file.ext.

For more details please check #578 and the ussues linked in it. Storage providers must not be aware of their mount point, or they are tightly coupled with the gateway. This is the first prerequisite of getting rid of the enable home logic that needlessly complicates the development of storage drivers as well as deployments. (The second stop is moving the share folder logic rom the gateway and storage providers into a dedicated share storage provider.)

@micbar
Copy link
Member

micbar commented Oct 27, 2021

@ishank011 @labkode If you need a call, please contact me.

This change is vital for our shared understanding of the reva architecture and we need to unblock ich very quickly.

@C0rby
Copy link
Contributor Author

C0rby commented Oct 28, 2021

Basically what @butonic said. We don't want storageproviders to know about their mount points.

Consider this scenario where we have user spaces sharded across various storage providers, so the registry rules would like
/users/[a-h] -> sp1
/users/[i-p] -> sp2
/users/[q-z] -> sp3

An alternative would be:
/users -> sp1
/users -> sp2
/users -> sp3

The registry would return all 3 and the gateway forwards the request to them all but only the one configured for /a/... would return data. The other ones would return nothing.

@ishank011
Copy link
Contributor

@butonic to store the mount path for each such rule, we'll need to maintain it for each such rule then
tests/oc-integration-tests/local/gateway.toml

"/users" = {"address" = "localhost:11000","provider_id" = "123e4567-e89b-12d3-a456-426655440000", "provider_path" = "/users"}
"123e4567-e89b-12d3-a456-426655440000" = {"address" = "localhost:11000", "provider_id" = "123e4567-e89b-12d3-a456-426655440000", "provider_path" = "/users"}

This sounds like a pretty bad design, as there are much more chances of misconfiguration and conflicts here. To get around this, you can only store the address along with the rules and the mount path and IDs in a separate struct, which is essentially what we're doing now.

This is the first prerequisite of getting rid of the enable home logic that needlessly complicates the development of storage drivers as well as deployments.

So the enable home logic is not a prerequisite at all for a storage driver. If it only wants to support global URLs, that is supported as is. We just have that logic in the same driver to avoid code duplication, since after the path translation, it's essentially the same.

From this change, we can support both type of requests from the same storage provider, that's an advantage. But you're just moving that logic from the driver to the registry. So your rule would look like

"/home" = {"address" = "localhost:11000","provider_id" = "123e4567-e89b-12d3-a456-426655440000", "provider_path" = "prepend /users/{{substr 0 1 .Username}}/{{.Username}} to the request path??"}

This is making things more complicated for me. At the moment, gateway just calls up the registry to get a list of SPs, the registry does the dirty, complicated work of expanding aliases and matching and we pass the request as it is to the SPs. There's a clear separation of concerns, which will be disturbed by the change.

The second stop is moving the share folder logic from the gateway and storage providers into a dedicated share storage provider.

I completely agree with this. But there again, I don't see the harm in the storage providers trimming the mount paths.

@ishank011
Copy link
Contributor

The registry would return all 3 and the gateway forwards the request to them all but only the one configured for /a/... would return data. The other ones would return nothing.

Sorry, this just sounds really bad latency-wise. We have 9 EOS instances and sending a request to each of them for every operation is not a good idea.

@butonic
Copy link
Contributor

butonic commented Oct 28, 2021

@ishank011 you are absolutely right: the storage registry is responsible for finding the correct storage provider for a request. But why does the storage provider have to cut off anything from the path segments if the registry already did the 'hard work'? That IMO is a clear violation of concerns.

In the global CS3 namespace /users is just a human readable alias for the storage provider that is responsible for user homes.

If we want to shard the user provider we can register them for /users/[a-k] and /users/[l-z]. In a way we are reinventing the wheel by implementing a poor router: the shiftPath approach used by the rhttp package would examine head (users) and tail (a/Alice/...). It would do a strings.HasPrefix or a regex match on the tail and then hand over further processing to the next handler, determined by the outcome of the regex.

While the storage registry implementation becomes a little more complex, the storage provider becomes less complex: we can completely drop the mount_path config option for the latter. Any wrap / unwrap logic in a driver will only deal with translating the received path to an internal reference, without having to also cut off a prefix. The gateway is responsible for doing that: it can ask the storage registry about the mount path, cut it off and prepend paths properly in responses.

A storage provider should not know it's mount path: #578 that is already a cause for misconfiguration.

A static storage provider that uses a toml file to configure routing policies using key value pairs may not be sufficient enough. But changing the static implementation may be a feasible short term solution. In the long run it does IMO not make sense to have a static configuration: the registry should query the storage providers for the storage spaces and their aliases. That would build up a proper routing table... But out of scope for now...

@butonic
Copy link
Contributor

butonic commented Oct 29, 2021

@ishank011 can you provide an example config for the virtual views? We'd like to cover this with test cases. But it is not clear how to configure the mount path for storageproviders. I assume the storage providers are all configured with mount point /users and the registry uses multiple regex rules. Anything else we need to take into account? I did not find an example toml...

Ummm also ... would the two entries in the storage registry have the same id?

# mount two storage providers for users
"/users/[a-k]" = {"address" = "localhost:11000"}
"123e4567-e89b-12d3-a456-426655440000" = {"address" = "localhost:11000"}
"/users/[l-z]" = {"address" = "localhost:11001"}
"123e4567-e89b-12d3-a456-426655440000" = {"address" = "localhost:11001"}

And how do you shard /home ?

@ishank011
Copy link
Contributor

@butonic sharding for home storage providers happens via the aliases feature in the static storage registry.

Consider the following registry rules:

{
 "/home": {
  "aliases": {
   "/home-[a-l]": "localhost:18000",
   "/home-[m-z]": "localhost:18001",
  },
  "mapping": "/home-{{substr 0 1 .Id.OpaqueId}}"
 },
 "/reva/user/[a-l]": {
  "address": "localhost:16000" -> mounted at /reva/user
 },
"reva-[a-l]": {  // This is the ID, not path
  "address": "localhost:16000"
 },
 "/reva/user/[m-z]": {
  "address": "localhost:16001" -> mounted at /reva/user
 },
"reva-[m-z]": {  // This is the ID, not path
  "address": "localhost:16001"
 },
 "/reva/project": {
  "address": "localhost:14000" -> mounted at /reva/project
 },
 "revaproject": {  // This is the ID, not path
  "address": "localhost:14000"
 },
 "/public": {
  "address": "localhost:13000"
 }
}

Now for a request to /reva, gateway would send a request to 16000, 16001 and 14000.
At localhost:16000, the storage provider would list /, which would return /a, /b, /c.... Mounting these, it would get /reva/user/a, /reva/user/b. Now since it needs to list /reva, it would condense these to give just the immediate child, so it would return /reva/user. localhost:16001 would also return /reva/user. The gateway would merge both of these. Similarly, gateway would get /reva/project so it would return these two paths.

Now consider a request to list /, it should go to all rules which work via path. In this case, both localhost:16000 and localhost:14000 would return /reva after condensing the paths. These would be further aggregated in the gateway. The /public sp would return an error, so we'll ignore that. It's similar to what @C0rby suggested but this is carried out only in extreme cases. We can't send such requests for every storage operation

@butonic
Copy link
Contributor

butonic commented Oct 29, 2021

as mentioned in #2215 (comment) the sharestorageprovider brings a new findEmbeddedProvider() that we can extend to handle 'virtual views'.

@labkode
Copy link
Member

labkode commented Oct 29, 2021

as mentioned in #2215 (comment) the sharestorageprovider brings a new findEmbeddedProvider() that we can extend to handle 'virtual views'.

let's not mix the sharestorageprovider with this virtual views. I like the concept, and we could refactor it, but the sharestorageprovider is for sharing only.

@butonic
Copy link
Contributor

butonic commented Oct 29, 2021

The sharestoreageprovider PR not only introducas a new storage provider. It also has to rewrite the etag calculation logic in the gateway for the root folder, the home folder the share jail and the shares. To implement that properly it has to determine which storage provider is embedded in which. That logic is not complicated, but it does change how the gateway works in a fundamental way. It not only calculates the etag and mtime, but also the size aggregation and the tree. AFAICT the sharestorageprovider PR solves the same problem that was adressed with the virtual views in #2215: aggreating metadata for 'virtual' nodes in the cs3 global namespace that exist along the path segments of configured mount points.

@butonic
Copy link
Contributor

butonic commented Oct 29, 2021

First things first. We'll write a test to cover this: #2219

@butonic
Copy link
Contributor

butonic commented Nov 2, 2021

started an acceptance test in #2219

@butonic butonic mentioned this pull request Nov 4, 2021
@dragotin
Copy link
Contributor

dragotin commented Nov 8, 2021

Tests from @butonic in #2219 are merged. Can we continue on this one?

@butonic
Copy link
Contributor

butonic commented Dec 7, 2021

merged in edge as part of #2234

@butonic butonic closed this Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separation of concerns: storage providers should not know about their mount path
7 participants