Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SrcFsType can't be used to reliably detect backend in metadata mapper #7848

Closed
chscott opened this issue May 15, 2024 · 8 comments
Closed

SrcFsType can't be used to reliably detect backend in metadata mapper #7848

chscott opened this issue May 15, 2024 · 8 comments
Labels
bug Support Contract Issues made for customers with support contracts
Milestone

Comments

@chscott
Copy link

chscott commented May 15, 2024

The associated forum post URL from https://forum.rclone.org

N/A

What is the problem you are having with rclone?

The metadata mapper receives a JSON object from rclone that contains the SrcFsType. We've built our mapper to detect the values drive and onedrive so the appropriate mapping logic can be invoked, returning to rclone a JSON object compatible with the target.

In some cases (e.g., a Google doc), SrcFsType is set to an unexpected value like object.memoryFs, and we don't handle that type. I could further inspect the object to look for something like content-type and infer that a value of application/vnd.google-apps.document means a drive backend, but I'd have to identify and maintain a list of content types to look for. I'm also not sure if object.memoryFs is the only oddball value I need to recognize.

What is your rclone version (output from rclone version)

rclone v1.67.0-beta.7957.625622bcb.fix-onedrive-metadata
- os/version: Microsoft Windows 11 Pro 23H2 (64 bit)
- os/kernel: 10.0.22631.3527 (x86_64)
- os/type: windows
- os/arch: amd64
- go/version: go1.22.3
- go/linking: static
- go/tags: cmount

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

Windows 11, 64-bit

Which cloud storage system are you using? (e.g. Google Drive)

  • Google Drive
  • OneDrive/SharePoint

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

copy Source: Target:

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

{"level":"debug","msg":"Metadata mapper sent: \n{\n\t\"SrcFs\": \"memory:\",\n\t\"SrcFsType\": \"object.memoryFs\",\n\t\"DstFs\": \"Target{VZpyf}:Test\",\n\t\"DstFsType\": \"onedrive\",\n\t\"Remote\": \"Sample doc.docx\",\n\t\"Size\": 317900,\n\t\"MimeType\": \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n\t\"ModTime\": \"2024-05-15T15:05:22.457Z\",\n\t\"IsDir\": false,\n\t\"Metadata\": {\n\t\t\"btime\": \"2024-05-15T14:56:11.061Z\",\n\t\t\"content-type\": \"application/vnd.google-apps.document\",\n\t\t\"copy-requires-writer-permission\": \"false\",\n\t\t\"mtime\": \"2024-05-15T15:05:22.457Z\",\n\t\t\"owner\": \"chad@cdsconsulting.co\",\n\t\t\"permissions\": \"[{\\\"emailAddress\\\":\\\"angie@cdsconsulting.co\\\",\\\"id\\\":\\\"14885772533033484759\\\",\\\"role\\\":\\\"writer\\\",\\\"type\\\":\\\"user\\\"},{\\\"emailAddress\\\":\\\"chad@cdsconsulting.co\\\",\\\"id\\\":\\\"09287294999424909072\\\",\\\"role\\\":\\\"owner\\\",\\\"type\\\":\\\"user\\\"}]\",\n\t\t\"starred\": \"false\",\n\t\t\"viewed-by-me\": \"true\",\n\t\t\"writers-can-share\": \"true\"\n\t}\n}\n","source":"fs/metadata.go:123","time":"2024-05-15T10:18:00.282537-05:00"}

How to use GitHub

  • Please use the 👍 reaction to show that you are affected by the same issue.
  • Please don't comment if you have no relevant information to add. It's just extra noise for everyone subscribed to this issue.
  • Subscribe to receive notifications on status change and new comments.
@chscott chscott added the Support Contract Issues made for customers with support contracts label May 15, 2024
@ncw
Copy link
Member

ncw commented May 15, 2024

Heads up @rclone/support - the "Support Contract" label was applied to this issue.

ncw added a commit that referenced this issue May 15, 2024
…apper

Before this change on files which have unknown length (like Google
Documents) the SrcFsType would be set to "memoryFs".

This change fixes the problem by getting the Copy function to pass the
src Fs into a variant of Rcat.

Fixes #7848
@ncw
Copy link
Member

ncw commented May 15, 2024

Google docs are handled in a special way in rclone.

We don't know their size until after they are downloaded so we can't use the standard copy routines. Instead we use the internal equivalent of rclone rcat which streams the result to the destination.

In this process rclone loses the knowledge of the source backend.

I've had a go at fixing this here - can you give it a try? Note that this includes the fix for #7845 which is very relevant!

v1.67.0-beta.7962.0681eb1c7.fix-7848-metadata-mapper on branch fix-7848-metadata-mapper (uploaded in 15-30 mins)

@ncw ncw added the bug label May 15, 2024
@ncw ncw added this to the v1.67 milestone May 15, 2024
@chscott
Copy link
Author

chscott commented May 15, 2024

The SrcFsType now looks good, but I've lost the DstFsType.

Now

"SrcFs": "Source{xrgXa}:Test",
"SrcFsType": "drive",
"DstFs": "//?/C:/Users/Chad/Logs/rclone-spool1942096378",
"DstFsType": "local",

Before

"SrcFs": "memory:",
"SrcFsType": "object.memoryFs",
"DstFs": "Target{VZpyf}:Test",
"DstFsType": "onedrive",

@chscott
Copy link
Author

chscott commented May 15, 2024

Using v1.67.0-beta.7962.0681eb1c7.fix-7848-metadata-mapper, the behavior changes based on whether or not --streaming-upload-cutoff governs the transfer.

Without --streaming-upload-cutoff set

Note that the mapper is invoked twice for a single file.

{
    "level": "debug",
    "msg": "Metadata mapper sent: \n{\n\t\"SrcFs\": \"Source{xrgXa}:Test\",\n\t\"SrcFsType\": \"drive\",\n\t\"DstFs\": \"//?/C:/Users/Chad/Domains/cdsconsulting.co/logs/Copy-GOUserDrives_20240515_160249/rclone-spool259196758\",\n\t\"DstFsType\": \"local\",\n\t\"Remote\": \"Sample doc.docx\",\n\t\"Size\": -1,\n\t\"MimeType\": \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n\t\"ModTime\": \"2024-05-15T15:05:22.457Z\",\n\t\"IsDir\": false,\n\t\"Metadata\": {\n\t\t\"btime\": \"2024-05-15T14:56:11.061Z\",\n\t\t\"content-type\": \"application/vnd.google-apps.document\",\n\t\t\"copy-requires-writer-permission\": \"false\",\n\t\t\"mtime\": \"2024-05-15T15:05:22.457Z\",\n\t\t\"owner\": \"user1@cdsconsulting.co\",\n\t\t\"permissions\": \"[{\\\"emailAddress\\\":\\\"user2@cdsconsulting.co\\\",\\\"id\\\":\\\"14885772533033484759\\\",\\\"role\\\":\\\"writer\\\",\\\"type\\\":\\\"user\\\"},{\\\"emailAddress\\\":\\\"user1@cdsconsulting.co\\\",\\\"id\\\":\\\"09287294999424909072\\\",\\\"role\\\":\\\"owner\\\",\\\"type\\\":\\\"user\\\"}]\",\n\t\t\"starred\": \"false\",\n\t\t\"viewed-by-me\": \"true\",\n\t\t\"writers-can-share\": \"true\"\n\t}\n}\n",
    "source": "fs/metadata.go:123",
    "time": "2024-05-15T16:03:00.789903-05:00"
}
{
    "level": "debug",
    "msg": "Metadata mapper sent: \n{\n\t\"SrcFs\": \"//?/C:/Users/Chad/Domains/cdsconsulting.co/logs/Copy-GOUserDrives_20240515_160249/rclone-spool259196758\",\n\t\"SrcFsType\": \"local\",\n\t\"DstFs\": \"Target{qpiyZ}:Test\",\n\t\"DstFsType\": \"onedrive\",\n\t\"Remote\": \"Sample doc.docx\",\n\t\"Size\": 317900,\n\t\"MimeType\": \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n\t\"ModTime\": \"2024-05-15T10:05:22.457-05:00\",\n\t\"IsDir\": false,\n\t\"Metadata\": {\n\t\t\"atime\": \"2024-05-15T10:05:22.457-05:00\",\n\t\t\"btime\": \"2024-05-15T09:56:11.061-05:00\",\n\t\t\"mode\": \"666\",\n\t\t\"mtime\": \"2024-05-15T10:05:22.457-05:00\"\n\t}\n}\n",
    "source": "fs/metadata.go:123",
    "time": "2024-05-15T16:03:01.167733-05:00"
}

With --streaming-upload-cutoff 100Mi set

Note that the mapper is invoked only once.

{
    "level": "debug",
    "msg": "Metadata mapper sent: \n{\n\t\"SrcFs\": \"Source{xrgXa}:Test\",\n\t\"SrcFsType\": \"drive\",\n\t\"DstFs\": \"Target{qpiyZ}:Test\",\n\t\"DstFsType\": \"onedrive\",\n\t\"Remote\": \"Sample doc.docx\",\n\t\"Size\": 317900,\n\t\"MimeType\": \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n\t\"ModTime\": \"2024-05-15T15:05:22.457Z\",\n\t\"IsDir\": false,\n\t\"Metadata\": {\n\t\t\"btime\": \"2024-05-15T14:56:11.061Z\",\n\t\t\"content-type\": \"application/vnd.google-apps.document\",\n\t\t\"copy-requires-writer-permission\": \"false\",\n\t\t\"mtime\": \"2024-05-15T15:05:22.457Z\",\n\t\t\"owner\": \"user1@cdsconsulting.co\",\n\t\t\"permissions\": \"[{\\\"emailAddress\\\":\\\"user2@cdsconsulting.co\\\",\\\"id\\\":\\\"14885772533033484759\\\",\\\"role\\\":\\\"writer\\\",\\\"type\\\":\\\"user\\\"},{\\\"emailAddress\\\":\\\"user1@cdsconsulting.co\\\",\\\"id\\\":\\\"09287294999424909072\\\",\\\"role\\\":\\\"owner\\\",\\\"type\\\":\\\"user\\\"}]\",\n\t\t\"starred\": \"false\",\n\t\t\"viewed-by-me\": \"true\",\n\t\t\"writers-can-share\": \"true\"\n\t}\n}\n",
    "source": "fs/metadata.go:123",
    "time": "2024-05-15T16:05:05.602204-05:00"
}

@ncw
Copy link
Member

ncw commented May 16, 2024

It looks like the changes I made are working for --streaming-upload-cutoff 100M.

Some backends (such as onedrive) can't upload a file of unknown length. You can see these backends with StreamUpload = N in the overview table.

Google docs don't have a defined length - you have to download them to find out how long they are. So either we cache them in memory (if size < --streaming-upload-cutoff) or we download them to a local disk first (if size >= --streaming-upload-cutoff)

If we cache them in memory then you will just get the single transfer and the single invocation of the metadata mapper. If we are caching them on disk then you will get the transfer to the local disk (with metadata invocation) followed by the transfer from local disk to destination (with metadata invocation). Rclone takes some care that the metadata is correct when transferring via the local disk.

In an ideal world we'd change the onedrive backend to accept streaming uploads. I had a quick review of the uploading methods and still think that you can't upload files without knowing how big they are to onedrive. There is a stack overflow question which confirms this) but you may have a better idea. This would be the best solution if possible.

In the mean time I'm going to attempt to rejig the Rcat code so that it doesn't use the rclone machinery to copy an object to local disk. This will mean writing a bit more code but it would mean you'd only get one transfer. I don't think it would make the transfer less reliable. It would affect the stats slightly but I suspect currently they are quite confusing!

ncw added a commit that referenced this issue May 16, 2024
…apper

Before this change on files which have unknown length (like Google
Documents) the SrcFsType would be set to "memoryFs".

This change fixes the problem by getting the Copy function to pass the
src Fs into a variant of Rcat.

Fixes #7848
ncw added a commit that referenced this issue May 16, 2024
… twice

The --metadata-mapper was being called twice for files that rclone
needed to stream to disk,

This happened only for:
- files bigger than --upload-streaming-cutoff
- on backends which didn't support PutStream

This also meant that these were being logged as two transfers which
was a little strange.

This fixes the problem by not using operations.Copy to upload the file
once it has been streamed to disk, instead using the Put method on the
backend.

This should have no effect on reliability of the transfers.

This also tidies up the Rcat function to make it clear there are 3
ways of uploading the data and make it easy to see that it gets
verified on all those paths.

See #7848
@ncw
Copy link
Member

ncw commented May 16, 2024

I've had a go at fixing this by re-working the internals of rcat.

I've run the local and onedrive integration tests with this and I think it is looking OK, but it could do with more testing.

v1.67.0-beta.7967.12d4964f2.fix-7848-metadata-mapper on branch fix-7848-metadata-mapper (uploaded in 15-30 mins)

@chscott
Copy link
Author

chscott commented May 16, 2024

In an ideal world we'd change the onedrive backend to accept streaming uploads. I had a quick review of the uploading methods and still think that you can't upload files without knowing how big they are to onedrive. There is a stack overflow question which confirms this) but you may have a better idea. This would be the best solution if possible.

My research turned up the same. Uploading files of unknown length does not seem possible with OneDrive/SharePoint.

The fix in v1.67.0-beta.7967.12d4964f2.fix-7848-metadata-mapper looks good to me. In particular, when I have --streaming-upload-cutoff unset:

  • The mapper is invoked only once.
  • SrcFsType and DstFsType have the expected values.
{
    "level": "debug",
    "msg": "Metadata mapper sent: \n{\n\t\"SrcFs\": \"Source{xrgXa}:Test\",\n\t\"SrcFsType\": \"drive\",\n\t\"DstFs\": \"Target{qpiyZ}:Test\",\n\t\"DstFsType\": \"onedrive\",\n\t\"Remote\": \"Sample doc.docx\",\n\t\"Size\": 317900,\n\t\"MimeType\": \"application/vnd.openxmlformats-officedocument.wordprocessingml.document\",\n\t\"ModTime\": \"2024-05-15T15:05:22.457Z\",\n\t\"IsDir\": false,\n\t\"Metadata\": {\n\t\t\"btime\": \"2024-05-15T14:56:11.061Z\",\n\t\t\"content-type\": \"application/vnd.google-apps.document\",\n\t\t\"copy-requires-writer-permission\": \"false\",\n\t\t\"mtime\": \"2024-05-15T15:05:22.457Z\",\n\t\t\"owner\": \"user1@cdsconsulting.co\",\n\t\t\"permissions\": \"[{\\\"emailAddress\\\":\\\"user2@cdsconsulting.co\\\",\\\"id\\\":\\\"14885772533033484759\\\",\\\"role\\\":\\\"writer\\\",\\\"type\\\":\\\"user\\\"},{\\\"emailAddress\\\":\\\"user1@cdsconsulting.co\\\",\\\"id\\\":\\\"09287294999424909072\\\",\\\"role\\\":\\\"owner\\\",\\\"type\\\":\\\"user\\\"}]\",\n\t\t\"starred\": \"false\",\n\t\t\"viewed-by-me\": \"true\",\n\t\t\"writers-can-share\": \"true\"\n\t}\n}\n",
    "source": "fs/metadata.go:123",
    "time": "2024-05-16T12:58:07.830524-05:00"
}

ncw added a commit that referenced this issue May 20, 2024
… twice

The --metadata-mapper was being called twice for files that rclone
needed to stream to disk,

This happened only for:
- files bigger than --upload-streaming-cutoff
- on backends which didn't support PutStream

This also meant that these were being logged as two transfers which
was a little strange.

This fixes the problem by not using operations.Copy to upload the file
once it has been streamed to disk, instead using the Put method on the
backend.

This should have no effect on reliability of the transfers as we retry
Put if possible.

This also tidies up the Rcat function to make the different ways of
uploading the data clearer and make it easy to see that it gets
verified on all those paths.

See #7848
@ncw ncw closed this as completed in faa5831 May 20, 2024
ncw added a commit that referenced this issue May 20, 2024
… twice

The --metadata-mapper was being called twice for files that rclone
needed to stream to disk,

This happened only for:
- files bigger than --upload-streaming-cutoff
- on backends which didn't support PutStream

This also meant that these were being logged as two transfers which
was a little strange.

This fixes the problem by not using operations.Copy to upload the file
once it has been streamed to disk, instead using the Put method on the
backend.

This should have no effect on reliability of the transfers as we retry
Put if possible.

This also tidies up the Rcat function to make the different ways of
uploading the data clearer and make it easy to see that it gets
verified on all those paths.

See #7848
@ncw
Copy link
Member

ncw commented May 20, 2024

Thanks for testing. I've merged this to master now which means it will be in the latest beta in 15-30 minutes and released in v1.67

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Support Contract Issues made for customers with support contracts
Projects
None yet
Development

No branches or pull requests

2 participants