New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drive File Stream Quota Management #1093
Comments
Interesting observation, thanks for sharing this. Case 1: Accessing multiple files (i.e. during a search operation)I don't believe there is really anything we can do about it. Cryptomator is just the middleman between the process accessing files and the underlying file system. If a process decides to not just look at metadata but actually read from files, Cryptomator has to obey and accesses the corresponding ciphertext file, thus triggering a download. Case 2: Accessing multiple blocks within a single large file:Of course there is no way to tell the underlying file system to "resume a download", since there is no API for this. However, we can investigate how our access pattern looks like. It should be:
It should not behave like:
Components we have to look at: fuse-nio-adapter (linux/mac), dokany-nio-adapter (win) and cryptofs. @cryptomator/libraries |
Case 1: Would it be possible to have an encrypted local DB ?sqlLite? That would be able to store basic file information to try and manage the impact of this? Of course this would need to be optional and treated more like a cache to manage all the syncing related issues/ conflicts |
How would you define "basic file information"? For metadata like file name, size and modification date it is already not required to download a file. |
I suggest to transform this issue into a feature request and optionally open up a bug report to investigate the behaviour mentioned by @overheadhunter . Cryptomator is first of all designed to access locally stored files. In this case this wouldn't be a problem if a requested file is downloaded as a whole when it is needed, because then you can make as many filesystem calls as you want.
This means that for each call a request is send to the server. And counts into the quota.
Soo, the crucial fact here is the number of filesystem calls. I know from the dokany-nio-adpter, that for big files a lot of read requests are made. Another example:
Edit: Updated due to direct comment below. |
This is not entirely true. CryptoFS creates a file channel when it is asked to create one. It closes it when it is asked to close it. Between those two events the requester can read from the file. This is normal I/O behaviour for any process. The only thing CryptoFS does, is reading a bit more than requested, as it needs whole chunks in order to do the MAC checks. Due to chunk buffering, it won't read things twice, unless cache eviction happens. |
@infeo I am not sure how varied you can change the chunk size but depending on the use case it will take a long time to get banned by google; upto a few hours. If you could reduce the request count by 10x this might be enough not to hit google'd limits. @overheadhunter I am rapid approaching the limit of my technical expertise. Whatever data is need in order for windows explorer to list the files in a directory, perform a search, or another application to do a library scan This could vary greatly depending on use case. Perhaps it could include the last accessed blocks of a file up to a certain size limit; hopefully this would be enough to keep certain requests local to the PC. Making a generalization |
What I can imagine is that Drive File Stream uses certain system features. In windows the filesystem can determine in some cases if a file is used by another program. Maybe Drive File Stream has also this ability and can continue streaming a file. If it would just some basic caching mechanism, it could detect that the same file is read twice. |
@whitephoenix117 Can you make similar tests with the dokany mirror example? It would be interesting if this application using the windows API also quickly hits the limit. I added the log of my test run with it and it can be seen, that the reads are mostly consecutivley. |
@infeo I have tried copying files directly from the vault to a local location using windows explorer. This is completed with a single download request to google. In this case it only triggers a single download request to google and the file transfer rate is limited by your internet bandwidth, or whatever your system bottleneck is for places with fast internet. |
I think I got this correct, but I couldn't figure out how to get the debug version of Dokan to log. From the google end it doesn't appear that it worked. Here is the chain of virtualization levels Drive FS --> Cryptomator --> Dokan Mirror The file was a video, it was accessed through the M:/ directory I played the first 2 minutes Here is the google access log |
According to their Open source attribution Drive FS uses Dokan/ FUSE too. |
Ohh, I'm sorry I was not totally clear. 🙈 I meant trying the mirror example without Cryptomator. Cryptomator is using Dokan to get an unencrypted view on your vault (the mounted drive). Mirror any directory on you File Stream Drive and access it by e.g. streaming a movie file. Here is a small instruction how to use it:
Regarding the Open Source Attribution: Interesting! But i think stacking these drivers into each other should not cause a problem. |
@infeo |
Anything else I can do to help with troubleshooting for this? |
Not that i know. This feature is not very high on the prio list, so don't expect results soon. |
Thanks. I understand you set your priority based on impact and number of affected users, and this is not very high. Let me know if there is anything I can do to contribute. |
I'm not sure this is especially useful for troubleshooting since the integration is completely different but it appears mountainduck https://mountainduck.io/ is a workaround to this issue. I am currently doing some more testing to confirm |
@whitephoenix117, did you reach a conclusion on mountainduck? Does it let you stream files? Does it have quota issues? |
From what I can tell MountainDuck has a different *block *size (not sure if
this is the correct terminology) while streaming the encrypted data such
that it only sends 1 google API request every few seconds opposed to
multiple. For my use case this seems sufficient not to trigger any quota
issues however I am not sure if this is a "fix" perhaps more of a band-aid.
P.S. Mountainduck has its own quirks. For managing on-line vs offline
files/ sync it's not as good as the 1st party google software. I don't have
confidence to rely on it for uploading files. Only for streaming (reading)
them.
Alex E Mena
…On Sun, Sep 26, 2021 at 10:42 PM Kevin ***@***.***> wrote:
@whitephoenix117 <https://github.com/whitephoenix117>, did you reach a
conclusion on mountainduck? Does it let you stream files? Does it have
quota issues?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1093 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOBE6IOP3MYQZYI7OINONTUD7KX3ANCNFSM4MENEYKA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I believe this will fit the requirements for a bug. I apologize ahead of time for the length.
Note, as I am not a google engineer I am taking some liberties with how exactly Drive File Stream works based on my observations
Edit: Added Case 3
Background
Google Drive file stream allows you to "stream" files from drive without having them synchronized locally. This is very helpful for managing disk space.
Normally when a file is accessed via drive stream google's "magic" will allow you to download only
the specific portion of a file that are required for the task needed, similar to how a physical disk only accesses the sectors of data needed. For example if you are viewing a 90min video file Drive FS will only download the blocks related to the 2 minutes that your player is locally buffering, this includes seeking to an arbitrary point within the video. Drive FS will then continue to download blocks as requested by the OS, just like a traditional disk.
Typically when a file is accessed by "streaming" Drive FS will only create a single download request for a series of blocks form the disk as more blocks are needed it will "resume" this download until the needed blocks have been downloaded and provided to the OS. This resume process will repeat as needed.
Ok so whats the problem?
Quotas.
For "security reasons" Google limits the number of "download requests" and if you exceed it they ban you for 24 hours. For security reasons google does not publish exactly what these limits are.
Cryptomator breaks Drive FS's ability to "resume" downloads; creating a massive number of requests to google's servers and will result in you getting banned.
Things I have noticed that trigger excessive download requests to google
-Browsing/ scanning a large vault (1 download/ file/ folder/ etc to get basic metadata)
-Searching a vault (same reason as above)
-Opening/ consuming large files
[Summarize your problem.]
Google will ban you
Windows 10 64bit
Cryptomator 1.4.15
Steps to Reproduce
Case 1
Case 2.
Edit: Added case 3
Case 3:
Edit:
Here is an example of a google access log, you can see there are multiple download requests per second for the same file totaling ~1,400 in 20 minutes.
Expected Behavior
-1 download "request" for each file accessed
-Meta data management/ local caching to prevent file explorer activities from trigger a download for each file/ directory in the vault.
Actual Behavior
Many many many download requests for each file accessed
Reproducibility
Always
Additional Information
Can provide on request
The text was updated successfully, but these errors were encountered: