-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access-controlled service for large static files #2099
Comments
Are you positive these should be distributed using static files and not media files instead? It sounds like this data would be uploaded by your application users to a |
The data is generated by a crawling process and aggregated into large zip files that the user then downloads. There's no uploading involved. (It's also not tracked in version control.) But if there are ways to restrict access to the files using some other mechanism that I haven't mentioned above, I'm all ears! It just has to be able to efficiently handle multi-gigabyte files. |
Ok, so when I have to do that type of things, for me, there is an "upload" invloved at some point, not from a user, but from the crawling process. Here is how I usually handle this (assuming I'm on Docker based config):
Each time a user wants to download a file, your application exposes the That being said, I don't know how your server is deployed at the University of Pennsylvania, it might be on a dedicated, non-cloud server. I don't know which are your storages options, but if AWS is not suitable, Digital Ocean might be and has a compatible API, which is supported by
A word of warning that it could generate some significant costs from Amazon. |
Description
I'm proposing that a feature be added to serve large static files to authenticated users.
It might not be obvious why this is a problem. Here are some of the possible solution paths, and why they are blocked:
Can't we use a static file service like whitenoise?
Can't Django just serve the files through a
FileResponse
object?FileResponse
objects do a decent job of serving small and medium-sized files, but for very large files, problems arise. (In my case, when files get big enough, I hit a memory error.) It appears that if a given environment has awsgi.file_wrapper
defined,FileResponse
objects may use that to efficiently serve access controlled files. But that seems to require that Django be running on the same machine as the web server.Isn't there some kind of funky thing you can do with headers?
cookiecutter-django
used Caddy. Caddy supported theX-Accel-Redirect
header, and could be configured similarly to nginx (as described here). After the switch to Traefik, this approach no longer works, because Traefik is not a web server at all.Could you use AWS somehow?
How should it be implemented? I don't know. This is where I am stuck, and would welcome discussion. I posted a question on stack overflow and got crickets; if you see a way around this that doesn't require a pull request, please feel free to answer there.
Rationale
In a sense, this is not a "feature" but a fix. The change from Caddy to Traefik arguably broke functionality that was working pretty well before.
What it really means for me, concretely, is this: now that I want to do something similar with a new app, I can't use cookiecutter-django without a fairly elaborate and awkward reconfiguration -- something like standing up an nginx container between the django service and the traefik service. If that's the only option, my instinct is to not use cookiecutter-django at all. I probably don't need all the things, and the configuration work will wind up being about the same either way. And maybe that's fine; this could just be a "It might not be what you want" situation.
But I'm proposing the alternative narrative that this would actually fix something that worked before and now is broken. I don't honestly imagine that there are that many people doing what I'm doing, and so I can't argue that you will lose a bunch of users over this. It's just kind of annoying that it used to be easy, and now is hard.
Use case(s) / visualization(s)
Here's my use case: I am developing new apps for researchers at the University of Pennsylvania doing large-scale statistical text analysis in multiple different departments. I need to be able to automatically distribute copyright-protected data to authorized users in bulk, without risking leaking the data.
The text was updated successfully, but these errors were encountered: