Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explicit storage directories, or no? #5

Closed
ctb opened this issue Mar 12, 2017 · 11 comments
Closed

explicit storage directories, or no? #5

ctb opened this issue Mar 12, 2017 · 11 comments
Milestone

Comments

@ctb
Copy link
Member

ctb commented Mar 12, 2017

Currently,

python -m osfclient fetch foo 7g6vu puts files under subdirectories named after the storage.

% ls -R
dropbox         github          osfstorage

foo/dropbox:
Questionnaire.docx

foo/github:
README.md

foo/osfstorage:
Recycling Event.png

It looks to me like the Right Approach is to use the attr['materialized_path'] attribute for filenames, which is what ls currently does --

% python -m osfclient ls 7g6vu 
looking at id=7g6vu, title=testfoo
/Questionnaire.docx
/README.md
/Recycling Event.png
/osf_subdirectory/

but I don't know what happens if a file from one osfstorage steps on a file from another osfstorage. Any tips @felliott?

@ctb
Copy link
Member Author

ctb commented Mar 12, 2017

Note this is behavior in #3 currently.

@felliott
Copy link
Collaborator

I think fetch is doing it correctly. materialized_path is the path relative to the storage provider root, so you'll want to prefix it with the provider shortname. We don't try to enforce any name uniqueness constraints across providers.

@felliott
Copy link
Collaborator

And one thing to watch out for: GoogleDrive, S3, and osfstorage are not always mappable to filesystems without special handling. All are happy to have a file named foo and a directory named foo in the same path. GDrive takes it a step farther and is happy to have two files named foo in the same directory.

@ctb
Copy link
Member Author

ctb commented Mar 12, 2017

thanks @felliott

...but let me argue for the other behavior here: it is simply not useful to require people to descend into subdirectories to get access to specific files :).

just to make life more complicated there's all sorts of things that we could do here ... some not necessarily orthogonal options:

  • have an explicit policy about what to do in case of collision
  • permit a specific osfstorage to be designated as 'root' (and even have that be a default for, say, osfstorage, but make it overridable)
  • provide a JSON mapping file that maps between filenames and OSF locations/paths.

I think designating osfstorage as root is a good default, and then they can fail if they have dared to name something under osfstorage that is also a storage type, e.g. if they have a dropbox file in their osfstorage location. All fun things to test :)

Is there any other remote integration we should be looking to for guidance here, or has this not come up before? I note that there is no way to do a download of an entire project's files as a single zip - it's storage specific.

@ctb
Copy link
Member Author

ctb commented Mar 12, 2017

OK, well, for now:

% python -m osfclient ls 7g6vu 
looking at: testfoo
dropbox/Questionnaire.docx
github/README.md
osfstorage/Recycling Event.png
osfstorage/osf_subdirectory/
osfstorage/osf_subdirectory/dotguide.pdf

@felliott
Copy link
Collaborator

👍 to the "well, for now". As far as having the files in separate directories based on provider, I'd argue that finding files nowadays comes in two styles:

  • the rigidly hierarchical (myself)
  • the Alfred-loving fuzzy-finders (reasonable people)

I agree that it may not be useful to go directory diving, but most of those folks have moved on to fuzzy-finding, so directories are somewhat irrelevant anyway. For myself, I like distinguishing between what's on my GDrive, what's on my osfstorage, and what's on my Dropbox. I use each of those for different purposes. I especially don't want those mixed into whatever GitHub project I have linked up. If I did that, my flake8 would become self-aware and murder me, and no jury would convict it. (Of course, that could just be an argument for making github/bitbucket/gitlab a special case).

@betatim
Copy link
Member

betatim commented Mar 13, 2017

I like (what a surprise) the behaviour of having subdirectories for each storage. It makes it explicit where a file came from and avoids heuristics to deal with naming collisions. Another argument for sub dirs would be that you are probably keeping different things in the different storage providers (code on github, data on S3, ...) so you want them to be different.

One thing I would like is an option --no-subdir (needs a better name) in the case where there is only one storage provider active.

@ctb
Copy link
Member Author

ctb commented Mar 13, 2017

One other thought: osf-cli should be idempotent, i.e.:

osf push foo --create "project title"
osf fetch foo <newproj>

should result in fetch having to do nothing; and likewise

osf fetch foo <newproj>
osf push foo

should do nothing.

@ctb
Copy link
Member Author

ctb commented Mar 13, 2017 via email

@betatim
Copy link
Member

betatim commented Mar 14, 2017

I'd expect it to show me the subdirectories ... especially as you might have notebooks in multiple storages. I really don't like having files dropped in the root directory because I don't know what a good strategy is for handling conflicts when two or more storages give you the same path.

So I still prefer the --no-subdir option for the case of only one storage being active, and in the future maybe a .osfconfig that specifies a mapping from storage provider to local fs path.

@betatim
Copy link
Member

betatim commented May 2, 2017

We now have basic config file support. I think adding to the config file a way to specify mappings from "name of the storage on osf.io" to "local path" is the way to go. By default we continue to do what osf does not (create subdirectories for each storage backend) but we let the (power) user reconfigure it.

@betatim betatim added this to the v0.1 milestone May 2, 2017
@betatim betatim closed this as completed Jun 8, 2017
yacchin1205 referenced this issue in yacchin1205/rdmclient May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants