-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select title #9
Comments
Added a candidate we defended recently and icons for voting (could be multiple) |
decentralized versus decentral versus distributed? decentralized sounds off in my ears |
Good point. Git also used "distributed". From git(1):
|
I was following https://www.degruyter.com/document/doi/10.1515/nf-2020-0037/html where no author seemed to raise a red flag in choosing "decentralized". @mih - what was your guide for choosing "decentralized" in favor of "distributed" there? Staying consistent with the title/dRDM in that paper would IMHO be a bonus, although if it was severity flawed, I am ok to "generalize" into "distributed": Looking at https://medium.com/distributed-economy/what-is-the-difference-between-decentralized-and-distributed-systems-f4190a5c6462 I think "decentralized" fits somewhat better than distributed as to reflect the most common use cases, although "distributed" reflects the technology underneath -- that git/git-annex/datalad indeed allow for a more distributed mode of operation. |
There was no particular rational or drive behind "decentralized". Given the labling used by Git, I would have preferred to have made a different choice. As usual, I am also no believer in sticking to mistakes of the past ;-) Re the comparison of the terms in the linked article: I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities. I suspect that the decentralized usage is largely driven by a deeply embedded concept of mine vs theirs.... we shall overcome ;-) |
as with any kind of a "release" it might later become considered as buggy as the prior one ;) and “You become responsible, forever, for what you have tamed.” ― Antoine de Saint-Exupéry, The Little Prince Overall -- I am fine with either, although leaning to "decentralized" for consistency and better reflection of the typical usage patterns. I guess the vote(s) hopefully would help us make the decision. |
You guys probably have thought long about the title, but in my view the options above stress too much the concept of "distributed" or "decentralized" management, while the main feature I think datalad provides for all users has to do with data "tracking" or "versioning". Something like "collaborative distributed tracking" might reflect my perception of Datalad. Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning (and time travel flexibility) of Datalad. Just my honest, shameless thought :) |
Thank you @dorianps for the feedback. Indeed, I think we somewhat missed "versioning" aspect entirely, as if it was given. "tracking" is somewhat implied by "decentralized" or "distributed" but not obvious, but it isalso unclear on its own so not sure if appropriate for a title. |
Just throwing an idea (without contributing a single line on the code):
May still work if |
This. Which is why in software journal my vote is on "distributed" as far as it refers to Datalad.
True, too.
Guess the cross-platform aspect can be left out of the title. If no platform is mentioned in it, we don't need to fight a possible impression that it's platform specific. Moreover hardly any VCS is. So: |
@bpoldrack Your version looks good to me, too. Two thoughts:
|
re "large" - not necessarily, since could be used for management of "sensitive" (licenses, personal data, etc) data re "management for research data" - it is captured better IMHO already by a |
I thought i'd echo @dorianps opinion. I appreciate that history shows the power of distirbuted over centralized... noone wants to go back to SVN but that's not what excited me most about Datalad. With the next big thing being things like MLops and other such buzz words I wonder whether emphasizing Datalad's ability to become a core tool beyond research science would be valuable. Something like: Datalad: A foundation for managing code, data, and environments Assuming another choice is the last thing everyone needs I've voted on the suggestions though : ) |
I really like that take. |
I think that the "foundation" aspect should indeed be verbalized in the paper. But
|
#34 is a possible "lean" injection of the foundation aspect. I guess there could be other places where it could be injected, but I do not think that the JOSS paper would be the best venue to center on "foundational aspect" of DataLad. |
@yarikoptic “Foundation” felt more dramatic and inspiring but I agree it falls a little short in that it hints Datalad is not your all encompassing solution to handling these problems. I feel platform has been over-used because of the stuff in the cloud. I can’t think of a better choice though. It fits well. I like your alternative title. Throwing out some other ideas in increasing order of absurdity in case one sticks or triggers an alternative in someone else’s head: |
What about "bedrock"? I also like "infrastructure (tool)", but it shifts the focus away from the individual user. A little esotheric: "digital companion for joint management of code and data" |
;-) it would have been nice to finally bring DataLad from it soulless form to reflect on its name origin of a curious youth as an alternative to "a person who is deemed to be despicable or contemptible" . |
Following up on previous points about what got us excited about datalad: For me it was the provenance tracking. That features of datalad is, as far as I can tell, really unique - and missing from the title. Specifically, what is missing is that we can track which code changed/created which data. Atm the "joint management" could be read as "managing in one place", but, in a sense, in parallel/independently. I think the provenance aspect could be emphasized a bit more explicitly. Riffing on the last title, something like DataLad: distributed system for joint management of code, data, and their relationship (that sounds a bit clunky to my ears.. but just to give an example of the direction I mean). |
Thank you @pvavra - (actionable for humans and computers) provenance of data transformations is indeed one of killer features. You suggestion sounds not too clunky and to the point as to me. |
I have added the 👎 choice for the 🎉's refinement and added a clarification. Everyone who voted (especially for 🎉) please consider adjusting your vote or expressing explicit (comment) preference for 🎉 over 👎 |
the choice was made and it is
in the paper |
Note mandated, but ": " is a common title pattern in the journal.
Candidates from various source
clarification: 👎 is the refinement of 🎉, and votes for 🎉 will be added to 👎 (unless double-voted).
I encourage those who voted for 🎉 revote for 👎 if they agree, and if you don't - please comment to support your choice of 🎉 over 👎
The text was updated successfully, but these errors were encountered: