Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select title #9

Closed
1 of 7 tasks
mih opened this issue Mar 19, 2021 · 24 comments
Closed
1 of 7 tasks

Select title #9

mih opened this issue Mar 19, 2021 · 24 comments

Comments

@mih
Copy link
Member

mih commented Mar 19, 2021

Note mandated, but ": " is a common title pattern in the journal.

Candidates from various source

  • 👍 DataLad: data management system for discovery, management, and publication of digital objects of science
  • 🚀 DataLad: perpetual decentralized management of digital objects for collaborative open science
  • 😄 DataLad: decentralized management of digital objects for open science
  • ❤️ DataLad: decentralized Research Data Management
  • 👀 DataLad: distributed system for joint management of code, data, and computational environments
  • 🎉 DataLad: distributed system for joint management of code and data
  • 👎 DataLad: distributed system for joint management of code, data, and their relationship

clarification: 👎 is the refinement of 🎉, and votes for 🎉 will be added to 👎 (unless double-voted).
I encourage those who voted for 🎉 revote for 👎 if they agree, and if you don't - please comment to support your choice of 🎉 over 👎

@yarikoptic
Copy link
Member

Added a candidate we defended recently and icons for voting (could be multiple)

@adswa
Copy link
Member

adswa commented Mar 26, 2021

decentralized versus decentral versus distributed?

decentralized sounds off in my ears

@mih
Copy link
Member Author

mih commented Mar 26, 2021

decentralized versus decentral versus distributed?

decentralized sounds off in my ears

Good point. Git also used "distributed". From git(1):

Git is a fast, scalable, distributed revision control system

@yarikoptic
Copy link
Member

yarikoptic commented Mar 26, 2021

I was following https://www.degruyter.com/document/doi/10.1515/nf-2020-0037/html where no author seemed to raise a red flag in choosing "decentralized". @mih - what was your guide for choosing "decentralized" in favor of "distributed" there? Staying consistent with the title/dRDM in that paper would IMHO be a bonus, although if it was severity flawed, I am ok to "generalize" into "distributed":

Looking at https://medium.com/distributed-economy/what-is-the-difference-between-decentralized-and-distributed-systems-f4190a5c6462 I think "decentralized" fits somewhat better than distributed as to reflect the most common use cases, although "distributed" reflects the technology underneath -- that git/git-annex/datalad indeed allow for a more distributed mode of operation.

@mih
Copy link
Member Author

mih commented Apr 5, 2021

There was no particular rational or drive behind "decentralized". Given the labling used by Git, I would have preferred to have made a different choice. As usual, I am also no believer in sticking to mistakes of the past ;-)

Re the comparison of the terms in the linked article: I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities. I suspect that the decentralized usage is largely driven by a deeply embedded concept of mine vs theirs.... we shall overcome ;-)

@yarikoptic
Copy link
Member

As usual, I am also no believer in sticking to mistakes of the past ;-)

as with any kind of a "release" it might later become considered as buggy as the prior one ;) and

“You become responsible, forever, for what you have tamed.”

― Antoine de Saint-Exupéry, The Little Prince

Overall -- I am fine with either, although leaning to "decentralized" for consistency and better reflection of the typical usage patterns. I guess the vote(s) hopefully would help us make the decision.

@dorianps
Copy link
Contributor

dorianps commented Apr 9, 2021

You guys probably have thought long about the title, but in my view the options above stress too much the concept of "distributed" or "decentralized" management, while the main feature I think datalad provides for all users has to do with data "tracking" or "versioning". Something like "collaborative distributed tracking" might reflect my perception of Datalad. Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning (and time travel flexibility) of Datalad. Just my honest, shameless thought :)

@yarikoptic
Copy link
Member

Thank you @dorianps for the feedback. Indeed, I think we somewhat missed "versioning" aspect entirely, as if it was given. "tracking" is somewhat implied by "decentralized" or "distributed" but not obvious, but it isalso unclear on its own so not sure if appropriate for a title.
Indeed it is hard to embed all possible features/use-cases into a single title. Makes me appreciate the official (in manpage) description of git ("the stupid content tracker") once more.

@dorianps
Copy link
Contributor

dorianps commented Apr 9, 2021

Just throwing an idea (without contributing a single line on the code):

Datalad: collaborative data tracking, transferring, and management, across multiple platforms

Platforms = non-specific catch all (linux, windows, git, uk biobank)

May still work if collaborative is replaced with distributed.

@bpoldrack
Copy link
Member

bpoldrack commented Apr 9, 2021

I think "decentralized" better fits actual usage patterns, but distributed is more appropriate for describing the technological capabilities.

This. Which is why in software journal my vote is on "distributed" as far as it refers to Datalad.

Whether data management is distributed or not depends on the use case, i.e., some users may appreciate the distributed nature of datasets, others may use a centralized repository or even not collaborate with anyone and still benefit from the core data versioning

True, too.

Datalad: collaborative data tracking, transferring, and management, across multiple platforms
May still work if collaborative is replaced with distributed.

Guess the cross-platform aspect can be left out of the title. If no platform is mentioned in it, we don't need to fight a possible impression that it's platform specific. Moreover hardly any VCS is.

So: Datalad: distributed versioning and management for research data ? May be even "large data" instead of "research data". It's agnostic after all and while we might want to draw particular attention from scientific community, JOSS may be more useful for us if we get developers (potential contributors) interested with completely different usecases.

@dorianps
Copy link
Contributor

dorianps commented Apr 9, 2021

@bpoldrack Your version looks good to me, too. Two thoughts:

  1. Research data sounds like a complicated tool for researchers only. I once read a post at git-annex with someone keeping inventory of DVDs using annex. Datalad can be used for any data, research or not.
  2. I thought one of the greatest strengths of Datalad is seamless transfer between platforms, i.e., going from linux to windows, from hard drive to usb, from local to cloud, etc. Those multiple transfer options are what makes it a universal tool for collaborations, that's why I included in the title, but even without it, management can still cover that aspect in a less specific way. So your title is good.

@yarikoptic
Copy link
Member

re "large" - not necessarily, since could be used for management of "sensitive" (licenses, personal data, etc) data

re "management for research data" - it is captured better IMHO already by a Research Data Management (RDM) which is a known concept. So the discussion seems to be just still circling back to which critical features to somehow include in the title to characterize such RDM better. But it seems the currently leading choice of the title even doesn't mention "research" aspect ;)

@leej3
Copy link
Contributor

leej3 commented Apr 10, 2021

I thought i'd echo @dorianps opinion. I appreciate that history shows the power of distirbuted over centralized... noone wants to go back to SVN but that's not what excited me most about Datalad. With the next big thing being things like MLops and other such buzz words I wonder whether emphasizing Datalad's ability to become a core tool beyond research science would be valuable. Something like:

Datalad: A foundation for managing code, data, and environments

Assuming another choice is the last thing everyone needs I've voted on the suggestions though : )

@bpoldrack
Copy link
Member

Datalad: A foundation for managing code, data, and environments

I really like that take.

@mih
Copy link
Member Author

mih commented Apr 11, 2021

Me too! Thx to @dorianps and @leej3 for your perspective. I think we should consider this aspect for title and manuscript focus.

@yarikoptic
Copy link
Member

yarikoptic commented Apr 11, 2021

I think that the "foundation" aspect should indeed be verbalized in the paper. But

  • "foundation" by itself is actually insufficient descriptor. A foundation establishes the grounds to further development (take a foundation of the house of NSF itself), but does not provide a full solution. And DataLad (core) is a complete (edit: i.e. already providing means for RDM) solution.
  • I think "platform" might be a descriptor, which would also encompass the aspect of "foundation". So may be we could take the currently leading title and make it into "DataLad: distributed platform for joint management of code and data". WDYT?

@yarikoptic
Copy link
Member

I think that the "foundation" aspect should indeed be verbalized in the paper.

#34 is a possible "lean" injection of the foundation aspect. I guess there could be other places where it could be injected, but I do not think that the JOSS paper would be the best venue to center on "foundational aspect" of DataLad.

@leej3
Copy link
Contributor

leej3 commented Apr 11, 2021

@yarikoptic “Foundation” felt more dramatic and inspiring but I agree it falls a little short in that it hints Datalad is not your all encompassing solution to handling these problems. I feel platform has been over-used because of the stuff in the cloud. I can’t think of a better choice though. It fits well. I like your alternative title.

Throwing out some other ideas in increasing order of absurdity in case one sticks or triggers an alternative in someone else’s head:
“Approach”, “system”,“Core-tool”, “comprehensive toolkit”,”ecosystem”,”vision”, “armamentarium”, “panacea”

@mih
Copy link
Member Author

mih commented Apr 11, 2021

What about "bedrock"?

I also like "infrastructure (tool)", but it shifts the focus away from the individual user.

A little esotheric: "digital companion for joint management of code and data"

@yarikoptic
Copy link
Member

A little esotheric: "digital companion for joint management of code and data"

;-) it would have been nice to finally bring DataLad from it soulless form to reflect on its name origin of a curious youth as an alternative to "a person who is deemed to be despicable or contemptible" .

@pvavra
Copy link
Contributor

pvavra commented Apr 13, 2021

Following up on previous points about what got us excited about datalad: For me it was the provenance tracking. That features of datalad is, as far as I can tell, really unique - and missing from the title.

Specifically, what is missing is that we can track which code changed/created which data. Atm the "joint management" could be read as "managing in one place", but, in a sense, in parallel/independently. I think the provenance aspect could be emphasized a bit more explicitly.

Riffing on the last title, something like

DataLad: distributed system for joint management of code, data, and their relationship

(that sounds a bit clunky to my ears.. but just to give an example of the direction I mean).

@yarikoptic
Copy link
Member

Thank you @pvavra - (actionable for humans and computers) provenance of data transformations is indeed one of killer features. You suggestion sounds not too clunky and to the point as to me.

@yarikoptic
Copy link
Member

I have added the 👎 choice for the 🎉's refinement and added a clarification. Everyone who voted (especially for 🎉) please consider adjusting your vote or expressing explicit (comment) preference for 🎉 over 👎

@yarikoptic
Copy link
Member

the choice was made and it is

title: 'DataLad: distributed system for joint management of code, data, and their relationship'

in the paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants