Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: first-class support for "namespaces" in registries #1836

Open
DilumAluthge opened this issue May 24, 2020 · 66 comments
Open

Comments

@DilumAluthge
Copy link
Member

DilumAluthge commented May 24, 2020

This issue is a feature request and proposal for first-class support for "namespaces" in registries.


The core idea is this: currently, you can register a package, and that package has a URL. In this proposal, you will also be able to register a namespace, and that namespace has a URL. Once a namespace has been registered, a package can be registered under that namespace if and only if the URL for the package begins with the URL for the namespace.


To avoid confusion, let's be careful about our terminology.

  • Let us use the term "organization" to refer to a GitHub/GitLab/Bitbucket/etc organization.
  • Let us use the term "namespace" to refer to the namespaces we are creating in the Julia General registry (or any Julia registry, for that matter).

Here is a rough sketch of the workflow:

So I am envisioning something like this workflow:

Step 1: I create an organization in GitHub, GitLab, BitBucket, etc. For example, suppose I create an organization at github.com/MyFooGitHub.

Step 2: I use some process to register an namespace with the Julia General registry. For example, I make a PR to the General registry that registers a new namespace called MyFooNamespace. And as part of that PR, I provide the URL https://github.com/MyFooGitHub/. Once the PR has been merged, we have created the namespace MyFooNamespace in the Julia General registry. A package can be registered in the MyFooNamespace namespace if and only if its URL begins with the string https://github.com/MyFooGitHub/.

To go into more detail, a namespace is a "thing" that is registered in the General registry and has the following properties, all of which are constants:

  • Name for that namespace.
  • UUID for that namespace.
  • URL prefix for that namespace.

For example, in my example, we might have a file in the General registry with filename /namespaces/MyFooNameSpace/Namespace.toml with the following contents:

name = "MyFooNamespace"
uuid = "00000000-0000-0000-0000-000000000000"
repo_url_prefix = "https://github.com/MyFooGitHub/"

Step 3: I register packages. For example, I create a package called CoolDilumStuff.jl. The URL for this package is https://github.com/MyFooGitHub/CoolDilumStuff.jl. I choose to register this package in the MyFooNamespace namespace. So the "full name" of my package is MyFooNamespace/CoolDilumStuff.

When Registrator makes the PR to register this package, it would, for example, create a file with filename /namespaces/MyFooNameSpace/C/CoolDilumStuff/Package.toml with the following contents:

name = "CoolDilumStuff"
uuid = "11111111-1111-1111-1111-111111111111"
repo = "https://github.com/MyFooGitHub/CoolDilumStuff.jl.git"
namespace_name = "MyFooNameSpace"
namespace_uuid = "00000000-0000-0000-0000-000000000000"

Step 4: I can install my package in Pkg. In the Pkg REPL mode, I would install with this:

] add MyFooNamespace/CoolDilumStuff

If I want to use the Pkg API, I would install with either this:

import Pkg
Pkg.add(Pkg.PackageSpec(namespace = "MyFooNamespace", name = "CoolDilumStuff"))

Or, equivalently, this:

using Pkg
Pkg.add(PackageSpec(namespace = "MyFooNamespace", name = "CoolDilumStuff"))

Step 5: After I install my package, I can import it by simply typing either this:

import CoolDilumStuff

Or this:

using CoolDilumStuff

This issue is kind of related to #1071. However, the proposal in this issue is a little different than what is proposed in #1071. If I understand correctly, #1071 does some automatic pattern-matching on the URL. In contrast, this issue explicitly adds full first-class support for official namespaces in registries.


This issue is not the same as #1064.

@DilumAluthge
Copy link
Member Author

Now here is the important part.

Suppose that somebody else creates a GitLab organization (technically GitLab calls them "groups" instead of "organizations") at the location gitlab.com/MyFooGitLab.

Now this person wants to register a new namespace in the Julia General registry that corresponds to their gitlab.com/MyFooGitLab GitLab organization. They would like to have the MyFooNamespace namespace. Unfortunately, they cannot have this namespace, because it has already been registered.

This is not a bad thing. This is just the same as when someone registers a package, and then later, someone else wants to register the same package name. For example, Fredrik has an excellent package with the name Literate. I cannot have that name. If I want to register a package, I have to pick a different name.

Same thing here. If the namespace name already exists, you must pick a different name for your namespace.

So let us return to the previous example. We have someone that has created the gitlab.com/MyFooGitLab GitLab organization. Now they want to create a namespace in the Julia General registry that corresponds to their gitlab.com/MyFooGitLab GitLab organization. They cannot have the MyFooNamespace namespace, because the MyFooNamespace namespace has already been registered.

So they have to pick a different name for their namespace. For example, they may choose to pick the name CoolFooStuff for their namespace. So they make a PR to register the CoolFooStuff namespace, with the restriction that all packages in the CoolFooStuff namespace must have URLs that begin with the string https://gitlab.com/MyFooGitLab/.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented May 24, 2020

In the above examples, I used different names for the GitHub/GitLab organizations and the namespaces. I did this to emphasize the difference between a GitHub/GitLab organization and the namespace in the Julia General registry. They are not the same thing, and they do not need to have different names.

Now, that being said, I think in the vast majority of cases, people will choose to have a namespace that is the same as (or very similar) to the name of their GitHub/GitLab/Bitbucket/etc organization.

For example, take @EricForgy's GitHub organization github.com/JuliaFinance. My guess is that for the namespace, he will choose to register the JuliaFinance namespace.

If he chooses JuliaFinance, then he can install his packages with ] add JuliaFinance/Positions and load them with import Positions and/or using Positions.

@EricForgy
Copy link

This would be beautiful ❤️ Thanks for spelling it out like that and getting the conversation started @DilumAluthge 🙌

@EricForgy
Copy link

For reference, in addition to the mentioned:

There is also:

@EricForgy
Copy link

I had a quick look at General and given the directory structure, it might make sense to allow one level of nesting for namespaces, e.g.

image

This seems minimally disruptive as a start.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented May 24, 2020

I was thinking this kind of layout:

Normal packages:

  • /F/Foo/Package.toml
  • /B/Bar/Package.toml

Namespaced packages:

  • /namespaces/JuliaFinance/F/Foo/Package.toml
  • /namespaces/JuliaFinance/B/Bar/Package.toml

And e.g. the information for a namespace would be here:

  • /namespaces/JuliaFinance/Namespace.toml

@DilumAluthge
Copy link
Member Author

For reference, in addition to the mentioned:

* #1064: [POC] Add scoping, e.g. @MyRegistry/MyPackage

* #1071: allow package to be disambiguated by `@user/package`

* #1791: Allow to register a registry in a registry

There is also:

* #1067: [RFC] Accommodate dependencies from other registries

* #1072: allow registries to depend on other registries

To elaborate, one of the major advantages that this proposal has over those other issues/PRs is that this proposal does not require any other registries. Everything is registered in the Julia General registry. In contrast, most or all of those other issues/PRs requires people to maintain their own Julia package registries.

@EricForgy
Copy link

I was thinking this kind of layout:

Normal packages:

/F/Foo/Package.toml
/B/Bar/Package.toml
Namespacedd packages:

namespaces/JuliaFinance/F/Foo/Package.toml
namespaces/JuliaFinance/B/Bar/Package.toml
And e.g. the information for a namespace would be here:

namespaces/JuliaFinance/Namespace.toml

That is probably better. Looking at, just as an example, JuliaFEM. That is a package, but it could also be a namespace. With your approach you can accommodate both 👍

@EricForgy
Copy link

EricForgy commented May 24, 2020

What about not having a separate namespaces directory, but split the difference:

/J/JuliaFinance/C/Currencies

with

/J/JuliaFinance/Namespace.toml

? 🤔

Edit: I'm not sure the second "letter" folder is necessary. It is unlikely an org is going to have enough packages to warrant that, but it is more future proof 🤔

@EricForgy
Copy link

EricForgy commented May 24, 2020

@fredrikekre
Copy link
Member

Why do we need namespaces when we have UUIDs?

@DilumAluthge
Copy link
Member Author

Why do we need namespaces when we have UUIDs?

I think that ] add Foo/Bar is much more user friendly than adding a package by UUID, right?

By that same argument, why do we have package names when we have package UUIDs? For user convenience.

@EricForgy
Copy link

Update:

I spent most of today looking into this. I have some upcoming high-profile projects that can potentially highlight JuliaFinance so I'd like to register some packages, but hope to get this namespace stuff worked out first.

A picture is worth a thousand words so...

image

I can now add packages with a namespace and it will split the namespace from the package name. I did end up adding a new field namespace to PackageSpec, but that shouldn't break anything.

At the moment, it just falls back to using the name to determine the UUID, but the idea is that if there are multiple packages with the same name, it can use the namespace, if any, to determine which one to use.

Next, on the registry side, I made some minor changes.

First, I moved all JuliaFinance packages currently registered with General, i.e.

  • Currencies.jl
  • CurrenciesBase.jl (which is actually archived)
  • BusinessDays.jl
  • DayCounts.jl

to J/JuliaFinance and changed Registry.toml accordingly, i.e.

0fd90b74-7c1f-579e-9252-02cd883047b9 = { name = "Currencies", path = "J/JuliaFinance/Currencies" }
44e31299-2c53-5a9b-9141-82aa45d7972f = { name = "DayCounts", path = "J/JuliaFinance//DayCounts" }
4f18b42c-503e-5345-9536-bb0f25fc7038 = { name = "BusinessDays", path = "J/JuliaFinance/BusinessDays" }
a33ca353-0707-5c2b-b398-646075a850cd = { name = "CurrenciesBase", path = "J/JuliaFinance/CurrenciesBase" }

It seems to be working as I hoped.

Until the tooling (Registrator, TagBot, etc) gets updated to handle namespaces, I don't mind doing this manually with PRs diretcly to General, but don't worry. There will not be so many updates.

There is certainly work to do with tooling, but that can come later I think. For example, Registrator could check a Namespace.toml before registering new packages, etc, but that is "nice to have" I think.

This will solve one of my major problems so I appreciate your consideration.

What do you think?

@EricForgy
Copy link

I started with the REPL, but now the API is working as well 👍

image

@fredrikekre
Copy link
Member

By that same argument, why do we have package names when we have package UUIDs? For user convenience.

I mean, adding by name is just a key to finding the UUID. We can add user/package as another key to that same UUID without having to mess with namespaces etc.

@EricForgy
Copy link

EricForgy commented May 25, 2020

By that same argument, why do we have package names when we have package UUIDs? For user convenience.

I mean, adding by name is just a key to finding the UUID. We can add user/package as another key to that same UUID without having to mess with namespaces etc.

For what it's worth, that is pretty much all I'm doing. It still uses name to find UUID, but if there is more than one UUID it will use the namespace before giving up. I'll try to submit a PR tomorrow. It is just a few lines added. Not major surgery by any means 🙏

Edit: I think I have an idea that is in the spirit of your comment and of #1071. I'll probably still need that namespace field, but then we can just compare that against the user / org info with no other changes. I think that is what you mean.

@DilumAluthge
Copy link
Member Author

We can add user/package as another key to that same UUID without having to mess with namespaces etc.

What do you mean when you say "user"?

@DilumAluthge
Copy link
Member Author

Also, if we add "user" as another key, then we have the issue where the package name by itself is not a good name.

One purpose of a namespace is that packagename by itself is not a good name for the General registry, but is a good name within the context of the namespace.

An example is the JuliaFinance package Positions.jl. We would not want to register a package in the General registry. It is too vague/ambiguous of a name.

But JuliaFinance/Positions is specific and unambiguous.

@fredrikekre
Copy link
Member

I just meant that we can try harder to "pattern match" against info that is in the registry. add fredrikekre/Literate could be made to work for example, it would be find from the repo URL.

An example is the JuliaFinance package Positions.jl. We would not want to register a package in the General registry. It is too vague/ambiguous of a name.

🤷 it is their loss really if they want to use such regular names. And from above, if it will be used in the code only by name, then what is the point of differentiating namespaces when installing?

@DilumAluthge
Copy link
Member Author

One of the big issues we have had in the General registry is that people want to register packages with names that in my opinion are not suitable for the registry.

For example, someone might make a package for doing expectation-maximization algorithms. They might want to name their package EM.jl. I would certainly object to this name. It is too ambiguous to add a package named EM to the General registry.

So what does that person do? The only way for them to get to keep the name EM.jl for their package is for them to maintain their own Julia package registry.

And this is how we ended up with multiple open source organizations maintaining their own registries. Examples include (but are not limited to) the BioJulia registry and the JuliaFinance registry.

But there are many ecosystem problems with organizations having their own registries. Just to give one example: NewPkgEval only tests packages in General. So packages in other registries are not tested.

So one of my main goals here is to give open source organizations the ability to have whatever package names they want, without polluting the "top-level namespace" of the General registry, and without having multiple registries.

So in my EM.jl example, maybe the EM.jl package is maintained by an organization called JuliaStatistics. So then they will register the package in the JuliaStatistics namespace. And then you can add the package by doing ] add JuliaStatistics/EM. But you cannot add the package by doing ] add EM.

And then later, someone else writes a package for solving problems in electricity and magnetism. They call this package EM.jl. The package is maintained by an organization called JuliaPhysics. Now they want to register their package EM.jl. No problem! They can register it in the JuliaPhysics namespace. And then you can add the package by doing ] add JuliaPhysics/EM. But you cannot add the package by doing ] add EM.

@DilumAluthge
Copy link
Member Author

I just meant that we can try harder to "pattern match" against info that is in the registry. add fredrikekre/Literate could be made to work for example, it would be find from the repo URL.

Personally I think this has the potential to get confusing. What if there is a GitHub user named johnsmith and a GitLab user named johnsmith, but those are two different humans. Now I do ] add johnsmith/Foo and I don't really know which johnsmith I will get.

But anyway, this sort of URL pattern matching doesn't get to one of the core issues, which is that we want to register names that do not pollute the top-level namespace of the General registry.

@DilumAluthge
Copy link
Member Author

🤷 it is their loss really if they want to use such regular names.

Actually I think it is our loss 😂 . Because as @StefanKarpinski said on Slack, if it isn't typo-squatting and it isn't offensive, then we let the package author choose the name. If the package author insists on a "bad" name, we still merge it.

So it's our loss in that we end up with these "bad" package names polluting the top-level namespace of the registry.

@fredrikekre
Copy link
Member

I don't understand these arguments. There will still be a package named EM in the registry. There will be a package you can load with using EM. Then, why not let users add it by add EM if that is unambiguous? Just seems unfriendly to users to not let them.

Personally I think this has the potential to get confusing. What if there is a GitHub user named johnsmith and a GitLab user named johnsmith, but those are two different humans. Now I do ] add johnsmith/Foo and I don't really know which johnsmith I will get.

Yes, because Pkg won't just blindly add a package if the specification is ambiguous.

So it's our loss in that we end up with these "bad" package names polluting the top-level namespace of the registry.

But how is that a loss for us as registry maintainers? It will just be a loss for the package maintainer in the sense that people will not find the package as easily.

@DilumAluthge
Copy link
Member Author

I don't understand these arguments. There will still be a package named EM in the registry. There will be a package you can load with using EM. Then, why not let users add it by add EM if that is unambiguous? Just seems unfriendly to users to not let them.

So that's fine for the first organization to register the name EM. But what if a second organization comes along and wants to register the name EM. They can't have it because it's taken. Wouldn't it be nice to let them register MyOrganization/EM?

@DilumAluthge
Copy link
Member Author

Last I checked, we don't actually allow two packages in the General registry to have the same name. Am I mistaken?

@EricForgy
Copy link

Thanks @DilumAluthge and thanks @fredrikekre for thoughts and feedback 🙌

From the above, I can tell the issue is understood and hope it can be taken seriously (I haven't registered a package on General for ages because of it). Naming is important. A name that isn't appropriate in a general context could be perfectly appropriate in the context of a namespace. So I hope we are past the question of whether disambiguating is a good thing and focus on how to do it 🙏

In my first attempt at namespacing (#1064), I went too far. The namespace actually pointed to a different registry. I can still imagine a time when we could allow registries to depend on other registries, e.g. #1072, #1791, but that is a bigger question than we are addressing with this issue. In this issue, we are trying to allow namespacing within one registry, namely General.

If I understand, Stefan and Frederik are both suggesting disambiguating along the lines of #1071. For example, we already know the repo url which is typically of the form:

https://githostingservice/UserOrOrgName/PackageName.jl

so we can just parse that if we need a tie breaker on the name. For example,

pkg> add OrgName/Package.jl

and it can match that against the repo URL. For sure, that is possible, but what Dilum (and I) are proposing here is a little more robust, while at the same time, not requiring much more effort.

As a concrete example, the next two packages I would want to register on General are:

  • Instruments.jl
  • Positions.jl

I imagine getting pushback on both the names because they are too generic for General, yet are perfectly reasonable for JuliaFinance (and I really do not want to preface the package names with "Financial***".

Frederik, you make a good point. Just because a package can be disamguitated with a namespace shouldn't mean that you can't add it without the namespace if it isn't ambiguous. In my draft PR, it accomodates both. For example, another package I might register is called XBRL.jl. There is no other name I would consider for such a package and it is clear within the context of JuliaFinance what it is. Nevertheless, I think it is unlikely anyone outside JuliaFinance would register a package called XBRL, so we should be able to simply do:

pkg> add XBRL

even though

pkg> add JuliaFinance/XBRL

is also fine. I'm onboard with this idea 👍

In this way, if there is a clash, instead of having an interactive selection, we can simply suggest something along the lines of:

Error: The name `FX` is ambiguous. Please add with either:

pkg> add Animators/FX

or

pkg> add JuliaFinance/FX

Whether we parse the repo URL or add a more robust namespace, I lean toward namespace, but I see that it can be done with a similar amount of (little) effort. It is almost finished already 😊

I'll try to get a PR up later today and that will make it easier to pick at 🙏

@DilumAluthge
Copy link
Member Author

So, I tried to register a package named Literate.jl and I got the error "changing UUIDs is not allowed."

https://github.com/DilumAluthge/Literate.jl/commit/1f2d06300df93e113dcac5795c9a1cba90018f27

So we currently do not allow you to register a package with the same name as an existing registered package.

@DilumAluthge
Copy link
Member Author

So then how does pattern matching the URLs help anything?

If someone has registered a package named EM, then no one else can register another package named EM.

@DilumAluthge
Copy link
Member Author

Personally, I think it would be SO WEIRD if on Monday, this works:
] add Flux
And then on Tuesday, the exact same command gives an ambiguity message.
It just seems to me like a weird thing to do.

True. So in the README, they can give instructions to do

pkg> add FluxML/Flux

if someone else registers a package named "Flux" or the new package could specify in their README to use:

pkg> add Electromagnetics/Flux

to their README and FLuxML would be unaffected. These are good points, but ultimately solveable with some though 👍

But that is not correct. If someone registers a package with the name Flux, then FluxML will be affected no matter what. As soon as there are two packages with the name Flux, then ] add Flux will never ever work.

@DilumAluthge
Copy link
Member Author

I just don't understand why having two packages with the same name is somehow more desirable than just having proper namespaces.

@EricForgy
Copy link

Maybe if someone tries to register a package with the same name as an already registered package, they are required to use namespace and then it is their problem if their users pull the wrong package. They need to specify clear instructions to use namespace when adding.

I just don't understand why having two packages with the same name is somehow more desirable than just having proper namespaces.

I don't see these as mutually exclusive. You can have both. Best if I can get a PR up. You will see what I mean.

@DilumAluthge
Copy link
Member Author

Think about how many people have something like Pkg.add("Documenter") in their .travis.yml.

If I register my own package with the name Documenter, then I will break the CI and docs build for every single one of those packages.

@EricForgy
Copy link

Think about how many people have something like Pkg.add("Documenter") in their .travis.yml.

If I register my own package with the name Documenter, then I will break the CI and docs build for every single one of those packages.

There is no chance we would allow that to happen 👍

@DilumAluthge
Copy link
Member Author

Think about how many people have something like Pkg.add("Documenter") in their .travis.yml.
If I register my own package with the name Documenter, then I will break the CI and docs build for every single one of those packages.

There is no chance we would allow that to happen 👍

But if we allow multiple packages with the same name, by definition we would allow that to happen.

@DilumAluthge
Copy link
Member Author

Maybe if someone tries to register a package with the same name as an already registered package, they are required to use namespace

This is exactly what I want.

But that's not the solution that e.g. @fredrikekre is suggesting.

@EricForgy
Copy link

But if we allow multiple packages with the same name, by definition we would allow that to happen.

Not necessarily. I will show you. Be patient 🙏

@DilumAluthge
Copy link
Member Author

But if we allow multiple packages with the same name, by definition we would allow that to happen.

Not necessarily. I will show you. Be patient 🙏

Alright, I'll wait for your PR!

@EricForgy
Copy link

Maybe if someone tries to register a package with the same name as an already registered package, they are required to use namespace

This is exactly what I want.

But that's not the solution that e.g. @fredrikekre is suggesting.

We can have both. I agree with both of you 👍

@GunnarFarneback
Copy link
Contributor

I'm not particularly interested in this subject but since I got dragged into it...

If we don't get support for namespaces as discussed in this issue, then someone needs to add > support for "multiple packages with the same name" to things like:
...
LocalRegistry (cc @GunnarFarneback)

If people can't organize their own local registries with unique package names, I suspect they have bigger problems than the tooling.

@DilumAluthge
Copy link
Member Author

DilumAluthge commented May 25, 2020

I'm not particularly interested in this subject but since I got dragged into it...

If we don't get support for namespaces as discussed in this issue, then someone needs to add > support for "multiple packages with the same name" to things like:
...
LocalRegistry (cc @GunnarFarneback)

If people can't organize their own local registries with unique package names, I suspect they have bigger problems than the tooling.

That's a great point. There isn't any reason we need to have multiple packages with the same name in local (private) registries.

So I don't think we need to add support for multiple packages with the same name to LocalRegistry.jl.

@fredrikekre
Copy link
Member

So that's fine for the first organization to register the name EM. But what if a second organization comes along and wants to register the name EM. They can't have it because it's taken.

But thats why we have UUIDs, I don't think we need to add another layer of disambiguation.

Last I checked, we don't actually allow two packages in the General registry to have the same name. Am I mistaken?

I don't think there are any decisions since this has not come up before.

As a concrete example, the next two packages I would want to register on General are:

Instruments.jl
Positions.jl

I imagine getting pushback on both the names because they are too generic for General, yet are perfectly reasonable for JuliaFinance (and I really do not want to preface the package names with "Financial***".

People can use whatever names they want -- pushback are usually just suggestions. However, I would argue that naming something Instruments is shooting yourself in the foot, and FinancialInstruments is 100 times better.

In this way, if there is a clash, instead of having an interactive selection, we can simply suggest something along the lines of:

You already get prompted in case of multiple packages.

So, I tried to register a package named Literate.jl and I got the error "changing UUIDs is not allowed."

If someone has registered a package named EM, then no one else can register another package named EM.

That is a limitation/bug of Registrator.

otherwise we'd need things like C/Currencies1, C/Currencies2, which obviously sucks

Why does that suck? This is never seen by users.

Personally I don't want the General registry to have multiple packages with the same name. I think that is confusing. I would rather have namespaces.

It will be the same level of confusion IMO, there will still be multiple packages with the same name.

Personally, I think it would be SO WEIRD if on Monday, this works:

Why? This is an interactive REPL and you will just have to select the package you want.

So in the README, they can give instructions to do

If you are worried about this your instructions should be

using Pkg
Pkg.add(name="Literate", uuid = "...")

Think about how many people have something like Pkg.add("Documenter") in their .travis.yml.

Well, that is not the recommended approach so hopefully people don't do this. It is recommended to use a project file where both name + uuid are already specified so it will not cause any problems.

@felipenoris
Copy link
Contributor

felipenoris commented May 29, 2020

Just to add my opinion, this feature is quite useful, but does not replace the proposition at #1791. Both propositions could be combined.

This proposition #1836 would allow for a better organization of packages. Many Julia organizations could file for a NameSpace. This would solve the "package naming" issue, where, for instance, it would be silly to register "Countries.jl" from "JuliaFinance" directly into the general registry. But, all packages inside "JuliaFinance" NameSpace would still need to pass the "General" registry governance: approvals and name conventions (maybe).

If this gets combined with #1791, a registry repo could be registered into the General registry as a NameSpace. Why this is useful? In one word: Governance. If, for instance, the registry repo at "JuliaFinance" gets registered into General, then we don't need to register every single package under "JuliaFinance" into the "General" registry. Also, "JuliaFinance" owners would be responsible for approvals when updating the "JuliaFinance" registry repo after the NameSpace gets approved by General repo owners. This is the value I see in the proposal at #1791. But this could be done as a second step after this issue #1836.

@EricForgy
Copy link

Hi @felipenoris 👋

Thanks for your thoughts on the subject 🙌

Just to add my opinion, this feature is quite useful, but does not replace the proposition at #1791. Both propositions could be combined.

I got pulled into some other things that are higher priority at the moment, but yes, the idea I have in mind combines elements of both #1791 and this issue. I am cautiously optimistic, but think having a PR to reference will aid the conversation, but it might take me a few days 🙏

@StefanKarpinski
Copy link
Sponsor Member

So the idea is to use #1791 to sidestep governance issues? I don’t think that makes sense from any perspective. Technically, it feels a bit silly: do we really need yet another layer of indirection? What’s next? A registry of registries of registries? But from a governance perspective it also doesn’t make sense: if something is visible by default with a standard Julia install, then the Julia project must have governance over it. Having the technical ability to make other registries visible from General is not the issue. If we had such a feature, then the maintainers of Julia itself would have the same responsibility to users to make sure that packages in those available-by-default registries meet the same (lightweight) standards as General. So either Julia’s maintainers have enough control over those external registries to ensure it is also reasonably safe and correct — in which case why bother with a separate registry? — or we lack such control and it’s totally irresponsible to delegate a huge portion of the default namespace to external groups without being able to ensure that.

In short, hard no on #1791.

Namespaces, on the other hand, seem plausible since the argument that you want to have Instruments mean something specific in a financial context without having to prefix with “Financial” all the time seems reasonable.

I do, however, think the focus here on how to organize the registry and file names is misplaced. Worry about that last—it doesn’t even matter from Pkg’s perspective. It’s only a problem for the tooling that manages registries. Focus instead on how these namespaces are supposed to work from a UI perspective and figure out a clear semantics for them.

Before getting general buy in for semantics and UI, doing all the work to implement a PR seems premature. It is, however, a fine way to explore the UI if you’re ok with the possibility that you might need to throw away that work if there isn’t general agreement on the approach.

@EricForgy
Copy link

EricForgy commented May 30, 2020

Thanks for chiming in @StefanKarpinski . Your opinion obviously matters so I'm glad you're here.

First comment, I think you know Felipe and I both work in finance. Finance is a highly regulated industry. If Felipe and I are talking about registries depending on other registries, rest assured, the reason is not to loosen governance. If anything, it is to have better governance, but that is not the point of this issue. This issue has a much cleaner focus, i.e. have first-class support for namespaces in registries with a focus on General.

Like Felipe, I once thought having a separate registry would be good for JuliaFinance and having it play nice with General would be a win win, so as an initial step, I implemented a PR with namespaces in #1064. That was pretty trivial. Since we keep an array with all registries in .julia/registries, the namespace simply told you which one to use for finding the correct UUID. That experience was, unfortunately, a bit unpleasant for me and I haven't registered anything on General since then (and had no plans to ever do so again until some others in the community needed some functionality and, for their project, it would be better to have things in General) so I'm back trying to be productive.

With feedback from others (thanks @alecloudenback and @ScottPJones), I've come around and now I want JuliaFinance packages in General, but only once we have namespaces for reasons mentioned and I think you understand.

One thing that is different now is that we have things like LocalRegistry.jl which seems to make it easier to manage local registries than last time I tried it.

Before getting general buy in for semantics and UI, doing all the work to implement a PR seems premature.

Sure. So my idea starts with @DilumAluthge's idea of introducing a new namespaces folder to General, but the trick that I am cautiously optimistic about that I think can make this work fairly painlessly is that this folder is actually a folder of complete local registries (contents and all - not just a UUIDs).

When Pkg updates, it loads General from .julia/registries but also looks for any additional registries also in .julia/registries. The idea is to extend this search by having it also look in .julia/registries/General/namespaces and push these to that same vector one by one. At that point, we can use my trick in #1064 to simply use the namespace to select which registry in that vector to use for finding the UUID. The difference is that some of the registeries are actually registered / embedded directly into General so that other packages in General can depend on them and they are subject to the same governance. I see this as a win win.

This brings up a related issue. There are already a ton of packages already registered on General and these should have priority. So for this, I suggest that registering on General is staking claim to that unqualified name. So if I want to register Flux.jl for my electromagnetics lab, the unqualified name Flux is already claimed by the FluxML organization, so I must qualify it via

pkg> add MaxwellLaboratory/Flux

In other words, I suggest that "no namespace" is a namespace and no two packages can have the same name in the same namespace.

If

pkg> add Flux

works today, it should work always. If I want to register a new Flux.jl on General, I can do that, but anyone who wants to use my Flux will need to qualify the name with my registered namespace since Flux is already taken in the "no namespace" namespace.

"Registering" a namespace on General would then mostly just mean having a full local subregistry under its namespaces folder. PRs to any subregistry would be reviewed with the same level of governance as we have today since you control General.

An added benefit of this is that it will play nicely with local private registries. Currently, we support private registeries, but your local registry cannot have a package with the same name as a package registered on General (without interaction which won't play nicely with containers, etc.). So if I have a private registry and name my package Foo and someone comes along later and registers Foo on General, I will have some problem. This solves that issue for private registries too because now I can qualify with my private registry name:

pkg> add SecretLab/Flux

and it will know to use my private registry (and we can add warnings etc if needed for namesapces taking you out of General). I don't want to get too distracted, but this could be completely recursive allowing for things like:

pkg> add Microsoft/Azure/AzureFunctions/Containers

which could be attractive for large corporations.

I would like to register JuliaFinance stuff on General, but I need namespaces for that. I've already worked out a draft PR to grab the namespace from both the Pkg REPL and the Pkg API and it is non-breaking. Next, I would combine that with #1064 , but tell Pkg to look not only in .julia/registries but also in .julia/registries/General/namespaces.

If we can get behind this idea of namespaces AND the rule that no two packages can have the same name in the same namespace where the default "no namespace" is considered a namespace, then that can simplify some things in Pkg and I'm happy to do the heavy lifting to get that to work since this is so important to me.

Thanks again for your consideration 🙏

@felipenoris
Copy link
Contributor

felipenoris commented May 30, 2020

@StefanKarpinski, thanks for your reply. I respect your view and I was happy to close #1791 given that the namespace feature got better acceptance. I just would like to point out that it is a strong statement to associate my proposal with "sidestep governance". For me it is more of a delegation, which often occurs when you organize people into groups of interest or domains. We could debate and disagree on whether it is silly or technically inferior to namespacing and I get that. But all ideas we discuss are for the better of this community we care so much about, and not to damage the community.

@StefanKarpinski
Copy link
Sponsor Member

Perhaps I misunderstood when you wrote:

Why this is useful? In one word: Governance. If, for instance, the registry repo at "JuliaFinance" gets registered into General, then we don't need to register every single package under "JuliaFinance" into the "General" registry. Also, "JuliaFinance" owners would be responsible for approvals when updating the "JuliaFinance" registry repo after the NameSpace gets approved by General repo owners.

To me that sounds as though, when JuliaFinance is registered with General, General loses governance over its own namespace, ceding control over a portion of it to JuliaFinance. If registries that are registered in General are also managed by the maintainers of the Julia project as a whole, then delegating like this would be ok, since it would have no effect, but having Julia maintainers manage more registries seems like more work, not to mention complexity, to what end?

If the goal is for checks on JuliaFinance packages to be more stringent than on General, then that's a viable position, but I think that would be better served by having a system where various entities can perform different kinds of checks and vouch for arbitrary subsets of a registry, via some form of trust metadata. In other words, it's an orthogonal concern to splitting up registries.

@EricForgy
Copy link

Please 🙏 This proposal has nothing to do with governance. It is about namespaces. I would like to register packages on General, but can only do so with names that make sense once we have namespaces so I hope to be proactive to help that become a reality. I felt we had some momentum.

I did spend some time both writing code to make sure it makes sense and explaining it the best I could. I think what I describe is a clean solution to namespaces that leverage recent developments for managing local registries. If you have feedback on the proposal, that would be great and I'm happy to incorporate better ideas and I'm happy to put a PR together. What do you think? Should I drop it? Does this issue have a chance?

@DilumAluthge
Copy link
Member Author

Think about how many people have something like Pkg.add("Documenter") in their .travis.yml.

Well, that is not the recommended approach so hopefully people don't do this. It is recommended to use a project file where both name + uuid are already specified so it will not cause any problems.

Okay, how about this:

Pkg.add("Coverage")

Everyone has this in their .travis.yml files. Are we really going to tell everyone that they need it change all of their Travis files?

@EricForgy
Copy link

In my proposal, no one will need to change any .travis.yml files.

@StefanKarpinski
Copy link
Sponsor Member

Yeah, sorry. Didn’t mean to sidetrack this, I was just addressing the part where #1791 got brought into it and then replying to responses.

@EricForgy
Copy link

If I could sum up my proposal (TL;DR), I'd say it boils down to:

  • Adding namespace to PackageSpec, which is non-breaking because PackageSpec is built with @kwdef (done ✔️)
  • Adding some code to splice the command to pull out namespace from add Namespace/Package (done ✔️)
  • Adding namespaces directory to General that will house subregistries (not yet done ❌)
  • Adding any registries found in .julia/registries/General/namespaces to the existing array of registries obtained from .julia/registries (not yet done ❌)
  • Using namespace to grab just the corresponding embedded subregistry and use that to find the UUID similar in nature to [POC] Add scoping, e.g. @MyRegistry/MyPackage #1064 , but staying with General with no external registries (not yet done ❌)

I am thinking we can point LocalRegistry.jl to .julia/registries/General/namespaces instead of .julia/registries so we can leverage that tool to manage the subregistries until the rest of the tooling catches up. Then submit PRs manually to General as a stopgap.

@fredrikekre
Copy link
Member

Currently, we support private registeries, but your local registry cannot have a package with the same name as a package registered on General (without interaction which won't play nicely with containers, etc.). So if I have a private registry and name my package Foo and someone comes along later and registers Foo on General, I will have some problem.

What? Then you don't do things correctly.

Okay, how about this:

Pkg.add("Coverage")

Everyone has this in their .travis.yml files. Are we really going to tell everyone that they need it change all of their Travis files?

Yes.


We already have a way to disambiguate packages, and that is by UUID. I don't understand why we need a second way to "kinda specify" the package when we have an absolute way of doing it.

In particular, what happens when I want to use the namespace JuliaFinance, should we add yet another indirection to disambiguate namespaces?

@EricForgy
Copy link

EricForgy commented May 31, 2020

I don't understand why we need a second way to "kinda specify" the package when we have an absolute way of doing it.

Here, you also asked:

Why do we need namespaces when we have UUIDs?

DIlum responded here:

I think that ] add Foo/Bar is much more user friendly than adding a package by UUID, right?

By that same argument, why do we have package names when we have package UUIDs? For user convenience.

You replied:

I mean, adding by name is just a key to finding the UUID. We can add user/package as another key to that same UUID without having to mess with namespaces etc.

Then we danced around a bit until Stefan said:

Namespaces, on the other hand, seem plausible since the argument that you want to have Instruments mean something specific in a financial context without having to prefix with “Financial” all the time seems reasonable.

This ☝️ is the answer to your question. Please accept this answer 🙏

I hope we can get past the question of whether namespaces are useful (they are) and start focusing on whether and, if so, how to implement them.

I see two options:

Option 1

Implement #1071. This allows multiple packages with the same name to be registered in General and we disambiguate them based on "user / org" which is already available in the registry.

This would be ok, but it isn't obvious to me how to protect packages that were already registered that should somehow, I think, be given priority. In my example above, if I register Flux under the MaxwellLaboratory organization, now everyone who uses Flux for machine learning will suddenly be required to disambiguate. The timing of these breaks would be unpredictable and could conceivably happen during critical times, e.g. right in the middle of someone giving a demo. It also isn't obvious to me that this would be less work to implement than adding first-class support for namespaces.

If Option 1 is what we want, we can make that work and I'm happy to help implement that.

Option 2

Implement this issue, i.e. first-class support for namespaces.

As I envision it, this can be implemented almost entirely using existing tools. In this approach, a namespace would be a full subregistry embedded into General under a namespaces folder. These subregistries could be managed by "owners" using tools such as LocalRegistry.jl today until tools like Registrator can be updated to handle these subregistries. Revisions to subregistries would be submitted as PRs to General and reviewed as usual with the same (and possibly more) governance.

Implementing first-class support for namespaces this way comes with multiple advantage. For one, once we have proper first-class namespaces, we can introduce the reasonable rule:

Package names should be unique within a given namespace.

This would be a non-breaking rule (since there are currently no duplicate package names in General). This would also protect package maintainers who have already registered package names in General as it ensures that

pkg> add Flux

will always and forever grab the FluxML version without the need to disambiguate because they are already registered in the default "no namespace" namespace of General.

The rule would also lead to a significant simplification in the implementation of Pkg. I can almost guarantee a reduction in lines of code with minimal changes to remaining code.

Decision needed

The first thing we need to decide is:

Do we want namespace?

If it is a flat out "No", then we can stop the discussion now and I'll try to find another way to proceed with JuliaFinance outside of General, which is not the end of the world, but unfortunate (for all of us I think).

If the answer is a "Maybe, it depends on the implementation", then I am happy to implement a PR for review myself.

Then the question becomes, "Which option do we want to implement?"

My preference would be to implement Option 2: first-class support for namespaces and I am happy to submit a working PR, but I'm also happy to help with Option 1 if we decide to go that way.

From a user (and package maintainer) perspective, Option 2 shoud introduce absolutely no change for existing packages in General. This would only affect package maintainers who want to register packages in General with the same name as existing packages and / or who may prefer namespaces for other reasons such as branding, etc. In these cases, the package maintainer would need to register a namespace (i.e. submit a PR adding a subregistry to the namesapces folder of General) and then users of the new package would be required to disambiguate using the namespace to protect the existing package with the same name.

@StefanKarpinski, at the end of the day, I think this is your call, so please consult with anyone you think should have a say on this and let me know what you think. I have some projects going on and need to make some decisions how to proceed 🙏

Edit: Btw, on Slack, Stefan mentioned the possibility of introducing this as an experimental feature, which would be totally fine with me. More than fine. It would be awesome 👍

@StefanKarpinski
Copy link
Sponsor Member

Some notes from discussion of this issue on the pkg-dev call today. Namespacing mechanisms are fundamentally about preventing/disambiguating name collisions. However, to a large extent Pkg already handles name collisions gracefully because of UUIDs:

  • It's perfectly fine to have packages with the same name in different registries. This already works: if there's an Instruments package in General that's about music and an Instruments package in JuliaFinance too and you have both registries added, and you do pkg> add Instruments it will prompt you for which one you want to add.

  • If you don't want an interactive prompt, you can use a add Instruments=55efc701-121b-4840-9663-6fa785ef03be to add a package by its UUID, completely eliminating any ambiguity.

There are only two places where any kind of further disambiguation might be helpful:

  • When doing add Instruments, it might be nicer to write add Finance/Instruments instead of having to spell out a long, unmemorable UUID.

  • Someone could write a project that needs to use both Finance/Instruments and Music/Instruments, which we currently cannot handle.

One solution to the second problem might be to implement JuliaLang/julia#33047 and decouple the name that a project refers to a dependency by from its canonical name. Then, when someone tries to add the second Instruments package to the project file, they would be prompted for an alternate name for it, or maybe we could add pkg add Finance/Instruments as FinanceInstruments. But that's a bit unsatisfying because Finance/Instruments is already a good unambiguous name. using Finance/Instruments is not a syntax error, but it's unlikely enough to have been used that I think we can use it if we want to.

Another issue that was discussed was whether disambiguates should be hierarchical (namespaces) or more like tags/labels/keywords/categories. Recently, there was some discussion of whether some packages belonged in the JuliaIO or JuliaData organizations. So there's often some ambiguity in these things. Why not allow both? Some packages may be relevant to both finance and economics. Why not allow installing them by add Finance/Options or add Economics/Options? Then this would just be a matter of adding support for a categories keyword in the registry for packages and allowing packages to be disambiguated by category. At that point, there's really reason not to also allow add @JuliaFinance/Instruments and using the GitHub user name as a kind of ad hoc category.

Another use case would be tagging packages that are useful for CI with the CI category and then having a preference system for categories and setting CI as the preferred category when running CI jobs. That would allow us to keep old Travis jobs working but running them with the CI category as preferred as long as we categorize all those packages as being in the CI category.

If we went with a non-hierarchical category system instead of a hierarchical namespace system, then the categories would be ordered with the earlier ones taking precedence. If you install both Music/Instruments and Finance/Instruments in a single project, the deps section could look like this:

[deps]
"Finance/Instruments" = "55efc701-121b-4840-9663-6fa785ef03be"
"Music/Instruments" = "be17f318-eccd-4180-afc1-ec5a11373fef"

The usage would look like using Finance/Instruments and using Music/Instruments. One issue that comes up then is that while the desired category could be derived from the add Finance/Instruments syntax, what if another Instruments package had already been installed? Do we prompt the user for what to call it? Maybe we can just leave it as-is since Finance/Instruments is distinct from Instruments. Or we could automatically prefix it with the first category it belongs to which unambiguously identifies it.

@EricForgy
Copy link

Thanks for the update Stefan 👍

It sounds like the discussion brought out some good ideas to think about. I appreciate everyone taking the subject seriously and working together to find a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants