Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration with a registry site #283

Closed
watzon opened this issue Jun 19, 2019 · 52 comments
Closed

Integration with a registry site #283

watzon opened this issue Jun 19, 2019 · 52 comments

Comments

@watzon
Copy link

watzon commented Jun 19, 2019

I've started working on a project which I think could benefit the Crystal community, a real registry for shards. Notice I didn't say repository, because that is both expensive and very hard to pull off, especially given my limited time, but I have been working on a shards registry which would allow people to:

  • register a shard
  • tie that shard to a github (and eventually gitlab) repository (which is where the shard will still be pulled from)
  • keep track of versions, downloads, dependencies, etc
  • search for shards
  • possibly more features in the future

I have taken inspiration from rubygems.org and npmjs.com for the design and the functionality. To start with, my idea was that we could allow dependencies to be listed in shard.yml like this

dependencies:
  cadmium: ~> 0.1.0
  apatite:
    github: watzon/apatite

Basically if the value of the dependency is a String you'd fetch it from the registry (and by fetch I really mean you'd send an API request to the registry which would return the github info for that repo), otherwise you'd continue to handle requests for specific git repositories as normal.

I am definitely open to suggestions and can add up to 3 people to the repo if anyone wants to run it locally or help out. Here are some screenshots of the current design.

image

image

image

image

Hopefully I'm not pushing any boundaries or being too presumptuous here. I just love Crystal and want to give back to the community. Thanks!

@watzon watzon changed the title Integration with a directory site Integration with a registry site Jun 19, 2019
@bcardiff
Copy link
Member

Hi @watzon, there is also https://crystalshards.xyz/ (which is a fork of https://crystalshards.herokuapp.com/) listed in https://crystal-lang.org/community/ .

I don't think you are pushing any boundaries here.
If you are looking for feedback of the site directory maybe the forum is a better place to initiate that conversation.
Shards intent to be decentralized, so I fail to see what are you thinking an integration could look like.

I hope the site is written in Crystal 😎 .

@watzon
Copy link
Author

watzon commented Jun 19, 2019

I know about crystslshards, but my goal with this is twofold. I want a centralized registry where we can keep track of shards with individual names, keep track of metrics such as downloads, etc and do it in such a way that it can be easily integrated directly into shards without requiring any real infrastructure changes.

The benefit of this is

  • there would be one place to search for shards
  • each registered shard is tied directly to a specific release, which means you have to have a realase to publish a shard

Also yes, it's definitely written in Crystal. Using Lucky.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

keep track of metrics such as downloads

from people who happen to not have boycotted the project.

What do I see:

  • Benefits:
    • Popularity contest helps to choose shards.
  • Downsides:
    • Centralization.
    • Need manual steps when publishing.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

I just had a thought, perhaps such a system could work without any change to shard.yml files but just by adding telemetry to shards project, but keep using the URL as the unique identifier (which is what it literally is)

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@oprypin I feel like there's a couple more upsides:

  • all shards are actually shards, not just any project with a shard.yml or however crystalshards is doing it right now.
  • reduces confusion
  • allows for additional metadata (such as platform-specificity)
  • could provide other info about a shard by parsing the shard.yml, such as dependencies

What do you mean by adding telemetry to the shards project?

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

actually shards

What does that even mean?

allows for additional metadata

You can put any needed metadata into text / yaml files.

reduces confusion

How exactly?


adding telemetry to the shards project

= Whenever shards runs, upload the actions that it took to a statistics server.

This is specifically to keep track of download count, because I think it's the only problem of the ones listed here that can't be solved with the approach taken by https://crystalshards.xyz

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

And if you wanted to instead just make a "better crystalshards", that would be very nice.
I do think that almost all of the benefits listed here can be achieved without the downsides of a centralized server that everyone must agree to use.

@watzon
Copy link
Author

watzon commented Jun 19, 2019

What does that even mean?

With crystalshards' fuzzing approach not all the "shards" are actual shards. Other repos get listed just because they give the appearance of being a shard.

I'm not saying that a lot of this stuff can't be done with shards and a completely decentralized approach, but this isn't really an attempt to completely centralize shards either. I'm not advocating that we get rid of github/gitlab integration, but just that another provider be added. One that could provide valuable statistics about shards, and one that could allow for organizations to create private shards (which would use the github api to link to private repos).

Getting private organizations interested could equal sponsorships which would in turn mean more funding for the development of Crystal. But that's long term goals. Obviously Crystal has to hit v1 before most organizations will even consider touching it.

@watzon
Copy link
Author

watzon commented Jun 19, 2019

Also tbh it doesn't really necessitate any changes to how shard.yml is structured. It would be pretty easy to have shards check to see if a listed github repo is a registered shard, if it is then fetch it through the registry, otherwise just go to github.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

With crystalshards' fuzzing approach not all the "shards" are actual shards. Other repos get listed just because they give the appearance of being a shard.

Well yeah, that may indeed be a problem with it. Just one small problem, though. Could instead just try to mitigate it.

Also tbh it doesn't really necessitate any changes to how shard.yml is structured

I suppose it doesn't, yeah. Just that this was presented as the main part of this project.

another provider be added. One that could provide valuable statistics about shards, and one that could allow for organizations to create private shards (which would use the github api to link to private repos).

This can be done without a new provider though
e.g. GitHub provides download statistics already

@asterite
Copy link
Member

If we want a central repository without having to host something we could do what Julia does: https://github.com/JuliaRegistries/General

Basically, have a YAML or JSON file in one github repository with all the registry. The repo seems to also record dependencies of each package. Then shards could inspect this registry... somehow...

Just an idea.

@z64
Copy link

z64 commented Jun 19, 2019

Rust does something similar: https://github.com/rust-lang/crates.io-index

In comparison to Julia, instead of a couple of files for a package, it has a single file with JSON serialized forms of their Cargo.toml (== shard.yml) for each release.

ex: https://github.com/rust-lang/crates.io-index/blob/master/se/re/serenity

@jkthorne
Copy link

I really like this project where is it deployed?

I love the idea of decentralized repositories and feel like npm is a great example of the downsides of centralization.
Here is a good video from a former lead of the project: https://youtu.be/MO8hZlgK5zc
Here is the alternative to npm that was discuses in the video: https://github.com/entropic-dev/entropic

However I think this project would be a great place for searching, evaluated information of shards, and the thing I want most tagging and discoverability.

I would love to see features "more like this" and "plugins here".
I hope this develops further and if you could post a link I would love to check it out.

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@wontruefree currently it's in a private repo and not hosted yet as I've still got a lot of work to do. I'll take a look at those videos though and maybe that will influence my thinking a bit.

@didactic-drunk
Copy link

didactic-drunk commented Jun 19, 2019

The common ground from this discussion seems to be a centralized search like crystalshards with a package registry that's not reliant on github and shows additional statistics with build status.

Goals:

  • Comprehensive single search website with all shards regardless of provider.
  • Telemetry - Download statistics.
  • Crystal compatibility status.
  • Dependency mapping.
  • Other features?

This can be accomplished by:

  • Keeping shards as is with additional telemetry to track downloads and versions used.
  • Providing a website with a registry for additional packages and showing additional data such as build status and crystal compatibility.
  • Possibly periodically populating the database with existing crystal shards until ownership is taken or providing a combined local database + github search.

Did I miss anything?

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@didactic-drunk seems about right

@ysbaddaden
Copy link
Contributor

Private shards are already supported using git: git@github.com/some/private.git for example. No need to setup a private registry or anything: it just works.

I'm strongly against telemetry. It's always totally irrelevant to the actual app, and a privacy issue. I won't add any calls to report downloads to some external website, be it the shards index. Not gonna happen.

I don't think we need a registry.

If decentralised, you must download the whole database locally, which means you must download it all (it can grow big) with the whole history (updating a shallow clone doesn't work well on github) then have to keep it updated...

For private shards, you suddenly need to setup a private registry (not nice) or Shards must now deal with different ways to build its dependency graph (increased complexity).

The benefit, thought, is that Shards could ask the registry for the dependency graph, but then the registry must build it, for everyone, all the time, which can consume lost of server resources, and... what if the registry fails?

That being said, your registry looks nice. Having a central place to search for shards, with published shards by their author (not scrapped), that would display the README, the available versions with their release date, the list of required dependencies, the dependent shards, all tightly coupled with GitHub (so it's updated when a tag is created) and others, would be very nice.

Add a tiny API to search for shards, and one to get infos on a shard, and I'll gladly integrate them as search and info commands.

@ysbaddaden
Copy link
Contributor

Note that a registry must only ever contain libraries released by their authors. Having a repository with a shard.yml for an experiment (just for fun) is totally different from publishing a library to be used by others.

@watzon
Copy link
Author

watzon commented Jun 19, 2019

Note that a registry must only ever contain libraries released by their authors. Having a repository with a shard.yml for an experiment (just for fun) is totally different from publishing a library to be used by others.

I agree 100%. That's one of my issues with the crystalshards approach. Don't get me wrong, I love it and use it all the time, but it is full of "shards" that aren't really shards.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

But then you might never get the critical mass of repos in your registry

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@oprypin you mean like NPM? Lol

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

No, "critical mass" doesn't mean "too many", it means "enough" - enough not to languish from the get-go, because who needs a registry that doesn't have any packages that one would be looking for. That becomes a negative feedback loop.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

Talking about this

a registry must only ever contain libraries released by their authors.

Nim language got a central registry early on and even they couldn't afford to be that strict. Instead they allowed anyone to register a package but have a reviewer to check if it makes sense

@watzon
Copy link
Author

watzon commented Jun 19, 2019

Well here's my idea to prevent that. Every shard that is registered must have a release associated with it, that is the only real requirement. That and names must be unique. I'll have a job that runs automatically every so often and scours Github for crystal repositories and then creates a shard record for each unique repo that:

  • is not a fork of another repo
  • has a shard.yml file
  • has at least one release associated with it

Then people will have the ability to connect their github account and claim the shards as theirs if they want to. This way I can avoid indexing small projects, but still build up a large collection very quickly.

@oprypin
Copy link
Member

oprypin commented Jun 19, 2019

I personally hope for a registry in which claiming shards would not be typically necessary.

But if there is such a feature, you'd need to support not just github.
An idea for that would be to require the user to add a special string to their repository. Maybe signed with a key by the registry itself or maybe a hash or maybe even just plaintext, because I don't know what's important to hide there

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@oprypin that's not a bad idea. I do want to have it search Gitlab and BitBucket in the future as well, but only Github will be supported initially as that is the one I have an API wrapper for. The nice thing about this approach is that claiming would be optional.

@didactic-drunk
Copy link

I'd like to add 2 optional pieces that are not requirements but my own personal pet peeves of missing functions.

  • shard search foo Queries the registry via an API and displays matching shards. (Like gem/other search).
  • Reverse dependencies list. I have my own code that gets every github crystal project and creates a dependency list that I use to check on the health of crystal projects before using them. Maybe that's useful for other people too.

@watzon
Copy link
Author

watzon commented Jun 19, 2019

@didactic-drunk if you want to share your code for the reverse dependency list I'd love to see it. I do have something like that planned. I also want to implement badges for various things and maybe do some kind of ranking system like they do with https://pub.dev.

@didactic-drunk
Copy link

With the current set of features "owning" a repository on github may not be necessary. Can the entire site function via data scraping and telemetry? Perhaps user signups are only needed for 3rd party providers. Even then you could probably have a submit URL for inclusion page that starts scraping the repository regardless of who owns it. Delisting or special commands could take place in a special branch with a single file.

@oprypin Authentication for github could use OAUTH. Special branches can contain tokens provided by the website for initial authentication. Authentication may not be necessary if the data is scraped automatically.

@watzon Are there features that require manual configuration not available from scraping?

@ysbaddaden A branch or other file could list the library versions with their compatibility information.

The website and shard search foo should probably list both library versions and general shards that are clearly marked for what they are. I've recently run in to problems with crystalshards not showing shards older than 1 year leading to duplicated effort. The main branch had no contributions for a year and was not shown on crystalshards but the forks were active and still not shown. That's the type of information I would like to see on a website along with a shards healthcheck? command so that someone can take over or easily upgrade their shards to a supported fork.

@watzon
Copy link
Author

watzon commented Jun 20, 2019

@didactic-drunk technically no. There's nothing currently that would require manual configuration.

@didactic-drunk
Copy link

@watzon The dependency list code is in ruby and was made as a quick and dirty way to get information not available elsewhere. Do you care? It can output JSON/YAML/etc.

Sample output.
Reverse dependencies are in the 2nd file.

@watzon
Copy link
Author

watzon commented Jun 20, 2019

@watzon The dependency list code is in ruby and was made as a quick and dirty way to get information not available elsewhere. Do you care? It can output JSON/YAML/etc.

I just wanted to take a look and get inspiration. I haven't started the code for dependency matching yet so anything helps. Thanks :)

@didactic-drunk
Copy link

Is http://shards.info a better starting point? It already has forward and reverse dependency tracking.

I've seen crystalshards.xyz but not shards.info before. Am I the only one that didn't know about it until now?

@watzon
Copy link
Author

watzon commented Jun 20, 2019

@didactic-drunk I'd never heard of it either. But I do have 2 trending repos on the front page, so yay me!

@ysbaddaden
Copy link
Contributor

@watzon @oprypin Please, don't publish shards for authors that never asked to!

Having a shard.yml and releases doesn't mean it's a library: it could be an app (e.g. shards, prax)!

Of all the repositories I have, I'd only publish a few (e.g. minitest, earl, scrypt, pool): the ones I want to maintain and be used. Others are mere weekend side projects; I don't want to have those listed in an official registry!

Let's have a central place to publish and search for shards. Let it be populated patiently with real libraries that people do care for in the long run, and let's avoid the junk that nobody uses, not even their own authors (including me).

@oprypin
Copy link
Member

oprypin commented Jun 20, 2019

If you think it's useful to have a registry of an arbitrary subset of libraries (I expect <10% for a very long time), I can't stop you.

@oprypin
Copy link
Member

oprypin commented Jun 20, 2019

http://shards.info is awesome! It needs to be advertised more (or at all!)

It's exactly what I would want from such a tool and a living proof that centralization is not required to achieve these goals.

@jkthorne
Copy link

@watzon have you thought about posting this on forum.crystal-lang.org ? I feel like most of the crystal community is watching the crystal-lang/shards repo but there might be some more people watching on the forum.

@watzon
Copy link
Author

watzon commented Jun 20, 2019

@wontruefree really I only posted it here because I wanted to get the maintainers opinions and see if having this site and shards be integrated in some way would be something they'd be willing to do. I'm not aiming for much attention right now.

@ysbaddaden
Copy link
Contributor

@watzon feel free to contact me directly. I'd love a directory, and the few screenshots you posted above are very nice, and feel a lot like what I wanted.

I'm closing for the time being, since I believe your questions are answered :)

@watzon
Copy link
Author

watzon commented Jun 21, 2019

Thanks @ysbaddaden. Once I have it a bit more feature complete and make the repo public I'll consult with you about how to go about integration with shards.

@RX14
Copy link
Contributor

RX14 commented Jun 22, 2019

There shouldn't be any integration with shards and a registry. That's centralization. Any registry should be a community project, just a list of "these are maintained shards and here's their repositories" with a nice UI.

Any other features can be obtained through the website without integration with shards.

@didactic-drunk
Copy link

@watzon @oprypin Please, don't publish shards for authors that never asked to!

Having a shard.yml and releases doesn't mean it's a library: it could be an app (e.g. shards, prax)!

Of all the repositories I have, I'd only publish a few (e.g. minitest, earl, scrypt, pool): the ones I want to maintain and be used. Others are mere weekend side projects; I don't want to have those listed in an official registry!

Let's have a central place to publish and search for shards. Let it be populated patiently with real libraries that people do care for in the long run, and let's avoid the junk that nobody uses, not even their own authors (including me).

Perhaps a publish flag and optional official url in shard.yaml would solve this issue and reduce friction? That way no one has to contact any registry. Multiple (decentralized) registries can use the same flag.

@Sija
Copy link
Contributor

Sija commented May 25, 2021

I'd prefer private flag, if any.

@didactic-drunk
Copy link

@ysbaddaden Wanted opt in rather than opt out. For new projects I'm likely to agree with him. Low quality shards probably shouldn't be published. If the user can't add a single flag and/or official url to publish then it's probably not worth having the shard searchable. Filling out anything in shard.yaml is still less work than registering with any potential index or awesomelist.

What about old shards that won't/don't update their shard.yaml to indicate publishing is wanted? Let the search index grandfather them in. There's only ~1000 or so last I checked.

You could have shards be smart and ask or warn about publishing if releases are detected or if the flag is ambiguous.

WARN: Set `publish; true` to indicate you intend to support the software and make it searchable on ...
WARN: Set `publish: false` to indicate the software should be kept private, as a personal project OR not ready for general/production use.

Perhaps the best default is no default. Let the user decide or in new shards set publish: false by default. Missing publish can warn, prompt and possibly tailor the recommendation based on detected release tags. A missing publish makes detecting grandfathered shards very simple.

@didactic-drunk
Copy link

On 2nd thought: Would detecting releases solve the issue? No releases == don't publish?

@straight-shoota
Copy link
Member

IMO https://shardbox.org solves this in an elegant way: The catalog is a list of explicitly opted-in projects. It doesn't depend on modifying individual projects' shard.yml. Instead, it's globally edited. An additional benefit of that is a curated taxonomy.
Of course, this is requires more coordinated effort to set up. As an alternative, automatic discovery method it also brings in any shard that is referred to as a dependency. So even without manual editing, it can discover newly published shards. When a repo is referenced as a dependency, this is a relatively good indicator of its usefulness as a library.

@rmarronnier
Copy link

Is this issue closed because the subject is settled (no registry) or can the discussion be opened again ?

I encountered an issue with some conflicting redis shards (https://shardbox.org/search?q=redis : there are 4 shards with the exact name redis).

shardbox is nice, but nothing stops / warns me from :

  • Opening a github/gitlab repo with a shard named 'redis'
  • Opening a PR asking to add this shard the the catalog.

There should be an authoritative source (an official repository like #283 (comment)) used by shards and crystal init to deter public homonyms of shards.

It won't stop toy / week end project shards to be published in an obscure github repository, but if you want to secure your more serious shard name and make it easily discoverable, just spend 5 minutes to open a PR to the central repo.

@ysbaddaden
Copy link
Contributor

A problem is that to enforce it, you must have the registry, hence download it on each computer. In Rust, Cargo needs to download a several hundred megabytes file for example. Same with Nix where it downloads a 64MB archive once a day, sometimes multiple times a day 😨

Also, non clashing shard names won't magically prevent clashing type names, especially when your shard is about monkey-patching (c.f. openssl_ext for example) or is abandoned, forked then the original is revived and diverges 😭

@bcardiff
Copy link
Member

There should be an authoritative source (an official repository like #283 (comment)) used by shards and crystal init to deter public homonyms of shards.

I would rather scope it to sensible templates for shards init rather than a whole registry.

I agree that not having a central package authority of packages makes some stories harder (like discovery), but it also simplifies a lot other stories and maintenance.

@ysbaddaden
Copy link
Contributor

@bcardiff @straight-shoota maybe crystal-lang.org could start considering shardbox.org as the "official registry" of shards?

@rmarronnier
Copy link

A problem is that to enforce it, you must have the registry, hence download it on each computer. In Rust, Cargo needs to download a several hundred megabytes file for example. Same with Nix where it downloads a 64MB archive once a day, sometimes multiple times a day

Right now, the uncompressed content of https://github.com/shardbox/catalog/tree/master/catalog is 100 kb. It could of course grow as Crystal gets more popular but we'd all wish for the Crystal ecosystem to have this kind of problem :-)

shards could also query directly https://shardbox.org/ (instead of downloading the whole catalog).
It could be useful for a shards add command but also in case of shard name conflict :

Error shard name (lasagna) has ambiguous sources: 'git: https://github.com/peter/lasagna.git' and 'git: https://github.com/alice/crystal-lasagna.git'.

could become

Error shard name (lasagna) has conflicting sources: 'git: https://github.com/peter/lasagna.git' and 'git: https://github.com/alice/crystal-lasagna.git'.
'git: https://github.com/alice/crystal-lasagna.git' is the registered source on shardbox.org for 'lasagna'

A registery wouldn't magically fix the existing conflicts / issues but it'd be a starting point for a (very !) long journey where the idiomatic content of a shards.yml file would be (without explicit sources) :

dependencies:
  hammer:
    version: ~> 1.0.0
  glass:
    version: ~> 1.0.0
  window:
    version: ~> 0.3.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests