Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next-gen registry & Bower architecture #73

Closed
2 of 20 tasks
rayshan opened this issue Jul 13, 2014 · 22 comments
Closed
2 of 20 tasks

Next-gen registry & Bower architecture #73

rayshan opened this issue Jul 13, 2014 · 22 comments
Labels

Comments

@rayshan
Copy link
Member

rayshan commented Jul 13, 2014

I'd like to help and get the next-gen registry out. I reviewed previous discussions, and seems like the team and interested parties still need to norm on a general architecture.

State of the union

  • CLI works very well and heavily relied on by tens of thousands of devs and CI environments all over the world (very important to ensure uptime & backward-compatibility)
  • Experience for package publishers has room for improvement
  • Registry needs more data to be single source of truth (stats, keywords, etc.)
  • No user management
  • Does not serve binaries
  • 3rd party integration relies on embedding CLI
  • Multiple server-side services with overlapping functionality
  • Light on tests & no CI environment
  • No https (SSL certificate for bower.io & subdomains bower.github.io#45)

Previous rewrite design doc for reference

Proposed architecture

Note that even though this looks very complicated, it's more of a reorganization of existing parts. Actual work will be focused on building next-gen registry and API. As you review this please keep simplicity in mind.


Drawing link

Decisions to make

  • Focus on RESTful API-oritented architecture (yes)

    3rd party tools currently embed the CLI. Does it make sense to build an API that does most of what the CLI can do (namely /package and /search endpoints), and have the CLI be a client that consumes the API? Currently CLI does this already to a certain degree. We can also push some logic to server-side (e.g. Faster dep resolution #48)

  • User management (yes, need to experiment re: 3rd-party service)

    • Necessary? Or continue to use existing method of relying on github tokens (not everyone uses github)
    • If yes, build our own or use 3rd party API like Stormpath or Auth0?
  • Why not keep Postgres? (keep)

    RMDBS is decently performant and well-known by potential contributors. Instead of saving the whole bower.json, we can just parse it and insert a row in a db table. For simplicity we can use an ORM like bookshelf. I'm not too familiar with Postgres / MongoDB / CouchDB / ... admin so it's up to the team to pick one and I'll figure it out.

  • Why store binaries? (yes, but do it last)

    Major benefits vs. what we do today? Aren't packages always on a repo hosting service like Github? If not maybe publisher can submit a link to a binary (security risk?)? For private packages publisher can provide a key to install.

  • Is it premature to worry about scaling now? (yes for now, no action)

  • Do we need to worry about ease of replication? (yes, better docs on this subject for now)

  • Express vs. Hapi? (@svnlto can you make an argument for this?) (leaning towards hapi, need to experiment)

  • CoffeeScript? (no, maybe ES6 + traceur)

    I would like to personally ask the team for blessing on this. I'm most productive writing CoffeeScript, but I understand there may be concern with maintenance and attracting future contributors. There are many high profile OSS projects (like Atom) that use CoffeeScript exclusively. If the team feel strongly against this I'll stick w/ vanilla js.

  • Do we still need caching to serve stats? (need to experiment)

    If db of choice is performant enough, I'll just dump/fetch stats straight from of db

Next steps

  • Get access, understand current infrastructure
  • evaluate pros / cons of hapi & 3rd-party auth service
  • db - new tables with stats & 3rd-party data
  • db - migrate packages table to include all info from bower.json
  • db - scrape github for bower.json data and put in packages table
  • CLI - include bower.json in payload, check for consistency (e.g. name, verison, etc.), recommend tags if none
  • Break out stats front-end
  • adapt stats back-end for ETL service
  • Write new endpoints for server, proxy old endpoints
  • hook up user service
  • simple admin panel for admins / publishers
  • test all the things!
  • Set up CI (try using github's release hooks for deploy)
  • Load testing
  • Get a SSL cert
  • document API endpoints (use http://docs.bower.apiary.io/ or RAML)
  • document replication & running private registry
  • Update CLI (need team's help on this)
  • Migrate / deploy
  • Store / serve binaries (sponsor needed)

Info

New consistent repo / service modules naming convention:
bower-server-api (combines below)
bower-server-registry (this repo)
bower-server-etl (from stats service)
bower-server-stats (from stats service)
bower-server-user

@rayshan rayshan mentioned this issue Jul 13, 2014
@sheerun
Copy link
Contributor

sheerun commented Jul 13, 2014

I'm pretty sure we should use npm registry for hosting binaries (just like component.io). Rolling our own hosting is not a feasible option, because of budget and resources bower has. We could host URLs to source repository (github) and distribution repository (npm, or other).

I think we should use Postgresql too. Redis has no good indexing or clients.

User authorization would be helpful. As well as mini admin panel with repositories that user manages. I wouldn't use 3rd party solutions unless they are open source.

With user repositories saved, the registry could crawl those repositories periodically.

I'm fine with CoffeeScript or even Ruby for registry server.

@rayshan
Copy link
Member Author

rayshan commented Jul 14, 2014

I'm also leaning towards not hosting binaries, they're already online somewhere. 1 less thing to break.

With user repositories saved, the registry could crawl those repositories periodically.

@sheerun what would the use case be? Auto bump bower package version based on git tags? Crawl for bower.json so CLI doesn't have to upload them?

I'll try to pull breakdown of repo hosting services used by publishers, should be a useful stat to have.

@Hacklone
Copy link

Hey, thanks for contacting me. Here is my opinion on some of the decisions:

  • I think mongodb with mongoose is really easy to use I had really great experience with those.
  • Passport.js got everything you need but I'm not sure if the public bower registry needs it, sounds more like internal usage.
  • Keep up the restful :)
  • I think you should not store binaries, it's like commiting package dependencies with your source code :)
  • Scaling -> I'm not sure how big load you've got but I always say that you should deal with a problem if there is a problem
  • ease of replication?
  • Coffescript -> harder to read, additional dependecy etc. Use harmony instead if you're not happy with es5 (Ruby -> get ready for a 'no conventions' hell :))
  • caching stats -> the db could cache it in memory (i.e. mongodb)

@sheerun
Copy link
Contributor

sheerun commented Jul 18, 2014

My more 2 cents:

  • Node has very poor Harmony support: http://kangax.github.io/compat-table/es6/#node So harmony is the same dependency as Coffee, it requires precompiling with traceour.
  • I had terrible experience with mongo. Postgresql was always reliable and awesome.
  • Ruby has better conventions than JS, Coffee is easier to read ;>

@Hacklone
Copy link

In Ruby you can write everything a 10000 way, in JS you can only in 100, so when working on the same project this makes a big difference. (tons of more best practises needed and the style will never be the same with Ruby :) )
I didn't say that you should use harmony, just said that es6 will be a standard and node harmony implements some nice features (like yield, arrow functions etc...)
A lot more people read JS than coffescript (why develop an opensource project with a "language" that will disappear eventually because es6 solves loads of problems why coffescript was created)
I'm not against Postgres I just didn't have that bad experience with node :)

I didn't want to offend anybody, this is just my opinion :)

@rayshan
Copy link
Member Author

rayshan commented Jul 18, 2014

Thanks @Hacklone for stopping by and your input. What I meant by ease of replication was if someone would like to run something like private-bower. Do you dump all the registry data in json into a local database? CouchDB seems to have super easy ways of replication.

Unfortunately I don't know Ruby, unless if another contributor wants to lead this. I would like to play with Traceur, just more familiar with CoffeeScript.

Mongoose looks great. There are also nice ORMs for Postgres like bookshelf / sequelize.

I noted you guys' votes, let's wait for additional contributors to chime in.

@benschwarz
Copy link
Member

In Ruby you can write everything a 10000 way, in JS you can only in 100, so when working on the same project this makes a big difference. (tons of more best practises needed and the style will never be the same with Ruby :) )

No.

@benschwarz
Copy link
Member

Thanks for getting all these thoughts / ideas down @rayshan!

Focus on RESTful API-oritented architecture

👍

User management

Definitely required. Single 3rd party vendor authentication sucks (because single vendor), multi 3rd party authentication sucks (because management / usability is a PITA)

Why not keep Postgres?

I'm not going to argue the cases for different database engines on their 'merits' (frankly, commentary here from non-contributors should not be welcomed).
Using a conservative RDBMS is the way to go. Contributing is easier, management is easier.

Why store binaries?

Right now, there are many issues surrounding http proxies, ssh access, shallow clones on github enterprise. Generally, transport of packages using git isn't working very well. I'd personally prefer to see packages stored as tar/zip/compressed on S3 (or, whatever) with a CDN in-front.

Is it premature to worry about scaling now?

Yes, I think so.

Do we need to worry about ease of replication?

Yes, running a mirror is definitely desirable. Within (larger) organisations, I'm sure that people would like to be able to replicate their own private registries… or the public registry.

At the risk of sounding naive, I think we could solve this via API and a sync-client.

Express vs. Hapi? (@svnlto can you make an argument for this?)

Whatever results in the most simple deployment / API. There should be also be a focus on the size of the contributing communities.

CoffeeScript?

This has been asked in the past. Other parts of Bower are written in Vanilla, I think that the registry should follow suit.

Personally, my preference is vanilla.

Do we still need caching to serve stats?

Depends on the database impact. We'd have to pull together some numbers to know for sure.

CI

Yes, this needs to happen.

No https (bower/bower.github.io#45)

We shouldn't be using HTTP for Bower at all. Definitely need to get this on the priority list.

Thanks Ray! 🐦

@Hacklone
Copy link

For the replication it was a definite request for private-bower to be able to cache any registry private or public, so I'm not sure that this public registry should do that too, because there are totally other needs for the organization compared to the rest of the developers.

@rayshan
Copy link
Member Author

rayshan commented Jul 20, 2014

Thanks @benschwarz.

Sounds like user management is a definite yes.

For binaries, what we could do is get all the other pieces up and running, then add binaries at the very last. The web admin UI can even accommodate drag / drop with simple registration. Because S3 / CloudFront is only free for a year, we may need to look into a CDN to sponsor.

I asked @Hacklone for opinion due to his work on private-bower. Perhaps I jumped the protocol here - my fault.

@benschwarz
Copy link
Member

I asked @Hacklone for opinion due to his work on private-bower. Perhaps I jumped the protocol here - my fault.

No, not at all.

I think its important to recognise that technologies should be chosen very conservatively, particularly for bower as a project during its current lifecycle. Thats where I was coming from. No harm, no foul.

@rayshan
Copy link
Member Author

rayshan commented Jul 25, 2014

Thanks @benschwarz. I completely agree with you.

I went to a talk last night where npm's devops person talked about their stack. You were right that they spent quite a bit of time wrestling w/ couchdb, and their stack is a lot less dependent on couchdb now. I feel like if they had a choice they would choose to move off of it.

Stormpath's dev advocate was also there. I spoke to him about pros / cons of using a 3rd-parth auth solution. He also offered to sponsor for enterprise level package for us.

update: npm infrastructure talk: https://www.youtube.com/watch?v=3ivx2RsZ1yA

@sindresorhus
Copy link
Contributor

👍 Everything @benschwarz said.

@rdegges
Copy link

rdegges commented Jul 28, 2014

FYI, I'm a developer @ Stormpath (we're a vendor which does user management). If any of you guys are interested in trying us out / considering us for the bower user management, we're more than happy to 100% sponsor the project for free. Our service is free for most small projects, but as you guys would probably need more than the free stuff, we'd be happy to cover all costs.

We love / use Bower ourselves all the time. It's a big part of our stack.

Furthermore, we have some pretty awesome node integration / express integration that I've been working on, and I'd be happy to pitch in for coding efforts!

Just a thought!

<3333

-Randall

UPDATE: Here's a link to the new express stuff: http://docs.stormpath.com/nodejs/express/ (we're super easy to use / easy to export data out of / and easy to migrate OFF of).

@rayshan
Copy link
Member Author

rayshan commented Aug 1, 2014

Thanks guys. Sounds like we're generally aligned on the next steps. I updated the first comment to reflect feedback.

@sindresorhus can you add me to the bower heroku app? Just for understanding env vars, scaling & replicating the db. I'll set up a new dev environment. Won't change anything in production app w/o notifying other owners.

Follow up for @sheerun, I pulled some git hosting stats out of the registry:

Total # of bower registry packages as of 7-30-14 - 17055
github public - 16947 (99.4%)
github enterprise (e.g. github.paypal.com) - 50
bitbucket - 14
gist.github.com - 12
gitorious - 2
beanstalk - 1
code.google.com - 1
gitlab - 1

live query: https://dataclips.heroku.com/byvmrorsycxmclubeuzxtlckegvm

@rdegges thank you sooooo very much for the offer. I love you guys for just willing to support OSS. As discussed offline, I need to understand the pros/cons of using a 3rd party service better so I can make a case for the entire bower team.

Some pros I see so far:

  • easier to get up and running
  • lots of non-core features built-in, e.g. password reset
  • better security (npm had to spend $$$ on security audit and said it was "well worth it")
  • uptime
  • additional support
  • ...

Some concerns:

  • github login integration (some vendors offer this; crucial given the audience)
  • data portability (depending on vendor, even if available, still a lot of work to migrate)
  • hapi / angular integration (everything's restful so maybe not so crucial)
  • ...

What I plan to do is to build a little prototype w/ & w/o just to try it out for myself. Let's chat more about this.

@sindresorhus
Copy link
Contributor

@rayshan done.

@struys
Copy link

struys commented Aug 16, 2014

Let me start off by saying, I love the general idea and I'll try my best to have my team help out with future work on the registry. <3

Light on tests & no CI environment

If we're talking about a rewrite, I'd really like to see a coverage tool like istanbul (https://github.com/gotwarlost/istanbul) used. We can save a lot of time merging pull requests if coverage is part of CI.

Note that even though this looks very complicated, it's more of a reorganization of existing parts. Actual work will be focused on building next-gen registry and API. As you review this please keep simplicity in mind.

My primary concern is making sure the registry is easy to setup internally. We're using bower at Yelp because it's a great tool that didn't take a huge amount of effort to get working within our internal network. As a result, we've also been able to contribute back upstream. I think we want to make sure all of these systems are pluggable. Please make sure it's possible to skip the vendor dependencies.

Why not keep Postgres? (keep)

RMDBS is decently performant and well-known by potential contributors. Instead of saving the whole >bower.json, we can just parse it and insert a row in a db table. For simplicity we can use an ORM like >bookshelf. I'm not too familiar with Postgres / MongoDB / CouchDB / ... admin so it's up to the team to >pick one and I'll figure it out.

At Yelp we primarily use mysql and our standard backup is mysql. Since the registry uses postgres, we've setup a somewhat weird git repo for backups (we also have a standard to backup git). It would be awesome if the registry was DB agnostic. Could we use something like sequelize? (http://sequelizejs.com/)

CoffeeScript? (no, maybe ES6 + traceur)

I would like to personally ask the team for blessing on this. I'm most productive writing CoffeeScript, but I understand there may be concern with maintenance and attracting future contributors. There are many high profile OSS projects (like Atom) that use CoffeeScript exclusively. If the team feel >strongly against this I'll stick w/ vanilla js.

My team would prefer to avoid CoffeeScript. Yet another language with questionable benefit considering the overhead.

@rayshan
Copy link
Member Author

rayshan commented Aug 19, 2014

@struys thanks for the input & your team's previous contribution to the registry.

coverage tool like istanbul

Yes will look into it, better coverage is definitely desired, we made a little progress recently but still a long way to go

... all of these systems are pluggable. Please make sure it's possible to skip the vendor dependencies.

I'm glad you mentioned this, it's in line with the core team's input so far, wasn't something I thought a lot about, but it will be

Could we use something like sequelize?

Definitely. I was debating b/t bookshelf & sequelize, did your team have any particular reason to choose sequelize? BTW I do plan to take advantage of Postgres' JSON data type

CoffeeScript

Definitely no, everyone convinced me (and I'm unsure of CoffeeScript's future as well...)

@krotscheck
Copy link

Does anyone have objections to some work being done on these items? I have a direct need for downstream mirroring and caching, and a few free weeks I can throw at it.

@zenorocha
Copy link

Any news on this @rayshan? I'd love to consume an api.bower.io to fetch packages by a certain keyword, instead of relying on search-server which is usually flaky.

@ghost
Copy link

ghost commented May 25, 2016

Just a small crazy hint in JS vs Coffee vs Ruby - why not Pharo (ping @pharo-project)? ;-)

@sheerun
Copy link
Contributor

sheerun commented May 31, 2016

Unfortunately we need to abandon api rewrite as we lack sufficient resources.

Moreover nowadays it's clear it's not a good idea to introduce yet another binaries registry when npm's is more than sufficient. I suggest to focus on developing bower a client of npm's registry, instead introducing brand new one. This would also fight bower-npm registry dichotomy we experience.

@sheerun sheerun closed this as completed May 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

9 participants