Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Packages extensibility #849

Open
JakeGinnivan opened this issue Jun 25, 2022 · 22 comments
Open

RFC: Packages extensibility #849

JakeGinnivan opened this issue Jun 25, 2022 · 22 comments

Comments

@JakeGinnivan
Copy link
Contributor

Problem

The tagling for changesets is A way to manage your versioning and changelogs with a focus on monorepos, currently it is A way to manage your versioning and changelogs for NPM packages with a focus on monorepos

The goal of this RFC is to suggest how we could realise the dream while not only supporting NPM.

There are also a bunch of issues talking about changing behavior around NPM publish.

Related discussions / PRs

#171
#218
#399
#425
#580
#654
#778
#800
#801

Proposed solution

Packages have a few basic components:

  1. A name
  2. A version
  3. Dependencies to other packages in the repo
  4. Publish status with version
  5. Ability to be published

If changesets allows the concept of 'packages' to be extensible then a repository could have multiple types of packages registered. For example in an NX repo you could expose NX projects via the plugin.
It would also allow non-node projects to be tracked by creating a plugin to support Ruby Gems, or .net NuGet packages etc.

Package interface

type PublishedState = "never" | "published" | "only-pre";

interface ChangesetPackage {
  name: string
  version: string
  kind: string
  directory: string
  dependencies: string[]

  getPublishInfo(): Promise<{
   publishedState: PublishedState
   publishedVersions: string[]
  }>
  /** assumes updateVersion has run */
  publish(): Promise<{}>
  updateVersion(version: string): Promise<{}>
}

Config

  "packages": [
    "@changesets/packages-nx",
    [
      "@changesets/packages-npm",
      { "repository": "npm", ignore: ['some-package'] }
    ],
    [
      "@changesets/packages-nuget",
      { }
    ]
  ]

Core workflows

Some of these diagrams could be improved, but is a first pass at visualising the workflows. Also have left out things like pre-release to keep things simple.

Get Packages

sequenceDiagram
    cli ->>+ getPackages: 
    getPackages ->> config: getPackagesPlugins
    config ->> getPackages: PackagePlugin[]
    loop map each plugin
        getPackages ->> plugin: getPackages
        plugin ->> getPackages: ChangesetPackage[]
    end

    note over getPackages: Input ChangesetPackage[][]
    loop flatMap packages
        alt !exists in map
            getPackages ->> getPackages: add to map
        end
    end

    getPackages ->>- cli: ChangesetPackage[]

Version

sequenceDiagram
    note over getPackages: *Changed
    version ->> getPackages: 
    getPackages ->> version: ChangesetPackage[]
    version ->> readChangeset: 
    readChangeset ->> version: NewChangeset[]

    version ->> assembleReleasePlan: changesets, packages
    note over assembleReleasePlan: Filters changesets from ignored packages
    note over assembleReleasePlan: Flatten changesets into package releases
    note over assembleReleasePlan: Add changes due to dependencies into releases
    assembleReleasePlan ->> version: { changesets: NewChangeset[], releases: ComprehensiveRelease[] }

    version ->> applyReleasePlan: 
    
    loop for each release
        note over package: *Changed from directly updating package.json
        applyReleasePlan ->> package: updateVersion()
        applyReleasePlan ->> config: Get changelog plugin
        note over applyReleasePlan: Gets commit for each change
        note over applyReleasePlan: Uses plugin to format lines

        applyReleasePlan ->> changeset file: Updates CHANGELOG.md for project
    end

Publish

This function gets rejigged a fair bit. Packages marked as private never call the publish() function on the package, they are just tagged (assuming the config allows that).

sequenceDiagram
    publish ->> getPackages: 
    getPackages ->> version: ChangesetPackage[]
    
    loop parallel map for each package
        alt is private package
            publish ->> git: get tag for current version
            alt tag exists
                publish -x publish: do nothing
                note over publish: return { published: false }
            else
                publish ->> git: tag release
                note over publish: return { published: true }
            end
        else
            publish ->> package: publish()
            alt success
                publish ->> git: tag release
                note over publish: return { published: true }
            end
        end
    end

Challenges

Source of truth?

While building #662 one of the challenges of tracking private packages is that Changeset uses the published NPM info as the source of truth but this isn't possible for the private packages. Those instead use the git tags in the repo.

I think we should use tags in the git repo as our state, then we can put checks in place to 'refresh' if things get out of sync, ie publish succeeds but then tagging fails. We can detect and fix this state.

This approach will not work if you can publish without tagging though, or not tag per repo. Either that or only private packages rely on git tags being the source of truth?

Duplicate discovery

Package discovery will be done by multiple plugins, if multiple plugins discover a project I think we simply take the one from the plugin registered first (so ordering of the plugins matters).

@Andarist
Copy link
Member

Andarist commented Jul 5, 2022

  1. How Nx packages are different from npm packages? Isn't this usually synonymous? If not - would @changesets/packages-nx have to be in control of handling polyglot packages? I see a huge overlap for this potential package with the potential @changesets/packages-npm and I wonder how those would relate to each other/interact with each other.
  2. getPublishInfo - this has been proven to be a somewhat problematic strategy for npm when it comes to private registries. IIRC Yarn Berry has --tolerate-republish and pnpm tolerates republishing automatically in its pnpm publish --recursive. I was thinking about switching to their strategy - the drawback would be that when there are no pending changesets we'd always be building & packing all of the publishable packages whereas right now we skip that step based on the publish info. In other words, I wonder if we can get away with not implementing this method at first 🤔
  3. Do we allow dependencies between package "types"?

I think this is a delicate issue, increasing the scope of the project - but given that we could, rather easily, abstract those things in those "plugins" the risk seems to be rather low. In the core, we could focus on npm workflows - while allowing the community to experiment with other package "types". I wonder though - who is the primary audience of this feature? Is it mainly useful for polyglot repos that already include npm tooling setup? I would assume that Nuget-only, Crates-only, etc repos wouldn't like to include the npm's can of worms just to use Changesets.

@JakeGinnivan
Copy link
Contributor Author

  1. How Nx packages are different from npm packages? Isn't this usually synonymous? If not - would @changesets/packages-nx have to be in control of handling polyglot packages? I see a huge overlap for this potential package with the potential @changesets/packages-npm and I wonder how those would relate to each other/interact with each other.

If you have a look at the NX PR, it uses an NX specific package to do project discovery and NPM uses manypkgs. They would both discover the same package, which is why order of plugins would be important. But manypkg can't discover all NX packages, because they might not have a package.json, and I don't think we want core to gain extra deps.

2. `getPublishInfo` - this has been proven to be a somewhat problematic strategy for npm when it comes to private registries. IIRC Yarn Berry has `--tolerate-republish` and pnpm tolerates republishing automatically in its `pnpm publish --recursive`. I was thinking about switching to their strategy - the drawback would be that when there are no pending changesets we'd always be building & packing all of the publishable packages whereas right now we skip that step based on the publish info. In other words, I wonder if we can get away with not implementing this method at first 🤔

Sounds good, we could also just handle the conflict result when we try to publish. We could initially not have it, then use the Git tags etc as the source of truth (similar to how terraform/pulumi have their state store) and not implement the refresh side of things initially?
If we did this we could still skip the rebuild to publish based on the tags? If we get a conflict when publish just output a message it looks like a package was published but tagging failed, run 'git tag ....' to fix?

3. Do we allow dependencies between package "types"?

I think this is a delicate issue, increasing the scope of the project - but given that we could, rather easily, abstract those things in those "plugins" the risk seems to be rather low. In the core, we could focus on npm workflows - while allowing the community to experiment with other package "types". I wonder though - who is the primary audience of this feature? Is it mainly useful for polyglot repos that already include npm tooling setup? I would assume that Nuget-only, Crates-only, etc repos wouldn't like to include the npm's can of worms just to use Changesets.

I kinda see NPM packages as an implementation detail. I think the real issue is that changesets is a node project, so if you are a .net project would you want to have a tool from another ecosystem. But many projects have a SPA or similar frontend so node is increasingly part of most dev shops.

The concepts behind changesets isn't really tied to NPM, a 'package' has a name, version and dependencies. There is then the publish side, which is a list of remote versions. A version has a changelog, and if there are unreleased changes you need to 'publish' the package to the remote. This model applies to docker, nuget, crates etc.

The advantage is we can focus on the changesets workflow, and not have a package managers implementation details bleed into that workflow?

@Andarist
Copy link
Member

Andarist commented Jul 5, 2022

I think that I'm mostly on board with the idea itself - we'd still have to figure out some implementation details, I will try to gather some discussable points when/if we greenlight this.

@mitchellhamilton what are your thoughts about this? I know that you were hesitant about doing this in the past.

@rohit-gohri
Copy link
Contributor

rohit-gohri commented Jul 22, 2022

I wonder though - who is the primary audience of this feature? Is it mainly useful for polyglot repos that already include npm tooling setup?

I found this issue while trying to figure out a way to track our infra changes (terraform and helm in the monorepo with the nodejs application code) and generate a changelog for them.

For terraform, since there is no versioning of it's own, I've just created a dummy package.json and added it as a dependency to the main application's package.json that way whenever there is an infra change, application is patch bumped and we can do a redeploy.

For helm I would like to use the chart's version in the Chart.yaml instead of keeping it in sync with the package.json.

How would I make the application dependent on the helm chart or vice versa if this RFC was implemented since one would be tracking version and dependencies via package.json and the other via Chart.yaml? Would this require a separate config file (could get messy)?

@Andarist
Copy link
Member

For terraform, since there is no versioning of it's own, I've just created a dummy package.json and added it as a dependency to the main application's package.json that way whenever there is an infra change, application is patch bumped and we can do a redeploy.

I think this is a viable solution here.

For helm I would like to use the chart's version in the Chart.yaml instead of keeping it in sync with the package.json.

Yeah, this makes sense and this is what @JakeGinnivan wants to solve here.

How would I make the application dependent on the helm chart or vice versa if this RFC was implemented since one would be tracking version via package.json and the other via Chart.yaml? Would this require a separate config file (could get messy)?

IIRC the idea here is that a plugin could read the version from your chart.yaml. It would require almost no integration at the side of your monorepo (other than specifying and configuring the said plugin) but the plugin itself would have to be written to support this particular "package type".

@rohit-gohri
Copy link
Contributor

IIRC the idea here is that a plugin could read the version from your chart.yaml.

Yes, I got that part. But how would we declare dependencies across these packages or is that out of scope? I can't add @monorepo/helm to the package.json dependencies array without it referring to a valid npm package otherwise yarn or npm would complain of invalid package.

@Andarist
Copy link
Member

If it's not a valid dependency for a node project then why do u want to add it as a dependency? I imagine different ways to solve this but i'd like to first understand the use case better

@fubhy
Copy link

fubhy commented Jul 22, 2022

@rohit-gohri FYI this is exactly what I'm also waiting for with this issue ;-). One monorepo package versioning tool to rule them all (publish packages and tag & deploy images with terraform).

The way I'm planning to do this involves a custom github action that leverage pnpm and changesets via a custom script to spit out JSON for a github workflow build matrix including metadata about image versions, etc. ...

  1. Generate matrix from pnpm package versions and changesets including some metadata to determine image tags / versions / etc. and what action to take (e.g. pull or build image, which tags to apply, etc._
  2. Use docker buildx bake to build images according to the matrix (build an image for any infra component that has a pending changeset, pull the others from the registry)
  3. Test the whole stack based on that set of images
  4. Roll every commit to main out to staging (push built images to registry, then deploy with terraform [only those that have been re-built])
  5. Roll out every changeset release to production (after testing it in CI and manually assessing it on the staging environment)

We've got everything else in place already except the changesets bit (currently deploying every component to staging on every commit and then to production whenever any changeset is released).

@rohit-gohri
Copy link
Contributor

rohit-gohri commented Jul 22, 2022

If it's not a valid dependency for a node project then why do u want to add it as a dependency?

I don't want to add it as a npm dependency but I want changesets to consider it a dependency when generating releases. Because the application code depends on the cloud resources being provisioned by terraform. So I want changesets to track that dependency but yarn should not care about that dependency.

Let's take an example of adding a new Google PubSub topic for use in the application.

  1. Create topic in terraform and assign permission to service account
  2. Once terraform is applied we can use the topic in nodejs app to publish payloads
  3. Once app is updated a new docker image is pushed
  4. New docker image tag is used by helm chart to update K8s deployment

The code is not directly using anything from terraform. But it is dependent on the cloud resources being created by it. So the latest terraform will not be compatible with an old application version. And similarly helm needs the application docker image but that is an output of the application.

Currently everything is run when we want to update even one thing (similar to @fubhy's setup). Ideally, I would want to declare these dependencies somewhere, and track it all for the release and in the changelogs.

It's possible I might be over-engineering this. So, keeping this cross language dependency tracking out of scope for now is also fine with me. I would be happy with just getting a common tool for changelog generation for all the things (terraform, helm, node app, packages, etc).

Just wanted to see if we end up implementing support for other packages then would declaring dependencies across them be possible.


On another note, tracking all this via git tags might not be the best solution for my setup. Since I have disabled git tags generation and only create the git tags for the main applications and not for all the common packages through a custom script.

@fubhy
Copy link

fubhy commented Jul 22, 2022

@rohit-gohri In preparation, we've already put a package.json into all components directories (whether they are javascript / typescript or not) simply to track versions, dependencies (using the pnpm workspace linking feature [workspace:*]) and to easily run commands in package dirs (using pnpm filter).

@JakeGinnivan
Copy link
Contributor Author

The proposed interface for a package allows dependencies to be discovered, you could use custom metadata fields or switch to a mono repo tool like NX, which introduces project.json to define dependencies between projects.

FWIW @rohit-gohri I am building https://github.com/arkahna/oss-nx-packages/tree/main/libs/nx-terraform at work and getting changesets to understand all projects in my nx.dev project is one of my first goals after implementing this RFC.

@JakeGinnivan
Copy link
Contributor Author

On another note, tracking all this via git tags might not be the best solution for my setup. Since I have disabled git tags generation and only create the git tags for the main applications and not for all the common packages through a custom script.

This is something which has played on my mind, we need to track it somewhere. We could put some custom metadata into the .git folder, but that makes it hard to recover from. Suggestions welcome

@rohit-gohri
Copy link
Contributor

rohit-gohri commented Jul 22, 2022

The proposed interface for a package allows dependencies to be discovered, you could use custom metadata fields or switch to a mono repo tool like NX, which introduces project.json to define dependencies between projects.

Great!

This is something which has played on my mind, we need to track it somewhere. We could put some custom metadata into the .git folder, but that makes it hard to recover from. Suggestions welcome

If the approach is to handle everything via plugins and if this is also handled by the plugin then I think it's fine to use git tags for the nx plugin. A plugin for helm charts could then check the chart registry instead of tags, a docker plugin could check pushed images, and so on.

@JakeGinnivan
Copy link
Contributor Author

If the approach is to handle everything via plugins and if this is also handled by the plugin then I think it's fine to use git tags for the nx plugin. A plugin for helm charts could then check the chart registry instead of tags, a docker plugin could check pushed images, and so on.

Where it could become a bit messy is let's say I have a node project but am publishing to Dockerhub because it's a private NPM package which would be managed by the npm plugin. I think the RFC needs to be expanded with some examples of what it would look like for various scenarios to 'test' the proposal. I'll see if I have time this week, but if someone else wants to take a crack that would be awesome

@emmatown
Copy link
Member

what are your thoughts about this? I know that you were hesitant about doing this in the past.

Yeah, I'm still hesitant about it. I think have a package.json and something to copy the version from the package.json to whatever other manifest seems like a good work around for now. Besides all the usual "what would this actually look like?" and "how much complexity would this add?" a big concern I have is that this would remove the ability to know what packages exist in a repo without doing arbitrary code execution (e.g. as the Changesets bot does with the link to create an auto-filled changeset and showing the resulting bumps after including dependents) and removing that ability would make Changesets worse for what I would suspect is the majority of Changesets users who only care about having packages in the npm ecosystem.

@brianespinosa
Copy link
Contributor

This is something which has played on my mind, we need to track it somewhere. We could put some custom metadata into the .git folder, but that makes it hard to recover from. Suggestions welcome

@JakeGinnivan have we considered using a markdown or JSON file in the Changesets directory? That seems like a good way to not overload git with info the same way Changesets does not overload commit messages.

I am less concerned with polyglot monorepos on my end, and it almost might make sense to separate that use case out into a different RFC, but if that case could be handled by a plugin infrastructure, great.

Also, thanks for jumping in with these proposals. I had started trying to implement some of this support for NX a few months ago and I got pulled on to other things at work. Nice to see some other people finding their way here and championing RFCs.

@JakeGinnivan
Copy link
Contributor Author

have we considered using a markdown or JSON file in the Changesets directory? That seems like a good way to not overload git with info the same way Changesets does not overload commit messages.

Not a bad idea, we are already writing the CHANGELOG.md file so it would just be an additional file. I haven't had a lot of capacity to push this forward at the moment. Posting some info on what that file could look like and how it would work then I can work into the RFC would be a huge help

@JakeGinnivan
Copy link
Contributor Author

what are your thoughts about this? I know that you were hesitant about doing this in the past.

Yeah, I'm still hesitant about it. I think have a package.json and something to copy the version from the package.json to whatever other manifest seems like a good work around for now. Besides all the usual "what would this actually look like?" and "how much complexity would this add?" a big concern I have is that this would remove the ability to know what packages exist in a repo without doing arbitrary code execution (e.g. as the Changesets bot does with the link to create an auto-filled changeset and showing the resulting bumps after including dependents) and removing that ability would make Changesets worse for what I would suspect is the majority of Changesets users who only care about having packages in the npm ecosystem.

@mitchellhamilton there are already a number of users of NX who are in the npm ecosystem and the experience isn't ideal because manypkg doesn't understand NX.

In terms of complexity I think well defined boundaries of responsibilities will actually simplify changesets architecturally. It has evolved and some responsibilities are in confusing spots.

@brianespinosa
Copy link
Contributor

brianespinosa commented Sep 15, 2022

I am not entirely sure where we might have landed on this RFC as I am coming back to read through the conversation to this point, but I will try to summarize:

It seems like perhaps @mitchellhamilton has the greatest concerns that we might need to address, but past that, it seems like this proposal would be sound and we need to figure out where to do the work. Is this accurate?

I am coming back here to look at this RFC and #848 because my teams will be in need some some of this support, and we might be able to underwrite some of the efforts to get the work done if there is consensus.

cc @Andarist @JakeGinnivan

@tom-sherman
Copy link

Could this help to support publishing custom GitHub actions? The only difference between a github action and an npm library is that for actions the publish step is pushing a tag to github.

I assume this RFC would support this use case?

@Andarist
Copy link
Member

@tom-sherman note that changesets/action is already using Changeset for its publishing process and needs. There are some additional scripts there to plumb things together but they are pretty minimal. I'm not sure how much sense it makes to develop github actions in a monorepo setup - but ofc Changesets also work in normal repositories too (and changesets/action is an example of that). But either way... I imagine that this RFC could make it a little bit easier to manage github actions too.


As to the RFC itself - I think I'm on board with some form of this. There is a clear demand for a feature like this. I think that we should keep package.json as the base "manifest" for the package. It's npm convention, but a harmless one for other ecosystems and it makes sense for polyrepos that include javascript packages to have a single type of a manifest for all packages. This would make the package discovery work out of the box at the expense of adding private: true to non-npm packages. Thoughts on that?

I'm open to discussing the exact implementation more. We could also start with a PoC PR and bikeshed things there. Note that nothing is set in stone when it comes to this feature - we need to try to get it right so I wouldn't be surprised if this will require a couple of iterations before landing.

@jonathonherbert
Copy link

@JakeGinnivan this is very interesting to https://github.com/guardian, who have standardised on Changesets for their npm releases but own many non-npm releasables, too.

Is the appetite for this feature still here? As someone who doesn't yet know the internals of this project well, this feels like a moderately complex task – unsure if it's something someone new to the project might be able to pick up, but it might be a nice side quest for us if it still felt like a good fit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants