Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Packagist] Update Packagist service to use v2 api #6508

Closed
wants to merge 5 commits into from

Conversation

GeoffSelby
Copy link
Contributor

Packagist deprecated the original packagist.org/p/username/package endpoint in favor of v2 packagist.org/p2/username/package endpoint. Because of this, new packages aren't being found using v1.

This PR updates the Packagist service to use the new endpoint.

Packagist deprecated the original `packagist.org/p/username/package` endpoint in favor of v2 `packagist.org/p2/username/package` endpoint. Because of this, new packages aren't being found using v1.

This PR updates the Packagist service to use the new endpoint.
@shields-ci
Copy link

shields-ci commented May 15, 2021

Messages
📖 ✨ Thanks for your contribution to Shields, @GeoffSelby!

Generated by 🚫 dangerJS against 4d3c20b

@calebcartwright calebcartwright added the service-badge Accepted and actionable changes, features, and bugs label May 15, 2021
@calebcartwright
Copy link
Member

Thank you for the PR @GeoffSelby! The changes seem to have broken some of the badges, more details can be found in the CI output (e.g. https://app.circleci.com/pipelines/github/badges/shields/6648/workflows/ba687673-7fbf-4cb5-8579-5a699f2b9972/jobs/143021).

Could you please take a look when you get a chance? Let us know if you have any questions

Some packages don't return the same data structure as others with the new api endpoints. This changes the validation schema to account for the potential differences.
@GeoffSelby
Copy link
Contributor Author

Thanks, @calebcartwright! Looks like it was the validation schema causing problems. Should be good when these tests wrap up.

Copy link
Member

@calebcartwright calebcartwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! I started going through this but as I'd noted in the inline comments I feel like I could use some more info about the change. Is there some background context, documentation, etc. you can share with us about the nature of the upstream changes and why there's all the cascading impacts relative to the branches?

Shields integrates with so many upstream services that even within the maintainer team we've got quite a few whose APIs we're not terribly familiar with, so any additional info you could share would be most appreciated!

@chris48s for some reason I feel like I recall you have some more hands on PHP experience than others (certainly more so than myself) so do you have any background on what we were doing before vs. the changes with the new API?

Comment on lines 18 to 15
license: Joi.array().required(),
license: Joi.array(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to drop .required() here? Is this because the new schema may not always define the key? If that's the case then I think in the transform function we need to account for that fact and throw a not found/invalid response error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new api endpoint now only returns the license array with the first item in the response (most recent release). At least that's what I've found in my testing. because of this, the schema validation would fail when the response contains more than 1 version. I do think that throwing a not found error would be good just in case there isn't a license at all for some reason. I'll add that in shortly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally you would run the MetadataMinifier::expand logic on the response before validating the schema, and then you can restore the required() I'd say. Note that packagist.org does not enforce the presence of a license key, so it may well be missing if the package author did not include it. I'd not expect they'd display a shields.io license badge if they haven't defined a license though but who knows :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Seldaek i thought about that too but I don’t think it would be possible with the way Shields makes a request since it’s shared with other services. Maybe with some sort of event hook system but that seems like overkill for this one situation.

Comment on lines 62 to 66
const branch = this.getDefaultBranch(json, user, repo)
if (!branch) {
throw new NotFound({ prettyMessage: 'default branch not found' })
}
const { license } = branch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh I'm not sure I know what we were doing here before. Do you know what, if any, the chances are of this API change resulting in us finding and rendering a different license because of the branch change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't ever find or render the wrong license because we only care about the most recent version which is always the first item in the json.packages[packageName] array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, though my question isn't about "right" or "wrong". We take backwards compatibility pretty seriously, and one of the things we have to be mindful of is whether a change we make to the Shields code has the potential to modify any of our existing badge users without them changing anything in their corresponding project/repo/package/etc.

It's still not clear to me whether that's a possibility here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is technically possible here but there is only one way and it's likely never going to be an issue. The only way it could be a problem is if the default branch has a different license than the most recent tagged release. For example, if a project had an MIT license before (so in the most recent tagged release) but you changed it to Apache License 2.0 in your default branch, main, master, dev, whatever you set your default branch to, and did not tag a new release. The current Shields, with the old api, would return Apache License 2.0 but this PR would return MIT because it's the license of the most recent tagged release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's helpful clarity, thank you!

Based on a cursory glance of the docs under https://packagist.org/apidoc#getting-package-data it seems like we could maintain access to the full set of dev/branch data if we fetched packageName-dev.json instead of packageName.json

Do you see any issues with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm taking a look at Composer's MetadataMinifier package and I realized that the expand method doesn't actually do what we need. All it's doing is "expanding" the array of releases with data we already have. In other words, it doesn't do anything useful if we want to get the PHP version requirements or license (or anything really) for the latest stable release if it's not the first item in the array.

Take a simplified version of the response from /p2/symfony/symfony.json:

"packages": {
  "symfony/symfony": [
    {
      "version": "v5.3.0-RC1",
      "license": [
        "MIT"
      ],
      "require": {
        "php": ">=7.2.5"
      },
    },
    {
      "version": "v5.3.0-BETA4",
    },
    {
      "version": "v5.3.0-BETA3",
    },
    {
      "version": "v5.3.0-BETA2",
    },
    {
      "version": "v5.3.0-BETA1",
    },
    {
      "version": "v5.2.9",
      "require": {
        "php": ">=7.2.5",
      },
    }
  ]
}

After "expanding", license for v5.2.9 (latest stable release) will be whatever licenseis for the first item,v5.3.0-RC1 in this case. In other words, if Symfony were to change its license in a pre-release, that would be the license used for the badge regardless. The same could go for PHP version requirements though not in the case of Symfony right now.

The only way to ensure we actually have the right data is to use the standard JSON API like packagist.org/packages/symfony/symfony.json. This is a more expensive request because the response is massive, but it does respond with all of the data we would need for every release. This poses some other issues as well. As far as I can tell, it still wouldn't return any branches/dev versions like dev-master or dev-main. If we needed dev we could fall back to /p2/user/repo~dev.json. The other issue is that it seems the response is in no particular order. It's not a huge deal but determining the most recent stable release could be a little time-consuming since it would have to go through the entire response.

I'll set some time aside tomorrow to start tackling this.

Copy link
Contributor

@Seldaek Seldaek May 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After "expanding", license for v5.2.9 (latest stable release) will be whatever licenseis for the first item,v5.3.0-RC1 in this case. In other words, if Symfony were to change its license in a pre-release, that would be the license used for the badge regardless.

This is not the case. Please trust the code instead of trying to infer things :) If the license changed in a prerelease, then the minified json would look like:

"packages": {
  "symfony/symfony": [
    {
      "version": "v5.3.0-RC1",
      "license": [
        "NEW-5.3-LICENSE" // new license here
      ],
      "require": {
        "php": ">=7.2.5"
      },
    },
    {
      "version": "v5.3.0-BETA4",
    },
    {
      "version": "v5.3.0-BETA3",
    },
    {
      "version": "v5.3.0-BETA2",
    },
    {
      "version": "v5.3.0-BETA1", // new license was added in beta1 let's assume, so it is valid until here
    },
    {
      "version": "v5.2.9",
      "require": {
        "php": ">=7.2.5",
      },
      "license": [
        "MIT" // old license now is set as it changed from the versions above
      ],
    }
  ]
}

In both cases, after running MetadataMinifier::expand, you would have license:["MIT"] present in v5.2.9.

/p2/user/repo~dev.json includes only dev releases, so if you're looking for latest stable this is entirely useless. Please just use /p2/user/repo.json with a MetadataMinifier::expand port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please everyone take a couple minutes to read through https://packagist.org/apidoc#get-package-data which explains our new v2 api in details. Note the point about caching and If-Modified-Since.. I don't know if you can easily integrate this here to save everyone some bandwidth but it would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the case. Please trust the code instead of trying to infer things :) If the license changed in a prerelease, then the minified json would look like:

@Seldaek so you’re saying that if there is a change between lisences, the minified response would still include the license with the last version to have that license before the change? I guess I just didn’t fully understand what was minified in the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you iterate through versions, the license will always be the last license you saw above. If it isn't anymore, then the license is present in the version object. If an older version lacks a license, the license is set to "__unset". But really IMO the best is to run the expand logic and then use the objects as you would normally, so you don't need to consider any of this.

Comment on lines -11 to -23
'1.0.0': { require: { php: '^5.6 || ^7' } },
'2.0.0': { require: { php: '^7.2' } },
'3.0.0': { require: { php: '^7.4 || 8' } },
'dev-main': { require: { php: '^8' }, 'default-branch': true },
},
'samwise/gardening': {
'1.0.x-dev': {},
'2.0.x-dev': {},
},
'pippin/mischief': {
'1.0.0': {},
'dev-main': { require: {}, 'default-branch': true },
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it typically concerns me when we have to modify tests as part of some other change. I realize that some updates are inevitable here since the new api has a different response structure, but I feel like there's more going on or at least that I'm missing some context (though I presume this is all branch related)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is solely because of the change in response structure as seen in the diff.

@shields-cd shields-cd temporarily deployed to shields-staging-pr-6508 May 18, 2021 07:18 Inactive
@shields-cd shields-cd temporarily deployed to shields-staging-pr-6508 May 18, 2021 23:13 Inactive
@chris48s
Copy link
Member

@chris48s for some reason I feel like I recall you have some more hands on PHP experience than others (certainly more so than myself) so do you have any background on what we were doing before vs. the changes with the new API?

I'll try and have a proper look over this in the next couple of days and see if I have anything useful to add
Also thanks for picking this up @GeoffSelby 👍

@chris48s
Copy link
Member

I did start looking at this one tonight and I've made some notes but I still need to spend a bit more time on it before I can post something useful.

@chris48s
Copy link
Member

I'm kind of hesitant to bring this one up: I don't want to get too bogged down in the 1% case because I think fixing this for the 99% case (packagist.org) is the priority and that is currently broken or producing unexpected behaviour for some users. That said, this transition is slightly complicated by the fact that the packagist integration allows the user to specify a custom (self-hosted) registry using the ?server= param.

If we completely drop compatibility with the v1 API and move to only allowing the v2 API to be used, if anyone is currently using this with a self-hosted registry that is running an older version of self-hosted packagist, their badges are going to break.
I suspect this use-case is probably super-rare for shields.io. Maybe some self-hosting users use it too (but we don't support setting credentials for private packagist, so maybe not..) Presumably someone somewhere is using this though: #3986

I think with one of the other services (sonar? nexus?) we support multiple API versions and we try the most recent first then fall back to the second (but allow you to explicitly specify the version). This situation is slightly complicated by the fact that it seems like the old API is retained but serves stale data so falling back by default might not be a great idea either :/ I think my view on this is we should just optimise to work for packagist.org and other custom registries that maintain feature parity with packagist.org. I think a fallback is going to further complicate an integration which already has a few pain points to serve a use-case which may turn out to be hypothetical in any case.

@shields-cd shields-cd temporarily deployed to shields-staging-pr-6508 May 28, 2021 01:05 Inactive
@GeoffSelby
Copy link
Contributor Author

@chris48s I have added the changes you mentioned regarding the PHP version support badge and license badge.

@chris48s
Copy link
Member

chris48s commented Jun 3, 2021

@GeoffSelby sorry I've not had a chance to get to this one yet. I've been a bit over-committed this week and I've been slightly putting this review off as I know I need to find a decent block of time to read over it properly but I'm not ignoring it. I will find some time for this over the weekend.

@chris48s
Copy link
Member

chris48s commented Jun 5, 2021

I think ideally you would run the MetadataMinifier::expand logic on the response before validating the schema, and then you can restore the required()

i thought about that too but I don’t think it would be possible with the way Shields makes a request since it’s shared with other services.

I suggest we override _requestJson(), like this:

class BasePackagistService extends BaseJsonService {
  async _requestJson({ schema, url, options = {}, errorMessages = {} }) {
    const mergedOptions = {
      ...{ headers: { Accept: 'application/json' } },
      ...options,
    }
    const { buffer } = await this._request({
      url,
      options: mergedOptions,
      errorMessages,
    })
    const compressedJson = this._parseJson(buffer)

    // decompress the response here
    const expandedJson = this.decompressResponse(compressedJson)

    return this.constructor._validateJson(expandedJson, schema)
  }
}

then because we apply the schema after decompressing we can make the things we need .required(). That will allow us to simplify the code for individual services by removing all the manual error handling/validation code and calls to decompressResponse().

On the subject of validation:

  1. Is version guaranteed to exist? It looks like we're not handling the case where it is missing at least. If so, lets make it .required() in the schema(s)
  2. Is it possible for a package on packagist to have zero releases, only dev branches (i.e: ...json returns no objects but ..~dev.json returns some). We could get away with assuming that a call to /p will give us at least one usable version when they were all mixed together, but maybe now we need to handle this case for /p2?

Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more comments but this is coming along nicely. Thanks

@@ -59,17 +58,26 @@ module.exports = class PackagistLicense extends BasePackagistService {
}

transform({ json, user, repo }) {
const branch = this.getDefaultBranch(json, user, repo)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delete getDefaultBranch() from the base class now. It is not being called anywhere.

version,
server
)
} catch {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stylistically we always use } catch (e) {
we should probably have an eslint rule for that

}
}

async transform({ json, user, repo, version = '', server }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic path for this badge is a bit muddled
Our sequence of execution is fetch --> transform --> render (or sometimes just fetch --> render)

  • Fetch gets data from APIs
  • Transform re-shapes the data (if necessary)
  • Render turns it into a badge

Here we are making an API call in transform, so that's a good indicator we've split the logic up incorrectly or named things poorly.

In this situation, calling fetch(), trying to find the latest version in there, falling back to calling fetchDev() if necessary is all part of the "fetch" stage.

The "transform" stage (to the extent there is one for this badge) is just extracting .require.php

Comment on lines +137 to +140
findRelease(json, versions = []) {
json.forEach(version => {
versions.push(version.version)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not that clearly named. How about findLatestRelease()?

Also, we should have one method of determining the latest (stable) version that we use consistently - we shouldn't have 2 different methods for different badges. More commentry on this on the packagist-version.js file..

I don't think versions needs to be a function param here. We never pass anything in when we call it. How about we make this just

const versions = json.map(version => version.version)

@@ -90,25 +85,27 @@ class PackagistVersion extends BasePackagistService {

transform({ includePrereleases, json, user, repo }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted on findRelease(), we should have one method for working out the latest (stable) version. I think most of the logic in this function was there to handle dev branches being mixed in with the releases. Now that they are on a different endpoint ..~dev.json, I think we can ditch all this can't we (do correct me if I'm wrong)? If so, this simplifies to something like:

const packageSchema = Joi.array().items(
  Joi.object({version: Joi.string().required()})
)

transform({ includePrereleases, json, user, repo }) {
  const versionsData = json.packages[this.getPackageName(user, repo)]
  const versions = versionsData.map( version => version.version)

  if (includePrereleases) {
    return { version: latest(versions) }
  } else {
    const stableVersion = latest(versions.filter(isStable))
    return { version: stableVersion || latest(versions) }
  }
}

Then at that point, the logic for identifying the latest release is the same logic as in your findRelease() (which we should rename) but with a includePrereleases flag.. so we can move this to the base class and call with includePrereleases=false when we use it for the license/PHP version badges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Packagist do not have separate tagged version on different branch. Even the main branch is considered a development branch. The ~dev.json endpoint is useful for accessing information of branch HEAD, regardless if it is tagged or not. The current version of documentation stated:

The p2/$vendor/$package.json file contains only tagged releases. If you want to fetch information about branches (i.e. dev versions) you need to download p2/$vendor/$package~dev.json.

Sometimes the tagged pre-release versions (e.g. v0.1.2) of libraries are published and used in production by others. In which case, the latest version would be "not stable" but is not shown on the ~dev.json endpoint but is still relevant.

@Seldaek
Copy link
Contributor

Seldaek commented Jun 6, 2021

Is it possible for a package on packagist to have zero releases, only dev branches (i.e: ...json returns no objects but ..~dev.json returns some).

It is possible for either .json to be empty as long as no tag exists. For ~dev.json to be empty is very unlikely and would indicate a broken package most likely, I wouldn't really care about handling this gracefully but it is an eventuality.

If there is a package version object present then yes the version field will always be present.

@PyvesB
Copy link
Member

PyvesB commented Aug 22, 2021

@GeoffSelby have you had a chance to read through @chris48s's latest review comments? It would be nice to get this landed. 😉

@PyvesB
Copy link
Member

PyvesB commented Nov 27, 2021

Given that there hasn't been any activity for a while, I'm going to close this pull request. Feel free to reopen it if you get a chance to look into it again.

If someone else wants to pick this up, you can simply fetch the commits by running git fetch origin pull/6508/head:pr-6508. Address the pending review comments and open a new pull request with a co-authored trailer included in the commit message to give attribution to the original author.

@PyvesB PyvesB closed this Nov 27, 2021
yookoala added a commit to yookoala/shields that referenced this pull request Feb 6, 2022
Address issues raised by @chris48s in badges#6508, which this PR is base on.
Includes:

* Remove getDefaultBranch() from base class for it is no longer used.
* Change try-catch statement syntax to align code style.
* Rename findRelease() to findLatestRelease() for clarity.
repo-ranger bot added a commit that referenced this pull request Mar 5, 2022
* Update Packagist service to use v2 api

Packagist deprecated the original `packagist.org/p/username/package` endpoint in favor of v2 `packagist.org/p2/username/package` endpoint. Because of this, new packages aren't being found using v1.

This PR updates the Packagist service to use the new endpoint.

* Adjust validation schema

Some packages don't return the same data structure as others with the new api endpoints. This changes the validation schema to account for the potential differences.

* Fix typo

* Throw NotFound when license not found

* Expand response and find the correct release

* Address issues raised by reviewer

Address issues raised by @chris48s in #6508, which this PR is base on.
Includes:

* Remove getDefaultBranch() from base class for it is no longer used.
* Change try-catch statement syntax to align code style.
* Rename findRelease() to findLatestRelease() for clarity.

* remove unusued param

* throw if no version found

* require version key

* use a single consistent method for identifying the latest (tagged) release

* don't throw in render()

* rename method (this is not really 'transform' in our usual parlance)

* Improve BasePackagistService testing

* Change BasePackagistService.decompressResponse to static
  method BasePackagistService.expandPackageVersions.
* Fix expandPackageVersions implementation.
* Add unit test for the function.

* Extend BasePackagistService.findLatestRelease

* extend BasePackagistService.findLatestRelease to also handle
  PackagistVersion.
* remove PackagistVersion.transform.

* Improve BasePackagistService.findLatestRelease

* Update findLatestRelease to throw NotFound itself.
* Update PackagistLicense and PackagistVersion and remove the
  NotFound throwing logics.

* Revert unneeded change

* Corrected packagist php version spec

* The test was on a false assumption that '__unset' might appear
  inside an array leaf while the composer's MetadataMinifier::minify
  never does thing recursively. The '__unset' should be value of the
  top level key.

Co-authored-by: Geoff Selby <geoff@geoffcodesthings.com>
Co-authored-by: Koala Yeung <koalay@gmail.com>
Co-authored-by: repo-ranger[bot] <39074581+repo-ranger[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants