Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication for Libraries.io-based badges, run [Libraries Bower] #7080

Merged
merged 8 commits into from
Oct 29, 2021

Conversation

calebcartwright
Copy link
Member

Resolves #6179 - would advise reviewing by commit

Our Lirbaries.io-based badges have been problematic for a long time. In the past it was advantageous for us to not authenticate requests as we were somehow actually getting a higher rate limit with the anonymous requests. However, that changed some time back which is why the service tests have been failing for months and the badges have been flaky to say the least.

According to the response headers on anonymous requests, the rate limit is 30/minute (compared to 60/minute with authentication). However, each anonymous request seems to use at least 5 points, so in practice it ends up being more like only a handful of requests per minute.

This PR adds an API provider type pattern similar to, but much simpler than, what we do with our GitHub requests, wherein a single token can be specified or multiple tokens can be used to fill a rotating pool. We have had past discussions about just doing the simple/standard auth route, however, we did have concerns that a single token would be insufficient during heavier traffic periods with our existing load patterns for these badges. Additionally, I could envision some growth/traffic increase for these badges once we get them stabilized (e.g. some folks have left Shields over the issue with these badges, and could return/future users could stick once we're able to offer these consistently).

As discussed on today's ops meeting, this implementation allows multiple tokens to be provided via the standard config instead of having them persisted in/pulled from Redis.

I've already wired up the CI environment with a token (using a bot account I own), and myself and a couple other maintainers will donate our Libraries.io tokens in our prod environment.

@calebcartwright calebcartwright added service-badge Accepted and actionable changes, features, and bugs core Server, BaseService, GitHub auth keep-service-tests-green Related to fixing failing tests of the services labels Sep 29, 2021
@shields-cd shields-cd temporarily deployed to shields-staging-pr-7080 September 29, 2021 01:27 Inactive
Comment on lines +83 to +86
get decrementedUsesRemaining() {
return this._usesRemaining - 1
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale/necessity for this is explained in the inline comments in the getRateLimitFromHeaders function in the new librariesio-api-provider class

Comment on lines +8 to +14
const withPooling = tokens.length > 1
Object.assign(this, {
baseUrl,
withPooling,
globalToken: tokens[0],
defaultRateLimit,
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opted to stick with simplicity in the event of a single token, to avoid dealing with all the pooling when there isn't really a pool

Comment on lines +16 to +20
if (this.withPooling) {
this.standardTokens = new TokenPool({ batchSize: 45 })
tokens.forEach(t => this.standardTokens.add(t, {}, defaultRateLimit))
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arbitrary number selected for the batch size. I don't think it matters all that much what the actual value is, so don't feel too strongly about it.

Comment on lines +22 to +55
getRateLimitFromHeaders({ headers, token }) {
// The Libraries.io API does not consistently provide the rate limiting headers.
// In some cases (e.g. package/version not founds) it won't include any of these headers,
// and the `retry-after` header is only provided _after_ the rate limit has been exceeded
// and requests are throttled.
//
// https://github.com/librariesio/libraries.io/issues/2860

// The standard rate limit is 60/requests/minute, so fallback to that default
// if the header isn't present.
// https://libraries.io/api#rate-limit
const rateLimit = headers['x-ratelimit-limit'] || this.defaultRateLimit

// If the remaining header is missing, then we're in the 404 response phase, and simply
// subtract one from the `usesRemaining` count on the token, since the 404 responses do count
// against the rate limits.
const totalUsesRemaining =
headers['x-ratelimit-remaining'] || token.decrementedUsesRemaining

// The `retry-after` header is only present post-rate limit excess, and contains the value in
// seconds the client needs to wait before the limits are reset.
// Our token pools internally use UTC-based milliseconds, so we perform the conversion
// if the header is present to ensure the token pool has the correct value.
// If the header is absent, we just use the current timestamp to
// advance the value to _something_
const retryAfter = headers['retry-after']
const nextReset = Date.now() + (retryAfter ? retryAfter * 1000 : 0)

return {
rateLimit,
totalUsesRemaining,
nextReset,
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the inline comments provide sufficient explanation for what's happening here. I have opened an upstream issue in the Libraries.io repo (link included in the comments) to see if we can get some consistent response header behavior, but otherwise felt like we had to deal with the reality of the present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unlikely we'll see much movement on that so agree we just have to deal with the messyness 🙁

token.update(totalUsesRemaining, nextReset)
}

async requestAsPromise(request, url, options = {}) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the same name as the corresponding function in the GitHub API Provider, but this is using the same fetcher (got based) as the standard request routes, and is async/await based. Open to bikeshedding on the name, I left it as-is for lazy copy/paste reasons, don't have any real opinions on the name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming things is hard, but I think we can do better here.

The github requestAsPromise function is taking an instance of request (as in literally the library request)

sendAndCacheRequestWithCallbacks: request,

which is callback-based and wraps it as a promise

requestAsPromise(request, url, options) {
return new Promise((resolve, reject) => {
this.request(request, url, options, (err, res, buffer) => {
if (err) {
reject(err)
} else {
resolve({ res, buffer })
}
})
})
}
}

That is a name and concept we need to remove in an upcoming PR to advance #4655 but given this function is not taking an instance of request and not wrapping it as a promise, lets give it a more descriptive name. Now that we've made up our minds on which HTTP client we're going to use, maybe that makes things easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any need for function names to reflect the names of libs consumed (directly or indirectly), so thoughts on any of these and/or any suggestions?

  • sendRequest
  • makeRequest
  • request
  • send
  • call
  • get
  • getLibrariesIOData
  • fetch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fetch(requestFetcher, url, options) probably makes the most sense to me if we can use fetch without clashing with base service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed in e331c2e

@shields-ci
Copy link

shields-ci commented Sep 29, 2021

Warnings
⚠️ This PR modified service code for librariesio but not its test code.
That's okay so long as it's refactoring existing code.
⚠️ This PR modified the server but none of its tests.
That's okay so long as it's refactoring existing code.
Messages
📖

Thanks for contributing to our documentation. We ❤️ our documentarians!

📖 ✨ Thanks for your contribution to Shields, @calebcartwright!

Generated by 🚫 dangerJS against 74b40ca

Comment on lines 96 to 104
const response = await request(`${baseUrl}${url}`, mergedOptions)
if (this.withPooling) {
if (response.res.statusCode === 401) {
this.invalidateToken(token)
} else if (response.res.statusCode < 500) {
this.updateToken({ token, url, res: response.res })
}
}
return response
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of how we've set up the actual http request issuing code with got is that this will only reject/throw in the event of major issue (no connectivity) and not on standard http response errors. In such a major issue type of scenario, I don't think it would be safe/valid to attempt to update the token pooling info, so I've not wrapped this in any kind of try catch. I think the call making infrastructure already handles anything that would be caught here, but let me know if a catch is needed with something like an explicit error conversion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the fetch function we're calling here (unhelpfully the param is called request 🙃 - see other comment about naming) is fundamentally calling this internally:

async function sendRequest(gotWrapper, url, options) {
const gotOptions = requestOptions2GotOptions(options)
gotOptions.throwHttpErrors = false
gotOptions.retry = 0
gotOptions.headers = gotOptions.headers || {}
gotOptions.headers['User-Agent'] = userAgent
try {
const resp = await gotWrapper(url, gotOptions)
return { res: resp, buffer: resp.body }
} catch (err) {
if (err instanceof got.CancelError) {
throw new InvalidResponse({
underlyingError: new Error('Maximum response size exceeded'),
})
}
throw new Inaccessible({ underlyingError: err })
}
}
which sets throwHttpErrors = false (so non-2XX status codes can be handled at the service layer) and wraps any error got throws as a InvalidResponse or Inaccessible (which can then be caught cleanly) so I think we can just call it here - same as in any other service class really.

@@ -86,6 +86,7 @@ private:
jenkins_pass: 'JENKINS_PASS'
jira_user: 'JIRA_USER'
jira_pass: 'JIRA_PASS'
librariesio_tokens: 'LIBRARIESIO_TOKENS'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opted to just stick an s on the end to, along with the docs, convey this supports accepting multiple tokens. I'm not sure we have any prior art with this type of value beyond the various *origins vars though, so let me know if folks have any preferred alternatives

Comment on lines 247 to 250
this.librariesioConstellation = new LibrariesIoConstellation({
service: publicConfig.services.librariesio,
private: privateConfig,
})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constellation class is a borderline no-op. It originally looked a bit more like the corresponding GitHub class, but after the decision earlier today to skip persistence it was dramatically simplified. I've opted to keep the class as somewhat of an abstraction/keeping the libraries-specific instantiations inside there, but let me know if folks think it'd be better to just deal with the api provider instantiation inline here in the server.

Comment on lines 1 to 39
import Joi from 'joi'
import { anyInteger, nonNegativeInteger } from '../validators.js'
import { BaseJsonService } from '../index.js'

// API doc: https://libraries.io/api#project
const projectSchema = Joi.object({
platform: Joi.string().required(),
dependents_count: nonNegativeInteger,
dependent_repos_count: nonNegativeInteger,
rank: anyInteger,
}).required()

function createRequestFetcher(context, config) {
const { sendAndCacheRequest, librariesIoApiProvider } = context

return async (url, options) =>
await librariesIoApiProvider.requestAsPromise(
sendAndCacheRequest,
url,
options
)
}

export default class LibrariesIoBase extends BaseJsonService {
constructor(context, config) {
super(context, config)
this._requestFetcher = createRequestFetcher(context, config)
}

async fetchProject({ platform, scope, packageName }) {
return this._requestJson({
schema: projectSchema,
url: `/${encodeURIComponent(platform)}/${
scope ? encodeURIComponent(`${scope}/`) : ''
}${encodeURIComponent(packageName)}`,
errorMessages: { 404: 'package not found' },
})
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally tend to prefer the reusable helper/fetcher type functions over extending the inheritance hierarchy, but it felt cleaner/more straightforward to me to use a base class for setting up the request fetcher on construction in a base class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 main entities in libraries.io - projects and repos:

return quite differently structured responses (although that isn't totally obvious from the routes - its a wierd API).
Most of our badges query projects but we do have one or two query the repo endpoints.
This class is called LibrariesIoBase but it is really best thought of as LibrariesIoProjectBase. If we made LibrariesIoBase just

class LibrariesIoBase extends BaseJsonService {
  constructor(context, config) {
    super(context, config)
    this._requestFetcher = createRequestFetcher(context, config)
  }
}

and kept projectSchema/fetchProject seperate would you see any downside to that? It seems more conceptually right (although at this point the chance of adding new libraries.io badges seems somewhat remote).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seperate would you see any downside to that?

My thinking was more about encapsulation and usage patterns. For service classes that need to make requests to the Libraries.io API, they must use the custom request fetcher, and therefore must extend this base class. If we have the common, borderline ubiquitous, function(s) and arguments (schema(s)) separated in a separate module, then that requires each individual service class to be sure to import and use from both modules, but it's still possible for a developer/class to speciously just import the function/schema without extending the base class.

I feel like the standalone modules with importable functions/objects/etc. are particularly useful/preferable when we want them to be usable regardless of the inheritance hierarchy, which isn't applicable in the Librarioes.io based cases.

Yes, it's true that adding the schema and function to the base class means the base class will contain things that aren't strictly used by every extending child class. However, that's pretty common throughout the rest of our own service class hierarchy and in my experience OOP in general.

although at this point the chance of adding new libraries.io badges seems somewhat remote

I actually think this PR increases the odds of us at least having the ability to use the service for more, now that we have a more well defined and expansive request footprint with authenticated requests using a pool. I don't know whether that hypothetical would necessarily manifest as new badges with net-new data points, or as an alternative source for existing types of badges

Comment on lines -9 to -17
t.create('license not declared')
.get('/bootstrap.json')
.intercept(nock =>
nock('https://libraries.io')
.get('/api/bower/bootstrap')
.reply(200, { normalized_licenses: [] })
)
.expectBadge({ label: 'license', message: 'missing' })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our Bower badges had a few such mocked requests. With the hard requirement that we have the API Keys enabled in CI (we always exceed the rate limit with our ~20+ tests without auth), trying to mock these becomes increasingly difficult.

We've used some nock tricks elsewhere (e.g. skipWhen) that I considered, but I actually found these mocked tests were either duplicative/unnecessary tests of a render helper (as was the case here) and/or the test scenario would be better covered with unit tests.

const data = await this.fetch({ packageName })
const includePrereleases = queryParams.include_prereleases !== undefined
const version = this.constructor.transform(data, includePrereleases)

return renderVersionBadge({ version })
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformation logic separated out here to make it easier to use standard unit tests, because we were using mocked service tests to validate this transformation logic before.

Comment on lines -37 to -54
t.create('Version label should be `no releases` if no stable version')
.get('/v/bootstrap.json')
.intercept(nock =>
nock('https://libraries.io')
.get('/api/bower/bootstrap')
.reply(200, { normalized_licenses: [], latest_stable_release: null })
)
.expectBadge({ label: 'bower', message: 'no releases' })

t.create('Version label should be `no releases` if no pre-release')
.get('/v/bootstrap.json?include_prereleases')
.intercept(nock =>
nock('https://libraries.io')
.get('/api/bower/bootstrap')
.reply(200, { normalized_licenses: [], latest_release_number: null })
)
.expectBadge({ label: 'bower', message: 'no releases' })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered with new unit tests, mocked tests removed for the aforementioned reasons

const scope = nock('https://libraries.io/api')
// This ensures that the expected credentials are actually being sent with the HTTP request.
// Without this the request wouldn't match and the test would fail.
.get(`/bower/bootstrap?api_key=${fakeApiKey}`)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For whatever reason, I was unable to get nock to intercept this request when I using the options arg (e.g. get('....', { qs: ... }). As is often the case for me with nock it was probably a pebkac issue, but I didn't feel like trying to battle it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in mysterious ways 🧙

@calebcartwright calebcartwright temporarily deployed to shields-staging-pr-7080 September 29, 2021 01:51 Inactive
@shields-cd shields-cd temporarily deployed to shields-staging-pr-7080 October 16, 2021 18:23 Inactive
token.update(totalUsesRemaining, nextReset)
}

async requestAsPromise(request, url, options = {}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming things is hard, but I think we can do better here.

The github requestAsPromise function is taking an instance of request (as in literally the library request)

sendAndCacheRequestWithCallbacks: request,

which is callback-based and wraps it as a promise

requestAsPromise(request, url, options) {
return new Promise((resolve, reject) => {
this.request(request, url, options, (err, res, buffer) => {
if (err) {
reject(err)
} else {
resolve({ res, buffer })
}
})
})
}
}

That is a name and concept we need to remove in an upcoming PR to advance #4655 but given this function is not taking an instance of request and not wrapping it as a promise, lets give it a more descriptive name. Now that we've made up our minds on which HTTP client we're going to use, maybe that makes things easier.

@@ -242,6 +244,11 @@ class Server {
private: privateConfig,
})

this.librariesioConstellation = new LibrariesIoConstellation({
service: publicConfig.services.librariesio,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, there is no services.librariesio key in publicConfigSchema

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. this is a leftover from the original route i'd taken of using a backing persistence for the tokens, with some of the associated (non-secret) config items going under this key

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in e42648f

const scope = nock('https://libraries.io/api')
// This ensures that the expected credentials are actually being sent with the HTTP request.
// Without this the request wouldn't match and the test would fail.
.get(`/bower/bootstrap?api_key=${fakeApiKey}`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works in mysterious ways 🧙

} catch (e) {
console.error('Unable to select next Libraries.io token from pool')
log.error(e)
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get here, this function is going to return undefined instead of a response object and throw a UnhandledPromiseRejectionWarning. I think we need to wrap this in an exception we can catch. log.error() should handle logging the error to sentry for us.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking we need to define a new error type of ShieldsRuntimeError and throw it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't necessarily have to be a brand new exception, but ultimately we should throw it as a ShieldsRuntimeError of some description. ImproperlyConfigured might be a good fit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 8c912ba

Comment on lines 1 to 39
import Joi from 'joi'
import { anyInteger, nonNegativeInteger } from '../validators.js'
import { BaseJsonService } from '../index.js'

// API doc: https://libraries.io/api#project
const projectSchema = Joi.object({
platform: Joi.string().required(),
dependents_count: nonNegativeInteger,
dependent_repos_count: nonNegativeInteger,
rank: anyInteger,
}).required()

function createRequestFetcher(context, config) {
const { sendAndCacheRequest, librariesIoApiProvider } = context

return async (url, options) =>
await librariesIoApiProvider.requestAsPromise(
sendAndCacheRequest,
url,
options
)
}

export default class LibrariesIoBase extends BaseJsonService {
constructor(context, config) {
super(context, config)
this._requestFetcher = createRequestFetcher(context, config)
}

async fetchProject({ platform, scope, packageName }) {
return this._requestJson({
schema: projectSchema,
url: `/${encodeURIComponent(platform)}/${
scope ? encodeURIComponent(`${scope}/`) : ''
}${encodeURIComponent(packageName)}`,
errorMessages: { 404: 'package not found' },
})
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 main entities in libraries.io - projects and repos:

return quite differently structured responses (although that isn't totally obvious from the routes - its a wierd API).
Most of our badges query projects but we do have one or two query the repo endpoints.
This class is called LibrariesIoBase but it is really best thought of as LibrariesIoProjectBase. If we made LibrariesIoBase just

class LibrariesIoBase extends BaseJsonService {
  constructor(context, config) {
    super(context, config)
    this._requestFetcher = createRequestFetcher(context, config)
  }
}

and kept projectSchema/fetchProject seperate would you see any downside to that? It seems more conceptually right (although at this point the chance of adding new libraries.io badges seems somewhat remote).

Comment on lines 96 to 104
const response = await request(`${baseUrl}${url}`, mergedOptions)
if (this.withPooling) {
if (response.res.statusCode === 401) {
this.invalidateToken(token)
} else if (response.res.statusCode < 500) {
this.updateToken({ token, url, res: response.res })
}
}
return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the fetch function we're calling here (unhelpfully the param is called request 🙃 - see other comment about naming) is fundamentally calling this internally:

async function sendRequest(gotWrapper, url, options) {
const gotOptions = requestOptions2GotOptions(options)
gotOptions.throwHttpErrors = false
gotOptions.retry = 0
gotOptions.headers = gotOptions.headers || {}
gotOptions.headers['User-Agent'] = userAgent
try {
const resp = await gotWrapper(url, gotOptions)
return { res: resp, buffer: resp.body }
} catch (err) {
if (err instanceof got.CancelError) {
throw new InvalidResponse({
underlyingError: new Error('Maximum response size exceeded'),
})
}
throw new Inaccessible({ underlyingError: err })
}
}
which sets throwHttpErrors = false (so non-2XX status codes can be handled at the service layer) and wraps any error got throws as a InvalidResponse or Inaccessible (which can then be caught cleanly) so I think we can just call it here - same as in any other service class really.

@shields-cd shields-cd temporarily deployed to shields-staging-pr-7080 October 23, 2021 15:19 Inactive
@calebcartwright calebcartwright temporarily deployed to shields-staging-pr-7080 October 28, 2021 01:20 Inactive
Copy link
Member

@chris48s chris48s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reckon this is good to go 👍 Lets co-ordinate tokens on discord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Server, BaseService, GitHub auth keep-service-tests-green Related to fixing failing tests of the services service-badge Accepted and actionable changes, features, and bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Libraries.io dependencies badge is flaky
4 participants