Service testing strategy #927

paulmelnikow · 2017-03-31T20:43:43Z

I'd like to suggest a goal for the project: automated tests of the vendor badges.

As Thaddée mentioned in #411:

Testing is of course imperfect (as in, manual). The CLI and backend engine have tests, but the server does not. What would be helpful however isn't tests here; it would be automated error reporting. We know that a list of badges should work. Having a program that goes through them and reports what the vendor server sent back if the badge failed would be great. However, it should absolutely not complexify the existing logic of the server.

I agree that tests like these would be valuable for end users. They would detect API problems when they occur, rather than hoping/waiting someone will eventually show up and report them.

They'd be relatively simple to write. They would help badge developers , and push more of the testing responsibility from the reviewer to the contributor.

In PR reviews, I spend a significant amount of time manually testing various combinations of inputs. After requesting changes I really should start all over again. Tests would make this faster and much more reliable.

I'm skilled enough to spot many common problems in this type of code, however unlike Thaddée I'm not an expert in this codebase. I don't have the eye to recognize all the problematic patterns that he would. I can't tell the code is obviously right, or obviously wrong code, just by looking at it.

Given how easy it is to write vendor badges, and what's sure to be endless growth of available repositories and tools, PR review will remain a bottleneck. With automatic tests, testing responsibility can be scaled out to a wide net of contributors. Writing tests for existing vendors is another easy way for contributors to put love into the project, even those who are content with the existing feature set.

Some downsides:

The tests would be slow, due to slow external servers. We'd want to easily run a subset.
The tests would be somewhat flaky due to flaky external servers. When running the tests, we'd want to include some amount of retrying.
Since assertions depend on external servers and external data, they would need to be somewhat flexible. A valid regex for semver, e.g., or for a build which may be passing, failing, or errored. We risk making the assertions so complex that they allow false negatives: assertions which pass when they should fail.
They would cover something pretty fundamental that can’t be tested any other way: changes to a vendor's API.

There are a few important attributes of tests like these:

Easy to read and write
Simple assertions – no false negatives
Retriable – so it’s possible to run automatically
Non-invasive – they should not add complexity to the server

We could call these integration tests.

There’s another, more ambitious kind of test I'd also like to see: tests which record vendor responses and play them back later. They would inject mock vendor responses to handle cases like malformed requests, which can’t be tested any other way. Fundamentally, they could test that the server code is doing exactly what it is supposed to, regardless of whether a package still exists or a vendor server is temporarily down.

Important attributes:

Easy to debug – simple flow control, minimal magic, good error messages
Blazingly fast – whole suite runs in a few seconds at most

We could run the whole suite of tests on every PR. That ensures non-breakage if other features due to a change, speeds up reviews, and lets us merge with confidence.

Running the whole suite is especially useful for refactoring: developing other, potentially significant changes to the implementation which should not affect most of the server’s behavior.

The tool I've been experimenting with is Nock Back. Working tests are reliable and it's pretty good overall. However it doesn't have the most active maintenance, I've run into some bugs (nock/nock#870), and I've found failing tests to be tricky to debug (nock/nock#781, nock/nock#869).

We could call these recorded tests.

Checking code coverage is important not just for improving the code quality, but also the quality of the service. In my experience writing just a few of these, I found behaviors which should be improved, code paths that do not make any callback, and other errors in error-handling code.

However, recording responses is a really big bite to take, especially with some uncertainty about the current state of the tooling. So I'd suggest we leave recording for another day.

For now, the goals should be the following:

Badge developers can easily test their work
They can commit those tests so reviewers can see them
Reviewers can run a subset of the tests (or CI can infer the affected services and do this automatically)
Developers can TDD bug fixes
Full suite runs periodically and/or on pushes to master, and we find out when badges have broken

This would include:

Tests using real vendor servers (true integration tests)
Tests of error handling using mock responses (functional tests)
Optional grouping by vendor
Retrying transient failures
Maybe down the line…
a. Automatically inferring which vendors to run
b. Reporting code coverage

Thoughts? Concerns?

And importantly, would anyone else like to join in the fun? 😀

paulmelnikow · 2017-03-31T20:57:29Z

By the way, I wanted to share my experimenting:

https://github.com/paulmelnikow/shields-tests

This includes:

paulmelnikow · 2017-04-08T00:26:31Z

I worked up a third POC:

Sketches
Shopping notes
Implementation using
- IcedFrisby
- Joi (which is bundled with IcedFrisby)
- Nock
- A plugin that adds an .intercept() method, for concise nock support within IcedFrisby

The vendor spec looks like this:

'use strict';

const Joi = require('joi');
const ServiceTester = require('../service-tester');

const t = new ServiceTester('CRAN', '/cran');
module.exports = t;

t.create('version')
  .get('/v/devtools.json')
  .expectJSONTypes(Joi.object().keys({
    name: Joi.equal('cran'),
    value: Joi.string().regex(/^v\d+\.\d+\.\d+$/)
  }));

t.create('specified license')
  .get('/l/devtools.json')
  .expectJSON({ name: 'license', value: 'GPL (>= 2)' });

t.create('unknown package')
  .get('/l/some-bogus-package.json')
  .expectJSON({ name: 'cran', value: 'not found' });

t.create('unknown info')
  .get('/z/devtools.json')
  .expectStatus(404)
  .expectJSON({ name: 'badge', value: 'not found' });

t.create('malformed response')
  .get('/v/foobar.json')
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .reply(200))
  .expectJSON({ name: 'cran', value: 'invalid' });

t.create('connection error')
  .get('/v/foobar.json')
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .replyWithError({ code: 'ECONNRESET' }))
  .expectJSON({ name: 'cran', value: 'inaccessible' });

t.create('unspecified license')
  .get('/l/foobar.json')
  // JSON without License.
  .intercept(nock => nock('http://crandb.r-pkg.org')
    .get('/foobar')
    .reply(200, {}))
  .expectJSON({ name: 'license', value: 'unknown' });

This provides close to 100% code coverage for the CRAN/METACRAN badges. Overall, I'm happy with it. I find the Joi syntax too verbose. This is the syntax from my sketch, which I prefer:

t.create('version')
  .get('/v/devtools.json')
  .expectJSON({ name: 'cran', value: v => v.should.match(/^v\d+\.\d+\.\d+$/) });

The implementation of that was more complicated than I expected so I'm going to defer it. This is good enough to start writing tests. We can always improve it later.

espadrine · 2017-04-12T22:27:19Z

Somewhat tangentially, I always thought it would be welcome to have a (non-test) script that would simply record a known-good output of all .json badges into a JSON file, along with all network outputs from using try.html. Then we could also have a (test) script that would be able to check that all the .json badges remain the same, ensuring that PRs don't cause regressions.

paulmelnikow · 2017-04-13T15:39:16Z

That's a nice idea. It would help prevent regression in the code, and could run fast, on every commit. It's a nice complement to tests which hit the vendors' servers, designed to catch issues like #939 which are with the services.

A problem we'd need to solve would be how to maintain the battery of recorded outputs when a dev wants to make additions or changes. When a dev adds new functionality, they need to record a few new requests, but keep the rest intact. When changing code to accommodate API changes upstream, they need to replace a few requests, but keep the rest intact.

paulmelnikow · 2017-11-29T23:33:39Z

Re: #1286 (comment) ^^ @chris48s

I do think it's important that we hit the real services during PRs, and also nightly, so we can detect breaking changes in the upstream APIs. The current setup is not perfect. I don't like that the build breaks on master when there are problems upstream. And we still don't have a good solution for our GitHub service tests. (#979)

chris48s · 2017-11-30T21:21:04Z

Unsurprisingly, it seems like you've already thought about this quite thoroughly :)

Having read this thread, I see that whereas you would usually want to mock out interactions with an external service the nature of this project means you do want some tests in your suite that will start failing if an external service goes away or makes a backwards-incompatible API change.

I guess the situation I am thinking of is not so much api breaks (which are important to detect), but if you think about this test for example:

https://github.com/badges/shields/blob/master/service-tests/npm.js#L34-L36

This test could start failing because Express changes its licence, rather than because there is a regression in the code under test or NPM has changed its API. Obviously in writing tests we can try to make good choices about the packages/repos we test against, but it is always an issue..

I think given where the project/test suite is now, prioritising the integration tests is most important but adding what you've described as 'recorded tests' has value too. Do you have any services where you also have 'recorded tests' in place?

paulmelnikow · 2018-03-04T17:03:25Z

We don't have any recorded tests right now.

Think this issue is a good record, though not sure there's anything actionable right now. Let's open a new issue when there is!

This was referenced Apr 1, 2017

Improve CTAN error handling and fix non-svg 404 badges #922

Merged

[badge/dynamic/json] User defined JSON source badge #820

Merged

baseUri function to override global setup IcedFrisby/IcedFrisby#26

Merged

Plugin support IcedFrisby/IcedFrisby#27

Closed

paulmelnikow mentioned this issue Apr 8, 2017

Service tests #937

Merged

10 tasks

paulmelnikow added developer-experience Dev tooling, test framework, and CI operations Hosting, monitoring, and reliability for the production badge servers labels Apr 17, 2017

This was referenced Apr 17, 2017

Shields data format? #776

Closed

Add badge for NuGet download count, again #945

Merged

paulmelnikow changed the title ~~Automated tests of the vendor badges~~ Service testing strategy Dec 6, 2017

This was referenced Dec 6, 2017

[MicroBadger] Add MicroBadger #1340

Merged

Keep service tests green 💚 #1359

Closed

chris48s mentioned this issue Feb 17, 2018

[hhvm] #1499 - update hhvm service #1508

Merged

paulmelnikow closed this as completed Mar 4, 2018

paulmelnikow mentioned this issue Oct 27, 2018

Run service tests on a given (remote) instance; test on [jetbrains] #2195

Merged

paulmelnikow mentioned this issue Dec 4, 2018

[AzureDevOpsTests] Refactor unit tests to improve stability #2454

Merged

paulmelnikow mentioned this issue Dec 12, 2018

Hackathon coordination nock/nock#1269

Closed

paulmelnikow mentioned this issue Dec 31, 2018

Fix [suggest] #2604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Service testing strategy #927

Service testing strategy #927

paulmelnikow commented Mar 31, 2017

paulmelnikow commented Mar 31, 2017

paulmelnikow commented Apr 8, 2017

espadrine commented Apr 12, 2017

paulmelnikow commented Apr 13, 2017

paulmelnikow commented Nov 29, 2017

chris48s commented Nov 30, 2017

paulmelnikow commented Mar 4, 2018

Service testing strategy #927

Service testing strategy #927

Comments

paulmelnikow commented Mar 31, 2017

paulmelnikow commented Mar 31, 2017

paulmelnikow commented Apr 8, 2017

espadrine commented Apr 12, 2017

paulmelnikow commented Apr 13, 2017

paulmelnikow commented Nov 29, 2017

chris48s commented Nov 30, 2017

paulmelnikow commented Mar 4, 2018