Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authoring guidelines: submission tests #19

Closed
Blind4Basics opened this issue Jul 8, 2020 · 21 comments · Fixed by #151
Closed

Authoring guidelines: submission tests #19

Blind4Basics opened this issue Jul 8, 2020 · 21 comments · Fixed by #151
Labels
documentation Improvements or additions to documentation

Comments

@Blind4Basics
Copy link
Contributor

Blind4Basics commented Jul 8, 2020

Here we can discuss more in detail what should be added there. I'll store some ideas at the same time.

First, a lot of things when it'll come to explain how to write "good tests" will be pretty generic. So I believe we might actually writte that generic version first, listing there some of the usual traps to avoid.

Then a more precise guide, python specific.

Generic guide:

Here are some ideas to use as a stump:

structure of the kata/tests suites

  • always fixed and random tests (explaining why: not usual TDD practice, needed to enforce the implementation/avoid cheats).
  • split the test suite using meaningful names for the different blocks
  • write the assertions inside the dedicated structures (groups/blocks/methods) rather than at the base level (when it exists in the language)
  • one day you'll maybe stop coming on CW, so other people will have to maintain your kata. So think about them and try to write your best code for your tests. DRY, no functions with tens of line, ...
  • explain how the test framework is working (generally speaking), printing the results/data needed for the feeback right to the stdout, meaning that the user can/will print in the middle of all of this. Hence the need to:
    • never do several calls to the user's function then do all the tests later. That makes the debugging a pain because the user cannot track hat assertion is related to waht call to his function
    • when possible and adapted to the test framework you use to have one single call to the user's function per test method.

tests feedback

  • be very carful about the assertion messages: they have to be precise and useful. "False should be True" almost always trips up newbies. General guidline: it's good to show the input, either in the assertion message, or in the name of he test (if the test framework allows to name the tests). Like: "28 is not prime. Answer should be False, but solution returned True." would be much better (hobo ; error256).

fixed/sample tests specificities:

  • having only 2 fixed tests before the random tests is generally a very bad idea
  • do not hide edge cases in the test cases part, provide them int the sample tests too (hobo).
  • You can use all the fixed tests of the test cases part as sample tests. But if it's very long, it's not always necessary. But when you remove some of the fixed tests, try to keep at least one meaningful test for every kind of situation (hobo)

random tests specificities:

  • for the random tests: prefer, when possible, to avoid to have a reference solution => design the tests so that either you know the answer right at the beginning when the input is generated if possible (hobo)
  • performances oriented katas: when the random tests use inputs/outputs that are very hard to debug (strings of hundreds of characters or more, huge arrays, multidimensionnal arrays, ... anything that akes the assertion message unreadable!), split the random tests in 2 sections or more, with a first batch of small random inputs. (error256)
  • the "mutation of the input" problem takes all its meaning in the random tests (give examples):
    • why it's bad in general
    • why it's bad on the user side (crazy feedback)
    • why it's bad for the tests (cheatable)

miscellaneous

  • don't rely on the random tests to enforce the requirements: all possible edge cases should be exhaustively tested in the fixed tests. As far as possible. When not possible, they should be added later if a user encounter one of them et raise an issue about it. (error256)
  • This do not apply to all framework, but keeping it as a habit/default behavior might avoid a lot of troubles to anyone:
    • do never compute directly the expected result with a formula in the same statement than the one calling the user's function (raising an exception reveals the solution) -> is that python specific? (I guess not...)
  • all rquirements that are announced have to be tested (hobo). Like, if the user isn't supposed to use BigInt, the author has to enforce it, etc. Adding a note: "Keep in mind that when you end up with this kind of idea, you're generally about to open the pandora's box. Especially in languages that are too much flexible, like JS, pyhton or ruby. So you actually should try to avoid restrictions that may be very hard to ensure. Number of lines or length of the code, on the other hand, is not problematic (code golfing)".

general infos about the guide itself:

  • "this guide is still generic, so it might not cover all cases or even not be appropriate for some languages whose framework works in a different way (QuickCheck for Haskell, for example). Please refer to the documentation of your language and/or ask for help to other experienced users." (error256)

Additional sources



python specific guide:

  • always put the assertion methods inside a it block rather than at the top level or in a describe block, especially for the random tests. This way, when the tests are passed, the user doesn't end up with a wall of Test Passed! in the output panel (in addition, this allowes to see the timings for the block(s) without the need to scroll all the way down.
  • insist one the mutation of the input trouble, because of the order of actual and expected in the assertion methods => always compute the expected value first is the best idea to get

per language:

(or not!? B4B)
add snippet explaining how to access the solution of the user (length/chars constraint)


Feel free to dump some ideas.

@Blind4Basics Blind4Basics added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Jul 8, 2020
@Blind4Basics Blind4Basics changed the title Writting a generic guide about how to writte tests? (before the one for python) Writting a generic guide about how to write tests? (before the one for python) Jul 8, 2020
@Blind4Basics Blind4Basics added feedback needed and removed help wanted Extra attention is needed labels Jul 8, 2020
@Blind4Basics
Copy link
Contributor Author

first corollary question: shoud the generic guide actually be autonom or a part of the guide about creating a kata?

@hobovsky
Copy link
Contributor

hobovsky commented Jul 9, 2020

first corollary question: shoud the generic guide actually be autonom or a part of the guide about creating a kata?

I think it should be a part of "making a kata". I am not sure if one is going to create tests (in a way described there) outside of this scope.

What I think would be good to have there:

  • some brief explanation why random tests are important in context of CW. Random tests are not common TDD (anti-)practice and I've seen people being surprised by the fact that such thing is used.
  • some guide to prefer known answers to reference solutions. Practice of running reference solutions in tests is widespread and in some cases totally unnecessary. Quite often it's possible to generate an input with the answer known upfront.
  • a guideline to make sample tests representative - I think that currently sample tests meet with real disregard and are used just as a kind of teaser at best. In my opinion, more attention should be brought to having sample tests which will be smaller in amount than full test suite, but will cover just as many cases (if feasible, that is). [... 10 minutes later...] Okay I know about TDD theme around kata, but 1) I am not sure if it still holds, and 2) people who are not familiar with unit tests do not know how it works
  • advice on assertion messages - these are often not clear for people not experienced with unit tests. "False should be True" trips up (almost) every newbie. I think something like "28 is not prime. Answer should be False, but solution returned True." would be much better.

do never compute directly the expected result with a formula in the same statement than the one calling the user's function (raising an exception reveals the solution) -> is that python specific? (I guess not...)

It's not exactly python specific, but for some languages this problem manifests differently or under slightly different circumstances. C, C++ and NASM use macros as assertions, and assertion message shows direct expressions being passed as arguments:

Value of expression __secret_referenceSolution_32587654(arg1, arg2) is not equal to value of expression Kata::solveProblem(arg1, arg2) - or something similar.

@error256
Copy link
Contributor

error256 commented Jul 9, 2020

  • It should be clear that the guide covers most common cases and there can be exceptions whenever it's reasonable.

It's not only for non-standard kata ideas, but also for simpler things like... For example, with some libraries like Haskell QuickCheck it should be OK not to have fixed tests if the only edge cases are like the minimum and maximum numbers of the input range. As long as an author can justify what they're doing, it should be OK.

when the random tests use inputs/outputs that are very hard to debug (strings of hundreds of characters or more, huge arrays, multidimensionnal arrays, ... anything that akes the assertion message unreadable!), split the random tests in 2 sections or more, with a first batch of small random inputs.

Splitting the tests may be a good thing when there are performance requirements, but otherwise with proper fixed tests one shouldn't need to look at the randomly generated input because at least one fixed test should fail before that.

do never compute directly the expected result with a formula in the same statement than the one calling the user's function (raising an exception reveals the solution) -> is that python specific? (I guess not...)

Not Python-specific, but there are quite a few languages where the line isn't shown too.

  • Custom assertion failure messages? Sometimes input can be a header of it, but it's not always possible. Ideally, it should be the tests that print input, not the user. Only the testing code can print the input only for failed test cases.

@Blind4Basics
Copy link
Contributor Author

Blind4Basics commented Jul 9, 2020

It's not exactly python specific, but for some languages this problem manifests differently or under slightly different circumstances. C, C++ and NASM use macros as assertions, and assertion message shows direct expressions being passed as arguments:

-> but is that a problem, to see those names? (I bet you can then call the ref solution?)

edit: 'added info and reformatted the original message

@hobovsky
Copy link
Contributor

hobovsky commented Jul 9, 2020

I would add a part about test cases checking some additional constraints, if present. For example, if kata says "do not use X", then this restriction should be either enforced by kata setup, or by assertion that it was not broken.

Edit: maybe a note about how difficult or pointless such restrictions are would be also helpful :)

@Blind4Basics
Copy link
Contributor Author

added.

I added the info about getting the solution of the user (must be done "per language", there). About that: might be better to write a single generic article, explaining the traps (especially, when/where to put the check), and then just provide all the snippets at the bottom of the page. This sounds far more reasonnable to me.

opinions?

@Blind4Basics
Copy link
Contributor Author

I added the info about getting the solution of the user (must be done "per language", there). About that: might be better to write a single generic article, explaining the traps (especially, when/where to put the check), and then just provide all the snippets at the bottom of the page. This sounds far more reasonnable to me.

opinions?

up!

@Blind4Basics Blind4Basics added kind/tutorial New Tutorial and removed kind/tutorial New Tutorial labels Jul 15, 2020
@hobovsky hobovsky changed the title Writting a generic guide about how to write tests? (before the one for python) Authoring guidelines: full test suite Oct 31, 2020
@hobovsky hobovsky linked a pull request Oct 31, 2020 that will close this issue
@hobovsky
Copy link
Contributor

hobovsky commented Nov 1, 2020

Do you guys know any way to make bullet points linkable? Can I insert anchor with Markdown?

It would make it possible to copy links directly to a specific guideline and paste them in discourses.

@hobovsky
Copy link
Contributor

hobovsky commented Nov 1, 2020

@kazk , guys,

What I am especially interested with is whether the guidelines, as I managed to describe them, are good enough to be treated as "official quality requirements" which reviewers could call on when reporting issues. Are they OK for that? Something is missing? Or they are totally not suitable for such purpose?

Thanks!

Here's the link to doc preview: https://deploy-preview-151--reverent-edison-2864ea.netlify.app/recipes/authoring/kata-snippets/full-tests/

@kazk
Copy link
Member

kazk commented Nov 3, 2020

Do you guys know any way to make bullet points linkable? Can I insert anchor with Markdown?

I'd like to use something like remark-directive which is trying to support CommonMark extension proposal, but I'm not sure if it's supported by the Remark version currently used.

You should be able to use HTML within Markdown, but it needs to be separated from the rest with blank lines if I remember correctly, so I'm not sure if it's possible to mix HTML element within line item.

@kazk
Copy link
Member

kazk commented Nov 3, 2020

You can now do the following to add ids.

- **foo bar**{id="foo-bar"}

See remark-attr for more information.

For open PRs, you'll need to rebase them or just add them later after merging.

@hobovsky
Copy link
Contributor

hobovsky commented Nov 4, 2020

@Blind4Basics could you tell me what you mean by this part:

When it comes to maintaining a kata with performance requirements, it can be especially useful to have a solution whose time complexity is supposed to be rejected at hand. Storing it, properly commented, in the test suite is then a very useful feature.

I removed it for now, because I think I have similar idea, but I will add it back if it's not what I think.

@Blind4Basics
Copy link
Contributor Author

@hobovsky: it's about having the slow solution available in the test suite so that the maintainer of the kata can easily switch from one solution to the other to chack that the slow one actually fails "from far enough". Like you did in the "closest points in linearithmic time" kata.

@hobovsky
Copy link
Contributor

hobovsky commented Nov 4, 2020

Aaah OK, so that's something else that I thought. Thanks for explanation!

I will put the note back then.

@ggorlen
Copy link
Collaborator

ggorlen commented Nov 6, 2020

Looks great! I have a few suggestions (mostly migrated from #146).

having only 2 fixed tests before the random tests is generally a very bad idea

It's explained later that random tests shouldn't be used as substitutes for fixed cases for all requirements, so emphasizing that random tests should be as simple as needed to detect cheaters and should not test specification might help motivate why two fixed tests is generally bad.

On a related note, I'd suggest that we urge avoiding redundant tests. More tests adds noise and make it that much more frustrating to fulfill the specification, particularly hundreds or thousands of unnamed random tests in a loop.

I recommend that each test case be pure and not rely on state from other tests. This is implied by the existing wording, but it's subtle and it might not be obvious that creating one instance of the candidate's solution class and reusing it across multiple test cases could be a problem. I've fallen into this in the past; on some challenge types, it takes extra thinking and work to destroy and rebuild state.

Concrete examples of violations of these guidelines might help illustrate and I'd be happy to provide some if this would be useful.

@hobovsky
Copy link
Contributor

hobovsky commented Nov 6, 2020

On a related note, I'd suggest that we urge avoiding redundant tests. More tests adds noise and make it that much more frustrating to fulfill the specification, particularly hundreds or thousands of unnamed random tests in a loop.

I am not sure what (how many) you consider redundant, but I see two potential issues here:

  • for some particularly difficult kata (or maybe rather for sufficiently twisted solutions ;) ), solution can hold to specs, but fail on some very specific input. For example, when there is a number which ends at nine, or something equally specific for the solution. It might be difficult to come up with types of inputs which given solution can get wrong. Large amount of random tests is considered to be helpful here.
    I think that usually such situation happens when some specific kind of input was not identified, and therefore is not explicitly tested. But I think that not always all kinds of inputs can be identified easily right from beginning.
  • That's why CW community came up with an unwritten guideline, that recommended amount of random tests for a non-performance kata should be around 100. Can be fewer, can be more, but generally a couple of tens. Do you think this approach should be discouraged and community should drive away from it?

@kazk
Copy link
Member

kazk commented Nov 7, 2020

I think the Codewars community can make use of random tests more than just preventing hard coded solutions. Random tests are helpful when authoring and reviewing to make sure the reference solution is working as expected and to come up with submission tests with good coverage. Every time you find an input that causes an unexpected failure, you move that to a named test so it's tested every time with a good feedback on failure. That's how random tests are usually used outside Codewars.

@ggorlen
Copy link
Collaborator

ggorlen commented Nov 7, 2020

My perspective is more Qualified-oriented than CW-oriented and I'm not really in a position to recommend CW community direction shifts one way or the other, but I'll try to elaborate. This point isn't critical because it's better err on the side of redundancy than gaps in coverage (CW community is good about enforcing coverage), but it is a UX issue for me and goes hand in hand with the other guidelines like clearly labeling test cases, testing the written spec precisely and keeping one function call to candidate code in each test case block. The goal is to avoid challengers slogging through walls of text from dozens or hundreds of assertions that may not need to be there to figure out their failing case.

I am not sure what you consider redundant ... solution can fail on some very specific input....

If a specific input can reasonably cause a solution to pass one test but not on another, then the tests are no longer redundant and that second input should be included as its own test (as Kaz mentions above, applicable to fixed or random tests). What I see as problematic is the practice of "spray & pray" test case writing like:

it("works on basic examples", () => {
  expect(getStrLen("asdf")).toBe(4);
  expect(getStrLen("asdfg")).toBe(5);
  expect(getStrLen("asdfgh")).toBe(6);
  expect(getStrLen("123456")).toBe(6);
  expect(getStrLen("654321")).toBe(6);
  // 10 more tests that kata author thought of on the spot
  // and aren't really increasing coverage of the code
});
it("works on random tests", () => {
  for (let i = 0; i < 100; i++) {
    const s = generateEnormousString();
    expect(getStrLen(s)).toBe(reference(s));
  }
});

This is a bit contrived and I'd probably do better by finding an exemplary kata, but for a string challenge like the above example, testing the empty string, length 1, 2 and 3 are critical to establish correctness, along with a long string, odd and even lengths and a few medium-sized random length strings to prevent hardcoding. This is all pretty kata-specific, so it's just a rule of thumb in my mind.

We have partial scoring in Qualified so this sort of guideline makes sense there, keeping the score transparent and more clearly proportional to a handful of carefully-chosen assertions. In theory, anyway!

@hobovsky
Copy link
Contributor

hobovsky commented Nov 7, 2020

I recommend that each test case be pure and not rely on state from other tests. This is implied by the existing wording, but it's subtle and it might not be obvious that creating one instance of the candidate's solution class and reusing it across multiple test cases could be a problem. I've fallen into this in the past; on some challenge types, it takes extra thinking and work to destroy and rebuild state.

I am not sure I get you correctly. If you mean that test cases should not rely one on another and pass any state somehow related to tests themselves, then yes, I agree. Like, for example, one test case should not generate inputs which would be used by another test case.

But if you mean that test cases should be able to handle solutions which can return invalid answer because they carry some stale internal state between invocations, then I am not sure I agree. Solutions can use global variables, private member fields, static variables, and I think that author should make sure that it can be called multiple times one after another. I am not sure how to test explicitly for such scenario, but I think that "non-reusability" of a solution should be considered its flaw and it deserves to fail.

Am I wrong here?

@ggorlen
Copy link
Collaborator

ggorlen commented Nov 7, 2020

I'm referring to this pattern (pretend the array is an instance of a solution class):

describe("solution", () => {
  let arr;
  beforeAll(() => (arr = [])); // note beforeAll
  it("should push 42", () => {
    expect(arr.push(42)).toBe(1);
  });
  it("should have 42 in index 0 after pushing it in the last test case", () => {
    expect(arr[0]).toBe(42);
  });
  // ... many more assertions that mutate `arr` and assert on state carried forward from previous tests ...
});

The issue here is unrelated to the solution entirely; the test suite is brittle--changing a test can break later tests. If a solution fails, the cause might be a problem 5 test cases back. The safer way to run this series of test cases is something like:

describe("solution", () => {
  let arr;
  beforeEach(() => (arr = []));  // note beforeEach
  it("should push 42", () => {
    expect(arr.push(42)).toBe(1);
  });
  it("should have 42 in index 0 after pushing", () => {
    expect(arr.push(42)).toBe(1);
    expect(arr[0]).toBe(42);
  });
  // .. etc ...
});

The intent is similar to most other guidelines on the page: authors should anticipate and take reasonable steps to prevent the user from winding up getting into a frustrating/confusing situation where they have no idea why their code failed.

If idempotency is a requirement for the candidate solution (in most cases it isn't), there should be a clearly labeled test for it:

describe("solution", () => {
  let arr;
  beforeEach(() => (arr = [1, 2, 3]));
  it("should return the same result for multiple calls to map which doesn't mutate the array", () => {
    expect(arr.map(e => e * 2)).toEqual([2, 4, 6]);
    expect(arr.map(e => e * 2)).toEqual([2, 4, 6]);
  });
});

Again, this is just a rule of thumb. Breaking the guideline should be done knowing what the consequences are and being able to communicate that to the solver.

"non-reusability" of a solution should be considered its flaw and it deserves to fail.

It seems like it depends on the specification. The specification for Array#push for example is that it makes a permanent state change on the object, but if Array#map mutated the underlying array, that'd be a problem because the specification says it won't.

Now, if what you're referring to is if a solution object or function used globals for its internal state instead of encapsulating and the safe/idempotent test suite shown above fails, yeah, that's the solver's fault and there's nothing to be done about it from the author's standpoint as far as I can tell. They might even want to test that explicitly and add it to the specification, but that's probably too much trouble.

Hope this is all relevant to your train of thought here.

@hobovsky
Copy link
Contributor

hobovsky commented Nov 7, 2020

Now reading my question once again, I can see how difficult it was to get :) sorry, it's a language thing :) but yes, we meant the same. Thanks!

@hobovsky hobovsky changed the title Authoring guidelines: full test suite Authoring guidelines: submission tests Nov 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants