-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authoring guidelines: submission tests #19
Comments
first corollary question: shoud the generic guide actually be autonom or a part of the guide about creating a kata? |
I think it should be a part of "making a kata". I am not sure if one is going to create tests (in a way described there) outside of this scope. What I think would be good to have there:
It's not exactly python specific, but for some languages this problem manifests differently or under slightly different circumstances. C, C++ and NASM use macros as assertions, and assertion message shows direct expressions being passed as arguments:
|
It's not only for non-standard kata ideas, but also for simpler things like... For example, with some libraries like Haskell QuickCheck it should be OK not to have fixed tests if the only edge cases are like the minimum and maximum numbers of the input range. As long as an author can justify what they're doing, it should be OK.
Splitting the tests may be a good thing when there are performance requirements, but otherwise with proper fixed tests one shouldn't need to look at the randomly generated input because at least one fixed test should fail before that.
Not Python-specific, but there are quite a few languages where the line isn't shown too.
|
-> but is that a problem, to see those names? (I bet you can then call the ref solution?) edit: 'added info and reformatted the original message |
I would add a part about test cases checking some additional constraints, if present. For example, if kata says "do not use X", then this restriction should be either enforced by kata setup, or by assertion that it was not broken. Edit: maybe a note about how difficult or pointless such restrictions are would be also helpful :) |
added. I added the info about getting the solution of the user (must be done "per language", there). About that: might be better to write a single generic article, explaining the traps (especially, when/where to put the check), and then just provide all the snippets at the bottom of the page. This sounds far more reasonnable to me. opinions? |
up! |
Do you guys know any way to make bullet points linkable? Can I insert anchor with Markdown? It would make it possible to copy links directly to a specific guideline and paste them in discourses. |
@kazk , guys, What I am especially interested with is whether the guidelines, as I managed to describe them, are good enough to be treated as "official quality requirements" which reviewers could call on when reporting issues. Are they OK for that? Something is missing? Or they are totally not suitable for such purpose? Thanks! Here's the link to doc preview: https://deploy-preview-151--reverent-edison-2864ea.netlify.app/recipes/authoring/kata-snippets/full-tests/ |
I'd like to use something like remark-directive which is trying to support CommonMark extension proposal, but I'm not sure if it's supported by the Remark version currently used. You should be able to use HTML within Markdown, but it needs to be separated from the rest with blank lines if I remember correctly, so I'm not sure if it's possible to mix HTML element within line item. |
You can now do the following to add ids. - **foo bar**{id="foo-bar"} See remark-attr for more information. For open PRs, you'll need to rebase them or just add them later after merging. |
@Blind4Basics could you tell me what you mean by this part:
I removed it for now, because I think I have similar idea, but I will add it back if it's not what I think. |
@hobovsky: it's about having the slow solution available in the test suite so that the maintainer of the kata can easily switch from one solution to the other to chack that the slow one actually fails "from far enough". Like you did in the "closest points in linearithmic time" kata. |
Aaah OK, so that's something else that I thought. Thanks for explanation! I will put the note back then. |
Looks great! I have a few suggestions (mostly migrated from #146).
It's explained later that random tests shouldn't be used as substitutes for fixed cases for all requirements, so emphasizing that random tests should be as simple as needed to detect cheaters and should not test specification might help motivate why two fixed tests is generally bad. On a related note, I'd suggest that we urge avoiding redundant tests. More tests adds noise and make it that much more frustrating to fulfill the specification, particularly hundreds or thousands of unnamed random tests in a loop. I recommend that each test case be pure and not rely on state from other tests. This is implied by the existing wording, but it's subtle and it might not be obvious that creating one instance of the candidate's solution class and reusing it across multiple test cases could be a problem. I've fallen into this in the past; on some challenge types, it takes extra thinking and work to destroy and rebuild state. Concrete examples of violations of these guidelines might help illustrate and I'd be happy to provide some if this would be useful. |
I am not sure what (how many) you consider redundant, but I see two potential issues here:
|
I think the Codewars community can make use of random tests more than just preventing hard coded solutions. Random tests are helpful when authoring and reviewing to make sure the reference solution is working as expected and to come up with submission tests with good coverage. Every time you find an input that causes an unexpected failure, you move that to a named test so it's tested every time with a good feedback on failure. That's how random tests are usually used outside Codewars. |
My perspective is more Qualified-oriented than CW-oriented and I'm not really in a position to recommend CW community direction shifts one way or the other, but I'll try to elaborate. This point isn't critical because it's better err on the side of redundancy than gaps in coverage (CW community is good about enforcing coverage), but it is a UX issue for me and goes hand in hand with the other guidelines like clearly labeling test cases, testing the written spec precisely and keeping one function call to candidate code in each test case block. The goal is to avoid challengers slogging through walls of text from dozens or hundreds of assertions that may not need to be there to figure out their failing case.
If a specific input can reasonably cause a solution to pass one test but not on another, then the tests are no longer redundant and that second input should be included as its own test (as Kaz mentions above, applicable to fixed or random tests). What I see as problematic is the practice of "spray & pray" test case writing like:
This is a bit contrived and I'd probably do better by finding an exemplary kata, but for a string challenge like the above example, testing the empty string, length 1, 2 and 3 are critical to establish correctness, along with a long string, odd and even lengths and a few medium-sized random length strings to prevent hardcoding. This is all pretty kata-specific, so it's just a rule of thumb in my mind. We have partial scoring in Qualified so this sort of guideline makes sense there, keeping the score transparent and more clearly proportional to a handful of carefully-chosen assertions. In theory, anyway! |
I am not sure I get you correctly. If you mean that test cases should not rely one on another and pass any state somehow related to tests themselves, then yes, I agree. Like, for example, one test case should not generate inputs which would be used by another test case. But if you mean that test cases should be able to handle solutions which can return invalid answer because they carry some stale internal state between invocations, then I am not sure I agree. Solutions can use global variables, private member fields, static variables, and I think that author should make sure that it can be called multiple times one after another. I am not sure how to test explicitly for such scenario, but I think that "non-reusability" of a solution should be considered its flaw and it deserves to fail. Am I wrong here? |
I'm referring to this pattern (pretend the array is an instance of a solution class): describe("solution", () => {
let arr;
beforeAll(() => (arr = [])); // note beforeAll
it("should push 42", () => {
expect(arr.push(42)).toBe(1);
});
it("should have 42 in index 0 after pushing it in the last test case", () => {
expect(arr[0]).toBe(42);
});
// ... many more assertions that mutate `arr` and assert on state carried forward from previous tests ...
}); The issue here is unrelated to the solution entirely; the test suite is brittle--changing a test can break later tests. If a solution fails, the cause might be a problem 5 test cases back. The safer way to run this series of test cases is something like: describe("solution", () => {
let arr;
beforeEach(() => (arr = [])); // note beforeEach
it("should push 42", () => {
expect(arr.push(42)).toBe(1);
});
it("should have 42 in index 0 after pushing", () => {
expect(arr.push(42)).toBe(1);
expect(arr[0]).toBe(42);
});
// .. etc ...
}); The intent is similar to most other guidelines on the page: authors should anticipate and take reasonable steps to prevent the user from winding up getting into a frustrating/confusing situation where they have no idea why their code failed. If idempotency is a requirement for the candidate solution (in most cases it isn't), there should be a clearly labeled test for it: describe("solution", () => {
let arr;
beforeEach(() => (arr = [1, 2, 3]));
it("should return the same result for multiple calls to map which doesn't mutate the array", () => {
expect(arr.map(e => e * 2)).toEqual([2, 4, 6]);
expect(arr.map(e => e * 2)).toEqual([2, 4, 6]);
});
}); Again, this is just a rule of thumb. Breaking the guideline should be done knowing what the consequences are and being able to communicate that to the solver.
It seems like it depends on the specification. The specification for Now, if what you're referring to is if a solution object or function used globals for its internal state instead of encapsulating and the safe/idempotent test suite shown above fails, yeah, that's the solver's fault and there's nothing to be done about it from the author's standpoint as far as I can tell. They might even want to test that explicitly and add it to the specification, but that's probably too much trouble. Hope this is all relevant to your train of thought here. |
Now reading my question once again, I can see how difficult it was to get :) sorry, it's a language thing :) but yes, we meant the same. Thanks! |
Here we can discuss more in detail what should be added there. I'll store some ideas at the same time.
First, a lot of things when it'll come to explain how to write "good tests" will be pretty generic. So I believe we might actually writte that generic version first, listing there some of the usual traps to avoid.
Then a more precise guide, python specific.
Generic guide:
Here are some ideas to use as a stump:
structure of the kata/tests suites
tests feedback
fixed/sample tests specificities:
random tests specificities:
miscellaneous
general infos about the guide itself:
Additional sources
python specific guide:
it
block rather than at the top level or in adescribe
block, especially for the random tests. This way, when the tests are passed, the user doesn't end up with a wall ofTest Passed!
in the output panel (in addition, this allowes to see the timings for the block(s) without the need to scroll all the way down.per language:
(or not!? B4B)
add snippet explaining how to access the solution of the user (length/chars constraint)
Feel free to dump some ideas.
The text was updated successfully, but these errors were encountered: