-
-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
15 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 #1358
Comments
While a number of changes have been made to the test suite, this issue holds up without modification: the same tests fail on Windows as before (and as shown above) when The |
In view of the insight in #1442 that Git Bash MSYS2 behavior of
when It causes one of the tests whose failure is noted here to pass:
The other failures here are unaffected. It also causes 9 tests that had otherwise passed to fail, both with and without |
As expected per #1358 (comment), the fixes for #1443 in #1444 have also made this test, originally reported as failing here, pass:
I have edited the description here accordingly, updating the title and summary list of failing tests, linking to a new gist with the current output of a full run, and also revising for clarity and adding some missing details. |
Here are updated runs. The failing tests are the same. This is just to clarify that, since I'm working on some changes in The reason for doing two runs within a short time of each other and to report both in the gist is to check that the number of tests reported as leaky varies across runs rather than having changed due to recent code changes (over time or across branches). 2024-10-22 update: The situation is unchanged at db5c9cf (gitoxide v0.38.0). |
This reverts commit dd94f57. One of the changes to `.gitattributes`, possibly that change, has caused the currently committed generated archives not to be able to be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358. See comments in GitoxideLabs#1607 for details.
This reverts commit dd94f57. That change to `.gitattributes`, though intended as a refactoring, caused the currently committed generated archives not to be able to be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358. See comments in GitoxideLabs#1607 for details.
This reverts commit a75e970. That change to `.gitattributes`, though intended as a refactoring, caused the currently committed generated archives not to be able to be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358. See comments in GitoxideLabs#1607 for details.
Thank you! Do you think it makes sense (and is in the time budget) to run the tests on Windows as well with However, what if the new What do you think? |
It seems to me that, depending on how narrowly or broadly it detects crates as involved, this might either miss breakages that it is intended to catch, or implicate too many crates to confer an adequate speed improvement. My thinking is that changes to one crate can affect any crates that depend on it directly or indirectly, and the effects may only be seen in tests of functionality in the dependent crate. |
You are right, my idea was probably too naive to catch everything in practice. It's an improvement with well-known drawbacks. If you think the same, I think a PR would be quickly done for this one and I wouldn't steal that opportunity :). |
The initial idea in #1358 (comment) may still be worth doing, even though it would not catch everything. Maybe it could be added later, atop the subsequent simpler idea of #1358 (comment), as a required check (or, required or not on CI, as a tool that could be run locally, such as when targeting platforms not covered by CI). But I do not plan to implement it immediately.
I agree with doing this. Please note that this will probably still slow down required checks sometimes, by causing them to be queued longer. As far as I know, runners for required and non-required checks always share the same quota and there is no prioritization of required checks. Although this check could be made dependent on other checks, that would probably be undesirable because it would make things take longer when load is low (by running it and the required checks in series instead of parallel), and because it would not address how checks from earlier events (e.g. earlier pushes to a pull request branch) would still use up quota (by which I just mean rate limiting). It's possible to cause runs on earlier commits to be canceled automatically, but I recommend against that because it would make statuses hard to understand and because comparing output for different commits can be valuable. If new non-required checks turn out to slow down required checks too much, then they could be scaled back or removed. My guess is that it will be okay. To speed things up, I think that:
I don't anticipate that this would be quick to implement, though it may just be that I haven't thought of the right approach. I think these checks should report success when no unexpected failures happen; otherwise, they would be constantly failing. (It may also be that they should fail when a test that is expected to fail passes, so that it can be reviewed and removed from being marked or listed as being expected to fail.) Maybe this is easy to do, but I don't know how to do it with The only general approach I know of to do this is to parse the output of Assuming this is done, the configuration to output XML should probably be placed in [profile.with-xml.junit]
path = "junit.xml" Then I ran: TZ=UTC GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest --profile=with-xml run --workspace --no-fail-fast Even though not currently intended for any of the GNU/Linux runs, I did this experiment on Arch Linux rather than Windows for speed. The results are in this gist, showing both the displayed human-readable text output and the saved XML output. The most important thing to figure out is if I am missing something such that some altogether different approach is preferable. (Thus, I am replying here with what I have found so far, rather than waiting until a PR is ready.) The next most important thing to figure out is how to elegantly parse the output. I'm not very familiar with the JUnit XML output format, but my understanding is that has a lot of tooling support, so there may even be purpose-built tools for this that would be preferable to using a general-purpose XML parsing tool. |
Thanks a lot for the detailed response!
That's a great idea!
It's incredible how I could miss this, given the title of the issue is But on the bright side, thanks to that you may have looked into it earlier and to my mind found a very viable approach. It seems that even the most simple of all ways of parsing, a grep for Sure, just doing a grep and counting lines isn't bulletproof, but I think it's solid enough to make use of it, probably catching all the cases that it could catch in its lifetime - after all this is quite a rare occurrence to begin with and it's unlikely that somehow one of the 15 tests is fixed while a new one appears, hence masking the anomaly. Even though I started my answer somewhat disappointed I definitely finish it with excitement :). Thank you! |
Some of these fail. We report the step as failing if any fail, which currently at least 14 and usually 15 are expected to (one of them is a performance test). If more than 15 fail, we'll fail this job, which will fail the workflow overall, but the job is still deliberately not treated as a required check for PR auto-merge. Right now, the expected number of failures is delibreately set too low, to 13, which is unlikely to be satisfied. This is just to test the new job. After verifying that it can trigger job failure when an excessive number of test case failures occurs, and fixing any readily apparent bugs in the job definition, this value will be raised, probably to 15. See GitoxideLabs#1358 for information on the known-failing tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1`.
Since XML output offers counts of each status across the entire run, including failure and error status, I think it may actually be easier to parse that precisely out of the XML. If we do end up parsing that out and counting, then I think we can parse the count of failed tests in the summary rather than counting the number of lines that fail, and this can be matched in such a way that no failure reports can be misinterpreted as it, if we are willing to match across multiple lines. But I think that this approach, even if corrected and simplified, and even if scaled back by allowing some ambiguity, will still be more complicated than parsing XML, which (based on local testing) seems like it can be done in a couple of lines of PowerShell. For example, if we only had to count failures (and not errors) then this, which I have verified works locally, would be sufficient, and adding errors should be no hurdle: [xml]$junit = Get-Content -Path 'target/nextest/with-xml/junit.xml'
[int]$junit.testsuites.failures The OS of most interest for this job--and possibly the only one it will use--is Windows, which even defaults Although I had originally intended the changes in #1654 to be the first part of a PR to do all this, only those changes, conceptually unrelated to this, are ready. So I've opened that sooner.
I don't actually know how to depend on a specific job generated by a matrix, rather than the whole group, and I'm not sure there is a way. When I was working on the publishing workflow, I tried to do this for the jobs that created macOS universal binaries out of the x86-64 and AArch64 (ARM64) builds, and I did not find a way. (I did find claims that GitHub Actions doesn't support it, but I don't have those on hand, and I don't know if they were accurate or still accurate.) The "obvious" way to do it would be to form a job Of course, there is already significant conditional special-casing in I am not sure if we should, though. It looks, from the testing linked above, to be the longest running job. It might be a good idea to let it have a head start, even at the expense of slowing down required jobs' progression through the queue a bit more often.
I think that counting failures is valuable and a reasonable place to start even if ultimately we do check for specific failures. However, I don't know that it's unlikely that one would start working while another stops. One of the tests is a performance test, and it only sometimes fails. It usually, but does not always, fail locally, and it seems usually, but not always, to pass on CI. This could easily coincide with a new failure. It did actually coincide with a local failure--vaguely referenced at the end of the d74e919 (#1652) commit message and I hope to open an issue for it sometime soon--and I had initially missed it by looking at counts, but fortunately found it because i was also comparing to the list of failures here using this diff tool. Still, having something start failing when more tests than we are accustomed to ever failing start failing seems like a significant gain, whether or not it ends up being subsequently made more exacting. |
Some of these fail. We report the step as failing if any fail, which currently at least 14 and usually 15 are expected to (one of them is a performance test). If more than 15 fail, we'll fail this job, which will fail the workflow overall, but the job is still deliberately not treated as a required check for PR auto-merge. Right now, the expected number of failures is delibreately set too low, to 13, which is unlikely to be satisfied. This is just to test the new job. After verifying that it can trigger job failure when an excessive number of test case failures occurs, and fixing any readily apparent bugs in the job definition, this value will be raised, probably to 15. See GitoxideLabs#1358 for information on the known-failing tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1`.
Wow, I really like this powershell approach. It could hardly be simpler, and portable or not, it only has to work on Windows as far as I can tell.
I see, so these jobs would have to run outside of the matrix, and then it's possible to depend on the one running on Windows. That seems at (least conceptually) simple.
Admittedly I am not thinking about queueing and quotas at all when thinking about this job. To me it's mostly about not having it fail if a test fails, as it will increase the amount of emails I get unnecessarily. From my experience, I ran into quotas the fewest of times, and if it happened it was easy enough to cancel older jobs. But that's just me and there are other workflows. Latency isn't a huge concern, so the job could also run on a cronjob occasionally, so it's never more than say, a day, until one gets alerted about breakage. But I don't know how cron-jobs can be triggered in PRs and only in PRs.
Oh, I wasn't aware that there is flakiness in one of these tests :/ - thus far I was quite proud that CI is incredibly reliable right now. Maybe having that new job and the possible flakiness coming with it will be a motivation, also for me, to do something about it. In any case, exciting research you have been doing, thank you! |
Do you get separate emails for different failing jobs from the same workflow run? When I watch a repository, I get notifications and associated emails for failing CI workflows, but only one notification/email for each workflow that fails, each time the workflow is run and fails. I don't get separate emails for different jobs in the same workflow run. If your experience is the same, then just keeping all the jobs in the If you get separate emails for separate failing jobs in the same workflow run (but not for jobs that are canceled) then their dependencies could affect this, but I think I would need to better understand how that works in order to figure out what arrangement would produce the smallest number of low-value notifications. |
This eliminates the piping, tricky parsing, and special-casing to preserve colorization, and runs the tests and the XML parsing in the default `pwsh` shell (since this is a Windows job), using PowerShell facilities to parse the XML. This also checks that there are no *errors*, in addition to (still) checking that there are no more *failures* than expected. In the preceding commit, five additional tests, not currently noted in GitoxideLabs#1358, failed: FAIL [ 0.010s] gix-credentials::credentials program::from_custom_definition::empty FAIL [ 0.008s] gix-credentials::credentials program::from_custom_definition::name FAIL [ 0.010s] gix-credentials::credentials program::from_custom_definition::name_with_args FAIL [ 0.009s] gix-credentials::credentials program::from_custom_definition::name_with_special_args FAIL [ 0.014s] gix-discover::discover upwards::from_dir_with_dot_dot In addition, one test noted in GitoxideLabs#1358 does not always fail on CI, because it is a performance test and the CI runner is fast enough so that it usually passes: FAIL [ 181.270s] gix-ref-tests::refs packed::iter::performance Let's see if running the tests more similarly to the way they are run on Windows without `GIX_TEST_IGNORE_ARCHIVES`, i.e. without piping and with the Windows default of `pwsh` as the shell, affects any of those new/CI-specific failures.
This should let the `test-fixtures-windows` job succeed. As suspected (but not, as of now, explained), the five additional failures when running the tests in `bash` and piping the output went away with `pwsh` and no pipe. This brings the number of failures down to 14, with the possibility of 15 failing tests if the performance test fails (see discussion in GitoxideLabs#1358). However, while it fails when I run it locally, that performance test seems rarely if ever to fail on CI (anymore?). So let's try setting the maximum number of failing tests, above which the job will report failure, to 14 rather than 15. So that they can be investigated later, the tests that failed with `bash` and `|&` piping of stdout and stderr, when run on Windows on CI with `GIX_TEST_IGNORE_ARCHIVES=1` -- which are listed in the previous commit where the failures were corrected -- have as the most significant reported details as follows: --- STDERR: gix-credentials::credentials program::from_custom_definition::empty --- thread 'program::from_custom_definition::empty' panicked at gix-credentials\tests\program\from_custom_definition.rs:16:5: assertion `left == right` failed: not useful, but allowed, would have to be caught elsewhere left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential- \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name --- thread 'program::from_custom_definition::name' panicked at gix-credentials\tests\program\from_custom_definition.rs:64:5: assertion `left == right` failed: we detect that this can run without shell, which is also more portable on windows left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name_with_args --- thread 'program::from_custom_definition::name_with_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:40:5: assertion `left == right` failed left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=\\\"a b\\\" \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"--arg\" \"--bar=a b\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name_with_special_args --- thread 'program::from_custom_definition::name_with_special_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:52:5: assertion `left == right` failed left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\"" right: "\"sh\" \"-c\" \"C:\\Program Files\\Git\\mingw64\\bin\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-discover::discover upwards::from_dir_with_dot_dot --- thread 'upwards::from_dir_with_dot_dot' panicked at gix-discover\tests\discover\upwards\mod.rs:155:5: assertion `left == right` failed left: Reduced right: Full note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace To be clear, those failures do *not* occur in the preivous commit and are not expected to occur in this one. Those prior failures consist of four `gix-credentials` tests and one `gix-discover` test. (The `gix-discover` failure resembles a problem observed locally on a Windows Server 2022 system as an administrator with UAC disabled, reported in GitoxideLabs#1429, and suggests that the conditions of that failure may differ from, or be more complex than, what I had described there.)
Some of these fail. We report the step as failing if any fail, which currently at least 14 and usually 15 are expected to (one of them is a performance test). If more than 15 fail, we'll fail this job, which will fail the workflow overall, but the job is still deliberately not treated as a required check for PR auto-merge. Right now, the expected number of failures is delibreately set too low, to 13, which is unlikely to be satisfied. This is just to test the new job. After verifying that it can trigger job failure when an excessive number of test case failures occurs, and fixing any readily apparent bugs in the job definition, this value will be raised, probably to 15. See GitoxideLabs#1358 for information on the known-failing tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1`.
This eliminates the piping, tricky parsing, and special-casing to preserve colorization, and runs the tests and the XML parsing in the default `pwsh` shell (since this is a Windows job), using PowerShell facilities to parse the XML. This also checks that there are no *errors*, in addition to (still) checking that there are no more *failures* than expected. In the preceding commit, five additional tests, not currently noted in GitoxideLabs#1358, failed: FAIL [ 0.010s] gix-credentials::credentials program::from_custom_definition::empty FAIL [ 0.008s] gix-credentials::credentials program::from_custom_definition::name FAIL [ 0.010s] gix-credentials::credentials program::from_custom_definition::name_with_args FAIL [ 0.009s] gix-credentials::credentials program::from_custom_definition::name_with_special_args FAIL [ 0.014s] gix-discover::discover upwards::from_dir_with_dot_dot In addition, one test noted in GitoxideLabs#1358 does not always fail on CI, because it is a performance test and the CI runner is fast enough so that it usually passes: FAIL [ 181.270s] gix-ref-tests::refs packed::iter::performance Let's see if running the tests more similarly to the way they are run on Windows without `GIX_TEST_IGNORE_ARCHIVES`, i.e. without piping and with the Windows default of `pwsh` as the shell, affects any of those new/CI-specific failures.
This should let the `test-fixtures-windows` job succeed. As suspected (but not, as of now, explained), the five additional failures when running the tests in `bash` and piping the output went away with `pwsh` and no pipe. This brings the number of failures down to 14, with the possibility of 15 failing tests if the performance test fails (see discussion in GitoxideLabs#1358). However, while it fails when I run it locally, that performance test seems rarely if ever to fail on CI (anymore?). So let's try setting the maximum number of failing tests, above which the job will report failure, to 14 rather than 15. So that they can be investigated later, the tests that failed with `bash` and `|&` piping of stdout and stderr, when run on Windows on CI with `GIX_TEST_IGNORE_ARCHIVES=1` -- which are listed in the previous commit where the failures were corrected -- have as the most significant reported details as follows: --- STDERR: gix-credentials::credentials program::from_custom_definition::empty --- thread 'program::from_custom_definition::empty' panicked at gix-credentials\tests\program\from_custom_definition.rs:16:5: assertion `left == right` failed: not useful, but allowed, would have to be caught elsewhere left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential- \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name --- thread 'program::from_custom_definition::name' panicked at gix-credentials\tests\program\from_custom_definition.rs:64:5: assertion `left == right` failed: we detect that this can run without shell, which is also more portable on windows left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name_with_args --- thread 'program::from_custom_definition::name_with_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:40:5: assertion `left == right` failed left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=\\\"a b\\\" \\\"$@\\\"\" \"--\" \"store\"" right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"--arg\" \"--bar=a b\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-credentials::credentials program::from_custom_definition::name_with_special_args --- thread 'program::from_custom_definition::name_with_special_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:52:5: assertion `left == right` failed left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\"" right: "\"sh\" \"-c\" \"C:\\Program Files\\Git\\mingw64\\bin\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\"" note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace --- STDERR: gix-discover::discover upwards::from_dir_with_dot_dot --- thread 'upwards::from_dir_with_dot_dot' panicked at gix-discover\tests\discover\upwards\mod.rs:155:5: assertion `left == right` failed left: Reduced right: Full note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace To be clear, those failures do *not* occur in the preivous commit and are not expected to occur in this one. Those prior failures consist of four `gix-credentials` tests and one `gix-discover` test. (The `gix-discover` failure resembles a problem observed locally on a Windows Server 2022 system as an administrator with UAC disabled, reported in GitoxideLabs#1429, and suggests that the conditions of that failure may differ from, or be more complex than, what I had described there.)
I've opened #1657 to run Windows tests on CI with |
Yes, you are right, it's actually only one per workflow (file), so that's a non-issue. |
This modifies the `test-fixtures-windows` job that tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that no more than 14 failures occur, it checks that the failing tests are exactly those that are documented in GitoxideLabs#1358 as expected to fail. The initial check that no tests have *error* status is preserved, with only stylistic changes, and kept separate from the subsequent logic so that the output is clearer. The new steps are no longer conditional on `nextest` having exited with a failure status, since (a) that was probably unnecessary before and definitely unnecessary now, (b) at last for now, the comparison is precise, so it would be strange to pass if the diff were to have changes on *all* lines, and (c) this makes it slightly less likely that GitoxideLabs#1358 will accidentally stay open even once fixed. The current approach is to actually retrieve the list of tests expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from the GitoxideLabs#1358 issue body. This has the advantage that it automatically keeps up to date with changes made to that issue description, but this is of course not the only possible approach for populating the expected value. Two changes should be made before this is ready: - As noted in the "FIXME" comment, the job should currently fail becuase the performance test reported to fail in GitoxideLabs#1358 is not being filtered out from the expected failures list. It's left in as of this commit, to verify that the job is capable of failing. (After that, the performance test should either be filtered out or removed from the list in GitoxideLabs#1358, but the former approach is currently preferable because I have not done diverse enough testing to check if the failure on my main Windows system is due to that system being too slow rather than a performance bug.) - The scratchwork file should be removed once no longer needed.
This modifies the `test-fixtures-windows` job that tests on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that no more than 14 failures occur, it checks that the failing tests are exactly those that are documented in GitoxideLabs#1358 as expected to fail. The initial check that no tests have *error* status is preserved, with only stylistic changes, and kept separate from the subsequent logic so that the output is clearer. The new steps are no longer conditional on `nextest` having exited with a failure status, since (a) that was probably unnecessary before and definitely unnecessary now, (b) at last for now, the comparison is precise, so it would be strange to pass if the diff were to have changes on *all* lines, and (c) this makes it slightly less likely that GitoxideLabs#1358 will accidentally stay open even once fixed. The current approach is to actually retrieve the list of tests expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from the GitoxideLabs#1358 issue body. This has the advantage that it automatically keeps up to date with changes made to that issue description, but this is of course not the only possible approach for populating the expected value. Two changes should be made before this is ready: - As noted in the "FIXME" comment, the job should currently fail becuase the performance test reported to fail in GitoxideLabs#1358 is not being filtered out from the expected failures list. It's left in as of this commit, to verify that the job is capable of failing. (After that, the performance test should either be filtered out or removed from the list in GitoxideLabs#1358, but the former approach is currently preferable because I have not done diverse enough testing to check if the failure on my main Windows system is due to that system being too slow rather than a performance bug.) - The scratchwork file should be removed once no longer needed.
This omits tests containing `performance` (and not as part of a larger "word", not even with `_`) from being expected to fail on CI with `GIX_TEST_IGNORE_ARCHIVES=1` on Windows. Currently there is one such test listed in GitoxideLabs#1358, `gix-ref-tests::refs packed::iter::performance`.
Instead of retrieving them from GitoxideLabs#1358. (See discussion in GitoxideLabs#1663.)
Current behavior 😯
Running tests on Windows in a Git Bash environment (similar to the environment in which they run in Windows on CI, where
bash
is Git Bash), all tests are able to pass normally. However, this apparently relies on the use of generated archives.When the tests are run with the environment variable
GIX_TEST_IGNORE_ARCHIVES
set to1
rather than unset, 15 tests fail. The failing tests, as quoted from the end of the test run, are:The full output is available in this gist, showing a test run at c2753b8, which has the fixes in #1444.
Before those fixes, one other test had been reported as failing here. See #1358 (comment) and the old gist if interested. The discussion in #1345 is still relevant, though it links to this even older gist.
As noted in comments in #1345, the failure in
compare_baseline_with_ours
seems particularly interesting, since it involves an unexpected effect of.gitignore
pattern matching that is different on Windows.Expected behavior 🤔
All tests should pass, even when suppressing the use of generated archives by setting
GIX_TEST_IGNORE_ARCHIVES=1
. Differences between Windows and other platforms should be accounted for when intentional and desirable, or fixed otherwise.Git behavior
Not fully applicable, since this is about a failure of multiple gitoxide tests when run in a certain way.
However, some discrepancies--intended or unintended--between gitoxide and Git may turn out to be related to some of the failures. So this section may be expanded in the future, or perhaps new issues will be split out from this one.
Steps to reproduce 🕹
I ran the tests on Windows 10.0.19045 (x64) with developer mode enabled so that symlink creation is permitted even without UAC elevation, with
git version 2.45.2.windows.1
.I used the current tip of the main branch, which at this time is c2753b8. When I opened this issue originally, that did not exist, but all experiments described here have been performed again, and the reported results have been updated. I used the latest stable Rust toolchain and
cargo-nextest
, though this does not seem to affect the results.Local development environments and CI sometimes differ in relevant ways, so I also verified that the tests all pass when
GIX_TEST_IGNORE_ARCHIVES
is not set. This may be considered an optional step, but is beneficial because it checks that all needed dependencies are installed and working, and that failures really can be attributed to the effect ofGIX_TEST_IGNORE_ARCHIVES
. To run the tests that way, I run this in Git Bash:Then I did a full clean:
Then, also in Git Bash, I ran this command, which produced the test output and failures described above:
The reason the tests must be run in Git Bash is that there is a separate issue where many test failures occur when they are run from a typical PowerShell environment (#1359).
I also ran the tests, with and without
GIX_TEST_IGNORE_ARCHIVES=1
, on Ubuntu 22.04 LTS (x64) and macOS 14.5 (M1). As expected, all tests passed on those systems, confirming that the failures are Windows-specific.The text was updated successfully, but these errors were encountered: