Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

15 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 #1358

Open
EliahKagan opened this issue May 7, 2024 · 15 comments
Open

15 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 #1358

EliahKagan opened this issue May 7, 2024 · 15 comments
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented May 7, 2024

Current behavior 😯

Running tests on Windows in a Git Bash environment (similar to the environment in which they run in Windows on CI, where bash is Git Bash), all tests are able to pass normally. However, this apparently relies on the use of generated archives.

When the tests are run with the environment variable GIX_TEST_IGNORE_ARCHIVES set to 1 rather than unset, 15 tests fail. The failing tests, as quoted from the end of the test run, are:

     Summary [ 903.419s] 2363 tests run: 2348 passed (17 slow, 1 leaky), 15 failed, 9 skipped
        FAIL [   7.416s] gix-glob::glob pattern::matching::compare_baseline_with_ours
        FAIL [   3.324s] gix-pathspec::pathspec parse::baseline
        FAIL [   0.015s] gix-pathspec::pathspec parse::valid::glob_negations_are_always_literal
        FAIL [   0.018s] gix-pathspec::pathspec parse::valid::whitespace_in_pathspec
        FAIL [   1.749s] gix-pathspec::pathspec search::files
        FAIL [   3.855s] gix-pathspec::pathspec search::prefixes_are_always_case_sensitive
        FAIL [ 181.270s] gix-ref-tests::refs packed::iter::performance
        FAIL [   4.803s] gix-submodule::submodule file::baseline::common_values_and_names_by_path
        FAIL [  30.106s] gix-submodule::submodule file::is_active_platform::pathspecs_matter_even_if_they_do_not_match
        FAIL [  35.376s] gix-submodule::submodule file::is_active_platform::submodules_with_active_config_are_considered_active_or_inactive
        FAIL [  22.003s] gix-submodule::submodule file::is_active_platform::submodules_with_active_config_override_pathspecs
        FAIL [  15.882s] gix-submodule::submodule file::is_active_platform::without_any_additional_settings_all_are_inactive_if_they_have_a_url
        FAIL [   9.165s] gix-submodule::submodule file::is_active_platform::without_submodule_in_index
        FAIL [   0.193s] gix::gix revision::spec::from_bytes::regex::find_youngest_matching_commit::regex_matches
        FAIL [   0.086s] gix::gix revision::spec::from_bytes::regex::with_known_revision::contained_string_matches_in_unanchored_regex_and_disambiguates_automatically
error: test run failed

The full output is available in this gist, showing a test run at c2753b8, which has the fixes in #1444.

Before those fixes, one other test had been reported as failing here. See #1358 (comment) and the old gist if interested. The discussion in #1345 is still relevant, though it links to this even older gist.

As noted in comments in #1345, the failure in compare_baseline_with_ours seems particularly interesting, since it involves an unexpected effect of .gitignore pattern matching that is different on Windows.

Expected behavior 🤔

All tests should pass, even when suppressing the use of generated archives by setting GIX_TEST_IGNORE_ARCHIVES=1. Differences between Windows and other platforms should be accounted for when intentional and desirable, or fixed otherwise.

Git behavior

Not fully applicable, since this is about a failure of multiple gitoxide tests when run in a certain way.

However, some discrepancies--intended or unintended--between gitoxide and Git may turn out to be related to some of the failures. So this section may be expanded in the future, or perhaps new issues will be split out from this one.

Steps to reproduce 🕹

I ran the tests on Windows 10.0.19045 (x64) with developer mode enabled so that symlink creation is permitted even without UAC elevation, with git version 2.45.2.windows.1.

I used the current tip of the main branch, which at this time is c2753b8. When I opened this issue originally, that did not exist, but all experiments described here have been performed again, and the reported results have been updated. I used the latest stable Rust toolchain and cargo-nextest, though this does not seem to affect the results.

Local development environments and CI sometimes differ in relevant ways, so I also verified that the tests all pass when GIX_TEST_IGNORE_ARCHIVES is not set. This may be considered an optional step, but is beneficial because it checks that all needed dependencies are installed and working, and that failures really can be attributed to the effect of GIX_TEST_IGNORE_ARCHIVES. To run the tests that way, I run this in Git Bash:

cargo nextest run --all --no-fail-fast

Then I did a full clean:

gix clean -xde

Then, also in Git Bash, I ran this command, which produced the test output and failures described above:

GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest run --all --no-fail-fast

The reason the tests must be run in Git Bash is that there is a separate issue where many test failures occur when they are run from a typical PowerShell environment (#1359).

I also ran the tests, with and without GIX_TEST_IGNORE_ARCHIVES=1, on Ubuntu 22.04 LTS (x64) and macOS 14.5 (M1). As expected, all tests passed on those systems, confirming that the failures are Windows-specific.

@EliahKagan EliahKagan changed the title 14 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 16 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 May 7, 2024
@Byron Byron added help wanted Extra attention is needed acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues labels May 7, 2024
@EliahKagan
Copy link
Member Author

EliahKagan commented Jun 23, 2024

While a number of changes have been made to the test suite, this issue holds up without modification: the same tests fail on Windows as before (and as shown above) when GIT_TEST_IGNORE_ARCHIVES=1, at c1d7c02.

The gix-ref-tests::refs packed::iter::performance test may depend on the speed of the test machine, but that was the case before. On my machine, it continues to take about the same amount of time (and fail for taking too long).

@EliahKagan
Copy link
Member Author

EliahKagan commented Jul 5, 2024

In view of the insight in #1442 that Git Bash MSYS2 behavior of ln -s creating copies rather than symlinks is behind the new failure (not listed above) of

gix-diff-tests::diff tree::changes::to_obtain_tree::many_different_states

when GIX_TEST_IGNORE_ARCHIVES=1, I was curious if causing it to actually create symlinks by setting MSYS=winsymlinks:nativestrict would change of the failing tests noted here to pass.

It causes one of the tests whose failure is noted here to pass:

gix::gix object::tree::diff::track_rewrites::copies_in_entire_tree_by_similarity_with_limit

The other failures here are unaffected.

It also causes 9 tests that had otherwise passed to fail, both with and without GIX_TEST_IGNORE_ARCHIVES=1. I've opened #1443 for that.

@EliahKagan
Copy link
Member Author

EliahKagan commented Jul 6, 2024

As expected per #1358 (comment), the fixes for #1443 in #1444 have also made this test, originally reported as failing here, pass:

gix::gix object::tree::diff::track_rewrites::copies_in_entire_tree_by_similarity_with_limit

I have edited the description here accordingly, updating the title and summary list of failing tests, linking to a new gist with the current output of a full run, and also revising for clarity and adding some missing details.

@EliahKagan EliahKagan changed the title 16 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 15 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 Jul 6, 2024
@EliahKagan
Copy link
Member Author

EliahKagan commented Sep 11, 2024

Here are updated runs. The failing tests are the same. This is just to clarify that, since I'm working on some changes in gix-testtools (in #1591 and elsewhere) and ran this on the current tip of main to compare against.

The reason for doing two runs within a short time of each other and to report both in the gist is to check that the number of tests reported as leaky varies across runs rather than having changed due to recent code changes (over time or across branches).


2024-10-22 update: The situation is unchanged at db5c9cf (gitoxide v0.38.0).

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Sep 24, 2024
This reverts commit dd94f57.

One of the changes to `.gitattributes`, possibly that change, has
caused the currently committed generated archives not to be able to
be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358.
See comments in GitoxideLabs#1607 for details.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Sep 24, 2024
This reverts commit dd94f57.

That change to `.gitattributes`, though intended as a refactoring,
caused the currently committed generated archives not to be able to
be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358.

See comments in GitoxideLabs#1607 for details.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Sep 24, 2024
This reverts commit a75e970.

That change to `.gitattributes`, though intended as a refactoring,
caused the currently committed generated archives not to be able to
be used, producing Windows `test-fast` failures similar to GitoxideLabs#1358.

See comments in GitoxideLabs#1607 for details.
@EliahKagan
Copy link
Member Author

A new test from #1618, gix-merge::merge tree::run_baseline, also fails on Windows with GIX_TEST_IGNORE_ARCHIVES=1. But I have not added it to the list here, because #1652 includes a fix for it in the changes it proposes.

@Byron
Copy link
Member

Byron commented Nov 5, 2024

Thank you! Do you think it makes sense (and is in the time budget) to run the tests on Windows as well with GIX_TEST_IGNORE_ARCHIVES=1? If so, I think it makes sense to do it.
Answering my own question, it's clearly not in the cards as the fast test takes 3 to 4 minutes on MacOS and Linux but will take 14-15 minutes on Windows, while the full test takes 13 minutes on Linux.

However, what if the new internal tool would be used to detect changes compared to the base of a PR, extract the involved crates, and then would run the tests only for the crates in question with the ignore-archives flag enabled? Probably this can even be done in scripting entirely without having to compile internal at all.

What do you think?

@EliahKagan
Copy link
Member Author

However, what if the new internal tool would be used to detect changes compared to the base of a PR, extract the involved crates, and then would run the tests only for the crates in question with the ignore-archives flag enabled? Probably this can even be done in scripting entirely without having to compile internal at all.

It seems to me that, depending on how narrowly or broadly it detects crates as involved, this might either miss breakages that it is intended to catch, or implicate too many crates to confer an adequate speed improvement. My thinking is that changes to one crate can affect any crates that depend on it directly or indirectly, and the effects may only be seen in tests of functionality in the dependent crate.

@Byron
Copy link
Member

Byron commented Nov 5, 2024

You are right, my idea was probably too naive to catch everything in practice.
However, what about running the full test suite just like on linux as additional job, which just like the fuzzing isn't linked to the auto-merge feature. This means that failures will trigger an email notification so despite the changes merged, it will become clear sooner that something broke. Depending on the longevity of a PR, it will be more common that these failures will prevent a merge for the right reasons, than not preventing it.

It's an improvement with well-known drawbacks. If you think the same, I think a PR would be quickly done for this one and I wouldn't steal that opportunity :).

@EliahKagan
Copy link
Member Author

EliahKagan commented Nov 5, 2024

You are right, my idea was probably too naive to catch everything in practice.

The initial idea in #1358 (comment) may still be worth doing, even though it would not catch everything. Maybe it could be added later, atop the subsequent simpler idea of #1358 (comment), as a required check (or, required or not on CI, as a tool that could be run locally, such as when targeting platforms not covered by CI). But I do not plan to implement it immediately.

However, what about running the full test suite just like on linux as additional job, which just like the fuzzing isn't linked to the auto-merge feature.

I agree with doing this.

Please note that this will probably still slow down required checks sometimes, by causing them to be queued longer. As far as I know, runners for required and non-required checks always share the same quota and there is no prioritization of required checks.

Although this check could be made dependent on other checks, that would probably be undesirable because it would make things take longer when load is low (by running it and the required checks in series instead of parallel), and because it would not address how checks from earlier events (e.g. earlier pushes to a pull request branch) would still use up quota (by which I just mean rate limiting). It's possible to cause runs on earlier commits to be canceled automatically, but I recommend against that because it would make statuses hard to understand and because comparing output for different commits can be valuable.

If new non-required checks turn out to slow down required checks too much, then they could be scaled back or removed. My guess is that it will be okay. To speed things up, I think that:

  • This should be like test-fast but with GIX_TEST_IGNORE_ARCHIVES=1, and not include the other elements of the full test job.
  • When GitHub Actions hosted ARM (AArch64/ARM64) runners are available (in the near future, I think) for Windows in the free tier, the Windows job with GIX_TEST_IGNORE_ARCHIVES=1 could run on whichever of ARM64 and x86-64 is faster, unless a reason arises to prefer one of them for another reason. (The x86 runner might be faster if Git for Windows does not yet have native ARM64 builds.)

If you think the same, I think a PR would be quickly done for this one

I don't anticipate that this would be quick to implement, though it may just be that I haven't thought of the right approach. I think these checks should report success when no unexpected failures happen; otherwise, they would be constantly failing.

(It may also be that they should fail when a test that is expected to fail passes, so that it can be reviewed and removed from being marked or listed as being expected to fail.)

Maybe this is easy to do, but I don't know how to do it with cargo nextest. If possible, I think the new CI check(s) should really run all the tests, even those that are expected to fail, and report their failures, showing details. As far as I know (and can find in the documentation), cargo nextest doesn't support xfail.

The only general approach I know of to do this is to parse the output of cargo nextest. This could be done on the human-readable output, but cargo nextest supports saving XML, and enabling this does not suppress what is usually displayed. It's fairly unobtrusive, being written into a subdirectory of target/, so it may be that it could just be enabled by default, but if not then a separate nextest profile can be configured.

Assuming this is done, the configuration to output XML should probably be placed in Cargo.toml and not with a nextest.toml (unless the configuration is being added on the fly on CI, rather than being committed). However, for testing I made a .config/nextest.toml file:

[profile.with-xml.junit]
path = "junit.xml"

Then I ran:

TZ=UTC GIX_TEST_IGNORE_ARCHIVES=1 cargo nextest --profile=with-xml run --workspace --no-fail-fast

Even though not currently intended for any of the GNU/Linux runs, I did this experiment on Arch Linux rather than Windows for speed. The results are in this gist, showing both the displayed human-readable text output and the saved XML output.

The most important thing to figure out is if I am missing something such that some altogether different approach is preferable. (Thus, I am replying here with what I have found so far, rather than waiting until a PR is ready.)

The next most important thing to figure out is how to elegantly parse the output. I'm not very familiar with the JUnit XML output format, but my understanding is that has a lot of tooling support, so there may even be purpose-built tools for this that would be preferable to using a general-purpose XML parsing tool.

@Byron
Copy link
Member

Byron commented Nov 5, 2024

Thanks a lot for the detailed response!

  • This should be like test-fast but with GIX_TEST_IGNORE_ARCHIVES=1, and not include the other elements of the full test job.

That's a great idea!

I don't anticipate that this would be quick to implement, though it may just be that I haven't thought of the right approach. I think these checks should report success when no unexpected failures happen; otherwise, they would be constantly failing.

It's incredible how I could miss this, given the title of the issue is 15 tests fail on Windows with GIX_TEST_IGNORE_ARCHIVES=1 and I am very sorry to have wasted your time :/. I truly thought doing this was immediately possible.

But on the bright side, thanks to that you may have looked into it earlier and to my mind found a very viable approach. It seems that even the most simple of all ways of parsing, a grep for <failure would already do the trick in seeing if there are failures. I really like the idea of making this job dependent on the normal test-fast as then it would only run if test-fast succeeds, hence no new 'normal' issues were found. And if one would make that parent-job the MacOS version of the test, it could be done in 3-4 minutes, adding minimal latency only.

Sure, just doing a grep and counting lines isn't bulletproof, but I think it's solid enough to make use of it, probably catching all the cases that it could catch in its lifetime - after all this is quite a rare occurrence to begin with and it's unlikely that somehow one of the 15 tests is fixed while a new one appears, hence masking the anomaly.

Even though I started my answer somewhat disappointed I definitely finish it with excitement :). Thank you!

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 6, 2024
Some of these fail. We report the step as failing if any fail,
which currently at least 14 and usually 15 are expected to (one of
them is a performance test). If more than 15 fail, we'll fail this
job, which will fail the workflow overall, but the job is still
deliberately not treated as a required check for PR auto-merge.

Right now, the expected number of failures is delibreately set too
low, to 13, which is unlikely to be satisfied. This is just to test
the new job. After verifying that it can trigger job failure when
an excessive number of test case failures occurs, and fixing any
readily apparent bugs in the job definition, this value will be
raised, probably to 15.

See GitoxideLabs#1358 for information on the known-failing tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1`.
@EliahKagan
Copy link
Member Author

Since XML output offers counts of each status across the entire run, including failure and error status, I think it may actually be easier to parse that precisely out of the XML. cargo nextest doesn't seem to support writing a copy of its human-readable output to a log file. This can be achieved with tools that may not be present on Windows runners, like script, or by piping to tee. In either case, for the output to be fully readable, it should be displayed colorized when run, which complicates parsing if it is to be done precisely (ANSI color sequences ending in m have unintuitive effects on where word boundaries are, for example), and complicates debugging if parsing is to be done imprecisely.

If we do end up parsing that out and counting, then I think we can parse the count of failed tests in the summary rather than counting the number of lines that fail, and this can be matched in such a way that no failure reports can be misinterpreted as it, if we are willing to match across multiple lines. But I think that this approach, even if corrected and simplified, and even if scaled back by allowing some ambiguity, will still be more complicated than parsing XML, which (based on local testing) seems like it can be done in a couple of lines of PowerShell. For example, if we only had to count failures (and not errors) then this, which I have verified works locally, would be sufficient, and adding errors should be no hurdle:

[xml]$junit = Get-Content -Path 'target/nextest/with-xml/junit.xml'
[int]$junit.testsuites.failures

The OS of most interest for this job--and possibly the only one it will use--is Windows, which even defaults shell to pwsh on GHA runners, but I believe that shell is also available on the non-Windows GitHub-hosted runners and can be specified as the value of the shell key.

Although I had originally intended the changes in #1654 to be the first part of a PR to do all this, only those changes, conceptually unrelated to this, are ready. So I've opened that sooner.

I really like the idea of making this job dependent on the normal test-fast as then it would only run if test-fast succeeds, hence no new 'normal' issues were found. And if one would make that parent-job the MacOS version of the test, it could be done in 3-4 minutes, adding minimal latency only.

I don't actually know how to depend on a specific job generated by a matrix, rather than the whole group, and I'm not sure there is a way. When I was working on the publishing workflow, I tried to do this for the jobs that created macOS universal binaries out of the x86-64 and AArch64 (ARM64) builds, and I did not find a way. (I did find claims that GitHub Actions doesn't support it, but I don't have those on hand, and I don't know if they were accurate or still accurate.) The "obvious" way to do it would be to form a job id from a matrix value, then depend on that id. I recall testing this and finding that an id could not be formed that way.

Of course, there is already significant conditional special-casing in test-fast. If that were split up into three separately defined jobs, then they could easily be depended on individually. However, then they would not automatically be canceled based on each other's failures. Cancellation could presumably be implemented but it might be cumbersome. Still, just splitting them up might be worthwhile, assuming we really do want to keep the new job that runs tests on Windows with GIX_TEST_IGNORE_ARCHIVES=1 from starting early.

I am not sure if we should, though. It looks, from the testing linked above, to be the longest running job. It might be a good idea to let it have a head start, even at the expense of slowing down required jobs' progression through the queue a bit more often.

after all this is quite a rare occurrence to begin with and it's unlikely that somehow one of the 15 tests is fixed while a new one appears, hence masking the anomaly.

I think that counting failures is valuable and a reasonable place to start even if ultimately we do check for specific failures.

However, I don't know that it's unlikely that one would start working while another stops. One of the tests is a performance test, and it only sometimes fails. It usually, but does not always, fail locally, and it seems usually, but not always, to pass on CI. This could easily coincide with a new failure. It did actually coincide with a local failure--vaguely referenced at the end of the d74e919 (#1652) commit message and I hope to open an issue for it sometime soon--and I had initially missed it by looking at counts, but fortunately found it because i was also comparing to the list of failures here using this diff tool.

Still, having something start failing when more tests than we are accustomed to ever failing start failing seems like a significant gain, whether or not it ends up being subsequently made more exacting.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 6, 2024
Some of these fail. We report the step as failing if any fail,
which currently at least 14 and usually 15 are expected to (one of
them is a performance test). If more than 15 fail, we'll fail this
job, which will fail the workflow overall, but the job is still
deliberately not treated as a required check for PR auto-merge.

Right now, the expected number of failures is delibreately set too
low, to 13, which is unlikely to be satisfied. This is just to test
the new job. After verifying that it can trigger job failure when
an excessive number of test case failures occurs, and fixing any
readily apparent bugs in the job definition, this value will be
raised, probably to 15.

See GitoxideLabs#1358 for information on the known-failing tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1`.
@Byron
Copy link
Member

Byron commented Nov 6, 2024

Wow, I really like this powershell approach. It could hardly be simpler, and portable or not, it only has to work on Windows as far as I can tell.

Of course, there is already significant conditional special-casing in test-fast. If that were split up into three separately defined jobs, then they could easily be depended on individually. However, then they would not automatically be canceled based on each other's failures. Cancellation could presumably be implemented but it might be cumbersome. Still, just splitting them up might be worthwhile, assuming we really do want to keep the new job that runs tests on Windows with GIX_TEST_IGNORE_ARCHIVES=1 from starting early.

I see, so these jobs would have to run outside of the matrix, and then it's possible to depend on the one running on Windows. That seems at (least conceptually) simple.

I am not sure if we should, though. It looks, from the testing linked above, to be the longest running job. It might be a good idea to let it have a head start, even at the expense of slowing down required jobs' progression through the queue a bit more often.

Admittedly I am not thinking about queueing and quotas at all when thinking about this job. To me it's mostly about not having it fail if a test fails, as it will increase the amount of emails I get unnecessarily. From my experience, I ran into quotas the fewest of times, and if it happened it was easy enough to cancel older jobs. But that's just me and there are other workflows. Latency isn't a huge concern, so the job could also run on a cronjob occasionally, so it's never more than say, a day, until one gets alerted about breakage. But I don't know how cron-jobs can be triggered in PRs and only in PRs.

Still, having something start failing when more tests than we are accustomed to ever failing start failing seems like a significant gain, whether or not it ends up being subsequently made more exacting.

Oh, I wasn't aware that there is flakiness in one of these tests :/ - thus far I was quite proud that CI is incredibly reliable right now. Maybe having that new job and the possible flakiness coming with it will be a motivation, also for me, to do something about it.

In any case, exciting research you have been doing, thank you!

@EliahKagan
Copy link
Member Author

To me it's mostly about not having it fail if a test fails, as it will increase the amount of emails I get unnecessarily.

Do you get separate emails for different failing jobs from the same workflow run? When I watch a repository, I get notifications and associated emails for failing CI workflows, but only one notification/email for each workflow that fails, each time the workflow is run and fails. I don't get separate emails for different jobs in the same workflow run.

If your experience is the same, then just keeping all the jobs in the ci.yml workflow, which should be done anyway due to their close relatedness and having the same event trigger, should be sufficient to avoid getting extra emails. In that case, any way of organizing the jobs and deciding/expressing their dependences should be fine.

If you get separate emails for separate failing jobs in the same workflow run (but not for jobs that are canceled) then their dependencies could affect this, but I think I would need to better understand how that works in order to figure out what arrangement would produce the smallest number of low-value notifications.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 7, 2024
This eliminates the piping, tricky parsing, and special-casing to
preserve colorization, and runs the tests and the XML parsing in
the default `pwsh` shell (since this is a Windows job), using
PowerShell facilities to parse the XML. This also checks that there
are no *errors*, in addition to (still) checking that there are no
more *failures* than expected.

In the preceding commit, five additional tests, not currently noted
in GitoxideLabs#1358, failed:

    FAIL [   0.010s] gix-credentials::credentials program::from_custom_definition::empty
    FAIL [   0.008s] gix-credentials::credentials program::from_custom_definition::name
    FAIL [   0.010s] gix-credentials::credentials program::from_custom_definition::name_with_args
    FAIL [   0.009s] gix-credentials::credentials program::from_custom_definition::name_with_special_args
    FAIL [   0.014s] gix-discover::discover upwards::from_dir_with_dot_dot

In addition, one test noted in GitoxideLabs#1358 does not always fail on CI,
because it is a performance test and the CI runner is fast enough
so that it usually passes:

    FAIL [ 181.270s] gix-ref-tests::refs packed::iter::performance

Let's see if running the tests more similarly to the way they are
run on Windows without `GIX_TEST_IGNORE_ARCHIVES`, i.e. without
piping and with the Windows default of `pwsh` as the shell, affects
any of those new/CI-specific failures.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 7, 2024
This should let the `test-fixtures-windows` job succeed.

As suspected (but not, as of now, explained), the five additional
failures when running the tests in `bash` and piping the output
went away with `pwsh` and no pipe.

This brings the number of failures down to 14, with the possibility
of 15 failing tests if the performance test fails (see discussion
in GitoxideLabs#1358). However, while it fails when I run it locally, that
performance test seems rarely if ever to fail on CI (anymore?).
So let's try setting the maximum number of failing tests, above
which the job will report failure, to 14 rather than 15.

So that they can be investigated later, the tests that failed with
`bash` and `|&` piping of stdout and stderr, when run on Windows on
CI with `GIX_TEST_IGNORE_ARCHIVES=1` -- which are listed in the
previous commit where the failures were corrected -- have as the
most significant reported details as follows:

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::empty ---
    thread 'program::from_custom_definition::empty' panicked at gix-credentials\tests\program\from_custom_definition.rs:16:5:
    assertion `left == right` failed: not useful, but allowed, would have to be caught elsewhere
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential- \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name ---
    thread 'program::from_custom_definition::name' panicked at gix-credentials\tests\program\from_custom_definition.rs:64:5:
    assertion `left == right` failed: we detect that this can run without shell, which is also more portable on windows
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name_with_args ---
    thread 'program::from_custom_definition::name_with_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:40:5:
    assertion `left == right` failed
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=\\\"a b\\\" \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"--arg\" \"--bar=a b\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name_with_special_args ---
    thread 'program::from_custom_definition::name_with_special_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:52:5:
    assertion `left == right` failed
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\""
     right: "\"sh\" \"-c\" \"C:\\Program Files\\Git\\mingw64\\bin\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-discover::discover upwards::from_dir_with_dot_dot ---
    thread 'upwards::from_dir_with_dot_dot' panicked at gix-discover\tests\discover\upwards\mod.rs:155:5:
    assertion `left == right` failed
      left: Reduced
     right: Full
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To be clear, those failures do *not* occur in the preivous commit
and are not expected to occur in this one. Those prior failures
consist of four `gix-credentials` tests and one `gix-discover`
test. (The `gix-discover` failure resembles a problem observed
locally on a Windows Server 2022 system as an administrator with
UAC disabled, reported in GitoxideLabs#1429, and suggests that the conditions
of that failure may differ from, or be more complex than, what I
had described there.)
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 7, 2024
Some of these fail. We report the step as failing if any fail,
which currently at least 14 and usually 15 are expected to (one of
them is a performance test). If more than 15 fail, we'll fail this
job, which will fail the workflow overall, but the job is still
deliberately not treated as a required check for PR auto-merge.

Right now, the expected number of failures is delibreately set too
low, to 13, which is unlikely to be satisfied. This is just to test
the new job. After verifying that it can trigger job failure when
an excessive number of test case failures occurs, and fixing any
readily apparent bugs in the job definition, this value will be
raised, probably to 15.

See GitoxideLabs#1358 for information on the known-failing tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1`.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 7, 2024
This eliminates the piping, tricky parsing, and special-casing to
preserve colorization, and runs the tests and the XML parsing in
the default `pwsh` shell (since this is a Windows job), using
PowerShell facilities to parse the XML. This also checks that there
are no *errors*, in addition to (still) checking that there are no
more *failures* than expected.

In the preceding commit, five additional tests, not currently noted
in GitoxideLabs#1358, failed:

    FAIL [   0.010s] gix-credentials::credentials program::from_custom_definition::empty
    FAIL [   0.008s] gix-credentials::credentials program::from_custom_definition::name
    FAIL [   0.010s] gix-credentials::credentials program::from_custom_definition::name_with_args
    FAIL [   0.009s] gix-credentials::credentials program::from_custom_definition::name_with_special_args
    FAIL [   0.014s] gix-discover::discover upwards::from_dir_with_dot_dot

In addition, one test noted in GitoxideLabs#1358 does not always fail on CI,
because it is a performance test and the CI runner is fast enough
so that it usually passes:

    FAIL [ 181.270s] gix-ref-tests::refs packed::iter::performance

Let's see if running the tests more similarly to the way they are
run on Windows without `GIX_TEST_IGNORE_ARCHIVES`, i.e. without
piping and with the Windows default of `pwsh` as the shell, affects
any of those new/CI-specific failures.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 7, 2024
This should let the `test-fixtures-windows` job succeed.

As suspected (but not, as of now, explained), the five additional
failures when running the tests in `bash` and piping the output
went away with `pwsh` and no pipe.

This brings the number of failures down to 14, with the possibility
of 15 failing tests if the performance test fails (see discussion
in GitoxideLabs#1358). However, while it fails when I run it locally, that
performance test seems rarely if ever to fail on CI (anymore?).
So let's try setting the maximum number of failing tests, above
which the job will report failure, to 14 rather than 15.

So that they can be investigated later, the tests that failed with
`bash` and `|&` piping of stdout and stderr, when run on Windows on
CI with `GIX_TEST_IGNORE_ARCHIVES=1` -- which are listed in the
previous commit where the failures were corrected -- have as the
most significant reported details as follows:

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::empty ---
    thread 'program::from_custom_definition::empty' panicked at gix-credentials\tests\program\from_custom_definition.rs:16:5:
    assertion `left == right` failed: not useful, but allowed, would have to be caught elsewhere
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential- \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name ---
    thread 'program::from_custom_definition::name' panicked at gix-credentials\tests\program\from_custom_definition.rs:64:5:
    assertion `left == right` failed: we detect that this can run without shell, which is also more portable on windows
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name_with_args ---
    thread 'program::from_custom_definition::name_with_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:40:5:
    assertion `left == right` failed
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=\\\"a b\\\" \\\"$@\\\"\" \"--\" \"store\""
     right: "\"C:\\Program Files\\Git\\mingw64\\bin\\git.exe\" \"credential-name\" \"--arg\" \"--bar=a b\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-credentials::credentials program::from_custom_definition::name_with_special_args ---
    thread 'program::from_custom_definition::name_with_special_args' panicked at gix-credentials\tests\program\from_custom_definition.rs:52:5:
    assertion `left == right` failed
      left: "\"sh\" \"-c\" \"C:\\\\Program Files\\\\Git\\\\mingw64\\\\bin\\\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\""
     right: "\"sh\" \"-c\" \"C:\\Program Files\\Git\\mingw64\\bin\\git.exe credential-name --arg --bar=~/folder/in/home \\\"$@\\\"\" \"--\" \"store\""
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

    --- STDERR:              gix-discover::discover upwards::from_dir_with_dot_dot ---
    thread 'upwards::from_dir_with_dot_dot' panicked at gix-discover\tests\discover\upwards\mod.rs:155:5:
    assertion `left == right` failed
      left: Reduced
     right: Full
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To be clear, those failures do *not* occur in the preivous commit
and are not expected to occur in this one. Those prior failures
consist of four `gix-credentials` tests and one `gix-discover`
test. (The `gix-discover` failure resembles a problem observed
locally on a Windows Server 2022 system as an administrator with
UAC disabled, reported in GitoxideLabs#1429, and suggests that the conditions
of that failure may differ from, or be more complex than, what I
had described there.)
@EliahKagan
Copy link
Member Author

I've opened #1657 to run Windows tests on CI with GIX_TEST_IGNORE_ARCHIVES=1 and assert that no more than the expected number of failures occur. Relating to #1358 (comment), I have not (yet?) made the new job declare a dependency on any other job(s), but that could be done.

@Byron
Copy link
Member

Byron commented Nov 7, 2024

If your experience is the same, then just keeping all the jobs in the ci.yml workflow, which should be done anyway due to their close relatedness and having the same event trigger, should be sufficient to avoid getting extra emails. In that case, any way of organizing the jobs and deciding/expressing their dependences should be fine.

Yes, you are right, it's actually only one per workflow (file), so that's a non-issue.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 8, 2024
This modifies the `test-fixtures-windows` job that tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that
no more than 14 failures occur, it checks that the failing tests
are exactly those that are documented in GitoxideLabs#1358 as expected to fail.

The initial check that no tests have *error* status is preserved,
with only stylistic changes, and kept separate from the subsequent
logic so that the output is clearer.

The new steps are no longer conditional on `nextest` having exited
with a failure status, since (a) that was probably unnecessary
before and definitely unnecessary now, (b) at last for now, the
comparison is precise, so it would be strange to pass if the diff
were to have changes on *all* lines, and (c) this makes it slightly
less likely that GitoxideLabs#1358 will accidentally stay open even once fixed.

The current approach is to actually retrieve the list of tests
expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from
the GitoxideLabs#1358 issue body. This has the advantage that it automatically
keeps up to date with changes made to that issue description, but
this is of course not the only possible approach for populating the
expected value.

Two changes should be made before this is ready:

- As noted in the "FIXME" comment, the job should currently fail
  becuase the performance test reported to fail in GitoxideLabs#1358 is not
  being filtered out from the expected failures list. It's left in
  as of this commit, to verify that the job is capable of failing.

  (After that, the performance test should either be filtered out
  or removed from the list in GitoxideLabs#1358, but the former approach is
  currently preferable because I have not done diverse enough
  testing to check if the failure on my main Windows system is due
  to that system being too slow rather than a performance bug.)

- The scratchwork file should be removed once no longer needed.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 8, 2024
This modifies the `test-fixtures-windows` job that tests on Windows
with `GIX_TEST_IGNORE_ARCHIVES=1` so that, instead of checking that
no more than 14 failures occur, it checks that the failing tests
are exactly those that are documented in GitoxideLabs#1358 as expected to fail.

The initial check that no tests have *error* status is preserved,
with only stylistic changes, and kept separate from the subsequent
logic so that the output is clearer.

The new steps are no longer conditional on `nextest` having exited
with a failure status, since (a) that was probably unnecessary
before and definitely unnecessary now, (b) at last for now, the
comparison is precise, so it would be strange to pass if the diff
were to have changes on *all* lines, and (c) this makes it slightly
less likely that GitoxideLabs#1358 will accidentally stay open even once fixed.

The current approach is to actually retrieve the list of tests
expected to fail on Windows with `GIX_TEST_IGNORE_ARCHIVES=1` from
the GitoxideLabs#1358 issue body. This has the advantage that it automatically
keeps up to date with changes made to that issue description, but
this is of course not the only possible approach for populating the
expected value.

Two changes should be made before this is ready:

- As noted in the "FIXME" comment, the job should currently fail
  becuase the performance test reported to fail in GitoxideLabs#1358 is not
  being filtered out from the expected failures list. It's left in
  as of this commit, to verify that the job is capable of failing.

  (After that, the performance test should either be filtered out
  or removed from the list in GitoxideLabs#1358, but the former approach is
  currently preferable because I have not done diverse enough
  testing to check if the failure on my main Windows system is due
  to that system being too slow rather than a performance bug.)

- The scratchwork file should be removed once no longer needed.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 8, 2024
This omits tests containing `performance` (and not as part of a
larger "word", not even with `_`) from being expected to fail on CI
with `GIX_TEST_IGNORE_ARCHIVES=1` on Windows.

Currently there is one such test listed in GitoxideLabs#1358,
`gix-ref-tests::refs packed::iter::performance`.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this issue Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged an issue is accepted as shortcoming to be fixed C-Windows Windows-specific issues help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants