Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition while parsing command line arguments if telemetry is enabled #503

Closed
c-garcia opened this issue Mar 8, 2024 · 1 comment · Fixed by #537
Closed

Race condition while parsing command line arguments if telemetry is enabled #503

c-garcia opened this issue Mar 8, 2024 · 1 comment · Fixed by #537
Labels
bug Something isn't working

Comments

@c-garcia
Copy link

c-garcia commented Mar 8, 2024

Describe the bug

In Fedora Linux 38 systems (both aarch and x86_64), when running multiple parallel instances of the command:

oasdiff breaking openapi.orig.json openapi.json

Some of then (2 or 3 out of 50, in our experiments) return error 102 to the operating system and show the text:

Error: failed to load base spec from "file": open file: no such file or directory

Please, note that the error is points to the file named file to be missing. It does not mention the real name of the file openapi.orig.json as is normally happens when the file is actually missing.

If telemetry is disabled, this behavior is not observed. That is, the command below works as expected.

OASDIFF_NO_TELEMETRY=1 oasdiff breaking openapi.orig.json openapi.json

To Reproduce

  • Run multiple instances of oasdiff with telemetry enabled. We have observed this either doing it from a single shell, or in CI systems where these multiple instances are started as a part of continuous integration jobs.

Expected behavior

We expected the command to not return that a file was missing, when in fact it was not.

Desktop (please complete the following information):

  • Linux Fedora 38 x86_64 and aarch
  • oasdiff version 1.10.8 and 1.10.11

Additional context

We have observed that this behavior seems to be caused by calling SendCommand in the preRun function. SendCommand seems to be visiting the cmd.Flags within a goroutine. Unfortunately, pflags.Visit seems not to be thread-safe and actually updates the internals of the data structure. While this happens, cobra is potentially accessing it to perform some validations such as ValidateRequiredFlags (see the code here, please). This is a potential candidate to explain the behavior and why the error is difficult to reproduce.

@c-garcia c-garcia added the bug Something isn't working label Mar 8, 2024
@reuvenharrison
Copy link
Collaborator

Thanks for the detailed replication.
I'll take a look at this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants