Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jsii release 1.86.0 is breaking our tox tests #4207

Closed
dreambeyondorange opened this issue Aug 2, 2023 · 5 comments · Fixed by #4215
Closed

jsii release 1.86.0 is breaking our tox tests #4207

dreambeyondorange opened this issue Aug 2, 2023 · 5 comments · Fixed by #4215
Assignees
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.

Comments

@dreambeyondorange
Copy link

Describe the bug

When we run tox, we get the following error

ERROR tests/pcluster/templates/test_cdk_artifacts_manager.py - RuntimeError: EEXIST: file already exists, open '/home/runner/.cache/aws/jsii/package-cache/@aws-cdk/cloud-assembly-schema/1.204.0/04582db0a7f6f674167deae6ce1b36f828cf13590bb60ee3ec8783d5cb0a7f06.lock'
ERROR tests/pcluster/templates/test_cdk_builder_utils.py - RuntimeError: EEXIST: file already exists, open '/home/runner/.cache/aws/jsii/package-cache/@aws-cdk/aws-sqs/1.204.0/06b40dd6daf47b06f64d8503d2ce34e0dfad12144da58f480b7fa56c56ebeb99.lock'

See our github actions log https://github.com/aws/aws-parallelcluster/actions/runs/5734849633/job/15541683710?pr=5553

Pinning the version allows it to succeed aws/aws-parallelcluster#5555

Expected Behavior

jsii to not fail

Current Behavior

jsii fails due to a lock file already existing

Reproduction Steps

steps are performed in this github action https://github.com/aws/aws-parallelcluster/actions/runs/5734849633/job/15541683710?pr=5553

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.204.0

Environment details (OS name and version, etc.)

Ubuntu 22.04.2

@aamielsan
Copy link

We are also encountering the same issue for tests that are running in parallel, with each test running synth to synthesize the stack and asserting the resources part of the stack.

The fix of pinning to 1.85.0 also worked for our use case.

@juweeks
Copy link

juweeks commented Aug 3, 2023

v1.86.0 broke pytest-xdist. @aamielsan fix of pinning v1.85.0 worked here. thanks!

one of the errors::

============================= test session starts ==============================
platform linux -- Python 3.9.17, pytest-7.4.0, pluggy-1.2.0 -- /root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/bin/python
cachedir: .pytest_cache
rootdir: /builds/aws/infrastructure
plugins: cov-4.1.0, mock-3.11.1, xdist-3.3.1, typeguard-2.13.3
created: 8/8 workers
8 workers [22 items]
scheduling tests via LoadScheduling
==================================== ERRORS ====================================
________________________ ERROR collecting test session _________________________
jsii.errors.JavaScriptError: 
  Error: EEXIST: file already exists, open '/root/.cache/aws/jsii/package-cache/@aws-cdk/asset-awscli-v1/2.2.200/76266c6c0354cd41d9860597faeaa631e6962e0fe67b1ec94555b06402098446.lock'
      at Object.openSync (node:fs:602:3)
      at exports.lockSync (/tmp/tmprleibzov/lib/program.js:2880:29)
      at retryThrow (/tmp/tmprleibzov/lib/program.js:2911:32)
      at exports.lockSync (/tmp/tmprleibzov/lib/program.js:2903:24)
      at retryThrow (/tmp/tmprleibzov/lib/program.js:2911:32)
      at exports.lockSync (/tmp/tmprleibzov/lib/program.js:2903:24)
      at retryThrow (/tmp/tmprleibzov/lib/program.js:2911:32)
      at exports.lockSync (/tmp/tmprleibzov/lib/program.js:2903:24)
      at retryThrow (/tmp/tmprleibzov/lib/program.js:2911:32)
      at exports.lockSync (/tmp/tmprleibzov/lib/program.js:2903:24)
The above exception was the direct cause of the following exception:
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/_pytest/config/__init__.py:642: in _importconftest
    mod = import_path(conftestpath, mode=importmode, root=rootpath)
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/_pytest/pathlib.py:565: in import_path
    importlib.import_module(module_name)
/usr/local/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1030: in _gcd_import
    ???
<frozen importlib._bootstrap>:1007: in _find_and_load
    ???
<frozen importlib._bootstrap>:972: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:228: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1030: in _gcd_import
    ???
<frozen importlib._bootstrap>:1007: in _find_and_load
    ???
<frozen importlib._bootstrap>:972: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:228: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1030: in _gcd_import
    ???
<frozen importlib._bootstrap>:1007: in _find_and_load
    ???
<frozen importlib._bootstrap>:986: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:680: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:850: in exec_module
    ???
<frozen importlib._bootstrap>:228: in _call_with_frames_removed
    ???
utils/__init__.py:5: in <module>
    from aws_cdk import CfnResource, IResource
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/aws_cdk/__init__.py:1422: in <module>
    from ._jsii import *
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/aws_cdk/_jsii/__init__.py:13: in <module>
    import aws_cdk.asset_awscli_v1._jsii
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/aws_cdk/asset_awscli_v1/_jsii/__init__.py:13: in <module>
    __jsii_assembly__ = jsii.JSIIAssembly.load(
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/jsii/_runtime.py:55: in load
    _kernel.load(assembly.name, assembly.version, os.fspath(assembly_path))
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/jsii/_kernel/__init__.py:299: in load
    self.provider.load(LoadRequest(name=name, version=version, tarball=tarball))
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/jsii/_kernel/providers/process.py:354: in load
    return self._process.send(request, LoadResponse)
/root/.local/share/virtualenvs/infrastructure-T7K4lPgJ/lib/python3.9/site-packages/jsii/_kernel/providers/process.py:342: in send
    raise RuntimeError(resp.error) from JavaScriptError(resp.stack)
E   RuntimeError: EEXIST: file already exists, open '/root/.cache/aws/jsii/package-cache/@aws-cdk/asset-awscli-v1/2.2.200/76266c6c0354cd41d9860597faeaa631e6962e0fe67b1ec94555b06402098446.lock'

@iamlittle
Copy link

seeeing the same issue with our CD pipelines. @aamielsan fix also worked for us

@sirrus233
Copy link
Contributor

Per the PR notes on #4181, this release makes a change to the runtime package caching, swapping from an opt-in default to an opt-out. This caching is what seems to be causing parallelized workflows (i.e. tests) to fail.

As an alternative workaround to being stuck on 1.85.0, you can opt-out of the caching by setting the environment variable JSII_RUNTIME_PACKAGE_CACHE=disabled

@rix0rrr rix0rrr self-assigned this Aug 9, 2023
rix0rrr added a commit that referenced this issue Aug 9, 2023
The package cache mechanism that was turned on by default in #4181
works in theory under parallelism, but not in practice.

Typically the CDK CLI will prevent CDK apps from running in parallel,
but Python testing frameworks like `tox` use subprocess parallelism
to speed up test runs, leading to the jsii imports being executed
at the same time.

Since jsii is sync, the locking needs to be sync. The sync locking
feature of the `lockfile` library doesn't have wait support (for good
reason), and so when a lock is already acquired by another process
it quickly burns through its 12 retries in a hot loop, and then exits
with an error.

Two changes to address this:

- (Ab)use `Atomics.wait()` to get a synchronous sleeping primitive;
  since `lockSync` doesn't support synchronous sleep, we build our
  own retry loop with synchronous sleep around `lockSync`.
- Since the extracted directory is immutable: if the marker file in the
  extracted directory exists, we can treat it as evidence that the
  directory has been completely written and we can skip trying to vy
  for exclusive access to write it. This avoids all lock contention
  after the very first CDK app execution.

Fixes #4207.
@mergify mergify bot closed this as completed in #4215 Aug 10, 2023
mergify bot pushed a commit that referenced this issue Aug 10, 2023
The package cache mechanism that was turned on by default in #4181 works in theory under parallelism, but not in practice.

Typically the CDK CLI will prevent CDK apps from running in parallel, but Python testing frameworks like `tox` use subprocess parallelism to speed up test runs, leading to the jsii imports being executed at the same time.

Since jsii is sync, the locking needs to be sync. The sync locking feature of the `lockfile` library doesn't have wait support (for good reason), and so when a lock is already acquired by another process it quickly burns through its 12 retries in a hot loop, and then exits with an error.

Two changes to address this:

- (Ab)use `Atomics.wait()` to get a synchronous sleeping primitive; since `lockSync` doesn't support synchronous sleep, we build our own retry loop with synchronous sleep around `lockSync`.
- Since the extracted directory is immutable: if the marker file in the extracted directory exists, we can treat it as evidence that the directory has been completely written and we can skip trying to vy for exclusive access to write it. This avoids all lock contention after the very first CDK app execution.

Fixes #4207.

---

By submitting this pull request, I confirm that my contribution is made under the terms of the [Apache 2.0 license].

[Apache 2.0 license]: https://www.apache.org/licenses/LICENSE-2.0
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. needs-triage This issue or PR still needs to be triaged.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants