Skip to content

FrameworkProcessor._package_code fails on Windows with PermissionError (WinError 32) when deleting temp tar.gz #5873

@lposti

Description

@lposti

PySDK Version

  • PySDK V2 (2.x)
  • PySDK V3 (3.x)

Describe the bug

On Windows, calling FrameworkProcessor.run() with a local source_dir fails during code packaging with:

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\...\\AppData\\Local\\Temp\\tmpXXXX.tar.gz'

The failure occurs in FrameworkProcessor._package_code() at os.unlink(tmp.name) (sagemaker-core/src/sagemaker/core/processing.py ~L1159–L1182).

The method creates a temp file with tempfile.NamedTemporaryFile(..., delete=False) and calls os.unlink(tmp.name) inside the with block, while the NamedTemporaryFile handle is still open. On Windows, a file cannot be deleted while any handle remains open; on Linux/macOS this pattern often succeeds.

To reproduce

  1. Use Windows (tested on Windows 11).
  2. Install PySDK V3, e.g. pip install sagemaker-core (tested with sagemaker-core==2.11.0).
  3. Save and run the script below (no AWS credentials required; S3 upload is mocked):
"""Minimal repro for FrameworkProcessor._package_code WinError 32 on Windows."""

import os
import sys
from unittest.mock import patch

if sys.platform != "win32":
    print("Skip: this bug only reproduces on Windows.")
    sys.exit(0)

from sagemaker.core.helper.session_helper import Session
from sagemaker.core.processing import FrameworkProcessor

SOURCE_DIR = "minimal_src"
os.makedirs(SOURCE_DIR, exist_ok=True)
with open(os.path.join(SOURCE_DIR, "script.py"), "w", encoding="utf-8") as f:
    f.write("print('hello')\n")

sess = Session()
processor = FrameworkProcessor(
    image_uri="123456789012.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.0.0-cpu-py3",
    role="arn:aws:iam::123456789012:role/SageMakerExecutionRole",
    instance_type="ml.m5.large",
    instance_count=1,
    sagemaker_session=sess,
)

with patch("sagemaker.core.s3.S3Uploader.upload_string_as_file_body"):
    processor._package_code(
        entry_point="script.py",
        source_dir=SOURCE_DIR,
        requirements=None,
        job_name="test-job",
        kms_key=None,
    )
  1. Observe PermissionError: [WinError 32].

Real-world usage (also fails before the processing job is created):

from sagemaker.core.helper.session_helper import Session
from sagemaker.core.image_uris import get_training_image_uri
from sagemaker.core.processing import FrameworkProcessor

sess = Session()
processor = FrameworkProcessor(
    image_uri=get_training_image_uri(
        region=sess.boto_region_name,
        framework="pytorch",
        instance_type="ml.m5.large",
    ),
    role="<SageMaker execution role ARN>",
    instance_type="ml.m5.large",
    instance_count=1,
)

processor.run(
    code="script.py", # script to execute
    source_dir="src",  # local directory
    job_name="test-job",
    wait=False,
)

Expected behavior

Code packaging completes successfully on Windows: the temporary sourcedir.tar.gz is removed after upload (or left for the OS to clean up), and FrameworkProcessor.run() proceeds to create the processing job.

Screenshots or logs

PermissionError: [WinError 32] The process cannot access the file because it is being used by another process:
'C:\\Users\\LORENZ~1\\AppData\\Local\\Temp\\tmp6qp3s0lg.tar.gz'

Traceback (abbreviated):

File ".../sagemaker/core/processing.py", line 1231, in run
    s3_runproc_sh, inputs, job_name = self._pack_and_upload_code(...)
File ".../sagemaker/core/processing.py", line 1270, in _pack_and_upload_code
    s3_payload = self._package_code(...)
File ".../sagemaker/core/processing.py", line 1182, in _package_code
    os.unlink(tmp.name)
PermissionError: [WinError 32] ...

System information

  • SageMaker Python SDK version: sagemaker-core==2.11.0 (V3 imports: sagemaker.core.*)
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch (FrameworkProcessor with PyTorch training image URI)
  • Framework version: default from get_training_image_uri (py312 image selected automatically)
  • Python version: 3.14.4
  • CPU or GPU: CPU (local client); instance type ml.m5.large for the job definition
  • Custom Docker image (Y/N): N

Additional context

Suggested fix: Close the temp file before unlink, or use tempfile.mkstemp and avoid holding an open handle during deletion. For example:

fd, tmp_path = tempfile.mkstemp(suffix=".tar.gz")
os.close(fd)
try:
    with tarfile.open(tmp_path, "w:gz") as tar:
        ...
    with open(tmp_path, "rb") as f:
        body = f.read()
    s3.S3Uploader.upload_string_as_file_body(body=body, ...)
    return s3_uri
finally:
    if os.path.exists(tmp_path):
        os.unlink(tmp_path)

Alternatively, move os.unlink(tmp.name) outside the NamedTemporaryFile context manager (after the with block exits and the handle is closed).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions