Skip to content

FeatureGroup().ingest() throws "OSError" - "Function not implemented" inside Lambda Function #2844

@makennedy626

Description

@makennedy626

Describe the bug
This error is thrown when running a Docker Lambda Function in the first step of my State Machine. Please see below for additional information.

To reproduce

  1. Create a Lambda Function that uses a Docker Image configured similarly to System Information.
  2. Include from sagemaker.feature_store.feature_group import FeatureGroup, FeatureGroup().ingest() in your function code.
  3. Test the function - I was able to reproduce the error in the Lambda Console via the Test tab.

Expected behavior
Feature Group should successfully ingest the data.

Screenshots or logs

{
  "error": "OSError",
  "cause": {
    "errorMessage": "[Errno 38] Function not implemented",
    "errorType": "OSError",
    "requestId": <REDACTED>,
    "stackTrace": [
      "  File \"/var/task/app.py\", line 298, in lambda_handler\n    master()\n",
      "  File \"/var/task/app.py\", line 268, in master\n    <REDACTED_FEATURE_GROUP_NAME>.ingest(data_frame=df2, max_workers=1, wait=True)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 627, in ingest\n    manager.run(data_frame=data_frame, wait=wait, timeout=timeout)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 371, in run\n    self._run_multi_process(data_frame=data_frame, wait=wait, timeout=timeout)\n",
      "  File \"/var/task/sagemaker/feature_store/feature_group.py\", line 297, in _run_multi_process\n    self._processing_pool = ProcessingPool(self.max_processes, init_worker)\n",
      "  File \"/var/task/pathos/multiprocessing.py\", line 111, in __init__\n    self._serve()\n",
      "  File \"/var/task/pathos/multiprocessing.py\", line 123, in _serve\n    _pool = Pool(nodes)\n",
      "  File \"/var/task/multiprocess/pool.py\", line 191, in __init__\n    self._setup_queues()\n",
      "  File \"/var/task/multiprocess/pool.py\", line 343, in _setup_queues\n    self._inqueue = self._ctx.SimpleQueue()\n",
      "  File \"/var/task/multiprocess/context.py\", line 113, in SimpleQueue\n    return SimpleQueue(ctx=self.get_context())\n",
      "  File \"/var/task/multiprocess/queues.py\", line 345, in __init__\n    self._rlock = ctx.Lock()\n",
      "  File \"/var/task/multiprocess/context.py\", line 68, in Lock\n    return Lock(ctx=self.get_context())\n",
      "  File \"/var/task/multiprocess/synchronize.py\", line 168, in __init__\n    SemLock.__init__(self, SEMAPHORE, 1, 1, ctx=ctx)\n",
      "  File \"/var/task/multiprocess/synchronize.py\", line 63, in __init__\n    sl = self._semlock = _multiprocessing.SemLock(\n"
    ]
  }
}

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.70.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): N/A
  • Framework version: N/A
  • Python version: 3.9
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y - public.ecr.aws/lambda/python:3.9

Additional context

  1. No issues running this locally in debugger (not in Docker container).
  2. I have seen similar errors related to _multiprocessing.SemLock in the kedro repository where they were creating Lambda Functions connected via a State Machine, which they were (afaik) unable to resolve or circumvent, so a resolution to this issue might be applicable / helpful for many users of different packages.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions