Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError when ingesting feature group data from pandas DataFrame #2297

Closed
gmjohns opened this issue Apr 22, 2021 · 9 comments
Closed
Labels

Comments

@gmjohns
Copy link

gmjohns commented Apr 22, 2021

Describe the bug
I am getting an error when trying to ingest data using the FeautureGroup.ingest() method. Possibly due to latest changes: (#2288)

To reproduce
Follow this example exactly. The NotImplementedError should happen when executing this line:
identity_feature_group.ingest(data_frame=identity_data, max_workers=3, wait=True)

Expected behavior
DataFrame should be entered successfully into feature group

Screenshots or logs
abbreviated traceback:
...

~/*/lib/python3.8/site-packages/sagemaker/feature_store/feature_group.py in wait(self, timeout)
    235         
    236         try:
--> 237             results = self._async_result.get(timeout=timeout)
    238         except KeyboardInterrupt as i:
    239             # terminate workers abruptly on keyboard interrupt.

...

~/*/lib/python3.8/site-packages/multiprocess/pool.py in __reduce__(self)
    638 
    639     def __reduce__(self):
--> 640         raise NotImplementedError(
    641               'pool objects cannot be passed between processes or pickled'
    642               )

NotImplementedError: pool objects cannot be passed between processes or pickled

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.38.0
  • Python version: 3.8.5
  • Custom Docker image (Y/N): N

Additional context
N/A

@mpetri
Copy link

mpetri commented Apr 30, 2021

I have the same issue following the example code provided. This works fine with version 2.37

@christophmark
Copy link

Same issue here. Any update on this?

@paulovasconcellos-hotmart

Facing the same problem with Python 3.8

@lee-saint
Copy link

Same issue here with Python 3.8.10 and SageMaker Python SDK version 2.44.0

@hl6
Copy link

hl6 commented Jul 8, 2021

@alex-tang It looks like your PR (#2288) introduced this issue. This breaks examples in the SageMaker documentation in multiple places (here, here, here, and here). Any idea? @icywang86rui

@jonathanglima
Copy link

any news on this?

@psnilesh
Copy link
Contributor

psnilesh commented Jan 31, 2023

So a change got merged in upstream to not fork another process if max_workers = 1 and max_processes = 1. It should be available in next minor version release, and should prevent some of these compatibility issues we've been having with processes.

@mchowdry
Copy link

mchowdry commented Apr 27, 2023

Until this is fixed, this code seemed to allow me to get around this:

sm_version = sagemaker.__version__
major, minor, patch = sm_version.split('.')
if int(major) < 2 or int(minor) < 125:
subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'sagemaker==2.125.0'])
importlib.reload(sagemaker)

@martinRenou
Copy link
Collaborator

Closing as setting max_workers = 1 and max_processes = 1 may fix issues introduced by multiprocessing support like this one. Feel free to reopen if you think otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests