Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Ensure destination caps is picklable #397

Closed
wants to merge 2 commits into from

Conversation

z3z1ma
Copy link
Collaborator

@z3z1ma z3z1ma commented Jun 10, 2023

I noticed that the DestinationCapabilitiesContext needs to be pickleable for the filesystem destination when using more than one process in the normalizer. Otherwise you see this:

Traceback (most recent call last):
  File "/.../.venv/lib/python3.10/site-packages/dlt/pipeline/pipeline.py", line 310, in normalize
    runner.run_pool(normalize.config, normalize)
  File "/.../.venv/lib/python3.10/site-packages/dlt/common/runners/pool_runner.py", line 48, in run_pool
    while _run_func():
  File "/.../.venv/lib/python3.10/site-packages/dlt/common/runners/pool_runner.py", line 41, in _run_func
    run_metrics = run_f.run(cast(TPool, pool))
  File "/.../.venv/lib/python3.10/site-packages/dlt/normalize/normalize.py", line 302, in run
    self.spool_schema_files(load_id, schema_name, schema_files)
  File "/.../.venv/lib/python3.10/site-packages/dlt/normalize/normalize.py", line 275, in spool_schema_files
    self.spool_files(schema_name, load_id, map_parallel_f, files)
  File "/.../.venv/lib/python3.10/site-packages/dlt/normalize/normalize.py", line 244, in spool_files
    schema_updates = map_f(schema, load_id, files)
  File "/.../.venv/lib/python3.10/site-packages/dlt/normalize/normalize.py", line 221, in map_parallel
    pending.get()
  File "/nix/store/m0fw0myf45rbsqkgvrnhm6dim036nqm3-python3-3.10.10/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/nix/store/m0fw0myf45rbsqkgvrnhm6dim036nqm3-python3-3.10.10/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/nix/store/m0fw0myf45rbsqkgvrnhm6dim036nqm3-python3-3.10.10/lib/python3.10/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/nix/store/m0fw0myf45rbsqkgvrnhm6dim036nqm3-python3-3.10.10/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'DestinationCapabilitiesContext.generic_capabilities.<locals>.<lambda>'

Even if there is something deeper here to look at, making this pickle-able probably isn't a bad idea.

@netlify
Copy link

netlify bot commented Jun 10, 2023

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit 405b1f1
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/6484d2b601d29c0008b58464

@z3z1ma
Copy link
Collaborator Author

z3z1ma commented Jun 10, 2023

Pickle is preferred but in cases like the defaults, which are lambdas, we will properly fall back to marshal

@rudolfix
Copy link
Collaborator

@z3z1ma thanks for this! this is indeed a bug that I introduced recently. the capabilities must be pickable by default without custom serialization. there is even a test for that but filesystem was not included in it.
I fixed the tests and forced all caps to be pickable here #404
we'll do 0.3.1 pre-release with that fix today

@z3z1ma
Copy link
Collaborator Author

z3z1ma commented Jun 13, 2023

Thanks, this makes sense. Glad I was able to catch it ;)

@rudolfix
Copy link
Collaborator

the fix is merged in prerelease 0.3.1a0

@rudolfix rudolfix closed this Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants