You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running 2 pipelines with the same name at the same time on the same machine, there will be race conditions with the load packages, as pipelines are designed to pick up and complete incomplete load packages lying around in their local folder. The solution from the user side is to use different names for each pipeline. While this is clear if you are familiar with the inner workings of dlt, this is not really super obvious or intuitive, so we should attempt to prevent this scenario or at least print a warning to stdout.
This solution should also work if 2 pipelines are started from two different python scripts running in 2 different processes, so we need some kind of interprocess communication for this. Possibilities are:
Place a marker as a file into the pipeline local directory which is removed after the pipeline exits (either successfully or unsuccessfully). We can lock this file to make it "thread safe" between processes. We would need a way for the user to clear this lock if somehow the lock is not removed after the pipeline exits (container crash etc.) If a pipeline gets started and finds the lock, it will print a warning to console and exit without doing anything. The warning should contain info on parallel pipeline runs and instructions on how to clear the marker.
Use multiprocessing.Lock() with pipeline name as key (will not work if different containers are accessing the same "local storage" mounted to them)
...
Expected behavior
No response
Steps to reproduce
Run the same pipeline twice at the same time.
Operating system
macOS
Runtime environment
Local
Python version
3.10
dlt data source
No response
dlt destination
No response
Other deployment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered:
dlt version
0.4.6
Describe the problem
When running 2 pipelines with the same name at the same time on the same machine, there will be race conditions with the load packages, as pipelines are designed to pick up and complete incomplete load packages lying around in their local folder. The solution from the user side is to use different names for each pipeline. While this is clear if you are familiar with the inner workings of dlt, this is not really super obvious or intuitive, so we should attempt to prevent this scenario or at least print a warning to stdout.
This solution should also work if 2 pipelines are started from two different python scripts running in 2 different processes, so we need some kind of interprocess communication for this. Possibilities are:
Expected behavior
No response
Steps to reproduce
Run the same pipeline twice at the same time.
Operating system
macOS
Runtime environment
Local
Python version
3.10
dlt data source
No response
dlt destination
No response
Other deployment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: