Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent running 2 pipelines with the same name at the same time on the same machine #1102

Open
sh-rp opened this issue Mar 18, 2024 · 0 comments
Assignees

Comments

@sh-rp
Copy link
Collaborator

sh-rp commented Mar 18, 2024

dlt version

0.4.6

Describe the problem

When running 2 pipelines with the same name at the same time on the same machine, there will be race conditions with the load packages, as pipelines are designed to pick up and complete incomplete load packages lying around in their local folder. The solution from the user side is to use different names for each pipeline. While this is clear if you are familiar with the inner workings of dlt, this is not really super obvious or intuitive, so we should attempt to prevent this scenario or at least print a warning to stdout.

This solution should also work if 2 pipelines are started from two different python scripts running in 2 different processes, so we need some kind of interprocess communication for this. Possibilities are:

  • Place a marker as a file into the pipeline local directory which is removed after the pipeline exits (either successfully or unsuccessfully). We can lock this file to make it "thread safe" between processes. We would need a way for the user to clear this lock if somehow the lock is not removed after the pipeline exits (container crash etc.) If a pipeline gets started and finds the lock, it will print a warning to console and exit without doing anything. The warning should contain info on parallel pipeline runs and instructions on how to clear the marker.
  • Use multiprocessing.Lock() with pipeline name as key (will not work if different containers are accessing the same "local storage" mounted to them)
  • ...

Expected behavior

No response

Steps to reproduce

Run the same pipeline twice at the same time.

Operating system

macOS

Runtime environment

Local

Python version

3.10

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

1 participant