Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlowETL DataPresentSensor fails if file does not exist #5763

Closed
jc-harrison opened this issue Jan 11, 2023 · 3 comments · Fixed by #6448
Closed

FlowETL DataPresentSensor fails if file does not exist #5763

jc-harrison opened this issue Jan 11, 2023 · 3 comments · Fixed by #6448
Labels
bug Something isn't working FlowETL

Comments

@jc-harrison
Copy link
Member

When FlowETL is used to ingest CDR data from a file, the first task creates a foreign table for reading the file contents via file_fdw. This can be done before the file arrives - file existence is not checked when creating the foreign table.

The next task is a DataPresentSensor sensor which should check whether the file has arrived and contains data, and keep checking until it does. However, if the file has not yet arrived then the task will encounter a psycopg2 exception and fail (rather than rescheduling, as we'd want it to do). This can still function much like a sensor if appropriate choices of the retries and retry_delay args are used, but this is not the intended purpose of the retry mechanism.

DataPresentSensor should register an unsuccessful "poke" if the file is missing (as is the case if the file is present but empty), instead of failing the entire task.

@jc-harrison jc-harrison added bug Something isn't working FlowETL labels Jan 11, 2023
@jc-harrison
Copy link
Member Author

Should be addressed along with #5090, I think.

@greenape
Copy link
Member

Might be more of a fiddle than it first seems, because we're typically using a program to load the file and you want to register a poke only if the file isn't there. I think at the least we need to know which error codes we want to handle here.

@greenape
Copy link
Member

In fact we maybe actually want to move the check for the file existing into the create operator, because it'd be tricky to distinguish say, zcat failing because a file doesn't exist from zcat failing because the permissions are wrong?

@greenape greenape mentioned this issue Feb 8, 2024
8 tasks
@mergify mergify bot closed this as completed in #6448 Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working FlowETL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants