Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement LoadInfo and ExtractInfo missing tracing #853

Open
rudolfix opened this issue Dec 23, 2023 · 0 comments
Open

implement LoadInfo and ExtractInfo missing tracing #853

rudolfix opened this issue Dec 23, 2023 · 0 comments
Labels
tech-debt Leftovers from previous sprint that should be fixed over time

Comments

@rudolfix
Copy link
Collaborator

Background
We release a refactored tracing for 0.4.1 that unified the shapes of data across pipeline step. A few important features are still missing

Tasks

LoadInfo (after #757 )

    • dataset name may be different for each load_id so extend load info to keep this information
    • remove elapsed from LoadJobInfo. Instead implement proper metrics for jobs generated in load.py. We should get start, stop times of the jobs initially. mind that you need to persist the metrics as this step is not atomic and may be restarted

ExtractInfo (after #754)

    • resource arguments (if created with decorator) - we have that partially implemented!
    • source arguments (if created with decorator) - to be implemented
      We should be careful what we send. Some of the arguments are secrets so initially we should only send arguments that are typed and not typed as secrets. for any other arguments we should just information that argument got set

Tests

    • test trace generation LoadInfo for use_single_dataset pipeline setting which will generate separate dataset for each schema (we should move this config to load config or destination config)
    • test persisted metrics in load. do they survive restarts?
    • better ExtractInfo steps: we do not check the hints trace too well
    • a proper test of the shape of the data: possibly using data contracts and pydantic (write a separate ticket!)
@rudolfix rudolfix added the tech-debt Leftovers from previous sprint that should be fixed over time label Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech-debt Leftovers from previous sprint that should be fixed over time
Projects
Status: Todo
Development

No branches or pull requests

1 participant