Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ExtractInfo #754

Closed
rudolfix opened this issue Nov 10, 2023 · 0 comments
Closed

Implement ExtractInfo #754

rudolfix opened this issue Nov 10, 2023 · 0 comments
Assignees

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Nov 10, 2023

Background
extract step does not generate any execution / tracing info, let's fix that. Should not be started before #757 is implemented and common interface of all step infos is clear

Tasks

    • implement ExtractInfo with asdict and asstr methods like the other infos. asdict must generate dlt friendly object that can be loaded into a relational structure, asstr must be friendly to our pipeline info command that displays traces
    • extract step may extract many sources. each extracted source is a separate operation and ExtractInfo must collect all of them.

extract info should contain the following information
in extract info per source

    • a DAG of selected resources (by name - they are unique). mind that source has a method that generates DAGs
    • for each source and resource a list of applied hints
    • for each resource collect metrics: the count of the items, number of written bytes, the counts of extracted items per table + bytes, user defined metrics (ticket icoming)
    • for each table: the list of extracted files and their sizes
    • elapsed time for each resource
    • resource state, source state (this should be optional, maybe extractor config)
    • resource arguments (if created with decorator) - we have that partially implemented!
    • source arguments (if created with decorator) - to be implemented

Implementation
You must be able to get all the data above after execution is finished. You must be able to collect partial info also when we have an exception during extract (info is part of exception)

users can add custom metrics to resources that will be part of extract info. please read #755 for some implementation hints. we can collect metrics in organized way even before custom metrics are added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

1 participant