Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backup tree walking using yield #941

Merged
merged 2 commits into from
Jun 24, 2024
Merged

Backup tree walking using yield #941

merged 2 commits into from
Jun 24, 2024

Conversation

gcalacoci
Copy link
Contributor

This is much a draft.

I've added a method walk_backups_tree that given any backup info allows to walk the tree of children and act on them starting from leaves.

let's assume this backups structure where names of the nodes are backup_id values:

            root_backup
                 /   \
            child1     child2
              /           \
         child1.1       child2.1
           /     \
   child1.1.1  child1.1.2

the method starting from the root_backup, should walk down until the leaves and then return up.
To put it simple, assuming to execute something like this:

for backup in backup_info.walk_backups_tree():
    print(backup.backup_id)

we should get as result:

child1.1.1
child1.1.2
child1.1
child1
child2.1
child2
root_backup

This should simplify the execution of actions like backup removal, allowing to walk bottom up the tree.

@andremagui
Copy link
Contributor

I have tried something like

def walk_backups_tree(self, visited: List[str], target: str):
    """
    Walk through all the children backups of the current incremental backup.
    :return: a generator of LocalBackupInfo objects for each child backup.
    """
    if self.children_backup_ids:
        for child_backup_id in self.children_backup_ids:
            if child_backup_id not in visited:
                backup_info = LocalBackupInfo(
                    self.server,
                    backup_id=child_backup_id,
                )
                yield from backup_info.walk_backups_tree(visited, target)
    visited.append(self.backup_id)
    if not self.parent_backup_id or self.backup_id == target:
        return visited
    parent_backup_info = self.get_parent_backup_info()
    if parent_backup_info:
        yield parent_backup_info.walk_backups_tree(visited, target)

We need to keep track of visited nodes, so i added a visited parameter for it that should be passed along the recursion ( the visited is the result of the walk through based on Depth-first search Post Order algo).
I added a target parameter just to stop the recursion at a node in the middle if this was used as the root to initiate the walk through although it is not necessary.
This seems to work but would like some opinions, suggestions, critics if you have so!
Also the way this would be used with other code might impact how this should be implemented.

@gcalacoci
Copy link
Contributor Author

this is an interesting take on the tree-walking method.
I have a question:

We need to keep track of visited nodes, so i added a visited parameter for it that should be passed along the recursion ( the visited is the result of the walk through based on Depth-first search Post Order algo). I added a target parameter just to stop the recursion at a node in the middle if this was used as the root to initiate the walk through although it is not necessary. This seems to work but would like some opinions, suggestions, critics if you have so! Also the way this would be used with other code might impact how this should be implemented.

Why do you want to keep track of the nodes?
The only scenario that I can think of, where the usage of a list of visited nodes makes sense is if we have a graph like this one:

           A
         /   \
        B     C
          \  /
           D
         /   \
        E     F

But that is not possible as an incremental backup can have a single parent only.
When talking of incremental backups we have a typical non-binary tree scenario, where each node has a single parent, and 0-N children like this one:

           A
         /   \
        B     C
          \ 
           D
         /   \
        E     F

(please forgive my ascii-art skills... but I think you got the idea)

Using the walk_backups_tree method on A is going to have the following outcome:
walk B > walk D > wak E > YIELD E > walk F > YELD F > YELD D > YELD B > walk C > YELD C > YIELD A

Where walk means the usage of yeld from that is not visible outside of the walk_backups_tree method chain of calls.. while YIELD is what actually returns something and triggers the next step of the generator

@andremagui
Copy link
Contributor

andremagui commented Jun 21, 2024

Actually the visited name for the variable was unfortunate! It should be backups = [] like yours there because it is the actual result of the walk through! But i got your point from the comments! I forgot about iteration of the method to get the result. This looks good

@gcalacoci gcalacoci marked this pull request as ready for review June 21, 2024 13:58
@gcalacoci gcalacoci requested a review from a team as a code owner June 21, 2024 13:58
@gcalacoci gcalacoci force-pushed the dev/bar-182 branch 2 times, most recently from c1e0406 to 025d499 Compare June 24, 2024 09:14
Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit with docstring suggestions.
The PR looks good, I only have a small comment about the walk_to_root method.

Comment on lines +904 to +912
backup_info = self.get_parent_backup_info()
while backup_info:
yield backup_info
backup_info = backup_info.get_parent_backup_info()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we yield self as well?
I mean, just so we make both methods "compatible" in terms of what is returned.
Also, it may make it easier to build the pg_combinebackup command later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on the behaviour we want... my rationale was:
"I already know the object that is self, so I start returning objects from the first parent."
I see nothing wrong in yielding self too... is just that in my mind were were more often going to ignore it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way works for me, actually.
I think we can keep it as it is, and later change if we think this is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

@barthisrael barthisrael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

This patch adds methods to walk a tree of incremental backups.

The `walk_backups_tree` method, given a node as root, traverses the tree
in post-order identifying leaves first and nodes parent of the leaves
after. This is useful for actions like: removal of a full chain of
backups. The method is implemented using yield statements, that allow to
iterate through the chain of nodes

The `walk_to_root` method allows to reach the parent full backup of a
given backup node of an incremental backups tree. The method is
implemented using yield statements allowing to cycle from the node until
the root.

References: bar-182

Signed-off-by: Giulio Calacoci <giulio.calacoci@enterprisedb.com>
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
Test `walk_to_root` and `walk_backups_tree` methods of the BackupInfo
class

References: bar-182

Signed-off-by: Giulio Calacoci <giulio.calacoci@enterprisedb.com>
Signed-off-by: Israel Barth Rubio <israel.barth@enterprisedb.com>
@gcalacoci gcalacoci merged commit 19ca909 into master Jun 24, 2024
7 of 8 checks passed
@gcalacoci gcalacoci deleted the dev/bar-182 branch June 24, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants