# Method B -- Diffs

This demo will restore based on tracking the diffs. Keeping metadata is out of scope though can be easily implimented.

**Efficiency is not a priority**. THis can be optimized latter

Algorithm:

1. Load desired state with hashes
2. Load all of the diffs. Walk forward on each file until it "terminates" (was deleted or modified). Also track renames

After that, you can transfer the files. Note, that the transfer can be optimized but for now, we will just do it dumbly (in another file)

In [None]:
import subprocess
import shutil
from pathlib import Path
import os,sys
import json
import gzip as gz
from collections import OrderedDict

In [None]:
if (abspath := os.path.abspath('../')) not in sys.path:
    sys.path.insert(0,abspath)
import rirb

## Local Copy of Logs

We have direct access to the logs, but let's assume we don't.

In [None]:
dst = '../tests/testdirs/restore_poc/dst/' # Your rclone remote including : if needed
loclogs = Path('DEST/logs') # Should be LOCAL

try:
    shutil.rmtree(loclogs)
except OSError:
    pass
Path(loclogs).mkdir(parents=True,exist_ok=False)
(Path(loclogs) / '.ignore').touch()

In [None]:
cmd = ['rclone','copy',
       rirb.utils.pathjoin(dst,'logs'),loclogs,
       '--exclude','log.log']
subprocess.call(cmd)

Load the states and delete the early ones we don't care about.

For this demo, we want to restore State 1 (0-based) so we don't care about the earlier.

Make sure to handle timezones

In [None]:
def rirb_timestamp_to_unix(timestr):
    """
    Convert from the timestamps used in rirb to unitx time.
    
    Leverages rirb.utils.RFC3339_to_unix by first converting the
    string to the RFC3339 used in rclone
        Input:        2022-12-17T183002.859829-0700
        rclone style: 2022-12-17T18:30:02.859829-07:00
        Unix:         1671327002.859829
    """
    date,time = timestr.split('T')
    time = f'{time[:2]}:{time[2:4]}:{time[4:6]}{time[6:-2]}:{time[-2:]}'
    return rirb.utils.RFC3339_to_unix(f'{date}T{time}')

In [None]:
states = sorted((d for d in loclogs.iterdir() if d.is_dir()),
                key=lambda p:rirb_timestamp_to_unix(p.name))
DESIRED = 1
states

In [None]:
# Optional to be faster. Doesn't matter though
# states = states[DESIRED:]
# DESIRED = 0 # Reset since we truncated

## Load the desired 

In [None]:
with gz.open(states[DESIRED] / 'curr.json.gz') as cfile:
    files = list(json.load(cfile).keys())

## Load the diffs

The order matters of course. Use `DESIRED+1`

In [None]:
diffs = OrderedDict()
for state in states[DESIRED+1:]:
    with gz.open(state / 'diffs.json.gz') as fobj:
        diffs[state] = json.load(fobj)

In [None]:
diffs

## Track each file

In [None]:
transfers = []
for file0 in files:
    file = file0 # May change with renames
    
    for state,diff in diffs.items():
        backpath = Path('back') / state.name
        if file in diff['modified'] + diff['deleted']:
            dest = str(backpath / file)
            break
        for src,dst in diff['renamed']:
            if src == file:
                file = dst
                break # though it won't happen again
    else:
        dest = Path('curr')  / file
    
    transfers.append((dest,file0))

transfers = [(str(a),str(b)) for a,b in transfers]

In [None]:
with open('transfer_B_tracking.json','wt') as f:
    json.dump(transfers,f,indent=1,ensure_ascii=False)