# Method A -- Hashes

This demo will restore based on hashes. Keeping metadata is out of scope though can be easily implimented.

Algorithm:

1. Load desired state with hashes
2. Using the `backed_up_files.json.gz` and final `curr.json.gz` file, build a hash mapping of all files.
    - Doesn't *really* matter but work backwards so the oldest files come first
3. Use the map to build the transfer

After that, you can transfer the files. Note, that the transfer can be optimized but for now, we will just do it dumbly (in another file)

In [None]:
import subprocess
import shutil
from pathlib import Path
import os,sys
import json
import gzip as gz

In [None]:
if (abspath := os.path.abspath('../')) not in sys.path:
    sys.path.insert(0,abspath)
import rirb

## Local Copy of Logs

We have direct access to the logs, but let's assume we don't.

In [None]:
dst = '../tests/testdirs/restore_poc/dst/' # Your rclone remote including : if needed
loclogs = Path('DEST/logs') # Should be LOCAL

try:
    shutil.rmtree(loclogs)
except OSError:
    pass
Path(loclogs).mkdir(parents=True,exist_ok=False)
(Path(loclogs) / '.ignore').touch()

In [None]:
cmd = ['rclone','copy',
       rirb.utils.pathjoin(dst,'logs'),loclogs,
       '--exclude','log.log']
subprocess.call(cmd)

Load the states and delete the early ones we don't care about.

For this demo, we want to restore State 1 (0-based) so we don't care about the earlier.

Make sure to handle timezones

In [None]:
def rirb_timestamp_to_unix(timestr):
    """
    Convert from the timestamps used in rirb to unitx time.
    
    Leverages rirb.utils.RFC3339_to_unix by first converting the
    string to the RFC3339 used in rclone
        Input:        2022-12-17T183002.859829-0700
        rclone style: 2022-12-17T18:30:02.859829-07:00
        Unix:         1671327002.859829
    """
    date,time = timestr.split('T')
    time = f'{time[:2]}:{time[2:4]}:{time[4:6]}{time[6:-2]}:{time[-2:]}'
    return rirb.utils.RFC3339_to_unix(f'{date}T{time}')

In [None]:
states = sorted((d for d in loclogs.iterdir() if d.is_dir()),
                key=lambda p:rirb_timestamp_to_unix(p.name))
DESIRED = 1
states

In [None]:
# Optional to be faster. Doesn't matter though
# states = states[DESIRED:]
# DESIRED = 0 # Reset since we truncated

In [None]:
states

## Build a mapping of hash to filepath 

In [None]:
hashes = {}

Use the `curr` first

In [None]:
state = states[-1]
with gz.open(state / 'curr.json.gz') as cfile:
    files = json.load(cfile)
for filename,data in files.items():
    hashes[data['Hashes']['sha1']] = os.path.join('curr',filename)

Now look at the backs.

In [None]:
for state in states[::-1]:
    backfile = state / 'backed_up_files.json.gz'
    if not backfile.exists():
        continue
    backpath = Path('back') / state.name
    
    with gz.open(backfile) as bfile:
        files = json.load(bfile)
    
    for filename,data in files.items():
        hashes[data['Hashes']['sha1']] = str(backpath /filename)
    

In [None]:
hashes

## Load the desired 

In [None]:
with gz.open(states[DESIRED] / 'curr.json.gz') as cfile:
    files = json.load(cfile)
statefiles = {file:data['Hashes']['sha1'] for file,data in files.items()}

In [None]:
transfers = [] # (SRC,DST) pair
for statefile,statehash in statefiles.items():
    transfers.append( (hashes[statehash],statefile))


In [None]:
with open('transfer_A_hashes.json','wt') as f:
    json.dump(transfers,f,indent=1,ensure_ascii=False)

In [None]:
transfers