<h1>Data mesh 2 Data mesh copy</h1>

This notebook is used to copy data from a data mesh container to another data mesh container.

It relies on [az copy](https://docs.azure.cn/en-us/storage/common/storage-ref-azcopy-copy) tools already installed on the VMs. It is wrapped by the script ```azcopy-script.sh```.

The best performances (2 Gbits/s) are achieved using a 16 cores compute, knowing that it is not memory intensive.


Define useful class
---

In [None]:
class ContainerCopyParam:
    src_container: str
    dst_container: str
    src_path: str
    dst_path: str | None
    src_storage_account: str | None
    dst_storage_account: str | None

    def __init__(self, src_container: str, dst_container: str, src_path: str, dst_path: str= None, src_storage_account: str = None, dst_storage_account: str = None):
       self.src_container = src_container
       self.dst_container = dst_container
       self.src_path = src_path
       self.dst_path = dst_path
       self.src_storage_account = src_storage_account
       self.dst_storage_account = dst_storage_account

    def get_dst_path(self)-> str :
        if self.dst_path is None:
            return self.src_path
        else:
            return self.dst_path

    def __repr__(self):
        return f'ContainerCopyParam({self.src_container}, {self.dst_container}, {self.src_path}, {self.dst_path}, {self.src_storage_account}, {self.dst_storage_account})'


Define parameters
---

In [None]:
log_level = "ERROR"

src_storage_account = "<default-source-account>"
dst_storage_account = "<default-destination-account>"
azcopy_script_path = "<absolute-path-to-the-directory-container-azcopy-script>"


containers_to_copy = [
    ContainerCopyParam(src_container= "<source-container>", dst_container= "<destination-container>", src_path= "<source-path>", dst_path= "<destination-path>"),
    ContainerCopyParam(src_storage_account= "<source-account>", dst_storage_account= "<destination-account>", src_container= f"<another-source-container>", dst_container= "<another-destination-container>", src_path= "<source-path>", dst_path= "<destination-path>"),
]


print(containers_to_copy)

Launch container copy
---

In [None]:
import subprocess

concurrency_value = 4000


for container in containers_to_copy:
    print("")
    print("-------------------------------------------------------------------------------------------")
    print(f"transfer: {container.src_container}")

    # Start the subprocess
    src_storage_account_to_use = container.src_storage_account or src_storage_account
    dst_storage_account_to_use = container.dst_storage_account or dst_storage_account

    parameters = ["--src-container",  container.src_container, "--dst-container",  container.dst_container, "--src-path", container.src_path, "--dst-path", container.get_dst_path(), "--src-storage-account", src_storage_account_to_use, "--dst-storage-account", dst_storage_account_to_use, "--log-level", log_level,"--concurrency-value", str(concurrency_value)]
    process = subprocess.Popen([f"{azcopy_script_path}/azcopy-script.sh"] + parameters)

    # Get the PID
    pid = process.pid
    print(f"Process ID: {pid}")

    # Wait for the process to complete
    process.wait()

    # Get the status code
    status_code = process.returncode
    print(f"Status code: {status_code}")

    if status_code != 0:
        raise ValueError(f"Transfer of {container.src_container} failed, see logs for details")
