# FTP SERVER SYNC (In Construction)

**Author: Xuanhe Chen**

This notebook contains a FTP server synchronization script including recursively downloading and uploading every updated files bwtween a local cluster and the FTP server. The reason of writting this script is because CU cluster firewall doesn't allow connection through http proxy so it's hard to do real-time data transmission and we need some indirect methods to achieve synchronization, such as manually repeating the upload and download daily.

Codes below are finished recursively downloading and uploading functions (the overall synchronization is still in desgin), people can refer to these functions to upload/download all data in a folder

### Download
The code below will create all parent folders (if it doesn't exist) from the root of the remote dir you choose and download all files under the remote dir under the local dir you choose. And you can choose to overwrite everything or just download files not in the local.

In [2]:
# Code modified from : https://gist.github.com/Jwely/ad8eb800bacef9e34dd775f9b3aad987

import ftplib
import os
import re
from ftplib import FTP_TLS

def _is_ftp_dir(ftp_handle, name, guess_by_extension=True):
    """ simply determines if an item listed on the ftp server is a valid directory or not """

    # guess if the name is a file by checking if it contains '.' in it
    if guess_by_extension is True:
        if '.' in name:
                return False

    original_cwd = ftp_handle.pwd()  # remember the current working directory
    try:
        ftp_handle.cwd(name)  # try to set directory to new name
        ftp_handle.cwd(original_cwd)  # set it back to what it was
        return True

    except ftplib.error_perm as e:
        print(e)
        return False

    except Exception as e:
        print(e)
        return False
    
def _make_parent_dir(fpath, local_dir):
    """ ensures the parent directory of a filepath exists """
    dirname = os.path.dirname(fpath)
    while not os.path.exists(local_dir + dirname):
        try:
            os.makedirs(local_dir + dirname)
            print("created {0}".format(dirname))
        except OSError as e:
            print(e)
            _make_parent_dir(dirname, local_dir)

def _download_ftp_file(ftp_handle, file_name, local_dir, overwrite):
    dest = ftp.pwd() + "/"
    localfilename = local_dir + dest.lstrip("/") + file_name # What we will name the file locally
    """ downloads a single file from an ftp server """
    _make_parent_dir(dest.lstrip("/"), local_dir)
    if not os.path.exists(localfilename) or overwrite is True:
        try:
            localfile = open(localfilename, 'wb')
            ftp.retrbinary(f"RETR {file_name}", localfile.write, 1024)
            localfile.close()
            print("downloaded: {0}".format(dest) + file_name)
            print("AS: " + localfilename)
        except FileNotFoundError:
            print("FAILED: {0}".format(dest))
    else:
        print("already exists: {0}".format(dest) + file_name)
        
def _file_name_match_patern(pattern, name):
    """ returns True if filename matches the pattern"""
    if pattern is None:
        return True
    else:
        return bool(re.match(pattern, name))

def _mirror_ftp_dir(ftp_handle, remote_dir, local_dir, overwrite, guess_by_extension, pattern):
    original_cwd = ftp_handle.pwd()
    ftp_handle.cwd(remote_dir)
    """ replicates a directory on an ftp server recursively """
    for item in ftp_handle.nlst(remote_dir):
        if _is_ftp_dir(ftp_handle, item, guess_by_extension):
            _mirror_ftp_dir(ftp_handle, item, local_dir, overwrite, guess_by_extension, pattern)
        else:
            if _file_name_match_patern(pattern, remote_dir):
                _download_ftp_file(ftp_handle, item.rsplit('/', 1)[1], local_dir, overwrite)
            else:
                # quietly skip the file
                pass
    ftp_handle.cwd(original_cwd)

In [3]:
# Recursive download usage example
ftp = FTP_TLS('ftp.lisanwanglab.org')
ftp.login(user='REPLACE', passwd='REPLACE') # Replace with necessary details
ftp.prot_p()

_mirror_ftp_dir(ftp, "/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/", "C:/Users/sakur/Desktop/FTP_test/", overwrite=True, guess_by_extension=True, pattern = None)

created ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2
downloaded: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test1.txt
AS: C:/Users/sakur/Desktop/FTP_test/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test1.txt
downloaded: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test2.txt
AS: C:/Users/sakur/Desktop/FTP_test/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test2.txt
downloaded: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test3.txt
AS: C:/Users/sakur/Desktop/FTP_test/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test3.txt
created ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir
downloaded: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir/test4.txt
AS: C:/Users/sakur/Desktop/FTP_test/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir/test4.txt
downloaded: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir/test5.txt
AS: C:/Users/sakur/Desktop/FTP_t

In [4]:
# if file already exist localy and you choose not to overwrite
_mirror_ftp_dir(ftp, "/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/", "C:/Users/sakur/Desktop/FTP_test/", overwrite=False, guess_by_extension=True, pattern = None)

already exists: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test1.txt
already exists: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test2.txt
already exists: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test3.txt
already exists: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir/test4.txt
already exists: /ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/test_subdir/test5.txt


### Upload
The code below will upload the local dir and all files in it to the remote dir in FTP server you choose

In [10]:
#Recursive upload

import ftplib
import os
import re
from ftplib import FTP_TLS

def __mirror_dir_ftp(ftp_handle, local_dir):
    files = os.listdir(local_dir)
    os.chdir(local_dir)
    for f in files:
        if os.path.isfile(local_dir + r'\{}'.format(f)):
            fh = open(f, 'rb')
            ftp_handle.storbinary('STOR %s' % f, fh)
            fh.close()
        elif os.path.isdir(local_dir + r'\{}'.format(f)):
            ftp_handle.mkd(f)
            ftp_handle.cwd(f)
            __mirror_dir_ftp(ftp_handle, local_dir + r'\{}'.format(f))
    ftp_handle.cwd('..')
    os.chdir('..')

In [11]:
# Recursive upload usage example

ftp = FTP_TLS('ftp.lisanwanglab.org')
ftp.login(user='REPLACE', passwd='REPLACE') # Replace with necessary details
ftp.prot_p()

# create corresponding parent directory in the remote
ftp.cwd('/ftp_fgc_xqtl/projects/histone-methylation/CU/')
ftp.mkd('test_dir2')
ftp.cwd('/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2/')

'250 Directory successfully changed.'

In [12]:
# log messages not deveoped yet but this will upload the test_dir2 to the FTP
local = r'C:/Users/sakur/Desktop/FTP_test/ftp_fgc_xqtl/projects/histone-methylation/CU/test_dir2'
__mirror_dir_ftp(ftp, local)