<a href="https://colab.research.google.com/github/KarlRadtke/ELOC_database/blob/main/Backup_in_google_drive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Copying S3 Buckets to Google Drive in Colab

This notebook provides a workflow to transfer data from AWS S3 buckets
to Google Drive while maintaining the directory structure. Ensure you
have a YAML file with AWS credentials prepared.


Import Necessary Libraries

In [3]:
!pip install boto3 tqdm



In [2]:
from google.colab import drive
import boto3
import os
from tqdm.notebook import tqdm
import yaml

### Mount Google Drive

In [4]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Load AWS Credentials

We'll load our AWS credentials from a YAML file.

In [7]:
from google.colab import files
uploaded = files.upload()

# Load AWS credentials from YAML
with open("connection_config.yaml", 'r') as file:
    creds = yaml.safe_load(file)

AWS_ACCESS_KEY_ID = creds['access_key']
AWS_SECRET_ACCESS_KEY = creds['secret_access_key']

s3 = boto3.resource(
    's3',
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)


Saving connection_config.yaml to connection_config (1).yaml


### Define the Copy Function

This function will copy content from a specified S3 bucket to a
designated folder on Google Drive, maintaining the directory structure.

In [8]:
def copy_bucket_to_drive(bucket, destination_folder):
    """
    Copies the contents of an S3 bucket to Google Drive.

    Parameters:
    - bucket: boto3 Bucket instance.
    - destination_folder: str. The root directory on Google Drive where the
      bucket contents will be saved.
    """

    # Define the base directory in Google Drive
    base_dir = os.path.join(destination_folder, bucket.name)

    # Get total number of objects for progress bar
    total_objects = sum(1 for _ in bucket.objects.all())

    print(f"Transfer bucket: {bucket.name}.")
    # Iterate over each object in the S3 bucket with a progress bar
    for obj in tqdm(bucket.objects.all(), total=total_objects, desc="Copying files"):
        try:
            # Define where to save the object on Google Drive
            path_on_drive = os.path.join(base_dir, obj.key)

            # If the folder structure leading up to the object doesn't exist, create it
            if not os.path.exists(os.path.dirname(path_on_drive)):
                os.makedirs(os.path.dirname(path_on_drive))

            # Download the object from S3 and save it to Google Drive
            bucket.download_file(obj.key, path_on_drive)

        except Exception as e:
            print(f"Error copying key {obj.key}: {e}")

### Execute the Copy

Now, we'll specify the S3 buckets we want to copy and then initiate the
copy process. Adjust the `bucket_names` list to contain the names of
the S3 buckets you want to transfer.


In [None]:
#BUCKETS_TO_COPY = ['tangkahan', 'bukit-tiga-puluh', 'sabah', 'way-kambas']
BUCKETS_TO_COPY = ['btp-abt-202307', 'way-kambas']
DESTINATION_FOLDER = "/content/drive/Shareddrives/ELOC_database"

# Iterate over each bucket and copy to Google Drive
for bucket_name in BUCKETS_TO_COPY:
    try:
        bucket = s3.Bucket(bucket_name)
        copy_bucket_to_drive(bucket, DESTINATION_FOLDER)
    except Exception as e:
        print(f"Error copying bucket {bucket_name}: {e}")


Transfer bucket: btp-abt-202307.


Copying files:   0%|          | 0/5059 [00:00<?, ?it/s]

Error copying key metadata/: [Errno 20] Not a directory: '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/metadata/.8DbDB8dF' -> '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/metadata/'
Error copying key metadata/photos/: [Errno 20] Not a directory: '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/metadata/photos/.DDDf6D48' -> '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/metadata/photos/'
Error copying key soundfiles/: [Errno 20] Not a directory: '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/soundfiles/.18d6ad96' -> '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/soundfiles/'
Error copying key soundfiles/1_1673352107743/: [Errno 20] Not a directory: '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/soundfiles/1_1673352107743/.E5Ecd70A' -> '/content/drive/Shareddrives/ELOC_database/btp-abt-202307/soundfiles/1_1673352107743/'
Error copying key soundfiles/1_1673370357472/: [Errno 20] Not a directory: '/content/drive/