<a href="https://colab.research.google.com/github/FFI-Vietnam/camtrap-tools/blob/main/Wildlife%20Insights/GoogleDriveToGCP_FFIVietnam.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GoogleDriveToGCP Script
## Migrate Files from a Google Drive folder a Google Cloud Platform bucket

Adapted from https://github.com/ConservationInternational/Wildlife-Insights----Data-Migration/tree/master/Image_Transfer


### 1. Sign into Google Drive

a.   Run the following cell

b.   Click on the URL in the output cell

c.   Sign into the account linked to the Google Drive folder you'd like to migrate

d.   Click "Allow"

e.   Copy and paste the Authorization code into the box in the output cell. It should say "Mounted at /content/drive" if successful

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### 2. Direct this script to the Google Drive folder

a. Replace ``` [Image_Folder] ``` with the path to Google Drive parent folder that contains the images. Example: ```"My Drive/Camera Trap Data/Kon Plong/2019-2020 Survey/Processed images/Final"```

b. Run the following cell

c. Ensure that the outputted files match those that are contained in the Google Drive folder



In [None]:
GDFolder =  "My Drive/[Image_Folder]"

# list all sub-folders
print("Sub-folders: ")
!ls "/content/drive/{GDFolder}/"

### 3. Sign into GCP

a.   Run the following cell

b.   Click on the URL in the output cell

c.   Sign into the account linked to the Google Drive folder you'd like to migrate

d.   Click "Allow"

e.   Copy and paste the Authorization code into the box in the output cell

In [None]:
from google.colab import auth
auth.authenticate_user()

### 4. Direct this script to the GCP bucket

a. Replace ``` [Project_ID] ``` with the GCP project ID. Consult Wildlife Insigths staff for more information.

b. Replace ``` [Bucket_Name] ``` with the organization's GCP bucket name. The bucket name is unique for each organization, contact Wildlife Insights staff for more information.

c. Replace ```[Project_Folder]``` with the folder in GCP where you want to save your images in. Example: ```project_folder = "KONPLONG19-20"```

d. Ensure that the outputted files match those that are in the GCP bucket (If the bucket is empty, make a temporary test file in the GCP bucket)

In [None]:
project_id = '[Project_ID]'
bucket = '[Bucket_Name]'
project_folder = "[Project_Folder]"

In [None]:
!gcloud config set project {project_id}

# list all projects in GCP
!gsutil ls "gs://{bucket}/" > GCP_projects.txt
with open("GCP_projects.txt") as f:
  list_projects = f.read().split('\n')[:-1]

# create a new project folder in GCP if not existed, pass otherwise
if f'gs://{bucket}/{project_folder}/' not in list_projects:
  # create a project folder and temporary placeholder file in Google Colab
  !mkdir "{project_folder}"
  !touch "{project_folder}/placeholder_file"
  # upload the folder and placeholder file to GCP
  !gsutil cp -r "{project_folder}" "gs://{bucket}/"
  # print Success message
  print('Successfully created project', project_folder)
  !gsutil ls "gs://{bucket}/"
else:
  print(f"Project gs://{bucket}/{project_folder} exists! Creating new project aborted")

In [None]:
# to check the status of project folder in GCP
!gsutil ls "gs://{bucket}/{project_folder}/"

### 5. Copy the Google Drive files to the GCP Bucket

a. Run the following cell

NOTE: If you are running this script in Google Colab, you may close the tab/window while it's running

b. Manually ensure the script worked by checking the GCP Bucket

In [None]:
# Uploaded from where left off
# list all station folders in Google Drive
!ls "/content/drive/{GDFolder}" > drive_folders.txt

# list all station folders in GCP
!gsutil ls "gs://{bucket}/{project_folder}/" > GCP_folders.txt

# list all folders on Google Drive
with open("drive_folders.txt", 'r') as f:
  list_folder_total = f.read().split('\n')[:-1]

# number of uploaded folders
with open("GCP_folders.txt", 'r') as f:
  try:
    # [-2] acounts for ignoring the last folder because it may not be uploaded completely
    list_folder_uploaded = f.read().split('\n')[:-2] 
  except:
     list_folder_uploaded = []

# continue uploading from left
for folder in list_folder_total:
  if f'gs://{bucket}/{project_folder}/{folder}/' not in list_folder_uploaded:
    print(f'Currently uploaded folder {folder}')
    !gsutil -m cp -r "/content/drive/{GDFolder}/{folder}" "gs://{bucket}/{project_folder}/"

In [None]:
# sanity check: to check if all images are uploaded to bucket preperly

!ls -R "/content/drive/{GDFolder}" > drive_files.txt
!gsutil ls -R "gs://{bucket}" > GCP_files.txt

# save all file names in Drive to a txt
with open("drive_files.txt") as f:
  drive_files = f.read().split('\n')

# save all file names in GGCP to a txt
GCP_files = []
with open("GCP_files.txt") as f:
  temp = f.read().split('\n')
  for j in temp:
    GCP_files.append(j.split('/')[-1])
  
# list all failed files
failed_files = []
for file in drive_files:
  if file not in GCP_files:
    # check if that is a file
    if file.lower().endswith(('jpg', 'jpeg')):
      failed_files.append(file)
      print(file)
  
with open('failed_files.txt', 'w') as f:
  for i in failed_files:
    f.write(i+'\n')
    
print("There are", len(set(failed_files)), "files fail to upload")