<a href="https://colab.research.google.com/github/FFI-Vietnam/camtrap-tools/blob/main/Wildlife%20Insights/GoogleDriveToGCP_FFIVietnam.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GoogleDriveToGCP Script
## Migrate Files from a Google Drive folder a Google Cloud Platform bucket
### Anthony Ngo. November 2020

<hr>



### 1. Sign into Google Drive

a.   Run the following cell

b.   Click on the URL in the output cell

c.   Sign into the account linked to the Google Drive folder you'd like to migrate

d.   Click "Allow"

e.   Copy and paste the Authorization code into the box in the output cell. It should say "Mounted at /content/drive" if successful

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### 2. Direct this script to the Google Drive folder

a. Assign the ``` ICDataUpload ``` variable to the Google Drive folder path

b. Run the following cell

c. Ensure that the outputted files match those that are contained in the Google Drive folder



In [6]:
GDFolder =  "My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final"

!ls "/content/drive/{GDFolder}/"

100  156  179  214  236  254  272  291	299  330  350  372  393  416  61  99
116  157  194  216  237  255  273  292	310  332  351  373  394  417  78
117  158  195  217  238  256  274  293	311  333  353  374  395  432  79
118  159  196  218  239  257  275  294	314  334  357  375  396  433  80
135  175  197  222  240  258  276  295	315  335  358  376  397  434  81
136  176  198  233  242  259  277  296	316  336  367  386  412  435  96
137  177  199  234  243  260  278  297	317  348  368  391  413  59   97
155  178  200  235  253  261  280  298	329  349  369  392  415  60   98


### 3. Sign into GCP

a.   Run the following cell

b.   Click on the URL in the output cell

c.   Sign into the account linked to the Google Drive folder you'd like to migrate

d.   Click "Allow"

e.   Copy and paste the Authorization code into the box in the output cell

In [3]:
from google.colab import auth
auth.authenticate_user()

### 4. Direct this script to the GCP bucket

a. Assign the ``` project_id ``` variable to the GCP project ID

b. Assign the ``` bucket ``` variable to the path of the GCP bucket you'd like to migrate to

c. Ensure that the outputted files match those that are in the GCP bucket (If the bucket is empty, make a temporary test file in the GCP bucket)

In [6]:
project_id = '2003086'

bucket = 'ffi_vietnam'

!gcloud config set project {project_id}

!gsutil ls "gs://{bucket}/"

Updated property [core/project].
gs://ffi_vietnam/KONPLONG19-20/


In [12]:
# only run this when the project does not exist
project_folder = "KONPLONG19-20"
bucket = 'ffi_vietnam'

# list all projects in GCP
!gsutil ls "gs://{bucket}/" > GCP_projects.txt
with open("GCP_projects.txt") as f:
  list_projects = f.read().split('\n')[:-1]

if f'gs://ffi_vietnam/{project_folder}/' not in list_projects:
  # create a placeholder file
  !mkdir "{project_folder}"
  !touch "{project_folder}/placeholder_file"
  # upload the folder and placeholder file
  !gsutil cp -r "{project_folder}" "gs://{bucket}/"
  # print Success message
  print('Successfully created project', project_folder)
  !gsutil ls "gs://{bucket}/"
else:
  print(f"Project gs://{bucket}/{project_folder} exists! Creating new project aborted")

Project gs://ffi_vietnam/KONPLONG19-20 exists! Creating new project aborted


In [16]:
# to check the status of project folder
!gsutil ls "gs://{bucket}/{project_folder}/"

gs://ffi_vietnam/KONPLONG19-20/100/
gs://ffi_vietnam/KONPLONG19-20/116/
gs://ffi_vietnam/KONPLONG19-20/117/
gs://ffi_vietnam/KONPLONG19-20/118/
gs://ffi_vietnam/KONPLONG19-20/135/
gs://ffi_vietnam/KONPLONG19-20/136/
gs://ffi_vietnam/KONPLONG19-20/137/
gs://ffi_vietnam/KONPLONG19-20/156/
gs://ffi_vietnam/KONPLONG19-20/157/
gs://ffi_vietnam/KONPLONG19-20/158/
gs://ffi_vietnam/KONPLONG19-20/159/
gs://ffi_vietnam/KONPLONG19-20/175/
gs://ffi_vietnam/KONPLONG19-20/176/
gs://ffi_vietnam/KONPLONG19-20/177/
gs://ffi_vietnam/KONPLONG19-20/178/
gs://ffi_vietnam/KONPLONG19-20/179/
gs://ffi_vietnam/KONPLONG19-20/194/
gs://ffi_vietnam/KONPLONG19-20/195/
gs://ffi_vietnam/KONPLONG19-20/196/
gs://ffi_vietnam/KONPLONG19-20/197/
gs://ffi_vietnam/KONPLONG19-20/198/
gs://ffi_vietnam/KONPLONG19-20/199/
gs://ffi_vietnam/KONPLONG19-20/200/
gs://ffi_vietnam/KONPLONG19-20/214/
gs://ffi_vietnam/KONPLONG19-20/216/
gs://ffi_vietnam/KONPLONG19-20/217/
gs://ffi_vietnam/KONPLONG19-20/218/
gs://ffi_vietnam/KONPLONG19-

### 5. Copy the Google Drive files to the GCP Bucket

a. Run the following cell

NOTE: If you are running this script in Google Colab, you may close the tab/window while it's running

b. Manually ensure the script worked by checking the GCP Bucket

In [10]:
# Uploaded from where left off
# list all station folders in Google Drive
!ls "/content/drive/{GDFolder}" > drive_folders.txt

# list all station folders in Google Cloud Private
!gsutil ls "gs://ffi_vietnam/KONPLONG19-20/" > GCP_folders.txt

# list all folders on Google Drive
with open("drive_folders.txt", 'r') as f:
  list_folder_total = f.read().split('\n')[:-1]

# number of uploaded folders
with open("GCP_folders.txt", 'r') as f:
  try:
    # [-2] acounts for ignoring the last folder because it may not be uploaded completely
    list_folder_uploaded = f.read().split('\n')[:-2] 
  except:
     list_folder_uploaded = []

# continue uploading from left
for folder in list_folder_total:
  if f'gs://ffi_vietnam/KONPLONG19-20/{folder}/' not in list_folder_uploaded:
    print(f'Currently uploaded folder {folder}')
    !gsutil -m cp -r "/content/drive/{GDFolder}/{folder}" "gs://{bucket}/{project_folder}/"

[1;30;43mKết quả truyền trực tuyến bị cắt bớt đến 5000 dòng cuối.[0m
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final/78/68916/78__68916__2019-05-11__01-32-54(2).JPG [Content-Type=image/jpeg]...
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final/78/68916/78__68916__2019-05-21__10-13-26(1).JPG [Content-Type=image/jpeg]...
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final/78/68916/78__68916__2019-05-01__11-12-32(5).JPG [Content-Type=image/jpeg]...
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final/78/68916/78__68916__2019-06-01__01-10-11(1).JPG [Content-Type=image/jpeg]...
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed images/Final/78/68916/78__68916__2019-05-28__11-22-14(1).JPG [Content-Type=image/jpeg]...
Copying file:///content/drive/My Drive/FFI/Kon Plong/2019-2020 Survey/Processed

In [9]:
# sanity check: to check if all images are uploaded to bucket preperly

!ls -R "/content/drive/{GDFolder}" > drive_files.txt
!gsutil ls -R "gs://{bucket}" > GCP_files.txt

# save all file names in Drive to a txt
with open("drive_files.txt") as f:
  drive_files = f.read().split('\n')

# save all file names in GGCP to a txt
GCP_files = []
with open("GCP_files.txt") as f:
  temp = f.read().split('\n')
  for j in temp:
    GCP_files.append(j.split('/')[-1])
  
# list all failed files
failed_files = []
for file in drive_files:
  if file not in GCP_files:
    # check if that is a file
    if file.lower().endswith(('jpg', 'jpeg')):
      failed_files.append(file)
      print(file)
  
with open('failed_files.txt', 'w') as f:
  for i in failed_files:
    f.write(i+'\n')
    
print("There are", len(set(failed_files)), "files fail to upload")

There are 0 files fail to uploaded
