# Envrionment Setup

You'll probably need to install tesseract (by running this code) at the beginning of each Colab session. You don't need to rerun this part for every folder, if you're running multiple folders in one session. 

Restart the runtime after installation. You can do this by going to Runtime>Restart Runtime or pressing the button at the output dialogue.

In [1]:
! apt install tesseract-ocr
! apt install libtesseract-dev
! pip install pytesseract

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 3 newly installed, 0 to remove and 42 not upgraded.
Need to get 4,795 kB of archives.
After this operation, 15.8 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr-eng all 4.00~git24-0e00fe6-1.2 [1,588 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr-osd all 4.00~git24-0e00fe6-1.2 [2,989 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr amd64 4.00~git2288-10f4998a-2 [218 kB]
Fetched 4,795 kB in 0s (14.1 MB/s)
Selecting previously unselecte

# Import Packages and Define Functions

In [1]:
#import packages
import pytesseract
import cv2
from google.colab.patches import cv2_imshow
import numpy as np
import re
import os
import shutil
from google.colab import drive

In [2]:
#Param: full source address of current picture file
#Returns: unaltered timeStamp from bottom of the photo (as a string)
def extract_timeStamp(pic_address):
    img = cv2.imread(pic_address) #read as an image
    ts = img[2352:, 2000:, :] #(change if sizing conventions change!)
    text = pytesseract.image_to_string(ts)
    return text

Note: The generate_picName function requires the folder to start with W# to get the watershed number! If this is not the case, update the ws_num variable manually.

In [3]:
#Param: full source address of current picture file, unaltered timeStamp from bottom of the photo (as a string)
#Returns: new name of picture

def generate_picName(fdr_name, tStamp):
  ws_num = fdr_name[1] #!!!this should be changed if src_elements[-2][1] will not be watershed number!!

  stamp_elements = re.split('[\n: -]', tStamp)
  date = stamp_elements[2] + stamp_elements[0] + stamp_elements[1]
  time = stamp_elements[3] + stamp_elements[4] + stamp_elements[5]

  new_name = "Hbwtr_w" + ws_num + '_' + date + '_' + time + '.JPG'
  return new_name

In [4]:
#Param: address of folder you're going to copy, address of destination parent folder
#Returns: name of folder (to use later), full new address of copy folder 
#determines new folder address

def new_folder(src, dst = None):
  if dst is None:
    dst = src

  src_elements = src.split('/')
  folder_name = src_elements[-1]
  newAddress = dst + "/" + folder_name
  return folder_name, newAddress

In [5]:
#Param: address of folder
#No return variable
#Renames all files in the current folder

def rename_images(picFolder, fdr_name, fdr_dst):
  picFiles = np.array(os.listdir(picFolder))
  
  for filename in picFiles:
    print(filename) #just to track where you are
    src = picFolder + '/' + filename #old img address
    filetype = filename[-4:]

    if (filetype == '.JPG') or (filetype == '.jpg'):
      tStamp = extract_timeStamp(src)
      new_name = generate_picName(fdr_name, tStamp)

      dst = fdr_dst + '/' + filename #new img address
      dst_renamed = fdr_dst + '/' + new_name #img new address + name

      shutil.copy(src, dst_renamed) #copies old image to new destination
      #os.rename(src, dst) #renames file in Google Drive

In [6]:
def unzip_src(folder):
  new_src = folder
  unzipped = None

  if folder[-4:] == '.zip':
    src_as_zip = True
    unzipped = folder[:-4]
    shutil.unpack_archive(folder, unzipped)

    parentZip = np.array(os.listdir(unzipped))
    new_src = unzipped + '/' + parentZip[0]
  
  return new_src, unzipped

# Main Code

In [7]:
#this will connect to your Google Drive. It will ask you to allow access
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


Change the `folder` variable to match address of folder in your personal GoogleDrive. All addresses will start with /content/drive and should NOT end with a backslash.

You should not have to change the `dst` address now that we are copying the whole folder.


Additionally, set the boolean `save_as_zip` to True if you want the new folder to be saved as a zip file, and False otherwise. Right now, if set to True, it will create both a zipped and unzipped folder of the  renamed files in your drive, but I can change it to only keep the zipped folder if that's preferred (code would only change by 1 line).

The code will deal with it automatically if the source folder is a zip file, so don't worry about that.

In [15]:
folder = "/content/drive/My Drive/2_Camera Trap Photos /COPY of data for script /On_Deck /W2 GC Channel 3-16-20 thru 11-6-20"
dst = "/content/drive/My Drive/2_Camera Trap Photos /COPY of data for script /Testing destination /W2"
save_as_zip = False

#will unzip if necessary
folder, unzipped = unzip_src(folder)

#create new destination folder
fdr_name, fdr_dst = new_folder(folder, dst)
if os.path.exists(fdr_dst):
  print("path already exists")
else:
  print("new path")
  os.mkdir(fdr_dst)

new path


FileNotFoundError: ignored

In [None]:
#copy all images in folder and rename
rename_images(folder, fdr_name, fdr_dst)

In [None]:
if save_as_zip:
  shutil.make_archive(fdr_dst, 'zip', fdr_dst)

In [None]:
#clean up contents from unzipping
if unzipped:
  shutil.rmtree(unzipped)

#saves all changes to your Google Drive
drive.flush_and_unmount()
print('All changes made in this colab session should now be visible in Drive.')