<a href="https://colab.research.google.com/github/aubreymoore/crb-damage-detector-colab/blob/main/detect_and_annotate_dev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# detect_and_annotate.ipynb

NOTE: The following documentation is already slightly out of date.
Please visit https://github.com/aubreymoore/crb-damage-detector-colab before running this notebook for the first time.

This Colab Jupyter notebook runs a custom YOLOv8 object detector which scans images to find three object classes: live coconut palms, dead coconut palms and v-shaped cuts symptomatic of damage caused by coconut rhinoceros beetle, *Oryctes rhinoceros*.

IMPORTANT: Shortly after the MAIN PROGRAM section begins executing, a BROWSE button will appear below the active cell to allow you to upload single file of input data from your loacal machine to Colab.

**Note that Colab will just sit there and not do anything until you have entered a path to a test file of URLs or a ZIP file of images on your local machine.** [Click here to scroll down to the "Browser" button.](#scrollTo=5zSjfTXvIv2q&line=1&uniqifier=1)

You may choose between 2 options:
* A TEXT file (\*.txt) containing URLs for images to be scanned. One URL per line. (This is the most efficient option.)
* A ZIP file (\*.zip) containing images to be scanned.


Test data are available in a companion GitHub repository at https://github.com/aubreymoore/crb-damage-detector-colab. To use the test data, download it to your local computer it as a [ZIP file](https://github.com/aubreymoore/crb-damage-detector-colab/archive/refs/heads/main.zip) and unzip it. If you have **git** installed, you can clone the repo as an alternative. The TEXT file or ZIP file to be uploaded to Colab will be found in the repository's **data** folder.

To scan images, select **Runtime | Run all** on the main menu.
Results will be in a temporary OUTPUT folder which you can access using the **File browser** in the left Colab panel.

When image scanning is complete, the OUTPUT folder will be compressed into a single ZIP file and automatically downloaded to your computer.

### TODO

- [ finished 2024-10-19] Reduce size of images in the companion GH repo to max dimension of 960px
- [ ] Copy current trained model to companion GH repo
- [ ] Copy this Jupyter notebook to companion GH repo
- [ ] Add confidence values to bounding box labels.
- [ ] Add database to OUTPUT folder
- [ ] Extract GPS coordinates from image files
- [ ] Figure out how to use URLs to access images stored on OneDrive (Sharepoint)

# Load Python packages which are not preinstalled by Colab

In [1]:
%pip install ultralytics -q
%pip install supervision -q
# %pip install imutils -q
%pip install icecream -q
%pip install ipython-autotime -q

# Import modules

In [2]:
import cv2
import supervision as sv
from ultralytics import YOLO
# import imutils

import glob
import os
import shutil
from skimage import io
from icecream import ic
from google.colab import files
import zipfile
# import io
from urllib.request import urlretrieve

# ultralytics.checks()

# Load cell timer

In [3]:
%load_ext autotime

time: 335 µs (started: 2024-10-24 11:46:21 +00:00)


# Define functions

In [4]:
# url = 'https://github.com/aubreymoore/crb-damage-detector-colab/blob/main/data/images/IMG_0532.JPG?raw=true'
# filename = 'IMG_0532.JPG'
# urlretrieve(url, filename)

time: 329 µs (started: 2024-10-24 11:46:30 +00:00)


In [5]:
def get_gps_from_exif(image_path):
  """
  Gets timestamp and GPS coordinates from an image.

  Args:
    image_path:

  Returns:
    timestamp, latitude, longitude
  """
  with open(image_path, 'rb') as src:
    img = Image(src)
    if img.has_exif:
      try:
        timestamp = img.datetime_original
        print(img.gps_latitude)
        dms = img.gps_latitude
        latitude = dms[0] + dms[1]/60 + dms[2]/3600
        if img.gps_latitude_ref == 'S':
          latitude = -latitude
        dms = img.gps_longitude
        longitude  = dms[0] + dms[1]/60 + dms[2]/3600
        if img.gps_longitude_ref == 'W':
          longitude = -longitude
        return {"timestamp": timestamp, "latitude": latitude, "longitude": longitude}
      except Exception as e:
        print(e)
        return {"timestamp": None, "latitude": None,"longitude": None}
    else:
      print ('The Image has no EXIF')
      return {"timestamp": None, "latitude": None,"longitude": None}

# get_gps_from_exif('IMG_0532.JPG')

time: 1.21 ms (started: 2024-10-24 11:46:30 +00:00)


In [6]:
def upload_model_weights():
  '''
  Upload model weights from GitHub repo to **weights.pt** only if this file does not already exist.
  '''
  !wget -nc https://github.com/aubreymoore/code-for-CRB-damage-ai/raw/refs/heads/main/models/3class/train5/weights/best.pt -O weights.pt

# upload_model_weights()

time: 410 µs (started: 2024-10-24 11:46:30 +00:00)


In [7]:
def load_model_weights():
  model = YOLO('weights.pt')

time: 357 µs (started: 2024-10-24 11:46:31 +00:00)


In [8]:
def create_input_folder():
  if not os.path.exists('INPUT'):
    os.makedirs('INPUT')

# create_input_folder()

time: 670 µs (started: 2024-10-24 11:46:31 +00:00)


In [9]:
def create_output_folder():
  if not os.path.exists('OUTPUT'):
    os.makedirs('OUTPUT')

# create_output_folder()

time: 572 µs (started: 2024-10-24 11:46:31 +00:00)


In [10]:
def run_garbage_disposal():
  '''
  Delete any data files left over from the last run.
  '''
  shutil.rmtree('INPUT', ignore_errors=True)
  shutil.rmtree('OUTPUT', ignore_errors=True)
  shutil.rmtree('sample_data', ignore_errors=True)

  try:
    os.remove('weights.pt')
  except OSError:
    pass

# run_garbage_disposal()

time: 458 µs (started: 2024-10-24 11:46:31 +00:00)


In [11]:
def extract_JPG_files(zip_file_path, output_dir):
    # Ensure the output directory exists
    os.makedirs(output_dir, exist_ok=True)

    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        # Loop through each file in the zip archive
        for file_name in zip_ref.namelist():
            if file_name.endswith('.JPG'):  # Check for .jpg extension
                print(f'Extracting {file_name}...')
                zip_ref.extract(file_name, output_dir)  # Extract the file

# Usage
# extract_dll_files('path/to/your/archive.zip', 'path/to/extract/directory')


time: 553 µs (started: 2024-10-24 11:46:31 +00:00)


In [12]:
def extract_zip_to_memory(zip_content):
    """
    Extracts files from a ZIP archive stored in memory.

    :param zip_content: Bytes of the ZIP file.
    :return: A dictionary of filename and file-like objects.
    """
    extracted_files = {}
    with zipfile.ZipFile(io.BytesIO(zip_content)) as z:
        for file_info in z.infolist():
            with z.open(file_info) as file:
                extracted_files[file_info.filename] = file.read()  # Read file content
    return extracted_files

# Usage example
# Assuming 'zip_data' contains the bytes of your ZIP file
# zip_data = ... (load your ZIP data here)
# files = extract_zip_to_memory(zip_data)
# for name, content in files.items():
#     print(f"Extracted {name} with size {len(content)} bytes")


time: 537 µs (started: 2024-10-24 11:46:31 +00:00)


In [13]:
def upload_and_unpack_zip_or_txt():
  '''
  Upload images in a ZIP (*.zip) or list of URLs (*.txt)
  '''
  input_mode = None
  urls = None
  image_file_dir = None

  # Upload images in a ZIP (*.zip) or list of URLs (*.txt)

  uploaded = files.upload(target_dir='INPUT')
  filename = list(uploaded.keys())[0]

  if filename.endswith('.txt'):
    input_mode = 'text'
    with open(filename, 'r') as f:
      urls = f.read().splitlines()

      for url in urls:

        # Extract filename from URL
        filename = url.split('/')[-1]
        pos = filename.find('?')
        if pos >= 0:
          filename = filename[:pos]

        urlretrieve(url, f'INPUT/filename')

  elif filename.endswith('.zip'):
    input_mode = 'zip'
    # !unzip -q $filename -d INPUT
    # image_file_dir = f'INPUT/{filename}'.replace('.zip', '')
    # ic(image_file_dir)

  else:
    raise ValueError('INPUT file must be *.txt or *.zip.')
  return input_mode, urls, image_file_dir

# input_mode, urls, image_file_dir = upload_and_unpack_zip_or_txt()
# ic(input_mode)
# ic(urls)
# ic(image_file_dir)

time: 665 µs (started: 2024-10-24 11:46:31 +00:00)


In [14]:
def get_input_file_list():
  return glob.glob(f'INPUT/**/*', recursive=True)

# get_input_file_list()

time: 1.02 ms (started: 2024-10-24 11:46:31 +00:00)


In [15]:
def detect_objects(image, model, box_annotator, label_annotator, csv_sink):
  '''
  detect objects in an image
  returns detections and an annotated image
  '''
  results = model(image)[0]
  detections = sv.Detections.from_ultralytics(results)
  # ic(detections)
  annotated_image = box_annotator.annotate(image, detections=detections)
  labels = [f"{model.model.names[class_id]} {confidence:.2f}" for class_id, confidence in zip(detections.class_id, detections.confidence)]
  annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
  return detections, annotated_image

# csv_sink = sv.CSVSink('detections.csv')
# csv_sink.open()

# upload_model_weights()
# model = YOLO('weights.pt')
# box_annotator = sv.BoxAnnotator()
# label_annotator = sv.LabelAnnotator()

# url = 'https://github.com/aubreymoore/crb-damage-detector-colab/blob/main/data/Vanuatu_July_2022_Sulav/resized-images/IMG_0532.JPG?raw=true'
# image = imutils.url_to_image(url)
# detections, annotated_image = detect_objects(image, model, box_annotator, label_annotator, csv_sink)
# ic(detections)
# sv.plot_image(annotated_image)

# custom_data = {'url': url}
# csv_sink.append(detections, custom_data)

# csv_sink.close()

time: 592 µs (started: 2024-10-24 11:46:31 +00:00)


# MAIN PROGRAM

In [16]:
# Clear data files from previous run
run_garbage_disposal()

create_input_folder()
create_output_folder()

# Upload images or list of URLs
input_mode, urls, image_file_dir = upload_and_unpack_zip_or_txt()

# Upload weights from trained model and load them
upload_model_weights()
model = YOLO('weights.pt')

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
csv_sink = sv.CSVSink('OUTPUT/detections.csv')
csv_sink.open()

Saving images.zip to INPUT/images.zip
--2024-10-24 11:47:18--  https://github.com/aubreymoore/code-for-CRB-damage-ai/raw/refs/heads/main/models/3class/train5/weights/best.pt
Resolving github.com (github.com)... 140.82.114.4
Connecting to github.com (github.com)|140.82.114.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/aubreymoore/code-for-CRB-damage-ai/refs/heads/main/models/3class/train5/weights/best.pt [following]
--2024-10-24 11:47:18--  https://raw.githubusercontent.com/aubreymoore/code-for-CRB-damage-ai/refs/heads/main/models/3class/train5/weights/best.pt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6269721 (6.0M) [application/octet-stream]
Saving to: ‘weights.pt’


2024-10-24 11:47:

In [None]:

# Scan images
if input_mode == 'text':
  for url in urls:
    try:
      # image = imutils.url_to_image(url)
      image = cv2.imread('image.jpg')
      detections, annotated_image = detect_objects(image, model, box_annotator, label_annotator, csv_sink)
      csv_sink.append(
          detections,
          custom_data={'image_h': image.shape[0], 'image_w': image.shape[1], 'source': url}
      )

      # Extract filename from URL
      filename = url.split('/')[-1]
      pos = filename.find('?')
      if pos >= 0:
        filename = filename[:pos]

      output_path = f'OUTPUT/{filename}'.replace('.', '_annotated.')
      ic(output_path)
      os.makedirs(os.path.dirname(output_path), exist_ok = True)
      cv2.imwrite(output_path, annotated_image)
    except:
      print(f'Error processing {url}')
    continue

if input_mode == 'zip':






  input_file_list = get_input_file_list()
  ic(input_file_list)
  for image_path in input_file_list:
    ic(image_path)
    try:
      image = cv2.imread(image_path)
      detections, annotated_image = detect_objects(image, model, box_annotator, label_annotator, csv_sink)
      csv_sink.append(
          detections,
          custom_data={'image_h': image.shape[0], 'image_w': image.shape[1], 'source': image_path}
      )

      filename = os.path.basename(image_path)
      output_path = f'OUTPUT/{filename}'.replace('.', '_annotated.')
      os.makedirs(os.path.dirname(output_path), exist_ok = True)
      result = cv2.imwrite(output_path, annotated_image)
    except:
      print(f'Error processing {image_path}')
    continue

csv_sink.close()

## Please click on the Browse buttom when it appears above this cell.

### Download OUTPUT folder as a ZIP file

In [None]:
!zip -r OUTPUT.zip OUTPUT

In [None]:
from google.colab import files
files.download("OUTPUT.zip")

# FINISHED
If everything worked as intended, you should find a file named **OUTPUT.zip** in your Downloads folder. Unzip this file to see results.

In [None]:

print('FINISHED')

In [27]:
import zipfile
import io
import PIL
from PIL import Image

z = zipfile.ZipFile('INPUT/images.zip')
for file_name in z.namelist():
  if file_name.lower().endswith(('.jpg', '.jpeg', '.png', '.gif')):  # Check for common image extensions
    print(f'Extracting {file_name}...')
    with z.open(file_name, 'r') as file: # Use z.open to directly open the file within the zip archive
      try:
        img = Image.open(file) # Pass the file object to Image.open
        # img = Image(file)
        print(img)
        # print(img.gps_latitude)
      # except PIL.UnidentifiedImageError:
      #   print(f"Failed to open {file_name}: UnidentifiedImageError")
      except Exception as e:
        print(f"Failed to open {file_name}: {e}")
      img

Extracting images/IMG_0671.JPG...
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=960x720 at 0x78D4C6AE0B20>
Extracting images/IMG_0695.JPG...
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=960x720 at 0x78D4C6AE3190>
Extracting images/IMG_0704.JPG...
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=960x720 at 0x78D4C6AE30A0>
Extracting images/IMG_06XX.JPG...
Failed to open images/IMG_06XX.JPG: cannot identify image file <zipfile.ZipExtFile name='images/IMG_06XX.JPG' mode='r' compress_type=deflate>
Extracting images/IMG_0532.JPG...
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=960x720 at 0x78D4C6AE0AC0>
Extracting images/IMG_0713.JPG...
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=960x720 at 0x78D4C6AE2560>
time: 12.9 ms (started: 2024-10-24 11:56:26 +00:00)


In [33]:
img._getexif



{34853: {1: 'S',
  2: (17.0, 44.0, 53.15),
  3: 'E',
  4: (168.0, 17.0, 39.25),
  5: b'\x00',
  6: 15.7755699134463,
  12: 'K',
  13: 0.18000000715266395,
  16: 'T',
  17: 284.4738461538462,
  23: 'T',
  24: 284.4738461538462,
  31: 5.0},
 296: 2,
 34665: 234,
 271: 'Apple',
 272: 'iPhone 8 Plus',
 305: '15.5',
 274: 6,
 306: '2022:07:28 10:51:26',
 531: 1,
 282: 72.0,
 283: 72.0,
 316: 'iPhone 8 Plus',
 36864: b'0232',
 37121: b'\x01\x02\x03\x00',
 37377: 5.9077276079652545,
 36867: '2022:07:28 10:51:26',
 36868: '2022:07:28 10:51:26',
 37378: 1.6959938128383605,
 37379: 3.8456704875017262,
 37380: 0.0,
 37383: 5,
 37385: 16,
 37386: 3.99,
 40961: 65535,
 40962: 4032,
 41989: 28,
 41990: 0,
 36880: '+11:00',
 36881: '+11:00',
 36882: '+11:00',
 37521: '377',
 37396: (2015, 1511, 2217, 1330),
 37522: '377',
 40963: 3024,
 41495: 2,
 33434: 0.016666666666666666,
 33437: 1.8,
 41729: b'\x01',
 34850: 2,
 34855: 32,
 41986: 0,
 40960: b'0100',
 41987: 0,
 42034: (3.9900000095374253, 6.6, 

time: 9.52 ms (started: 2024-10-24 12:02:50 +00:00)
