# BiteMe | Data Definition

In this notebook we source and download all available data on insect bites and stings. We hash files (if neccessary), resize and re-write the images to a separate directory. Metadata is also created for both raw and cleaned images based on the folder structure.

N.B. This will be explored more ~in v2 if we decide to progress to that.

In [1]:
import os
import sys

sys.path.append("..")
from utils.utils import hash_files, create_metadata, read_images

from utils.constants import ROWS, COLS, CHANNELS

In [2]:
# Define directories
base_dir_path = "../"

data_dir_path = os.path.join(base_dir_path, "data")
data_raw_dir_path = os.path.join(data_dir_path, "raw")
data_clean_dir_path = os.path.join(data_dir_path, "cleaned")

## Rename images to its hash

In [3]:
# UNCOMMENT ONLY IF READY TO RENAME AND OVERWRITE FILES!
# CREATE BACKUPS IF NECCESSARY!

#hash_files(data_raw_dir_path)

Renamed ../data/raw/none/7059b14d2aa03ed6c4de11afa32591995181d31c.jpg to ../data/raw/none/7059b14d2aa03ed6c4de11afa32591995181d31c.jpg
Renamed ../data/raw/none/ea1b100b581fcdb7ddfae52cc62347a99e304ba4.jpg to ../data/raw/none/ea1b100b581fcdb7ddfae52cc62347a99e304ba4.jpg
Renamed ../data/raw/none/6eac051b9c45ff6821ec8675216f371711b7cea9.jpg to ../data/raw/none/6eac051b9c45ff6821ec8675216f371711b7cea9.jpg
Renamed ../data/raw/none/fc72767f8520df9b2b83941077dc0ee013eb9399.jpg to ../data/raw/none/fc72767f8520df9b2b83941077dc0ee013eb9399.jpg
Renamed ../data/raw/none/oie_8jRyg0cqv3PC.jpg to ../data/raw/none/d77559fc4fbf78b6f131a605614d33bbbda978c2.jpg
Renamed ../data/raw/none/oie_gGOblBV8KmO4.jpg to ../data/raw/none/9bac4720af91cc18252051d7f25ad1b0aa518f7e.jpg
Renamed ../data/raw/none/oie_R9gLZnOUbNPf.jpg to ../data/raw/none/2aa45419844828a1655f628f0499730ad504c233.jpg
Renamed ../data/raw/none/oie_qur6j6z4apcn.jpg to ../data/raw/none/ea1d5ff81afb98acabd43aefc7699336601b117c.jpg
Renamed ../data/

## Create raw metadata.csv

In [4]:
# Create metadata csv
create_metadata(data_raw_dir_path).to_csv(f"{data_raw_dir_path}/metadata.csv", index=False)

## Resize and re-write images

In [5]:
# Read images, resize and write to cleaned directory
img_array = read_images(
    data_dir_path=data_raw_dir_path, 
    rows=ROWS, 
    cols=COLS, 
    channels=CHANNELS, 
    write_images=False, 
    output_data_dir_path=data_clean_dir_path
)

# Create metadata for cleaned images
create_metadata(data_clean_dir_path).to_csv(f"{data_clean_dir_path}/metadata.csv", index=False)

Reading images from: ../data/raw
Rows set to 1024
Columns set to 1024
Channels set to 3

Writing images to disk!
Writing images to: ../data/cleaned
Reading images...


100%|███████████████████████████████████████████| 30/30 [00:01<00:00, 17.38it/s]
100%|███████████████████████████████████████████| 61/61 [00:05<00:00, 11.43it/s]
100%|███████████████████████████████████████████| 23/23 [00:02<00:00,  8.07it/s]
100%|███████████████████████████████████████████| 51/51 [00:07<00:00,  6.99it/s]
100%|███████████████████████████████████████████| 28/28 [00:04<00:00,  5.92it/s]
100%|███████████████████████████████████████████| 23/23 [00:04<00:00,  5.34it/s]
100%|███████████████████████████████████████████| 30/30 [00:06<00:00,  4.72it/s]
100%|███████████████████████████████████████████| 24/24 [00:05<00:00,  4.20it/s]
0it [00:00, ?it/s]


Image reading complete.
Image array shape: (270, 1024, 1024, 3)
