# Collect Photo Metadata

This notebook provides an example of how to extract the date and time information from Reconyx camera traps. This notebook primarily uses the `Pillow` image processing library to extract the [EXIF](https://en.wikipedia.org/wiki/Exif) tags from the JPEG files. It specifically finds tag 306, which is the `DateTime` EXIF tag and parses it into a numpy `datetime64[ns]` object, which can subsequently be used to benchmark photo metrics to specific dates and times and correlate them with other variables. The notebook also uses `natsort` as in [this notebook illustrating working on time series image data](2_Image_Time_Series_Workflow.ipynb), `numpy`, and `os`.

### 1. Import Libraries

In [1]:
from PIL import Image
from PIL.ExifTags import TAGS
from natsort import natsorted, ns
import numpy as np
import os

# File name for an example file
filename = 'images/RCNX0093.JPG'


### 2. Extract Example Image Metadata

In the examples below, we use the `Pillow` `Image().open().getexif()` function to extract the header from the example JPEG file. We then print the tags and values for all the associated metadata in order to see all metadata written with the images, in case there is some additional potentially useful metadata beyond the `DateTime` tag. 

In [2]:
exifdata = Image.open(filename).getexif()

In [3]:
# iterating over all EXIF data fields to see what is in the metadata
for tag, value in exifdata.items():
    print(TAGS.get(tag)+': '+str(value))

ResolutionUnit: 2
ExifOffset: 300
Make: RECONYX
Model: HYPERFIRE 2 COVERT 
DateTime: 2023:11:02 17:00:00
YCbCrPositioning: 2
XResolution: 72.0
YResolution: 72.0


#### 2a. How to Get `DateTime` Reliably

The EXIF standard provides a controlled vocabulary for how metadata is stored in photographs. As such, it provides a reliable and predictable way that the date and time that a photo is taken will be recorded and stored in the photo metadata. In this case, the tag we are interested in getting the __*value*__ associated with the __*key*__ `DateTime`. In this case the __*key*__ `DateTime` is associated with the decimal number 306 (see [EXIF standards](https://exiv2.org/tags.html)). So when we use the `.get()` function and pass the decimal number 306, the `Pillow` ExifTags function returns the associated value for this photograph. 

In [4]:
exifdata.get(306)

'2023:11:02 17:00:00'

### 3. Repeat for a List of Images

Now we want to perform the same set of actions on a whole list of images that are sequential. 

#### 3a. Get the List of Images

In [5]:
# Get the file names as a Python list using os
file_list = os.listdir('images/')

# Sort this list 'naturally': that is, sequential by file name
file_list = natsorted(file_list)

# Store the length of the list of file names (i.e., how many files there are)
n_images = len(file_list)

In [6]:
# Create an empty array to store datetime tags for all images
image_datetime = np.empty(n_images, dtype='datetime64[s]')

# We need this to increment our counter 
counter = 0

for file in file_list:
    
    # Use Pillow Image operator to get the EXIF header data from the image
    exifdata = Image.open('images/'+file).getexif()
    
    # Get the DateTime tag from the EXIF data
    datetime_tag = exifdata.get(306) 
    
    # EXIF DateTime tags are weird in that the date is denoted YYYY:MM:DD
    # Numpy nor pandas recognize ':' as a valid delimiter of dates, so use the
    # Python string replace operator to replace the first two ':'s with '-'s 
    datetime_tag = datetime_tag.replace(':','-',2)
    
    # Convert the datetime_tag from a string to a datetime64 object
    this_image_datetime = np.datetime64(datetime_tag)
    
    # Store the datetime object for this image into the array for all images
    image_datetime[counter] = this_image_datetime
    
    counter += 1

In [7]:
# Print out the DateTime tags from the photos
# NOTE: The set of files I'm using has some gaps and so, while the 
# timestamps of the photographs below are correct and sequential, there  
# are some gaps that appear in the record. These photos exist, but aren't
# being use for these prototyping notebooks

image_datetime

array(['2023-11-02T17:00:00', '2023-11-02T17:15:00',
       '2023-11-02T17:30:00', '2023-11-02T17:45:00',
       '2023-11-03T08:45:00', '2023-11-03T09:15:00',
       '2023-11-03T09:30:00', '2023-11-03T09:45:00',
       '2023-11-06T15:15:00', '2023-11-06T15:30:00',
       '2023-11-06T15:45:00', '2023-11-06T16:00:00',
       '2023-11-06T16:15:00', '2023-11-06T16:30:00',
       '2023-11-06T16:45:00', '2023-11-06T17:00:00',
       '2023-11-06T17:15:00', '2023-11-07T08:00:00',
       '2023-11-07T08:15:00', '2023-11-07T08:30:00',
       '2023-11-07T08:45:00', '2023-11-07T09:00:00',
       '2023-11-07T09:15:00', '2023-11-07T09:30:00',
       '2023-11-07T09:45:00', '2023-11-07T10:00:00',
       '2023-11-07T10:15:00', '2023-11-18T16:30:00',
       '2023-11-18T16:45:00', '2023-11-18T17:00:00',
       '2023-11-19T08:45:00', '2023-11-19T09:00:00',
       '2023-11-19T09:15:00', '2023-11-19T09:30:00',
       '2023-11-19T09:45:00', '2023-11-19T10:00:00',
       '2023-11-19T10:15:00', '2023-11-19T10:3