# select_1_image_at_each_location

Ken Puliafico kindly provided high quality images of *Cycas micronesica* growing on Tinian, some of which clearly show infestation by a scale insect, probably *Aulacaspis yasumatsui* (CAS).

I want to use QGIS to display these images on a web map. But there is a problem: at most locations multiple images were taken. When mapped, only one point and one image is displayed. 
The code in this **Jupyter notebook** puts all images from each location into a single directory.

The next step is to manually select the **best image** from each directory. In this case the **best image** is one which which indicates presence of absence of **CAS**.

In [1]:
import pandas as pd
from glob import glob
from GPSPhoto import gpsphoto
import os
import shutil

### Extract georeferences from images stored in the *images* directory and store these data in a pandas dataframe

In [2]:
imagelist = []
for imagefile in glob('../images/*.jpg'):
    mydict = gpsphoto.getGPSData(imagefile)
    mydict['imagefile'] = os.path.basename(imagefile)
    imagelist.append(mydict)
df = pd.DataFrame(imagelist)
df = df[['Latitude','Longitude','imagefile']]

# delete any rows where latitude and longitude are 0
df.drop(df[(df.Latitude==0) & (df.Longitude==0)].index, inplace=True)

# round latitude and longitude to 6 decimal places
df =df.round(6)
df

Unnamed: 0,Latitude,Longitude,imagefile
0,15.050625,145.637681,PXL_20210729_015415398.jpg
1,15.055214,145.639239,PXL_20210724_071422991.jpg
2,15.055247,145.639358,PXL_20210724_070959525.jpg
3,15.041333,145.629769,PXL_20210725_040342814.jpg
4,15.055286,145.639233,PXL_20210724_071121560.jpg
...,...,...,...
76,15.055214,145.639239,PXL_20210724_071348906.jpg
77,15.055286,145.639233,PXL_20210724_071206988.jpg
78,15.055247,145.639358,PXL_20210724_070954271.jpg
79,15.041333,145.629769,PXL_20210725_040424143.jpg


### Group images by location

In [3]:
for i, r in df.iterrows():
    df.loc[i, 'lat_lon_str'] = f'{r.Latitude}_{r.Longitude}'
df['location'] = df['lat_lon_str'].rank(method='dense').astype(int)
df.drop('lat_lon_str', axis=1, inplace=True)
df    

Unnamed: 0,Latitude,Longitude,imagefile,location
0,15.050625,145.637681,PXL_20210729_015415398.jpg,4
1,15.055214,145.639239,PXL_20210724_071422991.jpg,5
2,15.055247,145.639358,PXL_20210724_070959525.jpg,6
3,15.041333,145.629769,PXL_20210725_040342814.jpg,1
4,15.055286,145.639233,PXL_20210724_071121560.jpg,7
...,...,...,...,...
76,15.055214,145.639239,PXL_20210724_071348906.jpg,5
77,15.055286,145.639233,PXL_20210724_071206988.jpg,7
78,15.055247,145.639358,PXL_20210724_070954271.jpg,6
79,15.041333,145.629769,PXL_20210725_040424143.jpg,1


### Create a directory for each location and populate with images

In [4]:
if not os.path.exists('../images/image_groups'):
    for i, r in df.iterrows():
        target_dir = f'../images/image_groups/{r.location}'
        os.makedirs(target_dir, exist_ok=True)
        shutil.copy(f'../images/{r.imagefile}', target_dir)
else:
    print('Not run because ../images/image_groups already exists.')

Not run because ../images/image_groups already exists.


### The next step is to manually select an example image from each directory.
All images for each location are examined and presence of CAS is noted. 
Results are saved in **examplars.csv**.

In [5]:
df_examplars = pd.read_csv('../examplars.csv')
df_examplars

Unnamed: 0,location,imagefile,cas_present
0,1,PXL_20210725_040405506.jpg,True
1,2,PXL_20210725_040330535.jpg,True
2,3,PXL_20210729_015526414.jpg,False
3,4,PXL_20210729_015405811.jpg,False
4,5,PXL_20210724_071346293.jpg,False
5,6,PXL_20210724_071022212.jpg,False
6,7,PXL_20210724_071144783.jpg,False
7,8,PXL_20210724_070331088.jpg,False
8,9,PXL_20210724_071318940.jpg,False
9,10,PXL_20210724_071606450.jpg,False


### Get latitude and longitude from the original dataframe

In [6]:
# Step: Left Join with df where imagefile=imagefile
df_examplars = pd.merge(df_examplars, df, how='left')
# df_examplars.drop(['location_y'], axis=1, inplace=True)
df_examplars

Unnamed: 0,location,imagefile,cas_present,Latitude,Longitude
0,1,PXL_20210725_040405506.jpg,True,15.041333,145.629769
1,2,PXL_20210725_040330535.jpg,True,15.042053,145.628667
2,3,PXL_20210729_015526414.jpg,False,15.050522,145.637697
3,4,PXL_20210729_015405811.jpg,False,15.050625,145.637681
4,5,PXL_20210724_071346293.jpg,False,15.055214,145.639239
5,6,PXL_20210724_071022212.jpg,False,15.055247,145.639358
6,7,PXL_20210724_071144783.jpg,False,15.055286,145.639233
7,8,PXL_20210724_070331088.jpg,False,15.055289,145.639522
8,9,PXL_20210724_071318940.jpg,False,15.0552,145.639231
9,10,PXL_20210724_071606450.jpg,False,15.0554,145.639264


### Add url column for web map

In [8]:
# <img src="https://github.com/aubreymoore/Tinian-cycad-images/raw/main/images/[% imagefile %]" width="350">
for i, r in df_examplars.iterrows():
    df_examplars.loc[i, 'URL'] = f'<img src="https://github.com/aubreymoore/Tinian-cycad-images/raw/main/images/{r.imagefile}" width="350">'
df_examplars

Unnamed: 0,location,imagefile,cas_present,Latitude,Longitude,URL
0,1,PXL_20210725_040405506.jpg,True,15.041333,145.629769,"<img src=""https://github.com/aubreymoore/Tinia..."
1,2,PXL_20210725_040330535.jpg,True,15.042053,145.628667,"<img src=""https://github.com/aubreymoore/Tinia..."
2,3,PXL_20210729_015526414.jpg,False,15.050522,145.637697,"<img src=""https://github.com/aubreymoore/Tinia..."
3,4,PXL_20210729_015405811.jpg,False,15.050625,145.637681,"<img src=""https://github.com/aubreymoore/Tinia..."
4,5,PXL_20210724_071346293.jpg,False,15.055214,145.639239,"<img src=""https://github.com/aubreymoore/Tinia..."
5,6,PXL_20210724_071022212.jpg,False,15.055247,145.639358,"<img src=""https://github.com/aubreymoore/Tinia..."
6,7,PXL_20210724_071144783.jpg,False,15.055286,145.639233,"<img src=""https://github.com/aubreymoore/Tinia..."
7,8,PXL_20210724_070331088.jpg,False,15.055289,145.639522,"<img src=""https://github.com/aubreymoore/Tinia..."
8,9,PXL_20210724_071318940.jpg,False,15.0552,145.639231,"<img src=""https://github.com/aubreymoore/Tinia..."
9,10,PXL_20210724_071606450.jpg,False,15.0554,145.639264,"<img src=""https://github.com/aubreymoore/Tinia..."


### Save the data as a CSV file which can be displayed using QGIS

In [9]:
df_examplars.to_csv('../images.csv', index=False)

In [10]:
print('FINISHED')

FINISHED
