# 03. Label survey photos
This step has a function that will you show you each of the core photos, one at time, and will prompt you for details related to the depth of the color transitions and the color banding in the core. These are entered in the form of numbers. 
1. Green-olive banding shallow associated with transition
2. Red-brown banding shallow associated transition
3. Purple-black banding shallow associated transition
4. Green-olive banding deep unassociated transition
5. Red-brown banding deep unassociated transition
6. Purple-black banding deep unassociated with transition
7. Deep oxdidative color transition (> 40 cm depth)
8. Shallow oxdidative color transition (< 40 cm depth)
9. Unusable image
10. No color changes or banding

Each core could recieve multiple numbers to indicate it's appearance. For example, it could have a shallow oxidative color transition (9), red-brown banding at that color transition (2) and green banding in deeper core sections, unassociated with the color transition (4).<br>

There is a step prior to this that converts all of the core images from PDFs to JPGs because that's the easiest image type to deal with in matplotlib. This code takes a while (>1 hour). 

Because this step requires input from the user, **it is ultimately not reproducible**. Classifying all of the core images will take approximately 4 hours.

## Setup
### Import Modules

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from matplotlib.ticker import AutoMinorLocator
# %matplotlib inline
from scipy import signal,interpolate,stats,linalg
# from IPython.display import clear_output
import time
import numpy as np
import seaborn as sns
from PIL import Image
import os
import cv2 as cv
import pickle
import random
from IPython.display import clear_output
import random

### Set Paths
Please change '/Users/danielbabin/GitHub/' to the path leading to the 'Green_Bands' directory on your local machine.

In [16]:
data_path='/Users/danielbabin/GitHub/Green_Bands/Data/'
table_path='/Users/danielbabin/GitHub/Green_Bands/Tables/'
survey_cores_path='/Users/danielbabin/GitHub/Green_Bands/Data/Survey/Photos/'
checkpoints_path='/Users/danielbabin/GitHub/Green_Bands/Data/Checkpoints/'

## Prep for labeling

### Import Survey Data

In [20]:
survey_cores=pd.read_csv(checkpoints_path+'core_survey.csv')
# not found: 1252A1H.PDF
survey_cores=survey_cores.drop(survey_cores[survey_cores['Filename']=='1252A1H.PDF'].index)

### Convert PDFs to JPGs
This step is necessary to work with matplotlib

In [9]:
from pdf2image import convert_from_path

You will need to install poppler. With Mac it's easiest to do this with homebrew. Change this variable below to the path to poppler on your machine. 

In [13]:
poppler_path='/opt/homebrew/Cellar/poppler/23.10.0/bin' ## change this to the path to poppler on your machine

In [14]:
def path_finder(era):
    if era=='DSDP':
        in_path=survey_cores_path+'PDFs/DSDP/'
        out_path=survey_cores_path+'JPGs/DSDP/'
    elif era=='ODP':
        in_path=survey_cores_path+'PDFs/ODP/'
        out_path=survey_cores_path+'JPGs/ODP/'
    elif era=='Early IODP':
        in_path=survey_cores_path+'PDFs/EarlyIODP/'
        out_path=survey_cores_path+'JPGs/EarlyIODP/'
    elif era=='Modern IODP':
        in_path=survey_cores_path+'PDFs/ModernIODP/'
        out_path=survey_cores_path+'JPGs/ModernIODP/'
    return in_path,out_path

In [28]:
phot={}
start=time.time()
for j,i in enumerate(survey_cores.loc[survey_cores['Era']!='Modern IODP','Filename'].dropna().index):
    clear_output(wait=True)
    era=survey_cores.loc[i,'Era']
    filename=survey_cores.loc[i,'Filename']
    in_path,out_path=path_finder(era)
    jpg=convert_from_path(in_path+filename, 500,
                                   poppler_path=poppler_path)[0]
    jpg.save(out_path+filename[:-4]+'.jpg','JPEG')
    print(j,'/',len(survey_cores['Filename'].dropna()))

107 / 2525


## Label Photos

In [29]:
# idxs=survey_cores.index.to_list()
# random.shuffle(idxs)
# survey_cores['N']=survey_cores.index
# survey_cores['N (new)']=idxs
# survey_cores_results_scrambled=survey_cores.set_index('N (new)',drop=False).sort_index()
# survey_cores_results_scrambled.to_csv(checkpoints_path+'survey_cores_results_scrambled.csv',index=False)

In [30]:
# survey_cores_results_scrambled=pd.read_csv(checkpoints_path+'survey_cores_results_scrambled.csv',index_col='N (new)')
# idxs=survey_cores_results_scrambled['Filename'].dropna()

In [52]:
def check(i):
    survey_cores=pd.read_csv(checkpoints_path+'survey_cores_results_scrambled.csv',index_col='N (new)')
    era=survey_cores.loc[i,'Era']
    filename=survey_cores.loc[i,'Filename']
    leg=survey_cores.loc[i,'Leg/Exp']
    site=survey_cores.loc[i,'Site']
    hole=survey_cores.loc[i,'Hole']
    in_path,out_path=path_finder(era)
    
    if era != 'Modern IODP':
        img=Image.open(out_path+filename[:-4]+'.jpg').rotate(90,expand=True)
    else:
        img=Image.open(out_path+filename).rotate(90,expand=True)
    
    aspect=img.size
    aspect_ratio=aspect[0]/aspect[1]

    fig,ax=plt.subplots(figsize=(25,12/aspect_ratio))

    ax.imshow(img)
    ax.axis('off')
    ax.set_title('Leg/Exp: '+str(leg)+'  Site: '+str(site)+'  Hole: '+str(hole))
    plt.tight_layout()

    ## Question
    print('Image Index: ',i,'\n',
          '(1) Green-olive banding shallow associated with transition\t',
          '(6) Purple-black banding deep unassociated with transition\n',
          '(2) Red-brown banding shallow associated transition\t\t',
          '(7) Deep oxdidative color transition (> 40 cm depth)\n',
          '(3) Purple-black banding shallow associated transition\t\t',
          '(8) Shallow oxdidative color transition (< 40 cm depth)\n',
          '(4) Green-olive banding deep unassociated transition\t\t',
          '(9) Unusable image\n',
          '(5) Red-brown banding deep unassociated transition\t\t',
          '(0) No color changes or banding')
    plt.show(block=False)
    label=int(input())
    plt.close()
    survey_cores.loc[i,'Label']=label
    survey_cores.to_csv(checkpoints_path+'survey_cores_results_scrambled.csv')

In [51]:
print('Enter start n: ')
start_n=int(input())
for idx in idxs.loc[start_n:].index:
    check(idx)