# IFT-pipeline evaluation table
Now that there is a functional pipeline, we can use it to identify where various stages of the pipeline fail to guide development. This notebook checks the output of each subfolder given a site specification table and a results folder.

## Steps that we are checking
1. soit
2. landmask
3. preprocess
4. extractfeatures
5. tracking
6. exportH5

The first four are applied linearly, so the maximum number of successes for step `i` is the number of successes at step `i-1`. Here, we are considering a success if the task was completed with error, i.e., *something* is in the appropriate folder.

In [3]:
import numpy as np
import os
import pandas as pd

In [14]:
region = 'baffin_bay'
site_locations = pd.read_csv('../data/location_specifications/baffin_bay_100km_cases.csv', index_col='location')
results_folder = 'baffin_bay_20240124T1406Z/'
results_loc = '../data/ift_results/' + results_folder

In [16]:
region = 'beaufort_sea'
site_locations = pd.read_csv('../data/location_specifications/beaufort_sea_100km_cases.csv', index_col='location')
results_folder = 'beaufort_sea_20240124T1618Z/'
results_loc = '../data/ift_results/' + results_folder

In [17]:
# soit successes
site_locations['soit'] = 'NA'
for case in site_locations.index:
    if len(os.listdir(results_loc + '/' + case + '/soit' )) > 0:
        site_locations.loc[case, 'soit'] = 'pass'
    else:
        site_locations.loc[case, 'soit'] = 'fail'

# landmask successes
site_locations['landmask'] = 'NA'
for case in site_locations.index:
    files = [x for x in os.listdir(results_loc + '/' + case + '/landmasks/') if x != '.DS_Store']
    if len(files) != 0:
        site_locations.loc[case, 'landmask'] = 'pass'
    elif site_locations.loc[case, 'soit'] == 'pass':
        site_locations.loc[case, 'landmask'] = 'fail'

# preprocessing successes
# here, slightly different check. hdf5-files will always be there.
site_locations['preprocess'] = 'NA'
site_locations['extractH5'] = 'NA'
site_locations['tracker'] = 'NA'
for case in site_locations.index:
    files = [x for x in os.listdir(results_loc + '/' + case + '/preprocess/') if x not in ['.DS_Store', 'hdf5-files']]
    if len(files) != 0:
        site_locations.loc[case, 'preprocess'] = 'pass'
        h5files = [x for x in os.listdir(results_loc + '/' + case + '/preprocess/hdf5-files') if x != '.DS_Store']

        # Check h5 and tracker if it passes the preprocess step
        if len(h5files) != 0:
            site_locations.loc[case, 'extractH5'] = 'pass'
        else:
            site_locations.loc[case, 'extractH5'] = 'fail'
        trfiles = [x for x in os.listdir(results_loc + '/' + case + '/tracker') if x != '.DS_Store']            
        if len(trfiles) != 0:
            site_locations.loc[case, 'tracker'] = 'pass'
        else:
            site_locations.loc[case, 'tracker'] = 'fail'            

    elif site_locations.loc[case, 'soit'] == 'pass':
        if site_locations.loc[case, 'landmask'] == 'pass': 
            site_locations.loc[case, 'preprocess'] = 'fail'

site_locations.loc[:,['soit', 'landmask', 'preprocess', 'extractH5', 'tracker']].to_csv(results_loc + region + '_evaluation_table.csv')

In [18]:
results = pd.concat([
    (site_locations.loc[:,['soit', 'landmask', 'preprocess', 'extractH5', 'tracker']] == 'pass').sum(axis=0),
    (site_locations.loc[:,['soit', 'landmask', 'preprocess', 'extractH5', 'tracker']] == 'fail').sum(axis=0)], axis=1)
results.columns = ['pass', 'fail']
results

Unnamed: 0,pass,fail
soit,21,0
landmask,21,0
preprocess,9,12
extractH5,9,0
tracker,4,5
