# Organization
Cells in this notebook are used to move files around and rename objects.

## Copy MODIS imagery from the IFT Resources folder
The IFT-Pipeline downloads the truecolor and falsecolor MODIS imagery as well as the landmask. This script moves the files to the folder `data/modis/` and names them using the case number, region, size, date, and satellite.

In [None]:
import os
regions_to_transfer = []
for region in regions_to_transfer:
    dataloc = '../data/ift_resources/' + region + '/'
    saveloc = '../data/modis/'
    case_folders = [f for f in os.listdir(dataloc) if '.DS_Store' not in f]
    
    for case in case_folders:
        cn, region, dx, start, end = case.split('-')
        for satellite in ['aqua', 'terra']:
            for imtype in ['falsecolor', 'truecolor']:
                old_path = '.'.join([dataloc + case + '/' + imtype + '/' + start, satellite, imtype, '250m', 'tiff'])
                new_path = saveloc + imtype + '/' + '.'.join(['-'.join([cn, region, dx, start]),
                                                              satellite, imtype, '250m', 'tiff'])
        
                ! mv $old_path $new_path
        # same thing for landmask
        old_path = dataloc + case + '/landmask.tiff'
        new_path = saveloc + 'landmask/' + '-'.join([cn, region, dx, 'landmask.tiff'])
        ! mv $old_path $new_path

## Copy MODIS cloud fraction data from the ebseg output
The ebseg algorithm downloads cloud fraction snapshots for each satellite. We will use these to validate the cloud masks and for the comparison between the two algorithm types.

In [68]:
import os
import shutil
import pandas as pd
regions = pd.read_csv('../data/metadata/region_definitions.csv', index_col='region')
cases = pd.read_csv('../data/metadata/validation_dataset_case_list.csv')
cases['start_date'] = pd.to_datetime(cases['start_date'].values)
dataloc = '../data/ift_data/ebseg_v0/'
saveloc = '../data/modis/'

for row, case in cases.iterrows():
    cn = str(case.case_number).zfill(3)
    region = case.region
    start = case.start_date # check start date format
    end = case.start_date + pd.to_timedelta('1d')
    dx = '100km'
    imtype = 'cloudfraction'
    for satellite in ['aqua', 'terra']:
        case_folder = '-'.join([cn, region, dx, start.strftime('%Y%m%d'), end.strftime('%Y%m%d')])
        case_folder += '-256m/' + '-'.join([region, start.strftime('%Y-%m-%d'), satellite])
        old_path = dataloc + region + '/' + case_folder + '/' + '_img-cloud.tiff'
        new_path = saveloc + imtype + '/' + '.'.join(['-'.join([cn, region, dx, start.strftime('%Y%m%d')]),
                                                      satellite, imtype, '250m', 'tiff'])
        shutil.copy2(old_path, new_path)

# Copy and rename labeled images from the ebseg output


In [70]:
import os
import shutil
import pandas as pd
regions = pd.read_csv('../data/metadata/region_definitions.csv', index_col='region')
cases = pd.read_csv('../data/metadata/validation_dataset_case_list.csv')
cases['start_date'] = pd.to_datetime(cases['start_date'].values)
dataloc = '../data/ift_data/ebseg_v0/'
saveloc = '../data/ift_images/ebseg_v0/'

for row, case in cases.iterrows():
    cn = str(case.case_number).zfill(3)
    region = case.region
    start = case.start_date # check start date format
    end = case.start_date + pd.to_timedelta('1d')
    dx = '100km'
    imtype = 'ebseg_v0'
    for satellite in ['aqua', 'terra']:
        case_folder = '-'.join([cn, region, dx, start.strftime('%Y%m%d'), end.strftime('%Y%m%d')])
        case_folder += '-256m/' + '-'.join([region, start.strftime('%Y-%m-%d'), satellite])
        old_path = dataloc + region + '/' + case_folder + '/' + 'final.tif'
        new_path = saveloc + '-'.join([cn, region, dx, start.strftime('%Y%m%d'), satellite, imtype]) + '.tiff'
        if os.path.isfile(old_path):
            shutil.copyfile(old_path, new_path)
        else:
            print('File not found for case', cn, '(', region, ')')


File not found for case 000 ( baffin_bay )
File not found for case 082 ( barents_kara_seas )
File not found for case 082 ( barents_kara_seas )
File not found for case 089 ( barents_kara_seas )
File not found for case 144 ( beaufort_sea )
File not found for case 443 ( bering_strait )
File not found for case 443 ( bering_strait )
File not found for case 163 ( bering_strait )
File not found for case 163 ( bering_strait )
File not found for case 450 ( bering_strait )
File not found for case 450 ( bering_strait )
File not found for case 168 ( bering_strait )
File not found for case 168 ( bering_strait )
File not found for case 170 ( bering_strait )
File not found for case 171 ( bering_strait )
File not found for case 200 ( chukchi_east_siberian_seas )
File not found for case 446 ( chukchi_east_siberian_seas )
File not found for case 226 ( chukchi_east_siberian_seas )
File not found for case 226 ( chukchi_east_siberian_seas )
File not found for case 234 ( chukchi_east_siberian_seas )
File no

In cases where the full image is cloud-covered, the expected result is that no segmented image is produced.

# Rename labeled PNG files to use standardized convention
Some manually labeled images had older versions of filenames. Updated version lets you split the filename to get the information.

1. 379 missing aqua labeled landfast
2. 0, 5, 7, 8, 16, 19, missing labeled landfast and landmask
4. 008 missing terra labeled floes

In [1]:
import os
import shutil
import pandas as pd
regions = pd.read_csv('../data/metadata/region_definitions.csv', index_col='region')
cases = pd.read_csv('../data/metadata/validation_dataset_case_list.csv')
cases['start_date'] = pd.to_datetime(cases['start_date'].values)
dataloc = '../data/validation_images/'

for imtype in ['labeled_floes', 'labeled_landfast', 'landmask']:
    for row, case in cases.iterrows():
        cn = str(case.case_number).zfill(3)
        region = case.region
        start = case.start_date
        for satellite in ['aqua', 'terra']:
            if 'barents' in region:
                old_path = dataloc + imtype + '_png/' + '_'.join([cn, 'barents-kara_seas', start.strftime('%Y%m%d'), satellite, imtype]) + '.png'
            elif 'chukchi' in region:
                old_path = dataloc + imtype + '_png/' + '_'.join([cn, 'chukchi-east_siberian_sea', start.strftime('%Y%m%d'), satellite, imtype]) + '.png'
            else:
                old_path = dataloc + imtype + '_png/' + '_'.join([cn, region, start.strftime('%Y%m%d'), satellite, imtype]) + '.png'
            new_path = dataloc + imtype + '_png/' + '-'.join([cn, region, start.strftime('%Y%m%d'), satellite, imtype]) + '.png'
            if os.path.isfile(old_path):
                shutil.copyfile(old_path, new_path)
                

# Merging validation tables and algorithmic metadata

In [71]:
import pandas as pd

In [96]:
cases = pd.read_csv('../data/metadata/validation_dataset_case_list.csv')
cases['mean_sea_ice_concentration'] = cases['mean_sea_ice_concentration'].round(3)
cases['case_number'] = [str(x).zfill(3) for x in cases['case_number']]

vtables = []
for file in os.listdir('../data/validation_tables/'):
    if '.csv' in file:
        vtables.append(pd.read_csv('../data/validation_tables/' + file))
vtables = pd.concat(vtables)
vtables['case_number'] = [str(x).zfill(3) for x in vtables['case_number']]

In [97]:
validation_dataset_metadata = cases.merge(vtables, left_on=['case_number', 'region', 'start_date'], right_on=['case_number', 'region', 'start_date'], how='outer').drop('notes', axis=1)

In [100]:
(validation_dataset_metadata.visible_floes == 'yes').sum()

231