# Data Condensing

It is important to always keep data portable as an experimentalist. If we can easily take all the data needed for a manuscript with us (via a cloud storage solution like Google Drive), we can work anywhere at anytime as long as internet is available. More importantly, after publishing a paper with the data, it's much easier to share the data with other researchers, making the data more valuable. 

The nature of my current research, which relies strongly on large videos, makes it hard to keep all the data portable. However, the raw videos are never necessary. By analyzing the videos (such as PIV and PTV), we can get most essential data, which does not require a lot of storage space, and a cloud drive can easily afford some data like this kind. 

Currently, I save my data in a local hard drive. All the data are organized in folders in the following structure:

- Date
    - raw images (by number from 0 to total number of videos in the day)
        - 01
        - 02
        - ...
    - data from analysis
        - piv_imseq (PIV)
        - df2_kinetics (density fluctuations analysis - kinetics)
        - ...

This is a good separation of raw data and data from analysis. Ideally, the data from analysis part should be the portable part. However, I find it not very feasible because the total size of analysis data is still large (for example the folder 08032020 has 14.3 GB analysis data at the time when I'm writing this document). I realize that when visualizing these data, not all data are used. For example, I don't need the flow field at every frame, but rather some frames for illustration. And other PIV flow fields may only be used for energy and flow order evolution, which abstract the detailed flow field in each frame into a single number.

For important data, I always write a code to summarize the data of each day of experiment in "summary.csv". Here, I write a piece of code to copy all the "summary.csv" files to my Google Drive - research project folder.

In [1]:
import shutil
import os
from myImageLib import dirrec

In [2]:
def copy_summary(src_folder, dest_folder, sub_folders, file_list=['summary.csv']):
    """
    copy summary.csv files to other folder (mainly for cloud drive storage).
    
    Args:
    src_folder -- source folder
    dest_folder -- destination folder
    sub_folders -- choose subfolders under source folder in which data are copied
    
    Returns:
    None
        
    """
    
    for sf in sub_folders:
        src = os.path.join(src_folder, sf)
        for file in file_list:
            f = dirrec(src, file)
            for src_file in f:
                dest_file = src_file.replace(src_folder, dest_folder)
                dest = os.path.split(dest_file)[0]
                if os.path.exists(dest) == False:
                    os.makedirs(dest)
        #             print('Create folder ' + dest)
                shutil.copyfile(src_file, dest_file)
                print('Copy file ' + dest_file.replace(dest_folder, ''))

In [5]:
# test copy_summary
src_folder = r'E:\moreData'
dest_folder = r'E:\Google Drive\Research projects\DF\data\level-2-data'
sub_folders = ['08032020', '08042020', '08052020', '08062020']
file_list = ['summary.csv', 'kinetics_data.csv', 'intensity.csv', 'energy_order.csv']
copy_summary(src_folder, dest_folder, sub_folders, file_list=file_list)

Copy file \08032020\cav_imseq\summary.csv
Copy file \08032020\corr_imseq\summary.csv
Copy file \08032020\df2_kinetics\summary.csv
Copy file \08032020\div_x_dcadv\summary.csv
Copy file \08032020\energy_spectrum\summary.csv
Copy file \08032020\energy_spectrum_10\summary.csv
Copy file \08032020\energy_spectrum_varstep\summary.csv
Copy file \08032020\df2_kinetics\00\kinetics_data.csv
Copy file \08032020\df2_kinetics\01\kinetics_data.csv
Copy file \08032020\df2_kinetics\02\kinetics_data.csv
Copy file \08032020\df2_kinetics\03\kinetics_data.csv
Copy file \08032020\df2_kinetics\04\kinetics_data.csv
Copy file \08032020\df2_kinetics\05\kinetics_data.csv
Copy file \08032020\df2_kinetics\06\kinetics_data.csv
Copy file \08032020\df2_kinetics\07\kinetics_data.csv
Copy file \08032020\df2_kinetics\08\kinetics_data.csv
Copy file \08032020\df2_kinetics\09\kinetics_data.csv
Copy file \08032020\df2_kinetics\10\kinetics_data.csv
Copy file \08032020\df2_kinetics\11\kinetics_data.csv
Copy file \08032020\df2

Copy file \08062020\energy_order\00\energy_order.csv
Copy file \08062020\energy_order\01\energy_order.csv
Copy file \08062020\energy_order\02\energy_order.csv
Copy file \08062020\energy_order\03\energy_order.csv
Copy file \08062020\energy_order\04\energy_order.csv
Copy file \08062020\energy_order\05\energy_order.csv
Copy file \08062020\energy_order\06\energy_order.csv
Copy file \08062020\energy_order\07\energy_order.csv
Copy file \08062020\energy_order\08\energy_order.csv
Copy file \08062020\energy_order\09\energy_order.csv
Copy file \08062020\energy_order\10\energy_order.csv
Copy file \08062020\energy_order\11\energy_order.csv
Copy file \08062020\energy_order\12\energy_order.csv
