# Profile processing Step 2: Building the master particle size dataset

The primary task completed in this profile processing step is to associate a specific date, time, depth, salinity, and temperature with every single particle identified through image processing. The notebook uses files generated in step 1 along with a single master data file of all the particles identified in the cast residing in the folder ```0_analysis_output``` with the cast of interest. The only input needed from the user is to identify the folder path for the cast/profile you wish to analyze and to identify the name of the master particle size file in the ```0_analysis_output``` folder. It should be ```001.csv``` but this may depend on exactly how the processing was completed.

The images must have been processed to generate the ```0_analysis_output``` folder and particle data contained within prior to running this notebook. The main output of this processing step is the generation of the file ```particle_profile_data.csv```, which will be saved to the ```0_analysis_output``` folder. ```particle_profile_data.csv``` is identical to the ```001.csv``` in the ```0_analysis_output``` folder with the addition of the depth, salinity, and temperature data added for each identified particle. The ```particle_profile_data.csv``` file is what is used in processing step 3 to look at and output profile data.

In [1]:
%config InlineBackend.figure_format='retina' # hig-res plots for a Retina display 
import numpy as np
import pandas as pd
import os
import glob

In [2]:
""" --- User input ----------------------------- """
particledata = '/0_analysis_output/001.csv' #dataframe with all of the particle data
"""  ------------------------------------------- """

# files that should be present in the castpath if step 1 of the processing was completed.
# pathfile = '/Users/strom-adm/My Drive/Floc-Processing/Code/1_Profile_Processing/0_CastPath.csv'
pathfile = '0_CastPath.csv'
castpath = pd.read_csv(pathfile).profile_path[0]+'/'
imagetimes = 'ImageTime.csv'
ctdtimeseries = 'CTD-timeseries.csv' #path to raw CTD time series data
ctdprofile = 'CTD-profile.csv'           #path to CastAway processed time series data
depth_file = 'Depth.csv'

# read in the data 

if(os.path.exists(castpath+particledata) == True):
    pdata_master = pd.read_csv(castpath+particledata)       
    ImageTime_df = pd.read_csv(castpath+imagetimes)  
    CTD_df = pd.read_csv(castpath+ctdtimeseries)  
    ProcessedCTD = pd.read_csv(castpath+ctdprofile)
    totaldepth = pd.read_csv(castpath+depth_file)['Depth [m]'][0]
else:
    print('Particle data needs to be processed to create folder "0_analysis_output"')


In [3]:
# average ctd data over every second 

# time  = CTD_df['CTD Time'].unique()
avgDepth = np.zeros(len(CTD_df['Time'].unique()))
avgTemp = np.zeros(len(CTD_df['Time'].unique()))
avgSpC = np.zeros(len(CTD_df['Time'].unique()))
avgPSU = np.zeros(len(CTD_df['Time'].unique()))

count = 0
for time in CTD_df['Time'].unique(): 
    avgDepth[count] = np.average(CTD_df['Depth [m]'].where(CTD_df['Time'] == time).dropna())
    avgTemp[count] = np.average(CTD_df['T [Celsius]'].where(CTD_df['Time'] == time).dropna())
    avgSpC[count] = np.average(CTD_df['SpC [MicroSiemens/cm]'].where(CTD_df['Time'] == time).dropna())
    count = count + 1

#get PSU value from processed CTD data. Currently based on matching conductance in raw to PSU in ctd processed profile

count = 0
for spc in avgSpC:
    indexmatch = (ProcessedCTD['Conductivity (MicroSiemens per Centimeter)']-spc).abs().argsort()[0]
    avgPSU[count] = ProcessedCTD.iloc[indexmatch]['Salinity (Practical Salinity Scale)']
    count = count+1

# create dataframe with average data and map to image_time then map avg data to superfolder 

time  = CTD_df['Time'].unique()
columns = ["Image Time","Depth [m]", "T [Celsius]", "SpC [MicroSiemens/cm]","PSU"]
data = np.array([time, avgDepth, avgTemp,avgSpC,avgPSU]).T
df_ctd_x = pd.DataFrame(data=data, columns=columns)
df_ctd_x = pd.merge(ImageTime_df, df_ctd_x, how='inner', left_on='Image Time', right_on='Image Time')
df_ctd_x.rename(columns ={'0': "Image File"})

#create a matrix "pre_master" with average data for each particle to append to pdata_master 

pre_master = np.zeros((len(pdata_master),5)) # number of particles in master datafile x 5 for the new columns
partcount = 0

# figure out which is longer, the CTD or images. Use the shorter of the two. If images are longer, the last identified particles will not have CTD info
if len(ImageTime_df)>len(df_ctd_x):
    end = len(df_ctd_x)
else:
    end = len(ImageTime_df)

for i in np.arange(0,end): # for in range of 0 to the last image in the series can be larger than unique seconds
    df_ctd_y = np.array(df_ctd_x.iloc[i,1:]) # the data from df_ctd_x associated with the image name/time
    NoParticles = len(pdata_master[(pdata_master['ImgNo'] == i+1)]) #number of particles in image
    
    pre_master[partcount:partcount+NoParticles,:] = df_ctd_y
    partcount = partcount + NoParticles
    
pre_master_df = pd.DataFrame(data=pre_master, columns=columns)

pdata = pd.concat([pdata_master, pre_master_df], axis=1)

pdata.insert(loc=22, column='z [m]', value=totaldepth - pdata['Depth [m]'])

pdata.to_csv(os.path.join(castpath,'0_analysis_output/particle_profile_data.csv' ),index=False)

pdata

Unnamed: 0,Number,ImgNo,NoInTot,Area,MeanGreyValue,StdDev,MinGreyValue,MaxGreyValue,Perimeter,BX,...,Circularity,AR,Round,Solidity,Image Time,Depth [m],z [m],T [Celsius],SpC [MicroSiemens/cm],PSU
0,13,1,13,363,43.003,19.153,2,102,86.912,1582,...,0.604,1.815,0.551,0.832,4.172022e+12,1.085517,12.988472,10.809676,34103.443594,30.202538
1,15,1,15,519,37.225,18.180,2,110,130.711,3494,...,0.382,2.774,0.360,0.706,4.172022e+12,1.085517,12.988472,10.809676,34103.443594,30.202538
2,38,1,38,142,41.423,19.069,4,103,56.527,2124,...,0.558,2.016,0.496,0.789,4.172022e+12,1.085517,12.988472,10.809676,34103.443594,30.202538
3,46,1,46,264,51.905,21.654,3,108,76.912,455,...,0.561,1.866,0.536,0.795,4.172022e+12,1.085517,12.988472,10.809676,34103.443594,30.202538
4,49,1,49,115,61.878,22.898,5,113,44.870,1695,...,0.718,1.950,0.513,0.855,4.172022e+12,1.085517,12.988472,10.809676,34103.443594,30.202538
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29434,265,656,199560,486,54.123,19.843,2,110,101.154,2253,...,0.597,1.245,0.803,0.831,4.172022e+12,-0.009356,14.083345,10.440586,503.050465,30.130700
29435,271,656,199566,65,61.015,23.116,4,104,35.799,2349,...,0.637,2.106,0.475,0.818,4.172022e+12,-0.009356,14.083345,10.440586,503.050465,30.130700
29436,276,656,199571,129,60.783,23.437,15,113,53.213,2362,...,0.572,2.858,0.350,0.881,4.172022e+12,-0.009356,14.083345,10.440586,503.050465,30.130700
29437,277,656,199572,324,59.265,22.736,8,118,84.811,2037,...,0.566,2.108,0.474,0.812,4.172022e+12,-0.009356,14.083345,10.440586,503.050465,30.130700
