**Author:** C Mitchell

# Background / Context

Here we are doing the blank subtraction for all the samples.

We were testing a total of four combinations: 2 different filter pore sizes and 2 different filter rinses. These are listed in the table below, along with the label that was used to indicate each option. These labels were combined and used as the 2nd and 3rd letters in the "Code" name in the dataset. The first letter in the "Code" indicates which culture, and the final number indicates which replicate. For the blanks, the "culture" symbol is "M" (for medium). Note there were 4 samples that were run twice by UMaine, we assigned these a number 0 to fit with the "Code" formatting.

| Label | Filter size | Rinse |
|---|---|---|
| AX | 0.8 um | NH4OH |
| AY | 0.8 um | K2B4O7 |
| BX | 0.4 um | NH4OH |
| BY | 0.4 um | K2B4O7 |

# Approach

We want to be able to see the variability within each triplicate, so rather than taking the mean for each set of triplicates then subtracting the appropriate mean blank, we are going to calculate the mean blanks and then subtract from each (appropriate) individual triplicate.

For example, for SAX, rather than doing $\overline{SAX} - \overline{MAX}$, we'll do:
$SAX1 - \overline{MAX}$, $SAX2 - \overline{MAX}$ and $SAX3 - \overline{MAX}$.

# Initialization

In [1]:
import pandas as pd
import re

In [3]:
df = pd.read_csv('data/02-ICPMS-and-PIC.csv')

In [4]:
df

Unnamed: 0,Code,Tube Number,Culture,Name,Ploidity,CaCo3,Filter,Rinse,Comment on filtration,Ca (ug/L),Mg (ug/L),Sr (ug/L),Na (ug/L),Ca sw corr,Ca sw + V corr,PIC mmol/m3,PIC ug/l
0,MAX1,16.0,K/2,Medium only,-,No,0.8,NH4OH,,49.67,3034.62,0.499,580.79,27.474890,9.158297,0.228957,2.747489
1,MAX2,17.0,K/2,Medium only,-,No,0.8,NH4OH,,41.61,640.22,0.472,427.09,25.288594,8.429531,0.210738,2.528859
2,MAX3,18.0,K/2,Medium only,-,No,0.8,NH4OH,,51.11,5276.07,0.495,491.48,32.327906,10.775969,0.269399,3.232791
3,MAX0,18.2,K/2,Medium only,-,No,0.8,NH4OH,,50.73,5300.59,0.472,505.69,31.404865,10.468288,0.261707,3.140487
4,MBX1,19.0,K/2,Medium only,-,No,0.4,NH4OH,,33.51,193.68,0.297,347.40,20.233977,6.744659,0.168616,2.023398
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
73,SAY4,37.0,MIR 02,Syracosphaera pulchra,n,Yes,0.8,K2B4O7,Additional 0.8 K2B4O7.,210.86,223.56,1.938,1499.07,153.620489,51.206830,1.280171,15.362049
74,SBY1,38.0,MIR 02,Syracosphaera pulchra,n,Yes,0.4,K2B4O7,,184.81,253.62,1.728,1649.04,121.844133,40.614711,1.015368,12.184413
75,SBY2,39.0,MIR 02,Syracosphaera pulchra,n,Yes,0.4,K2B4O7,,220.67,220.66,1.613,1317.27,170.372222,56.790741,1.419769,17.037222
76,SAY2,35.0,MIR 02,Syracosphaera pulchra,n,Yes,0.8,K2B4O7,,127.46,92.41,0.781,251.38,117.861470,39.287157,0.982179,11.786147


# Calculating blanks

Pull out the blanks

In [5]:
blank_df = df[df.Code.str.startswith('M')]

Remove replicate number

In [6]:
blank_df = blank_df.assign(label= blank_df.Code.str[:-1])

Calculate mean and standard deviation

In [7]:
mean_blanks = blank_df.groupby('label').mean(numeric_only=True).drop('Tube Number',axis=1)
stdev_blanks = blank_df.groupby('label').std(numeric_only=True).drop('Tube Number',axis=1)

In [8]:
mean_blanks

Unnamed: 0_level_0,Filter,Ca (ug/L),Mg (ug/L),Sr (ug/L),Na (ug/L),Ca sw corr,Ca sw + V corr,PIC mmol/m3,PIC ug/l
label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
MAX,0.8,48.28,3562.875,0.4845,501.2625,29.124064,9.708021,0.242701,2.912406
MAY,0.8,40.83,77.63,0.518,512.236667,21.254682,7.084894,0.177122,2.125468
MBX,0.4,32.843333,506.2,0.358667,424.106667,16.635936,5.545312,0.138633,1.663594
MBY,0.4,26.49,54.586667,0.313333,321.016667,14.222226,4.740742,0.118519,1.422223


In [9]:
stdev_blanks

Unnamed: 0_level_0,Filter,Ca (ug/L),Mg (ug/L),Sr (ug/L),Na (ug/L),Ca sw corr,Ca sw + V corr,PIC mmol/m3,PIC ug/l
label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
MAX,0.0,4.488222,2219.283377,0.014526,63.091159,3.311458,1.103819,0.027595,0.331146
MAY,0.0,4.842406,30.28824,0.161948,235.213776,6.799275,2.266425,0.056661,0.679928
MBX,0.0,2.809953,299.012566,0.111168,164.829175,4.33571,1.445237,0.036131,0.433571
MBY,0.0,13.923947,37.931715,0.240479,320.002373,2.051331,0.683777,0.017094,0.205133


In [10]:
mean_blanks.to_csv('data/03a-PIC-blank-means.csv')
stdev_blanks.to_csv('data/03b-PIC-blank-stdev.csv')

# Blank subtractions

We'll include uncertainty, where we're just using the standard deviation of the blanks as the uncertainty.

In [11]:
labels = ['AX','AY','BX','BY']

In [12]:
corr_df_list = []
for ll in labels:
    subdf = df[df.Code.str.contains(ll)]
    corr_subdf = subdf.copy()
    # PIC values - blanks
    corr_subdf['PIC mmol/m3'] = subdf['PIC mmol/m3'] - mean_blanks.loc['M'+ll,'PIC mmol/m3']
    corr_subdf['PIC ug/l'] = subdf['PIC ug/l'] - mean_blanks.loc['M'+ll,'PIC ug/l']

    # blanks
    corr_subdf['PIC blank mean mmol/m3'] = mean_blanks.loc['M'+ll,'PIC mmol/m3']
    corr_subdf['PIC blank mean ug/l'] = mean_blanks.loc['M'+ll,'PIC ug/l']
    
    # blank stdev
    corr_subdf['PIC blank stdev mmol/m3'] = stdev_blanks.loc['M'+ll,'PIC mmol/m3']
    corr_subdf['PIC blank stdev ug/l'] = stdev_blanks.loc['M'+ll,'PIC ug/l']
    
    corr_df_list += [corr_subdf]

In [13]:
corr_df = pd.concat(corr_df_list)
corr_df = corr_df[~corr_df.Code.str.startswith('M')]

Adding in PIC blank values as percentages

In [14]:
corr_df['PIC blank %'] = (corr_df['PIC blank mean mmol/m3'] / corr_df['PIC mmol/m3']) * 100

And saving the final dataframe:

In [15]:
corr_df.to_csv('data/03-PIC-blank-corrected.csv',index=False)

In [16]:
corr_df

Unnamed: 0,Code,Tube Number,Culture,Name,Ploidity,CaCo3,Filter,Rinse,Comment on filtration,Ca (ug/L),...,Na (ug/L),Ca sw corr,Ca sw + V corr,PIC mmol/m3,PIC ug/l,PIC blank mean mmol/m3,PIC blank mean ug/l,PIC blank stdev mmol/m3,PIC blank stdev ug/l,PIC blank %
13,LAX1,41.0,RCC1151,Calcidiscus leptoporus,n,Yes,0.8,NH4OH,,162.37,...,703.25,135.495035,45.165012,0.886425,10.637097,0.242701,2.912406,0.027595,0.331146,27.379710
14,LAX2,42.0,RCC1151,Calcidiscus leptoporus,n,Yes,0.8,NH4OH,,211.71,...,3136.07,91.863897,30.621299,0.522832,6.273983,0.242701,2.912406,0.027595,0.331146,46.420372
15,LAX3,43.0,RCC1151,Calcidiscus leptoporus,n,Yes,0.8,NH4OH,,153.66,...,2280.64,66.504478,22.168159,0.311503,3.738041,0.242701,2.912406,0.027595,0.331146,77.912629
25,CAX1,53.0,RCC1164,Calcidiscus leptoporus,2n,Yes,0.8,NH4OH,,625.42,...,2372.71,534.745988,178.248663,4.213516,50.562192,0.242701,2.912406,0.027595,0.331146,5.760048
26,CAX2,54.0,RCC1164,Calcidiscus leptoporus,2n,Yes,0.8,NH4OH,,1144.30,...,2203.04,1060.109992,353.369997,8.591549,103.098593,0.242701,2.912406,0.027595,0.331146,2.824875
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,UBY2,87.0,RCC1472,Umbilicosphaera foliosa,2n,Yes,0.4,K2B4O7,,164.39,...,508.67,144.950983,48.316994,1.089406,13.072876,0.118519,1.422223,0.017094,0.205133,10.879187
62,UBY3,88.0,RCC1472,Umbilicosphaera foliosa,2n,Yes,0.4,K2B4O7,,137.36,...,188.78,130.145701,43.381900,0.966029,11.592347,0.118519,1.422223,0.017094,0.205133,12.268634
74,SBY1,38.0,MIR 02,Syracosphaera pulchra,n,Yes,0.4,K2B4O7,,184.81,...,1649.04,121.844133,40.614711,0.896849,10.762191,0.118519,1.422223,0.017094,0.205133,13.214992
75,SBY2,39.0,MIR 02,Syracosphaera pulchra,n,Yes,0.4,K2B4O7,,220.67,...,1317.27,170.372222,56.790741,1.301250,15.615000,0.118519,1.422223,0.017094,0.205133,9.108054
