## Joining RI24 series ICP-OES data

### - All data from Sharepoint Winter Watershed EMMA directory: https://uvmoffice.sharepoint.com/:f:/s/Winterwatershed/Em7xsXZomnFJvPg3qm5bpqcB0INgFeGMeJ8m_HdQfwu6Yg?e=9k32Up

### Sample chemistry from LCBP-EMMA/ICPOES
- UVM Ag Lab ICPOES data:
        - A csv with merged UVM RI24 ICP-OES data from multiple runs on a PerkinElmer Optima 7000DV.
        - Al, Ca, Cu, Fe, K, Mg, Mn, Na, P, Si, Zn
        
- LCBP_RI23_icpoes-for-join.csv
        - A csv with merged Dartmouth RI24 ICP-OES data from multiple runs on a Spectro ARCOS operated in axial view.
        - Al, As, Ba, Be, Br, Ca, Cd, Cl, Co, Cr, Cu, Fe, I, K, Li, Mg, Mn, Na, Ni, P, Pb, S, Si, Sr, Ti, V, Zn (not all in summary)

In [35]:
import os
os.chdir("/home/millieginty/OneDrive/git-repos/EMMA/data/newrnet-chemistry/RI24/ICPOES/")

!ls

LCBP_RI24-ICP-for-join.csv
LCBP_RI24-ICPOES-Dartmouth-March2025-June2025.csv
LCBP_RI24-ICPOES-merged-UVMAgLab.csv


In [36]:
import pandas as pd

# Read CSV files with the 'Sample ID' column as the index
df_dart = pd.read_csv('LCBP_RI24-ICPOES-Dartmouth-March2025-June2025.csv')
df_uvm = pd.read_csv('LCBP_RI24-ICPOES-merged-UVMAgLab.csv')

# -------------------------
# 1. Check for overlap in Sample ID
# -------------------------
overlap = set(df_dart["Sample ID"]).intersection(set(df_uvm["Sample ID"]))
if overlap:
    print("Overlapping Sample IDs found:", overlap)
else:
    print("No overlapping Sample IDs found.")

# MED note: originally the Sample ID overlap was 2: RI24-1267 and RI24-1268. I removed the UVMAgLab values from their csv and kept the Dartmouth

# -------------------------
# 2. Merge into a single dataframe
# -------------------------
# Drop metadata-only columns before merging
df_dart_clean = df_dart.drop(columns=["Landis Sample ID"], errors="ignore")
df_uvm_clean = df_uvm.drop(columns=["UVM-AgLab-ICP-notes-MED"], errors="ignore")

# -------------------------
# 2.1 Rename metadata columns
# -------------------------
df_dart = df_dart.rename(columns={"Landis Sample ID": "ICPOES-notes-MED"})
df_uvm = df_uvm.rename(columns={"UVM-AgLab-ICP-notes-MED": "ICPOES-notes-MED"})

# -------------------------
# 2.2 Merge into a single dataframe
# -------------------------
df_merged = pd.concat([df_dart, df_uvm], ignore_index=True)

# Save the joined dataframe to a new CSV file
df_merged.to_csv('LCBP_RI24-ICP-for-join.csv', index=False)

# Inspect result
print(df_merged.head())

No overlapping Sample IDs found.
   Sample ID              ICPOES-notes-MED   Al_mg_L   Ba_mg_L   Ca_mg_L  \
0  RI24-1001  DartmouthID_duffy-030625-001  0.037672  0.003183  3.045061   
1  RI24-1002  DartmouthID_duffy-030625-002  0.026739  0.003092  3.063053   
2  RI24-1003  DartmouthID_duffy-030625-003  0.031394  0.002785  3.077050   
3  RI24-1004  DartmouthID_duffy-030625-004  0.034951  0.002742  3.127813   
4  RI24-1005  DartmouthID_duffy-030625-005  0.035640  0.002747  3.189687   

    Fe_mg_L    K_mg_L   Mg_mg_L   Mn_mg_L   Na_mg_L    P_mg_L    S_mg_L  \
0  0.024847  0.162434  0.436963  0.005938  0.495754  0.002390  0.827794   
1  0.020288  0.172870  0.438012  0.005298  0.479873  0.002872  0.847052   
2  0.021433  0.133111  0.441800  0.003858  0.523873  0.004211  0.854399   
3  0.020748  0.151967  0.450640  0.003541  0.547715  0.001425  0.865910   
4  0.022054  0.125264  0.464699  0.004125  0.603753  0.001874  0.909064   

    Si_mg_L   Sr_mg_L   Zn_mg_L  Cu_mg_L  
0  2.582697  0.0