# ** Compute Average Spectrum for Each Sample (by Sample ID) **

<div style="margin-top:10px; text-align:justify";>This function averages the replicate spectra for each sample in your dataset.

<br>💡How it works:</br>

- Groups all replicate columns by sample ID.

- Calculates the mean spectrum for each sample (i.e. the average intensity across all replicates at each wavelength)

- Creates a new DataFrame with:

    - One "wavelength" column

    - One column per sample, with its average spectrum

✅ Why it’s useful:
- Reduces random noise by averaging replicates

- Prepares data for visualisation or machine learning

- Makes the dataset more compact and easy to interpret

</div>

In [None]:
def natural_key(text):
    return [int(s) if s.isdigit() else s.lower() for s in re.split(r'(\d+)', text)]

def average_intensity_by_sample(df):
  
    measurements = df.columns[1:]  # skip 'wavelength'
    sample_names = sorted(set(col.split('_')[0] for col in measurements), key=natural_key)

    averaged_dfs = []
    for sample in sample_names:
        matching_cols = [col for col in measurements if col.startswith(f"{sample}_")]
        averaged = df[matching_cols].mean(axis=1)
        averaged_dfs.append(averaged.rename(sample))

    return pd.concat([df[["wavelength"]]] + averaged_dfs, axis=1)

SAMPLE_Averaged_full_df = average_intensity_by_sample(SAMPLE_SNV_all_df)
SAMPLE_Averaged_full_df = average_intensity_by_sample(SAMPLE_MSC_all_df)
SAMPLE_Averaged_full_df = average_intensity_by_sample(SAMPLE_TSN_all_df)
# SAMPLE_Averaged_full_df
# SAMPLE_Averaged_full_df.to_csv("Friederaveraged_SNV.csv", index=False)