## End member mixing analysis (EMMA) to determine streamflow source contributions

### Finally running an EMMA using linear regression.​

#### Here we started with solute data from Hungerford Brook late winter/early spring flow events captured with ISCOs. Data include:
- ICP-OES (Al, Ca, Cu, Fe, K, Mg, Mn, Na, P, Zn, Si
- IC and total elemental analyser data (Cl, SO4, NO3, PO4, TOC, DIN)
- Stable isotopes (dD, d18O)

Data are from the BREE OneDrive directory (Watershed Data>1_Projects>EMMA>Working file for MATLAB 2023)

- For HB 2022 timeseries, 17 parameters total
- 5 were found to be relatively conservative: Cl, Ca, Na, Si, and Mg
- See "bivariates" notebook for those plots

This code does a linear regression to estimate the contributions of each endmember to the observed streamflow concentrations during the specific storm events.

In [1]:
import os
os.chdir("/home/millieginty/Documents/git-repos/EMMA/")

In [13]:
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Load streamflow data from the CSV file
streamflow_data = pd.read_csv("data/Data_for_EMMA_2022_HB.csv")

# Load potential endmembers from the separate CSV file
endmember_data = pd.read_csv("data/end_members_2022_HB_mean_for_emma.csv")

# Select the specific parameters of interest
selected_parameters = ['Ca_mg_L', 'Cl_mg_L', 'Si_mg_L', 'Na_mg_L', 'Mg_mg_L']

# Extract the subset of data for selected parameters in endmembers
subset_endmembers = endmember_data[selected_parameters]

# Standardize the endmember data (mean=0 and variance=1)
scaler = StandardScaler()
scaled_endmembers = scaler.fit_transform(subset_endmembers)

# Apply PCA to endmember data
pca_endmembers = PCA(n_components=2)
pca_result_endmembers = pca_endmembers.fit_transform(scaled_endmembers)

# Apply PCA to streamflow data
pca_streamflow = PCA(n_components=2)
pca_result_streamflow = pca_streamflow.fit_transform(scaler.transform(streamflow_data[selected_parameters]))

# Use linear regression to calculate contributions
model = LinearRegression()
model.fit(pca_result_endmembers, pca_result_streamflow)

# Get coefficients as contributions
contributions = model.coef_

print("Contributions of endmembers to streamflow:")
print(contributions)

ValueError: Found input variables with inconsistent numbers of samples: [3, 69]

In [14]:
pca_result_endmembers

array([[-1.10430701,  1.70119174],
       [-1.42437853, -1.56344972],
       [ 2.52868554, -0.13774202]])

In [15]:
pca_result_streamflow

array([[ 1.15700647e+00, -6.02870826e-02],
       [-1.03186197e+00,  2.30868785e-02],
       [-1.08420317e+00,  8.50689520e-02],
       [-9.89689734e-01,  1.06117241e-02],
       [-8.77731330e-01,  5.63450339e-02],
       [-7.61073082e-01,  9.56165868e-02],
       [ 1.27029031e+00, -1.53849139e-02],
       [ 1.26974597e+00, -2.58840492e-02],
       [ 1.17377780e+00, -6.30172220e-02],
       [-9.77996580e-01,  7.77964294e-02],
       [-9.37789998e-01,  1.47747245e-01],
       [-8.33584584e-01,  9.59147790e-02],
       [-4.64894334e-01, -5.38692489e-03],
       [ 4.64905181e-01, -4.29023690e-02],
       [-8.55736013e-01, -7.80976341e-04],
       [-4.50297884e-01,  1.07932155e-01],
       [ 7.14174415e-01, -8.28279651e-02],
       [-1.00913092e+00,  1.24666774e-01],
       [-9.99854585e-01,  1.53289551e-01],
       [ 2.90692871e-01, -2.13168492e-02],
       [ 9.45148531e-01, -4.66037743e-02],
       [ 7.44449228e-01, -1.37998465e-01],
       [ 1.08831797e+00,  4.29446549e-02],
       [-6.

Now we want to use an optimization approach to solve for the contribution fraction matrix, F, and minimize the difference between the observed mixture and the modeled mixture. We will minimize the sum of squared residuals. In a least squares sense, this looks like:

minimize∣∣O−E⋅F∣∣<sup>2</sup>

In [3]:
import numpy as np
from scipy.optimize import minimize

# Assuming you have your observed mixture (O), endmember matrix (E), and initial guess for fractions (F)
O = np.array([...])  # Your observed mixture vector
E = np.array([...])  # Your endmember matrix
F_initial_guess = np.array([...])  # Your initial guess for fractions

# Define the objective function (sum of squared residuals)
def objective_function(F):
    return np.sum((O - np.dot(E, F))**2)

# Define constraints (fractions should be >= 0 and sum to 1)
constraints = ({'type': 'ineq', 'fun': lambda F: F},
               {'type': 'eq', 'fun': lambda F: np.sum(F) - 1})

# Solve the optimization problem
result = minimize(objective_function, F_initial_guess, constraints=constraints)

# The result.x contains the optimal fractions
optimal_fractions = result.x

TypeError: float() argument must be a string or a number, not 'ellipsis'