## End member mixing analysis (EMMA) to determine streamflow source contributions

### Finally running an EMMA using linear regression.​

#### Here we started with solute data from Hungerford Brook late winter/early spring flow events captured with ISCOs. Data include:
- ICP-OES (Al, Ca, Cu, Fe, K, Mg, Mn, Na, P, Zn, Si
- IC and total elemental analyser data (Cl, SO4, NO3, PO4, TOC, DIN)
- Stable isotopes (dD, d18O)

Data are from the BREE OneDrive directory (Watershed Data>1_Projects>EMMA>Working file for MATLAB 2023)

- For HB 2022 timeseries, 17 parameters total
- 5 were found to be relatively conservative: Cl, Ca, Na, Si, and Mg
- See "bivariates" notebook for those plots

This code does a linear regression to estimate the contributions of each endmember to the observed streamflow concentrations during the specific storm events.

In [30]:
import os
os.chdir("/home/millieginty/Documents/git-repos/EMMA/")

In [31]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from matplotlib.path import Path
import numpy as np

# Load streamflow data with PCA results

streamwater_data = pd.read_csv("data/Data_for_EMMA_2022_HB.csv")
pca_results = pd.read_csv("analysis/pca_result_streamwater.csv")
endmember_data = pd.read_csv("data/end_members_2022_HB_mean_for_emma.csv")

# Convert the 'Date' column to datetime
streamwater_data['Date'] = pd.to_datetime(streamwater_data['Date'], format='%m/%d/%y')

# Subset data for each storm event
event_A_data = streamwater_data[(streamwater_data['Date'] >= '2022-02-16') & (streamwater_data['Date'] <= '2022-02-23')]
event_B_data = streamwater_data[(streamwater_data['Date'] >= '2022-03-05') & (streamwater_data['Date'] <= '2022-03-10')]
event_C_data = streamwater_data[(streamwater_data['Date'] >= '2022-03-16') & (streamwater_data['Date'] <= '2022-03-22')]
event_D_data = streamwater_data[(streamwater_data['Date'] >= '2022-04-07') & (streamwater_data['Date'] <= '2022-04-11')]

# Select the specific parameters of interest
selected_parameters = ['Ca_mg_L', 'Cl_mg_L', 'Si_mg_L', 'Na_mg_L', 'Mg_mg_L']

# Linear regression

def calculate_contributions(event_data, pca_results, endmember_data):
    # Extract relevant PCA results for the event
    event_pca = pca_results.loc[event_data.index]

    # Standardize the endmember data
    scaler = StandardScaler()
    scaled_endmembers = scaler.fit_transform(endmember_data[selected_parameters])

    # Fit linear regression model
    model = LinearRegression()
    model.fit(scaled_endmembers, event_data[selected_parameters])

    # Get coefficients as contributions
    contributions = model.coef_

    return contributions

# Calculate contributions for each event
contributions_A = calculate_contributions(event_A_data, pca_results, endmember_data)
contributions_B = calculate_contributions(event_B_data, pca_results, endmember_data)
contributions_C = calculate_contributions(event_C_data, pca_results, endmember_data)
contributions_D = calculate_contributions(event_D_data, pca_results, endmember_data)

Now we want to use an optimization approach to solve for the contribution fraction matrix, F, and minimize the difference between the observed mixture and the modeled mixture. We will minimize the sum of squared residuals. In a least squares sense, this looks like:

minimize∣∣O−E⋅F∣∣<sup>2</sup>

In [None]:
import numpy as np
from scipy.optimize import minimize

# Assuming you have your observed mixture (O), endmember matrix (E), and initial guess for fractions (F)
O = np.array([...])  # Your observed mixture vector
E = np.array([...])  # Your endmember matrix
F_initial_guess = np.array([...])  # Your initial guess for fractions

# Define the objective function (sum of squared residuals)
def objective_function(F):
    return np.sum((O - np.dot(E, F))**2)

# Define constraints (fractions should be >= 0 and sum to 1)
constraints = ({'type': 'ineq', 'fun': lambda F: F},
               {'type': 'eq', 'fun': lambda F: np.sum(F) - 1})

# Solve the optimization problem
result = minimize(objective_function, F_initial_guess, constraints=constraints)

# The result.x contains the optimal fractions
optimal_fractions = result.x