# Compositional Nutrient Diagnosis

Compositional Nutrient Diagnosis (CND) is the multivariate expansion of CVA and DRIS
and is fully compatible with PCA. CND nutrient indices
are composed of two separate functions, one considering differences between nutrient levels, another examining
differences between nutrient balances (as defined by nutrient geometric means), of individual and target specimens.
These functions indicate that nutrient insufficiency can be corrected by either adding a single nutrient or taking
advantage of multiple nutrient interactions to improve nutrient balance as a whole.

In [27]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

## Input concentrations of the nutrients

In [28]:
data_diagnosed_dict = {
    'P': [12,11,10, 9],
    'Mg': [2,2,3,3],
    'N': [1,2,1,2],
    'Ca': [3,4,3,3],
    'Mn': [4,5,4,2]
}

data_optimum_dict = {
    'p': [12,11,10],
    'mg': [2,1,3],
    'n': [2,1,1],
    'ca': [4,4,3],
    'mn': [5,4,4],
}

df_diagnosed = pd.DataFrame(data_diagnosed_dict)
df_optimum = pd.DataFrame(data_optimum_dict)

## The z-values $z_i$

The z-values are defined as $z_i = \log(x_i/g(x))$

- $g(x)$ is the geometric mean

In [29]:
def calculate_z(df):
    ''' Calculates z for CND analysis on a DataFrame input.
      Args:
        df (pd.DataFrame): DataFrame with columns representing nutrient concentrations for each plant.
      Returns:
        pd.DataFrame: DataFrame containing z values with column names prefixed by "z_".
    '''
    # Normalize each row so that the sum of nutrients is 1
    row_sums = df.sum(axis=1)
    x = df.div(row_sums, axis=0)

    # Calculate the geometric mean for each row
    g = x.prod(axis=1)**(1/x.shape[1])

    # Compute z values
    z = np.log(x.div(g, axis=0))

    # Rename columns to reflect that they are z values
    z.columns = [f'z_{col}' for col in df.columns]

    return z

z_population = calculate_z(df_diagnosed)
z_population.head()

Unnamed: 0,z_P,z_Mg,z_N,z_Ca,z_Mn
0,1.352315,-0.439445,-1.132592,-0.03398,0.253702
1,1.041911,-0.662837,-0.662837,0.03031,0.253454
2,1.125364,-0.078609,-1.177221,-0.078609,0.209074
3,1.041076,-0.057536,-0.463002,-0.057536,-0.463002


## The CND index $I_{z_i}$

The CND index is given by $I_{z_i}=(Z_i - z_i) / \sigma_{z_i}$
- $Z_i$ is the z-value of the **test** population for nutrient $i$
- $z_i$ is the z-value of the **target** population for nutrient $i$
- $\sigma_{z_i}$ is the standard deviation of the z-value of the **target** population for nutrient $i$


This index $I_{z_i}$ is the differenz of the z-values normalized by the standart deviation of the target population.
Therefore, for each nutrient, $I_{z_i}$ measures the distance between the test and target population.

The normalization with $\sigma_{z_i}$ is to have a sensible scale. If a nutrient of the target population has a large standard deviation it means that the range of 'acceptable' nutrient amount is large. As a result the $I_{z_i}$ for this nutrient is scaled down (smaller) for this nutrient.

The interpretation of the the index $I_{z_i}$:
- $I_{z_i} < 0$: relative nutrient insufficiency
- $I_{z_i} = 0$: relative nutrient balance
- $I_{z_i} > 0$: relative nutrient excess


We can deepen the dicussion of $I_{z_i}$ by writing it as a sum and analysing each term:

$I_{z_i} = \frac{1}{\sigma_{z_i}} \biggl[\underbrace{\log\left( \frac{X_i}{x_i} \right)}_{ f(X_i)} + \underbrace{ \log\left( \frac{g(X_i)}{g(x_i)} \right)}_{ f(g(X_i))} \biggr]$

- The first term $f(X_i) = \log\left( \frac{X_i}{x_i} \right)$ is only dependent on the **individual** nutrient
- The second term $f(X_i) = \log\left( \frac{g(X_i)}{g(x_i)} \right)$ is only dependent on the geometric means $g$ therefore takes into acount **every** nutrient

In [30]:
def calculate_I(df_diagnosed, df_optimum):
    ''' 
    Calculates I for CND analysis using DataFrame inputs, based on the mean z values.
      Args:
        df_diagnosed (pd.DataFrame): DataFrame with nutrient concentrations of the diagnosed population.
        df_optimum (pd.DataFrame): DataFrame with nutrient concentrations of the optimum (target) population.
      Returns:
        pd.DataFrame: DataFrame containing I values with column names prefixed by "I_".
    '''
    # Calculate z values for diagnosed and optimum using the calculate_z function
    z_diagnosed = calculate_z(df_diagnosed)
    z_optimum = calculate_z(df_optimum)

    # Calculate mean z values for each nutrient
    mean_z_diagnosed = z_diagnosed.mean(axis=0)           # mean of the log values is taken
    mean_z_optimum = z_optimum.mean(axis=0)

    # Calculate standard deviations for each nutrient across optimum rows
    stds = z_optimum.std(axis=0)
    
    # Calculate I values for each nutrient using the mean z values
    I_values = (mean_z_diagnosed.values - mean_z_optimum.values) / stds.values

    # Convert the I values to a DataFrame and rename columns to reflect they are I values
    I_values_df = pd.DataFrame([I_values], columns=[f'{col}' for col in df_diagnosed.columns])

    return I_values_df

Is = calculate_I(df_diagnosed, df_optimum)
Is.index = ['CND']

optimum_stds = df_optimum.std()  # Compute standard deviations

# Convert Series to DataFrame and set index
optimum_stds_df = pd.DataFrame(optimum_stds, columns=["Standard Deviation optimum"])
optimum_stds_df.index = Is.columns


df_nutrients = pd.concat([Is, optimum_stds_df.T])

df_nutrients.head()

Unnamed: 0,P,Mg,N,Ca,Mn
CND,-0.423486,0.596274,0.410194,-0.574762,-2.662266
Standard Deviation optimum,1.0,1.0,0.57735,0.57735,0.57735


## Ranges for DRIS and CND

In [31]:
def operation(column):
    I = column.iloc[0]
    sd = column.iloc[1]
    if I < -4/3*sd:
        return "deficiency"
    elif I < -2/3*sd:
        return "tendency to deficiency"
    elif I < 2/3*sd:
        return "sufficient"
    elif I < 4/3*sd:
        return "tendency to excess"
    else:
        return "excess"
    
df_nutrients.loc['Interpretation CND'] = df_nutrients.apply(operation, axis=0)
df_nutrients

Unnamed: 0,P,Mg,N,Ca,Mn
CND,-0.423486,0.596274,0.410194,-0.574762,-2.662266
Standard Deviation optimum,1.0,1.0,0.57735,0.57735,0.57735
Interpretation CND,sufficient,sufficient,tendency to excess,tendency to deficiency,deficiency
