# Logarithmic-DRIS Analysis

Diagnosis and Recommendation Integrated System (DRIS) implemented according to "Establishment of DRIS and CND Standards for Fertigated ‘Prata’
Banana in the Northeast, Brazil" Antonio João de Lima Neto" et al.

In [38]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from itertools import combinations

In [39]:
data_diagnosed_dict = {
    'P': [12,11,10,9],
    'Mg': [2,2,3,3],
    'N': [1,2,1,2],
    'Ca': [3,4,3,3],
    'Mn': [4,5,4,2]
}

data_optimum_dict = {
    'p': [12,11,10],
    'mg': [2,1,3],
    'n': [2,1,1],
    'ca': [4,4,3],
    'mn': [5,4,4],
}

df_diagnosed = pd.DataFrame(data_diagnosed_dict)
df_optimum = pd.DataFrame(data_optimum_dict)

df_optimum.describe()

Unnamed: 0,p,mg,n,ca,mn
count,3.0,3.0,3.0,3.0,3.0
mean,11.0,2.0,1.333333,3.666667,4.333333
std,1.0,1.0,0.57735,0.57735,0.57735
min,10.0,1.0,1.0,3.0,4.0
25%,10.5,1.5,1.0,3.5,4.0
50%,11.0,2.0,1.0,4.0,4.0
75%,11.5,2.5,1.5,4.0,4.5
max,12.0,3.0,2.0,4.0,5.0


## Nutrient functions $f(\frac{A}{B})$

The nutrient function is defined as

\begin{align}
f\biggl(\frac{A}{B}\biggr) = \biggl[\biggl(\frac{A}{B} - \frac{a}{b}\biggr)\biggr] \cdot \frac{c}{s}
\end{align}

where
- $\frac{A}{B}$ is the (average) of the logarithm of the ratio between the nutrient $A$ and $B$ in the sample
- $\frac{a}{b}$ is the average of the logarithm ratio between the nutrient $a$ and $b$ in the population of high yield
- $s$ ist the standard deviation of the ratio between $a$ and $b$ of the nutrients in the population of high yield
- c is the sensitivity coefficients (=1)


**Comments:**
1. $A/B = \mathrm{avg}(\ln(c_A/c_B))$ where $c_{A}/c_{B}$ is the nutrient content, A and B are the nutrient types. (Similar $a/b = \mathrm{avg}(\ln(c_a/c_b))$)
2. $s$ is as shorthhand for  $\sigma(\ln(a/b))$
3. In the publication only for a/b it is writen that it is the average. However this is probalby because only a single sample plant is considered. For a sample population we (most certanly) should take the average. 
4. Because of the log-transformation we do not have to consider different cases for the nutrient function $f$ as for the original DRIS approach


First we calculate the average of the log of the ratios of the nutrient content.
It is important that we keep the ordering (ration then log then mean)

In [40]:
def calculate_log_ratios(df):
    """calculates the natural logarithm of the ratios between all combinations"""
    ratios = {}
    for col1, col2 in combinations(df.columns, 2):
        ratio_name1 = f"{col1}/{col2}"
        ratios[ratio_name1] = np.log(df[col1] / df[col2]) # first ratio then log
    return pd.DataFrame(ratios)

In [41]:
df_optimum_log_ratios = calculate_log_ratios(df_optimum)
df_optimum_log_ratios_mean = pd.DataFrame(df_optimum_log_ratios.mean()).T # take the mean after log-transformation 

print("=====================================================================================")
print("Average of the logarithm of the ratios between nutrient contents of target population")
df_optimum_log_ratios_mean.head()

Average of the logarithm of the ratios between nutrient contents of target population


Unnamed: 0,p/mg,p/n,p/ca,p/mn,mg/n,mg/ca,mg/mn,n/ca,n/mn,ca/mn
0,1.797876,2.16408,1.104729,0.934453,0.366204,-0.693147,-0.863422,-1.059351,-1.229626,-0.170275


In [42]:
df_diagnosed_log_ratios = calculate_log_ratios(df_diagnosed)
df_diagnosed_log_ratios_mean = pd.DataFrame(df_diagnosed_log_ratios.mean()).T

print("========================================================================================")
print("Average of the logarithm of the ratios between nutrient contents of diagnosed population")
df_diagnosed_log_ratios_mean.head()

Average of the logarithm of the ratios between nutrient contents of diagnosed population


Unnamed: 0,P/Mg,P/N,P/Ca,P/Mn,Mg/N,Mg/Ca,Mg/Mn,N/Ca,N/Mn,Ca/Mn
0,1.449773,1.999079,1.17512,1.076859,0.549306,-0.274653,-0.372914,-0.823959,-0.92222,-0.098261


Next we calculate  

In [43]:
def f(df_diagnosed_log_ratios, df_optimium_log_ratios, sensitivity=1):
    """Calculate the nutrient function from the mean-log ratios for all combinations."""
    
    names = df_diagnosed_log_ratios.columns
    df_diagnosed_log_ratios = df_diagnosed_log_ratios.to_numpy()
    df_optimium_log_ratios = df_optimium_log_ratios.to_numpy()

    normalization_factor = sensitivity / df_optimium_log_ratios.std()

    def f_single(diagnosed_mean, optimum_mean):
        return (diagnosed_mean - optimum_mean) * normalization_factor

    # Calculate mean for each column
    df_diagnosed_log_ratios_mean = df_diagnosed_log_ratios.mean(axis=0)
    df_optimium_log_ratios_mean = df_optimium_log_ratios.mean(axis=0)

    f_dict = {}
    for i in range(len(names)):
        f_dict[f"f({names[i]})"] = f_single(df_diagnosed_log_ratios_mean[i], df_optimium_log_ratios_mean[i])
        
    return pd.DataFrame([f_dict])  # Return a DataFrame with one row for consistency

# Example usage
# Assuming df_diagnosed_log_ratios and df_optimium_log_ratios are defined with similar columns and structure
df_f = f(df_diagnosed_log_ratios, df_optimum_log_ratios, sensitivity=1)
df_f.head()

Unnamed: 0,f(P/Mg),f(P/N),f(P/Ca),f(P/Mn),f(Mg/N),f(Mg/Ca),f(Mg/Mn),f(N/Ca),f(N/Mn),f(Ca/Mn)
0,-0.286783,-0.135935,0.057992,0.117321,0.150848,0.344775,0.404104,0.193927,0.253256,0.059329


## Dris Index 

The DRIS index is defined for each nutrient as

$I_A = \frac{\sum f(\frac{A}{B}) -  \sum f(\frac{B}{A})}{n+m}$

- n is the number of DRIS functions in their direct form (A/B)
- m is the number of DRIS functions in their inverse form (B/A)

In [44]:
def calculate_I(df):
    """
    Calculate DRIS index for each nutrient in the DataFrame.
    
    Parameters:
        df (pd.DataFrame): DataFrame where each column represents a ratio in the format 'f(A/B)' or 'f(B/A)'
        
    Returns:
        pd.DataFrame: DataFrame with I DRIS.
    """
    # Initialize a dictionary to store IA values for each element
    I_DRIS_values = {}
    
    # Identify unique chemical elements involved
    elements = set()
    for col in df.columns:
        if '/' in col:
            ratio = col[2:-1]  # Extract the part inside the parentheses 'A/B'
            A, B = ratio.split('/')
            elements.add(A)
            elements.add(B)
    
    # Calculate I_DRIS for each unique element
    for element in elements:
        sum_f_A_over_B = 0
        sum_f_B_over_A = 0
        count_A_over_B = 0
        count_B_over_A = 0
        
        # Iterate through columns to accumulate sums for the current element
        for col in df.columns:
            if '/' in col:
                # Extract elements from the ratio
                ratio = col[2:-1]
                A, B = ratio.split('/')
                
                # Check if the current element is in the numerator or denominator
                if A == element:  # If the column is in the form f(element/...)
                    sum_f_A_over_B += df[col].sum()
                    count_A_over_B += 1
                elif B == element:  # If the column is in the form f(.../element)
                    sum_f_B_over_A += df[col].sum()
                    count_B_over_A += 1
        
        # Calculate n + m as the total number of ratios for the element
        n_m = count_A_over_B + count_B_over_A
        
        # Calculate I_DRIS for the element, avoid division by zero if n_m is zero
        I_DRIS_values[element] = (sum_f_A_over_B - sum_f_B_over_A) / n_m if n_m > 0 else None
    
    # Convert IA values to a DataFrame
    I_DRIS_df = pd.DataFrame(list(I_DRIS_values.items()), columns=["Element", "Log-DRIS"])
    I_DRIS_df.set_index("Element", inplace=True)
    I_DRIS_df.index.name = None
    
    return I_DRIS_df


# Calculate IA for each element
I_DRIS_df = calculate_I(df_f)

optimum_stds = df_optimum.std()  # Compute standard deviations

# Convert Series to DataFrame and set index
optimum_stds_df = pd.DataFrame(optimum_stds, columns=["Standard Deviation optimum"])
optimum_stds_df.index = I_DRIS_df.T.columns


df_nutrients = pd.concat([I_DRIS_df.T, optimum_stds_df.T])

df_nutrients.head()

Unnamed: 0,P,Ca,Mg,Mn,N
Log-DRIS,-0.061851,-0.134341,0.296627,-0.208502,0.108067
Standard Deviation optimum,1.0,1.0,0.57735,0.57735,0.57735


## DRIS index alternative

In [45]:
def calculate_index_value(index_element, df_diagnosed, df_f):
    ''' Calculates the DRIS index '''
    f_dict = df_f.to_dict('index')[0]
    elements = df_diagnosed.columns
    result = 0
    for i, element in enumerate(elements):
        if index_element < i:
            result += f_dict[f'f({elements[index_element]}/{element})']
        elif index_element > i:
            result -= f_dict[f'f({element}/{elements[index_element]})']
    return result/(len(elements)-1)


def calculate_all_index_values(df_diagnosed, df_f):
    """
    Calculate the index values for each element in the input DataFrame.
    
    Parameters:
        df_diagnosed (pd.DataFrame): DataFrame where each column represents a diagnosed nutrient.
        df_f (pd.DataFrame): DataFrame with calculated f(A/B) or f(B/A) values for each element ratio.
        
    Returns:
        pd.DataFrame: DataFrame with index values for each element, where each row corresponds to an element.
    """
    elements = df_diagnosed.columns
    results_dict = {}
    
    # Calculate index value for each element and store in results_dict
    for i, element in enumerate(elements):
        # Assuming calculate_index_value is a function that calculates the index for a single element
        index_value = calculate_index_value(i, df_diagnosed, df_f)
        results_dict[element] = index_value
    
    # Convert results_dict to a DataFrame with the specified format
    results_df = pd.DataFrame(list(results_dict.items()), columns=["Element", "I_DRIS"])
    results_df.set_index("Element", inplace=True)
    results_df.index.name = None

    return results_df

# Example usage
# df_diagnosed and df_f should be DataFrames with appropriate data for the calculation
DRIS_indices = calculate_all_index_values(df_diagnosed, df_f)


optimum_stds = df_optimum.std()  # Compute standard deviations

# Convert Series to DataFrame and set index
optimum_stds_df = pd.DataFrame(optimum_stds, columns=["Standard Deviation optimum"])
optimum_stds_df.index = I_DRIS_df.T.columns


df_nutrients = pd.concat([DRIS_indices.T, optimum_stds_df.T])

df_nutrients.head()

Unnamed: 0,P,Mg,N,Ca,Mn
I_DRIS,-0.061851,0.296627,0.108067,-0.134341,-0.208502
Standard Deviation optimum,1.0,0.57735,0.57735,1.0,0.57735


## Nutrient Balance Index (NBI)

$\mathrm{NBI} = |I_N| + |I_P| + \cdots + |I_{Na}|$

$\mathrm{NBI_m} = \frac{\mathrm{NBI}}{m}$ 

In [46]:
NBI = I_DRIS_df.abs().sum().to_numpy()
NBI_m = NBI/(len(I_DRIS_df))


print(f"NBI = {NBI[0]}")
print(f"NBI_m = {NBI_m[0]}")


NBI = 0.8093892639888514
NBI_m = 0.16187785279777028


## Ranges for DRIS and CND

In [49]:
def operation(column):
    I = column.iloc[0]
    sd = column.iloc[1]
    if I < -4/3*sd:
        return "deficiency"
    elif I < -2/3*sd:
        return "tendency to deficiency"
    elif I < 2/3*sd:
        return "sufficient"
    elif I < 4/3*sd:
        return "tendency to excess"
    else:
        return "excess"
    
df_nutrients.loc['Interpretation Log-DRIS'] = df_nutrients.apply(operation, axis=0)
df_nutrients

Unnamed: 0,P,Mg,N,Ca,Mn
I_DRIS,-0.061851,0.296627,0.108067,-0.134341,-0.208502
Standard Deviation optimum,1.0,0.57735,0.57735,1.0,0.57735
Interpretation Log-DRIS,sufficient,sufficient,sufficient,sufficient,sufficient
