<h1 style='font-size: 45px; color: crimson; font-family: Dubai; font-weight: 600'>Regression Analysis</h1>

The regression model  in this analysis is a **linear regression model** with interaction terms to quantify the effect of **Fertilizer** and **Irrigation** on **Yield**. Linear regression is chosen because it is effective in modeling relationships between continuous dependent variables (in this case, yield) and independent variables (fertilizer type, irrigation method). By including interaction terms (such as the interaction between fertilizer type and irrigation method), the model allows us to assess not only the main effects of each independent variable but also how they combine to influence the yield. This helps identify any significant differences or synergies between different combinations of fertilizer and irrigation practices, providing actionable insights for improving crop yield under varying conditions. The model is evaluated using p-values and coefficients to determine the significance and strength of the relationships, helping inform agricultural practices for optimal outcomes.

<h2 style='font-size: 25px; color: crimson; font-family: Dubai; font-weight: 600'>Import Required Libraries</h2>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import squarify 
import seaborn as sns

import statsmodels.api as sm
from statsmodels.formula.api import ols
from scipy.stats import stats
print('Libraries loaded Succesfully')

Libraries loaded Succesfully


<h2 style='font-size: 25px; color: crimson; font-family: Dubai; font-weight: 600'>Import Dataset</h2>

In [14]:
df = pd.read_csv('Datasets/Hypothesis 101.csv')
df.head()

Unnamed: 0,Fertilizer,Yield (tones/ha),Days to Maturity,Biomass,Dry matter,Irrigation
0,C,38.87497,6.26185,2.525738,75.436679,Furrow Irrigation
1,A,29.921425,7.194156,2.594134,51.828069,Sprinkler Irrigation
2,C,41.152436,7.329974,3.76043,49.058974,Drip Irrigation
3,C,42.161544,7.137822,3.340263,46.778604,Furrow Irrigation
4,A,36.715841,6.61532,3.701663,57.817993,Furrow Irrigation


<h2 style='font-size: 25px; color: crimson; font-family: Dubai; font-weight: 600'>Renaming df columns</h2>

In [15]:
df = df.rename(columns={'Yield (tones/ha)': 'Yield'})

In [16]:
df.columns

Index(['Fertilizer', 'Yield', 'Days to Maturity', 'Biomass', 'Dry matter',
       'Irrigation'],
      dtype='object')


<h2 style='font-size: 25px; color: crimson; font-family: Dubai; font-weight: 600'>Interaction Effect: Irrigation and Fertilizer to yield</h2>


The formula for the model is:

$\text{Yield} = \beta_0 + \beta_1 \cdot \text{Fertilizer} + \beta_2 \cdot \text{Irrigation} + \beta_3 \cdot (\text{Fertilizer} \times \text{Irrigation}) + \epsilon$

Where:
- **Fertilizer** and **Irrigation** are categorical independent variables.
- $((\text{Fertilizer} \times \text{Irrigation}))$ is the interaction term.
- $(\beta_0), (\beta_1), (\beta_2)$, and $(\beta_3)$ are coefficients estimated by the model.


In [19]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Define the regression formula
formula = "Yield ~ C(Fertilizer) + C(Irrigation) + C(Fertilizer):C(Irrigation)"

# Fit the model
model = ols(formula, data=df).fit()

# Print the summary of the regression
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  Yield   R-squared:                       0.466
Model:                            OLS   Adj. R-squared:                  0.446
Method:                 Least Squares   F-statistic:                     22.88
Date:                Fri, 03 Jan 2025   Prob (F-statistic):           1.90e-33
Time:                        21:25:37   Log-Likelihood:                -905.32
No. Observations:                 300   AIC:                             1835.
Df Residuals:                     288   BIC:                             1879.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
                                                                coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------


<h2 style='font-size: 25px; color: crimson; font-family: Dubai; font-weight: 600'>Automate for Multiple Models</h2>

If you want to test the regression for multiple dependent variables (e.g., Yield, Biomass), you can automate it:

In [21]:
def regression_analysis(data, dependent_columns, factor1, factor2):
    results = []
    for dependent_var in dependent_columns:
        formula = f"{dependent_var} ~ C({factor1}) + C({factor2}) + C({factor1}):C({factor2})"
        model = ols(formula, data=data).fit()
        
        for index, row in model.summary2().tables[1].iterrows():
            results.append({
                "Dependent Variable": dependent_var,
                "Factor": index,
                "Coef.": row["Coef."],
                "Std.Err.": row["Std.Err."],
                "t": row["t"],
                "p-value": row["P>|t|"],
                "95% CI Lower": row["[0.025"],
                "95% CI Upper": row["0.975]"]
            })
    
    return pd.DataFrame(results)

# List of dependent variables
dependent_vars = ["Yield", "Biomass"]

# Run regression analysis
regression_results_df = regression_analysis(df, dependent_vars, "Fertilizer", "Irrigation")
#regression_results_df.to_csv("regression_results.csv", index=False)
regression_results_df

Unnamed: 0,Dependent Variable,Factor,Coef.,Std.Err.,t,p-value,95% CI Lower,95% CI Upper
0,Yield,Intercept,31.19449,1.076461,28.978747,2.334911e-87,29.075762,33.313219
1,Yield,C(Fertilizer)[T.B],3.394082,1.427527,2.377595,0.01807877,0.584373,6.203791
2,Yield,C(Fertilizer)[T.C],10.365909,1.427527,7.261444,3.588447e-12,7.556199,13.175618
3,Yield,C(Irrigation)[T.Furrow Irrigation],-1.080913,1.540362,-0.701726,0.4834165,-4.112708,1.950882
4,Yield,C(Irrigation)[T.Sprinkler Irrigation],-1.937047,1.490293,-1.299776,0.1947171,-4.870293,0.9962
5,Yield,C(Irrigation)[T.Subsurface Irrigation],-2.325955,1.398364,-1.66334,0.09733215,-5.078263,0.426354
6,Yield,C(Fertilizer)[T.B]:C(Irrigation)[T.Furrow Irri...,2.228763,2.077014,1.073061,0.2841419,-1.859288,6.316814
7,Yield,C(Fertilizer)[T.C]:C(Irrigation)[T.Furrow Irri...,1.325821,2.006064,0.660907,0.5092006,-2.622585,5.274227
8,Yield,C(Fertilizer)[T.B]:C(Irrigation)[T.Sprinkler I...,3.646485,2.040157,1.787355,0.07493166,-0.369025,7.661994
9,Yield,C(Fertilizer)[T.C]:C(Irrigation)[T.Sprinkler I...,0.015697,2.063688,0.007606,0.9939364,-4.046125,4.077519
