<h3>1- ONE FACTORS EXPERIMENTS DESIGN ON THE AGRICULTURE DATASET ( CRD , RBD, LSD)</h3>

<h5>1-Agriculture Completely Randomized Design (CRD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset comprises observations on three crop varieties (Tomato, Carrot, Cucumber) across 20 replications, each associated with a growth measurement. The dataset follows a balanced experimental design, allowing for comparisons of average growth and variability within each crop. It is suitable for analyses aimed at understanding and comparing the growth characteristics of the three crops over different replication periods.

In [None]:
# Import necessary libraries
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt

In [35]:
# Load Agriculture Completely Randomized Design (CRD) dataset
Data_Ag_CRD = pd.read_csv("CRD_1.csv")
# Display the first few rows of the CRD dataset
Data_Ag_CRD.head()

Unnamed: 0,Crop,Replication,Growth
0,Tomato,1,28
1,Carrot,1,35
2,Cucumber,1,30
3,Tomato,2,31
4,Carrot,2,38


Null Hypothesis (H0): There is no significant difference in the mean growth of plants across different crop types.

Alternative Hypothesis (H1): There is a significant difference in the mean growth of plants across different crop types.

From our dataset :

Dependent Variable (Response Variable): Growth , Independent Variable (Factor): Crop 

###### TEST ANOVA ON THE MODEL

In [37]:
# Fit the ANOVA model for Agriculture CRD
model_Data_Ag_CRD = ols('Growth ~ Crop', data=Data_Ag_CRD).fit()

# Print ANOVA table for Agriculture CRD
anova_table_model_Data_Ag_CRD = sm.stats.anova_lm(model_Data_Ag_CRD, typ=2)
anova_table_model_Data_Ag_CRD

Unnamed: 0,sum_sq,df,F,PR(>F)
Crop,520.0,2.0,123.5,1.9078149999999998e-21
Residual,120.0,57.0,,


The p-value associated with the F-statistic for the $\textit{Crop}$ factor is extremely small, leading to the rejection of the null hypothesis.

Strong evidence suggests that the mean $\textit{Growth}$ values for the different crop types are not equal.
The specific crops (Tomato, Carrot, Cucumber) have a significant impact on the observed differences in plant growth.


In summary, the ANOVA analysis indicates that the choice of crop significantly influences the growth of plants, with differences between crop types being highly statistically significant.


<h5>2-Agriculture Randomized Block Design (RBD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset speaks to an rural explore including distinctive fertilizer medicines (Fertilizer_A, Fertilizer_B, Fertilizer_C) connected to soil pieces in different replications. The recorded variable is the surrender gotten from each combination of treatment, soil square, and replication. The dataset structure recommends a factorial test plan with three variables: Treatment, SoilBlock, and Replication, making it appropriate for analyzing the impacts of diverse fertilizers on edit abdicate over different conditions.

In [19]:
# Load Agriculture Randomized Block Design (RBD) dataset
Data_Ag_RBD = pd.read_csv("RBD_1.csv")

# Display the first few rows of the RBD dataset
Data_Ag_RBD.head()

Unnamed: 0,Treatment,SoilBlock,Replication,Yield
0,Fertilizer_A,1,1,15
1,Fertilizer_B,1,1,18
2,Fertilizer_C,1,1,14
3,Fertilizer_A,2,2,17
4,Fertilizer_B,2,2,16


Null Hypothesis (H0): There is no significant difference in the mean crop yield among different treatments (fertilizers) and soil blocks.

Alternative Hypothesis (H1): There is a significant difference in the mean crop yield among different treatments (fertilizers) and soil blocks.

For our dataset:

Dependent Variable (Response Variable): Yield, Independent Variables (Factors): Treatment (Fertilizer), SoilBlock

###### TEST ANOVA ON THE MODEL

In [38]:
# Perform the RBD analysis using ANOVA for Agriculture
model_Data_Ag_RBD = ols('Yield ~ C(Treatment) + C(SoilBlock)', data=Data_Ag_RBD).fit()

# Print the ANOVA table for Agriculture RBD
anova_table_model_Data_Ag_RBD = sm.stats.anova_lm(model_Data_Ag_RBD, typ=2)
anova_table_model_Data_Ag_RBD

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Treatment),1.47619,2.0,0.190899,0.827024
C(SoilBlock),21.3,2.0,2.754494,0.076708
Residual,143.057143,37.0,,


The p-value associated with the F-statistic for the $\textit{Treatment}$ factor is 0.827024, which is greater than the significance level (e.g., 0.05). 
This suggests that there is no significant difference in the mean response variable among different treatments.
The p-value for the $\textit{SoilBlock}$ factor is 0.076708, which is close to the typical significance level. 
There is some evidence to suggest that the mean response variable may vary between different soil blocks, but the evidence is not strong enough to reach statistical significance at a conventional threshold.
The Residual row represents unexplained variability, and the F-statistic and p-value for the Residual row are not applicable.

In summary, the ANOVA analysis suggests that, based on the provided p-values, there is no strong evidence of significant differences in the mean response variable among different treatments. 
There is a suggestion that the choice of soil block might have some impact, but this evidence is not strong enough to be considered statistically significant at a typical threshold.

<h5>3-Agriculture Latin Square Design (LSD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset represents an agricultural experiment involving different crop varieties (Wheat, Barley, Corn) planted in rows and columns across multiple replications. The recorded variable is the growth observed for each combination of crop variety, row, column, and replication. The dataset structure suggests a factorial experimental design with four factors: CropVariety, Row, Column, and Replication. The data can be analyzed to understand the effects of different crop varieties, row positions, and column positions on crop growth, considering replication as a source of variation.

In [21]:
# Load Agriculture Latin Square Design (LSD) dataset
Data_Ag_LSD = pd.read_csv("LSD_1.csv")

# Display the first few rows of the LSD dataset
Data_Ag_LSD.head()

Unnamed: 0,CropVariety,Row,Column,Replication,Growth
0,Wheat,1,1,1,15
1,Barley,2,2,1,18
2,Corn,3,3,1,14
3,Wheat,2,3,2,17
4,Barley,3,1,2,16


Null Hypothesis (H0): There is no significant difference in the mean plant growth among different crop varieties, rows, columns, and replications.

Alternative Hypothesis (H1): There is a significant difference in the mean plant growth among different crop varieties, rows, columns, and replications.

In the context of the dataset you provided:

Dependent Variable (Response Variable): Growth, Independent Variables (Factors): CropVariety, Row, Column, Replication

######  TEST ANOVA ON THE MODEL

In [39]:
# Perform ANOVA for Agriculture LSD
model_Data_Ag_LSD = sm.formula.ols('Growth ~ C(CropVariety) + C(Row) + C(Column)', data=Data_Ag_LSD).fit()

# Print ANOVA table for Agriculture LSD
anova_table_model_Data_Ag_LSD = sm.stats.anova_lm(model_Data_Ag_LSD)

# Print the ANOVA table for Agriculture LSD
anova_table_model_Data_Ag_LSD

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(CropVariety),2.0,20.066667,10.033333,3.964801,0.033156
C(Row),2.0,13.145455,6.572727,2.597298,0.096153
C(Column),2.0,9.550699,4.77535,1.887041,0.174242
Residual,23.0,58.203846,2.530602,,


For CropVariety, the p-value is less than 0.05, indicating a significant difference in mean response among different crop varieties.
For Row, the p-value is close to 0.05, suggesting some evidence of a difference, but not strong enough to reach statistical significance.
For Column, the p-value is greater than 0.05, suggesting no significant difference in mean response among different columns.
The Residual row represents unexplained variability, and the F-statistic and p-value for the Residual row are not applicable.

In conclusion, the type of crop variety has a significant impact on the mean response, while the row and column factors show mixed evidence, with row having a borderline significance and column not being significant at the 0.05 threshold.

<h3>2- ONE FACTOR EXPERIMENT DESIGN ON HEATH DATASET ( CRD, RBD, LSD)</h3>

<h5>1-Health Completely Randomized Design (CRD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset speaks to an explore including distinctive treatment sorts (TherapyA, TherapyB, Control) managed over different replications, with the recorded variable being the recuperation time. Each push within the dataset compares to a special combination of treatment sort and replication, and the related recuperation time recorded.

In [23]:
# Load Health Completely Randomized Design (CRD) dataset
Data_He_CRD = pd.read_csv("CRD_2.csv")

# Display the first few rows of the Health CRD dataset
Data_He_CRD.head()

Unnamed: 0,TreatmentType,Replication,RecoveryTime
0,TherapyA,1,10
1,TherapyB,1,12
2,Control,1,11
3,TherapyA,2,9
4,TherapyB,2,11


Null Hypothesis (H0): There is no significant difference in the mean recovery time among different treatment types.

Alternative Hypothesis (H1): There is a significant difference in the mean recovery time among different treatment types.

In the context of the dataset you provided:

Dependent Variable (Response Variable): RecoveryTime, Independent Variable (Factor): TreatmentType

###### TEST ANOVA ON THE MODEL

In [40]:
# Fit the ANOVA model for Health CRD
model_Data_He_CRD = ols('RecoveryTime ~ TreatmentType', data=Data_He_CRD).fit()

# Print ANOVA table for Health CRD
anova_table_model_Data_He_CRD = sm.stats.anova_lm(model_Data_He_CRD, typ=2)
anova_table_model_Data_He_CRD

Unnamed: 0,sum_sq,df,F,PR(>F)
TreatmentType,0.428571,2.0,0.021039,0.979192
Residual,397.214286,39.0,,


For TreatmentType, the p-value is 0.979192, which is much greater than 0.05. This indicates that there is no significant difference in mean response among different treatment types.
The Residual row represents unexplained variability, and the F-statistic and p-value for the Residual row are not applicable.

In conclusion, based on this analysis, there is no evidence to reject the null hypothesis, suggesting that there is no significant difference in the mean response among different treatment types in the health dataset.

<h5>2-Health Randomized Block Design (RBD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset involves an experiment where different treatments (DrugA, DrugB, Placebo) are administered by various medical professionals (Doctor1, Doctor2, Doctor3) across multiple replications. The recorded variable is the recovery time for each patient.

Summary of the dataset:

 Treatments: DrugA, DrugB, Placebo;
 Medical Professionals: Doctor1, Doctor2, Doctor3;
 Replications: 1 to 12;
 Recovery Time: The time taken for recovery after each treatment

In [25]:
# Load Health Randomized Block Design (RBD) dataset
Data_He_RBD = pd.read_csv("RBD_2.csv")

# Display the first few rows of the Health RBD dataset
Data_He_RBD.head()

Unnamed: 0,Treatment,MedicalProfessional,Replication,RecoveryTime
0,DrugA,Doctor1,1,10
1,DrugB,Doctor1,1,12
2,Placebo,Doctor1,1,11
3,DrugA,Doctor2,2,9
4,DrugB,Doctor2,2,11


Null Hypothesis (H0): There is no significant difference in the mean recovery time among different treatments and medical professionals.

Alternative Hypothesis (H1): There is a significant difference in the mean recovery time among different treatments and medical professionals.

In the context of the dataset you provided:

Dependent Variable (Response Variable): RecoveryTime, Independent Variables (Factors): Treatment, MedicalProfessional

###### TEST ANOVA ON THE MODEL

In [41]:
# Perform RBD analysis using ANOVA for Health
model_Data_He_RBD = ols('RecoveryTime ~ C(Treatment) + C(MedicalProfessional)', data=Data_He_RBD).fit()

# Print ANOVA table for Health RBD
anova_table_model_Data_He_RBD = sm.stats.anova_lm(model_Data_He_RBD, typ=2)
anova_table_model_Data_He_RBD

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Treatment),0.055556,2.0,0.002863,0.997141
C(MedicalProfessional),28.388889,2.0,1.462966,0.247095
Residual,300.777778,31.0,,


For Treatment, the p-value is 0.997141, which is much greater than 0.05. This indicates that there is no significant difference in mean response among different treatments.
For MedicalProfessional, the p-value is 0.247095, which is greater than 0.05. This suggests that there is no strong evidence to reject the null hypothesis for the MedicalProfessional factor.
The Residual row represents unexplained variability, and the F-statistic and p-value for the Residual row are not applicable.

In conclusion, based on this analysis, there is no significant difference in mean response among different treatments or medical professionals in the health dataset

<h5>3-Health Latin Square Design (LSD) dataset</h5>

<h5>Description of the dataset</h5>

The dataset involves an experiment with different treatments (WeightGainDrug_A, WeightLossDrug_B, Control) administered in various rows and columns across multiple replications. The recorded variable is the recovery time for each patient.

Summary of the dataset:

 Treatments: WeightGainDrug_A, WeightLossDrug_B, Control;
 Rows: 1 to 3;
 Columns: 1 to 3;
 Replications: 1 to 13;
 Recovery Time: The time taken for recovery after each treatment

In [27]:
# Load Health Latin Square Design (LSD) dataset
Data_He_LSD = pd.read_csv("LSD_2.csv")

# Display the first few rows of the Health LSD dataset
Data_He_LSD.head()

Unnamed: 0,Treatment,Rows,Columns,Replication,RecoveryTime
0,WeightGainDrug_A,1,1,1,10
1,WeightLossDrug_B,2,2,1,12
2,Control,3,3,1,11
3,WeightGainDrug_A,2,3,2,9
4,WeightLossDrug_B,3,1,2,11


Null Hypothesis (H0): There is no significant difference in the mean recovery time among different treatments, rows, columns, and replications.

Alternative Hypothesis (H1): There is a significant difference in the mean recovery time among different treatments, rows, columns, and replications.

In the context of the dataset you provided:

Dependent Variable (Response Variable): RecoveryTime, Independent Variables (Factors): Treatment, Rows, Columns, Replication

######  TEST ANOVA ON THE MODEL

In [42]:
# Create ANOVA model for Health LSD
model_Data_He_LSD = ols('RecoveryTime ~ C(Treatment) + C(Rows) + C(Columns) + C(Replication)', data=Data_He_LSD).fit()

# Perform ANOVA analysis for Health LSD
anova_table_model_Data_He_LSD = sm.stats.anova_lm(model_Data_He_LSD, typ=2)

# Print ANOVA table for Health LSD
anova_table_model_Data_He_LSD

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Treatment),1.732729,2.0,0.286359,0.754021
C(Rows),2.59766,2.0,0.429302,0.656822
C(Columns),1.382479,2.0,0.228475,0.797795
C(Replication),276.933861,12.0,7.6279,4.3e-05
Residual,60.508976,20.0,,


For Treatment, Rows, and Columns, the p-values are greater than 0.05, suggesting that there is no significant difference in mean response among different levels of these factors.
For Replication, the p-value is 0.000043, which is less than 0.05. This indicates a significant difference in mean response among different replications.
The Residual row represents unexplained variability, and the F-statistic and p-value for the Residual row are not applicable.

In conclusion, the LSD analysis suggests a significant difference in mean response among different replications in the health dataset, while no significant differences are observed for Treatment, Rows, and Columns.

## REFERENCES

https://www.easybiologyclass.com/types-of-experimental-designs-in-statistics-rbd-crd-lsd-factorial-designs/

https://iastate.pressbooks.pub/quantitativeplantbreeding/chapter/randomized-complete-block-design/

https://epirhandbook.com/en/descriptive-tables.html