# **`Statistics Advance-6`**

`Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.`

**Assumptions for ANOVA**
1. Normality of sampling distribution of mean: It means the distribution of sample means must follow normal distribution (central limit theory)
2. Absence of outliers: It means that outlying scores need to be removed before performing ANOVA
3. Homogenity of variance: It means that population variance in different level of the each independent variable or factor is the same. [σ1²=σ2²=σ3²]
4. Samples are independent and random

**Examples of violations that could impact the validity of the results**

1. Implicit factors:

A lack of independence within a sample is often caused by the existence of an implicit factor in the data. For example, values collected over time may be serially correlated (here time is the implicit factor). If the data are in a particular order, consider the possibility of dependence. (If the row order of the data reflect the order in which the data were collected, an index plot of the data [data value plotted against row number] can reveal patterns in the plot that could suggest possible time effects.)

2. Lack of independence:

Whether the samples are independent of each other is generally determined by the structure of the experiment from which they arise. Obviously correlated samples, such as a set of observations over time on the same subjects, are not independent, and such data would be more appropriately tested by a one-way blocked ANOVA or a repeated measures ANOVA. 

3. Outliers:

Values may not be identically distributed because of the presence of outliers. Outliers are anomalous values in the data. Outliers tend to increase the estimate of sample variance, thus decreasing the calculated F statistic for the ANOVA and lowering the chance of rejecting the null hypothesis. They may be due to recording errors, which may be correctable, or they may be due to the sample not being entirely from the same population. Apparent outliers may also be due to the values being from the same, but nonnormal, population. The boxplot and normal probability plot (normal Q-Q plot) may suggest the presence of outliers in the data. The F statistic is based on the sample means and the sample variances, each of which is sensitive to outliers. (In other words, neither the sample mean nor the sample variance is resistant to outliers, and thus, neither is the F statistic.) particular, a large outlier can inflate the overall variance, decreasing the F statistic and thus perhaps eliminating a significant difference. A nonparametric test may be a more powerful test in such a situation. 

4. Nonnormality:

The values in a sample may indeed be from the same population, but not from a normal one. Signs of nonnormality are skewness (lack of symmetry) or light-tailedness or heavy-tailedness. The boxplot, histogram, and normal probability plot (normal Q-Q plot), along with the normality test, can provide information on the normality the population distribution. However, if there are only a small number of data points, nonnormality can be hard to detect. If there are a great many data points, the normality test may detect statistically significant but trivial departures from normality that will have no real effect on the F statistic. For data sampled from a normal distribution, normal probability plots should approximate straight lines, and boxplots should be symmetric (median and mean together, in the middle of the box) with no outliers. The one-way ANOVA's F test will not be much affected even if the population distributions are skewed, but the F test can be sensitive to population skewness if the sample sizes are seriously unbalanced. If the sample sizes are not unbalanced, the F test will not be seriously affected by light-tailedness or heavy-tailedness, unless the sample sizes are small (less than 5), or the departure from normality is extreme (kurtosis less than -1 or greater than 2). Robust statistical tests operate well across a wide variety of distributions. test can be robust for validity, meaning that it provides P values close to the true ones in the presence of (slight) departures from its assumptions. It may also be robust for efficiency, meaning that it maintains its statistical power (the probability that a true violation of the null hypothesis will be detected by the test) in the presence of those departures. The one-way ANOVA's F test is robust for validity against nonnormality, but it may not be the most powerful test available for a given nonnormal distribution, although it is the most powerful test available when its test assumptions are met. In the case of nonnormality, a nonparametric test or employing a transformation may result in a more powerful test.

5. Unequal population variances:

The inequality of the population variances can be assessed by examination of the relative size of the sample variances, either informally (including graphically), or by a robust variance test such as Levene's test. (Bartlett's test is even more sensitive to nonnormality than the one-way ANOVA's F test, and thus should not be used for such testing.) The effect of inequality of variances is mitigated when the sample sizes are equal: The F test is fairly robust against inequality of variances if the sample sizes are equal, although the chance increases of incorrectly reporting a significant difference in the means when none exists. This chance of incorrectly rejecting the null hypothesis is greater when the population variances are very different from each other, particularly if there is one sample variance very much larger than the others. The effect of inequality of the variances is most severe when the sample sizes are unequal. If the larger samples are associated with the populations with the larger variances, then the F statistic will tend to be smaller than it should be, reducing the chance that the test will correctly identify a significant difference between the means (i.e., making the test conservative). On the other hand, if the smaller samples are associated with the ulations with the larger variances, then the F statistic will tend to be greater than it should be, increasing the risk of incorrectly reporting a significant difference in the means when none exists. This chance of incorrectly rejecting the null hypothesis in the case of unbalanced sample sizes can be substantial even when the population variances are not very different from each other. Although the effect of unbalanced sample sizes and unequal population variances increases for smaller sample sizes, it does not decrease substantially if the sample sizes are increased without changing the lack of balance in the sample sizes. For this reason, and because equal sample sizes mitigate the effect of unequal population variances, the best course is to keep the sample sizes as equal as possible. If both nonnormality and unequal variances are present, employing a transformation may be preferable. A nonparametric test like the Kruskal-Wallis test still assumes that the population variances are comparable.

`Q2. What are the three types of ANOVA, and in what situations would each be used?`

The 3 main types of ANOVA are:

1. ONE WAY ANOVA : Used when there is one factor with atleast 2 levels and the levels are independent of each other
2. Repeated Measures ANOVA : Used when ther is one factor with atleast 2 levels and the levels are dependent on each other
3. Factorial ANOVA : Used when there are 2 or more factors and each factor has 2 or more levels and the levels may be dependent or independent

`Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?`

Partitioning of variance in ANOVA refers to hypothesis testing and it is as follows:

Null hypothesis (H0) : σ1²=σ2²=σ3²= .......σk² (k = number of levels)
Alternate hypothesis (Ha): Atleas one of the sample mean is not equal

The test statistic in ANOVA is the F test:

F = (Variance between samples) / (Variance within samples)

`Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?`

Explained with example using test scores

![image.png](attachment:f3904666-aa50-4138-aafa-829b52fba5eb.png)

In [7]:
import pandas as pd
df = pd.DataFrame({"Group 1":[85,86,88,75,78,94,98,79,71,80],"Group 2":[91,92,93,85,87,84,82,88,95,96],"Group 3":[79,78,88,94,92,85,83,85,82,81]})
df.shape

(10, 3)

In [16]:
#Step 1: Calculate the group means and the overall mean.
#group mean is:
group_mean = df.mean()
overall_mean = df.unstack().mean()
print("Group mean is:")
print(group_mean,end="\n\n")
#overall mean:
print("Overall mean is",overall_mean)


Group mean is:
Group 1    83.4
Group 2    89.3
Group 3    84.7
dtype: float64

Overall mean is 85.8


*Step 2: Calculate SSR*

Next, we will calculate the regression/residual sum of squares (SSR) using the following formula:

nΣ(Xj – X..)2 

where:

n: the sample size of group j<br>
Σ: a greek symbol that means “sum”<br>
Xj: the mean of group j<br>
X..: the overall mean<br>

In [38]:
n=10
SSR = 0
for i in group_mean:
    a = n*(i - overall_mean)**2
    SSR += a.round(2)
print("The Residual sum of squares (SSR) is:",SSR)

The Residual sum of squares (SSR) is: 192.2


*Step 3: Calculate SSE.*

Next, we will calculate the error sum of squares (SSE) using the following formula:

Σ(Xij – Xj)2 

where:

Σ: a greek symbol that means “sum”<br>
Xij: the ith observation in group j<br>
Xj: the mean of group j<br>

In [37]:
group_1 = 0
for i in df["Group 1"]:
    group_1+=((i-group_mean[0])**2).round(2)
group_2 = 0
for i in df["Group 2"]:
    group_1+=((i-group_mean[1])**2).round(2)
group_3 = 0
for i in df["Group 3"]:
    group_1+=((i-group_mean[2])**2).round(2)
SSE = group_1+group_2+group_3
print("The error of sum of squares(SSE) is:",SSE)

The error of sum of squares(SSE) is: 1100.6


*Step 4: Calculate SST.*

Next, we will calculate the total sum of squares (SST) using the following formula:

SST = SSR + SSE

In [39]:
SST = SSR + SSE
print("The total sum of sqaures (SST) is",SST)

The total sum of sqaures (SST) is 1292.8


`Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?`

Performing a Two-Way ANOVA in Python:
Let us consider an example in which scientists need to know whether plant growth is affected by fertilizers and watering frequency. They planted exactly 30 plants and allowed them to grow for six months under different conditions for fertilizers and watering frequency. After exactly six months, they recorded the heights of each plant centimeters. Performing a Two-Way ANOVA in Python is a step by step process and these are discussed below:

Step 1: Import libraries.

The very first step is to import the libraries installed above. 

In [1]:
# Importing libraries
import numpy as np
import pandas as pd

Step 2: Enter the data.

Let us create a pandas DataFrame that consist of the following three variables:

fertilizers: how frequently each plant was fertilized that is daily or weekly.
watering: how frequently each plant was watered that is daily or weekly.
height: the height of each plant (in inches) after six months.
Example:

In [2]:
# Create a dataframe
df = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
                          'Watering': np.repeat(['daily', 'weekly'], 15),
                          'height': [14, 16, 15, 15, 16, 13, 12, 11, 14, 
                                     15, 16, 16, 17, 18, 14, 13, 14, 14, 
                                     14, 15, 16, 16, 17, 18, 14, 13, 14, 
                                     14, 14, 15]})
df

Unnamed: 0,Fertilizer,Watering,height
0,daily,daily,14
1,daily,daily,16
2,daily,daily,15
3,daily,daily,15
4,daily,daily,16
5,daily,daily,13
6,daily,daily,12
7,daily,daily,11
8,daily,daily,14
9,daily,daily,15


Step 3: Conduct the two-way ANOVA:

To perform the two-way ANOVA, the Statsmodels library provides us with anova_lm() function. The syntax of the function is given below,

Syntax:

sm.stats.anova_lm(model, type=2)

Parameters:

model: It represents model statistics
type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2 or 3 }

In [4]:
# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Performing two-way ANOVA
model = ols(
    'height ~ C(Fertilizer) + C(Watering) +\
    C(Fertilizer):C(Watering)', data=df).fit()
results = sm.stats.anova_lm(model, typ=2)
#results
results

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Fertilizer),9.878966e-13,1.0,3.576867e-13,1.0
C(Watering),0.08630952,1.0,0.03125,0.860956
C(Fertilizer):C(Watering),0.03333333,1.0,0.01206897,0.913305
Residual,77.33333,28.0,,


Interpreting the result:

Following are the p-values for each of the factors in the output:

The fertilizer p-value is equal to 0.913305
The Watering p-value is equal to 0.990865
The Fertilizer * Watering: p-value is equal to 0.904053

The p-values for water and sun and the p-value for the interaction effect is greater than 0.05 which depicts that there is no significant interaction effect between fertilizer frequency and watering frequency and water and fertilizer independently as well.

`Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?`

Sine the p-value is less than 0.05 we can reject the null hypothesis with 95% confidence levels. Which means we can state that there is a significant difference in the means between the groups.

`Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?`

In a repeated measures ANOVA treats each measurement as a separate variable. Because it uses listwise deletion, if one measurement is missing, the entire case gets dropped.

`Multiple imputation`
One of the most effective ways of dealing with missing data is multiple imputation (MI). Using MI, we can create multiple plausible replacements of the missing data, given what we have observed and a statistical model (the imputation model).

In the ANOVA, using MI has the additional benefit that it allows taking covariates into account that are relevant for the missing data but not for the analysis. In this example, x is a direct cause of missing data in y. Therefore, we must take x into account when making inferences about y in the ANOVA.

Imputation is not the only method that can deal with missing data, and other methods like `maximum-likelihood estimation (ML)` have also been recommended (Schafer & Graham, 2002). Using ML, cases contribute to the estimation of the model only to the extent to which they have data, and its results are often equally trustworthy as those under MI.

`Marginal and mixed models` treat each occasion as a different observation of the same variable. So you may lose the measurement with missing data, but not all other responses from the same subject.

**The potential consequences of using different methods to handle missing data are as follows:**

For missing data in the outcome variable y, using ML simply means that the model is estimated using only the cases with observed y (i.e., listwise deletion), which can lead to distorted parameter estimates if other variables are related to the chance of observing y. In order to account for this, ML requires including these extra variables in the analysis model, which changes the meaning of the parameters (i.e., the ANOVA becomes ANCOVA, though the estimates for it would be unbiased!).

`Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.`

An ANOVA is a statistical test that is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups. 

The hypotheses used in an ANOVA are as follows:

The null hypothesis (H0): µ1 = µ2 = µ3 = … = µk  (the means are equal for each group)

The alternative hypothesis: (Ha): at least one of the means is different from the others

If the p-value from the ANOVA is less than the significance level, we can reject the null hypothesis and conclude that we have sufficient evidence to say that at least one of the means of the groups is different from the others.

However, this doesn’t tell us which groups are different from each other. It simply tells us that not all of the group means are equal.

In order to find out exactly which groups are different from each other, we must conduct a post hoc test (also known as a multiple comparison test), which will allow us to explore the difference between multiple group means while also controlling for the family-wise error rate.

`Technical Note:` It’s important to note that we only need to conduct a post hoc test when the p-value for the ANOVA is statistically significant. If the p-value is not statistically significant, this indicates that the means for all of the groups are not different from each other, so there is no need to conduct a post hoc test to find out which groups are different from each other.

Post hoc tests can be divided into 2 types:

1. Equal Variances Assumed

Tukey's honestly significant difference test, Hochberg's GT2, Gabriel, and Scheffé are multiple comparison tests and range tests. Other available range tests are Tukey's b, S-N-K (Student-Newman-Keuls), Duncan, R-E-G-W F (Ryan-Einot-Gabriel-Welsch F test), R-E-G-W Q (Ryan-Einot-Gabriel-Welsch range test), and Waller-Duncan. Available multiple comparison tests are Bonferroni, Tukey's honestly significant difference test, Sidak, Gabriel, Hochberg, Dunnett, Scheffé, and LSD (least significant difference).

- LSD. Uses t tests to perform all pairwise comparisons between group means. No adjustment is made to the error rate for multiple comparisons.
- Bonferroni. Uses t tests to perform pairwise comparisons between group means, but controls overall error rate by setting the error rate for each test to the experimentwise error rate divided by the total number of tests. Hence, the observed significance level is adjusted for the fact that multiple comparisons are being made.
- Sidak. Pairwise multiple comparison test based on a t statistic. Sidak adjusts the significance level for multiple comparisons and provides tighter bounds than Bonferroni.
- Scheffe. Performs simultaneous joint pairwise comparisons for all possible pairwise combinations of means. Uses the F sampling distribution. Can be used to examine all possible linear combinations of group means, not just pairwise comparisons.
- R-E-G-W F. Ryan-Einot-Gabriel-Welsch multiple stepdown procedure based on an F test.
- R-E-G-W Q. Ryan-Einot-Gabriel-Welsch multiple stepdown procedure based on the Studentized range.
- S-N-K. Makes all pairwise comparisons between means using the Studentized range distribution. With equal sample sizes, it also compares pairs of means within homogeneous subsets, using a stepwise procedure. Means are ordered from highest to lowest, and extreme differences are tested first.
- Tukey. Uses the Studentized range statistic to make all of the pairwise comparisons between groups. Sets the experimentwise error rate at the error rate for the collection for all pairwise comparisons.
- Tukey's b. Uses the Studentized range distribution to make pairwise comparisons between groups. The critical value is the average of the corresponding value for the Tukey's honestly significant difference test and the Student-Newman-Keuls.
- Duncan. Makes pairwise comparisons using a stepwise order of comparisons identical to the order used by the Student-Newman-Keuls test, but sets a protection level for the error rate for the collection of tests, rather than an error rate for individual tests. Uses the Studentized range statistic.
- Hochberg's GT2. Multiple comparison and range test that uses the Studentized maximum modulus. Similar to Tukey's honestly significant difference test.
- Gabriel. Pairwise comparison test that used the Studentized maximum modulus and is generally more powerful than Hochberg's GT2 when the cell sizes are unequal. Gabriel's test may become liberal when the cell sizes vary greatly.
- Waller-Duncan. Multiple comparison test based on a t statistic; uses a Bayesian approach.
- Dunnett. Pairwise multiple comparison t test that compares a set of treatments against a single control mean. The last category is the default control category. Alternatively, you can choose the first category. 2-sided tests that the mean at any level (except the control category) of the factor is not equal to that of the control category. < Control tests if the mean at any level of the factor is smaller than that of the control category. > Control tests if the mean at any level of the factor is greater than that of the control category.

2. Equal Variances Not Assumed

Multiple comparison tests that do not assume equal variances are Tamhane's T2, Dunnett's T3, Games-Howell, and Dunnett's C.

- Tamhane's T2. Conservative pairwise comparisons test based on a t test. This test is appropriate when the variances are unequal.
- Dunnett's T3. Pairwise comparison test based on the Studentized maximum modulus. This test is appropriate when the variances are unequal.
- Games-Howell. Pairwise comparison test that is sometimes liberal. This test is appropriate when the variances are unequal.
- Dunnett's C. Pairwise comparison test based on the Studentized range. This test is appropriate when the variances are unequal.

Examples of situations where post hoc tests are used after ANOVA:


A post-hoc analysis can be conducted for proportions and frequencies, but it is mostly used for testing mean differences. The following types of research involve post-hoc analyses.

A. In any discipline, studies investigating differences between groups will use post-hoc tests when the null hypothesis of an ANOVA model is rejected. 

Here is an example.

A researcher wants to investigate differences in the effectiveness of TikTok, Instagram and Facebook influencers in promoting a nutraceutical brand. Let’s say that, by ANOVA, the null hypothesis (that all three influencer types have similar effectiveness) is rejected. A post-hoc pairwise comparison may then reveal that Instagram influencers have a significantly higher effectiveness in promoting the brand than TikTok and Facebook influencers, while the latter two are similar.

B. In medicine, post-hoc analyses may be used in clinical trials if the original hypothesis does not hold (e.g. the primary outcome being the antidiabetic effect of a drug). Triallists then re-examine the dataset for other outcomes (not originally planned, e.g. improvement in renal outcomes in diabetes patients) and perform statistical analysis to determine other valuable results from the trial. 

Note: For most clinical trials, the research questions and statistical tests must be defined before observing the research outcomes, even before the first patient is enrolled. Primary, secondary and exploratory outcome measures should be established beforehand, while post-hoc outcome measures can be specified after the trial has started. This ‘pre-registration’ avoids the practice of outcome switching (reporting something different from what was originally planned). Pre-specified and post-hoc outcome measures must be clearly indicated in the analysis section, in a way that makes it possible to readily distinguish between them.

C. Analyses of pooled data from completed trials comprise a type of post-hoc study as well.

`Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.`

In [9]:
#H0: there is no difference among group means
#Ha: there is atleast 1 group difference among the group means
import numpy as np
import pandas as pd
#for convenience a total of 51 observations is assumed for demonstration
df = pd.DataFrame({"diet":np.repeat(["A","B","C"],17),"weight":np.random.randint(40,70,51)})

In [19]:
# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Performing two-way ANOVA
model = ols(
    'weight ~ C(diet)', data=df).fit()
results = sm.stats.anova_lm(model, typ=1)
#results
results

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(diet),2.0,479.568627,239.784314,3.270333,0.046612
Residual,48.0,3519.411765,73.321078,,


From the one way anova we got the F = 3.270333 and p-value = 0.04


N = 51 , a = 3 , n = 17
dof(between) = a - 1 = 2
dof (within) = N - a = 48
dof (total) = N - 1 = 50
crtical value of F at alpha = 0.05 is 3.2317
Decision rule: If F > 3.2317  then the null hypothesis will be rejected

Inference: Since the F value is higher than the critical limit the null hypothesis is rejected so there is difference in means in atleast one pair of groups

`Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.`

In [26]:
#H0: there is no difference among group means
#Ha: there is atleast 1 group difference among the group means
#import libraries:
import numpy as np
import pandas as pd
# Create a dataframe
prog = np.repeat(["Program A","Program B","Program C"], 10)
np.random.shuffle(prog)
exp = np.repeat(['novice', 'experienced'], 15)
np.random.shuffle(exp)
df = pd.DataFrame({'Program': prog,
                          'Exp_lvl':exp,
                          'time': np.random.randint(20,60,30)})
df

Unnamed: 0,Program,Exp_lvl,time
0,Program C,experienced,47
1,Program A,experienced,46
2,Program B,experienced,58
3,Program B,experienced,25
4,Program C,novice,25
5,Program B,novice,28
6,Program A,novice,53
7,Program A,experienced,49
8,Program C,experienced,53
9,Program B,novice,32


In [27]:
# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Performing two-way ANOVA
model = ols(
    'time ~ C(Exp_lvl) + C(Program) +\
    C(Exp_lvl):C(Program)', data=df).fit()
results = sm.stats.anova_lm(model, typ=2)
#results
results

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Exp_lvl),93.316438,1.0,0.59679,0.447347
C(Program),205.749772,2.0,0.65792,0.52702
C(Exp_lvl):C(Program),593.250228,2.0,1.897018,0.171837
Residual,3752.733333,24.0,,


for exp_lvl the critical value of F = 4.2597 and for program as well as the interaction of experience level with program the critical value of F = 3.4028
Decision rule: If the F statistic is more than the correspond critical limit then null hypothesis is rejected
F values from the test are:
- Exp_lvl = 0.59
- Program = 0.65
- Exp_lvl and Program = 1.89

Inference: Since none of F statistic is higher than its critical limit the test has failed to reject the null hypothesis
Therefore we conclude that the means among all groups are the same

`Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.`

In [33]:
#H0: there is no difference between group means
#Ha: there is a significant difference between the group means (two-tailed)
#import libraries:
import numpy as np
import pandas as pd
# Create a dataframe
groups = np.repeat(["Experimental","Control"], 50)
np.random.shuffle(groups)

df = pd.DataFrame({'groups': groups,'score': np.random.randint(50,100,100)})

In [37]:
x = df[df.groups == "Experimental"]["score"]
y = df[df.groups == "Control"]["score"]

In [38]:
#Using pingouin for 2 sample t-test
import pingouin as pg
pg.ttest(x,y)

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,1.158929,98,two-sided,0.249301,"[-2.27, 8.63]",0.231786,0.383,0.209232


Decision rule: if T value > 1.98 then null hypothesis is rejected<br>
Observed T value = 1.158<br>
Inference: Since observed T value < critical T the test has failed to reject the null hypothesis<br>
Therefore there is no difference in means between the two groups<br>
Please note: Post hoc test are relevent only in cases where there are more than 2 groups i.e in ANOVA not in t-test<br>

`Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post- hoc test to determine which store(s) differ significantly from each other.`

Step 1: Hypothesis and Database creation

In [72]:
#H0: there is no difference among group means
#Ha: there is atleast 1 group difference among the group means
#import libraries:
import numpy as np
import pandas as pd
# Create a dataframe
days =[i for i in np.arange(1,31)] * 3
sales = [i for i in np.random.randint(200,300,30)] + [i for i in np.random.randint(300,400,30)]+[i for i in np.random.randint(150,250,30)]
df = pd.DataFrame({'store': np.repeat(["A","B","C"],30),
                          'days':days,
                          'sales': sales})
df

Unnamed: 0,store,days,sales
0,A,1,210
1,A,2,230
2,A,3,238
3,A,4,295
4,A,5,221
...,...,...,...
85,C,26,209
86,C,27,219
87,C,28,195
88,C,29,164


Step 2: Perform the repeated measures ANOVA.

Next, we will perform the repeated measures ANOVA using the AnovaRM() function from the statsmodels library:

In [75]:
import pingouin as pg

aov = pg.rm_anova(dv='sales', within='days',
                  subject='store', data=df)

In [76]:
#results
aov

Unnamed: 0,Source,ddof1,ddof2,F,p-unc,ng2,eps
0,days,29,58,1.152104,0.31678,0.069923,0.068007


Decision rule: if value of F > 1.5943 then the null value will be rejected <br>
Observed values: F = 1.152 , p = 0.3<br>
inference: since the observed F < critical value the test has failed to reject the null hypothesis<br>
Therefore there is no difference in means among the groups of stores<br>

In [78]:
import warnings
warnings.filterwarnings('ignore')

In [79]:
#if the test had rejected the null hypothesis then the post hoc test for repeated measures will be paired t-test which can be performed as shown below
pg.pairwise_tests(dv='sales', within='days',
                  subject='store', data=df)

Unnamed: 0,Contrast,A,B,Paired,Parametric,T,dof,alternative,p-unc,BF10,hedges
0,days,1,2,True,True,0.449798,2.0,two-sided,0.696906,0.509,0.100401
1,days,1,3,True,True,-3.052632,2.0,two-sided,0.092642,1.751,-0.251554
2,days,1,4,True,True,-2.732056,2.0,two-sided,0.111926,1.554,-0.668705
3,days,1,5,True,True,-1.606034,2.0,two-sided,0.249496,0.923,-0.131264
4,days,1,6,True,True,-0.333282,2.0,two-sided,0.770618,0.49,-0.103498
...,...,...,...,...,...,...,...,...,...,...,...
430,days,27,29,True,True,4.708556,2.0,two-sided,0.042266,2.826,0.383029
431,days,27,30,True,True,0.399893,2.0,two-sided,0.727902,0.5,0.047750
432,days,28,29,True,True,2.497999,2.0,two-sided,0.129781,1.413,0.252976
433,days,28,30,True,True,-0.496816,2.0,two-sided,0.668555,0.517,-0.122171


Since the repeated measures ANOVA failed to reject the null hypothesis the results of the post hoc analysis also reflect the same