### Sums of Squares (Definition Formulas)

$\bullet$  SUMS OF SQUARES (TWO FACTOR)

$SS_{total}=SS_{column}+SS_{row}+SS_{interaction}+SS_{within}$

$SS_{total}=SS_{between}+SS_{within}$

$SS_{total}=\Sigma(X-\overline{X}_{grand})^2$

$SS_{between}=SS_{column}+SS_{row}+SS_{interaction}$

$SS_{interaction}=SS_{between}-(SS_{column}+SS_{row})$

### $\bullet$ WORD, DEFINITION, AND COMPUTATION FORMULAS FOR SS TERMS (TWO-FACTOR ANOVA)
#### $\bullet$ $\bullet$ For the total sums of squares,

$SS_{total}=$ the sum of squared deviations for raw scores about the grand mean

$=\Sigma(X-\overline{X}_{grand})^2$

$SS_{total}=\Sigma{X}^2-\dfrac{G^2}{N}$, where $G$ is the grand total and $N$ is its sample size

#### $\bullet$ $\bullet$ For the between-cells sum of squares,

$SS_{between}=$ the sum of squared deviations for cell means about the grand mean

$=n\Sigma(\overline{X}_{cell}-\overline{X}_{grand})^2$

$SS_{between}=\Sigma{\dfrac{T^2_{cell}}{n}}-\dfrac{G^2}{N}$, where $T_{cell}$ is the cell total and $n$ is its sample size of each cell

#### $\bullet$ $\bullet$ For the within-cells sum of squares,

$SS_{within}=$ the sum of squared deviations of raw scores about their respective cell means

$=\Sigma(X-\overline{X}_{cell})^2$

$SS_{within}=\Sigma{X}^2-\Sigma{\dfrac{T^2_{cell}}{n}}$, where $T_{cell}$ is the cell total and $n$ is the sample sizeof each cell

#### $\bullet$ $\bullet$ For the between-columns sum of squares,

$SS_{column}=$ the sum of squared column means about the grand mean

$=rn\Sigma(\overline{X}_{column}-\overline{X}_{grand})^2$

$SS_{column}=\Sigma{\dfrac{T^2_{column}}{rn}}-\dfrac{G^2}{N}$, where $T_{column}$ is the column total, $r$ is the number of rows, and $rn$ is the sample size of each column

#### $\bullet$ $\bullet$ For the between-rows sum of squares,

$SS_{row}=$ the sum of squared of raw means about the grand mean

$=cn\Sigma(\overline{X}_{row}-\overline{X}_{grand})^2$

$SS_{row}=\Sigma{\dfrac{T^2_{row}}{cn}}-\dfrac{G^2}{N}$, where $T_{row}$ is the row total, $c$ is the number of columns, and $cn$ is the sample size of each row

#### $\bullet$ $\bullet$ For the interaction sum of squares,

$SS_{interaction}=SS_{between}-(SS_{columns}+SS_{row})$

#### $\bullet$ FORMULAS FOR df TERMS: TWO-FACTOR ANOVA

$df_{total}=N-1$, that is, the number of all scores$-$1

$df_{column}=c-1$, that is, the number of columns$-$1

$df_{row}=r-1$, that is, the number of rows$-$1

$df_{interaction}=(c-1)(r-1)$, that is, the product of $df_{row}$ and $df_{column}$

$df_{within}=N-(c)(r)$, that is, the number of all scores$-$the number of cells

#### PROPORTION OF EXPLAINED VARIANCE (TWO-FACTOR ANOVA)

$\eta^2_p(column)=\dfrac{SS_{column}}{SS_{total}-(SS_{row}+SS_{interaction})}=\dfrac{SS_{column}}{SS_{column}+SS_{within}}$

$\eta^2_p(row)=\dfrac{SS_{row}}{SS_{row}+SS_{within}}$

$\eta^2_p(interaction)=\dfrac{SS_{interaction}}{SS_{interaction}+SS_{within}}$

#### TUKEY’S HSD TEST (TWO-FACTOR ANOVA)

$HSD=q\sqrt{\dfrac{MS_{within}}{n}}$

#### Fse RATIO (SIMPLE EFFECT)

$F_{se}=\dfrac{MS_{se}}{MS_{within}}$

#### SUM OF SQUARES (SIMPLE EFFECT)

$SS_{se}=\Sigma\dfrac{T^2_{se}}{n}-\dfrac{G^2_{se}}{N_{se}}$

### Crowd Size Degree of Danger

\begin{array}{ccc}
\text{DEGREE OF DANGER} & \text{ZERO} & \ & \text{TWO} & \ & \text{FOUR} & \ & \text{Row~Totals} \\
\hline
Dangerous & 8 &  & 8 &  & 10 &  &  \\
 & 8 & 16 & 6 & 14 & 8 & 18 & 48  \\
Non~dangerous & 9 &  & 15 &  &  24 &  \\
 & 11 & 20 & 19 & 34 &  18 & 42 & 96 \\
\end{array}

In [1]:
def two_factor_ANOVA(the_data,alpha,r):
    import statistics,numpy,math

    joined = numpy.concatenate(the_data)
    print(f'joined = {joined}')

    rn = len(the_data[0])
    print(f'rn = {rn}')

    N = len(joined)
    print(f'N = {N}')

    G = sum(joined)
    print(f'G = {G}')

    c = len(the_data)
    print(f'c = {c}')
    
    cn = len(the_data)*2
    print(f'cn = {cn}')
    
    X_sq = [round(num**2,3) for num in joined]
    print(f'X_sq = {X_sq}')

    sum_X_sq = sum(X_sq)
    print(f'sum_X_sq = {sum_X_sq}')

    G_sq_over_N = G**2/N
    print(f'G_sq_over_N = {G_sq_over_N}')
    
    SS_total = round(sum_X_sq - G_sq_over_N,3)
    print(f'SS_total = {SS_total}')
    
    column_totals = [sum(list) for list in the_data]
    # print(f'column_totals = {column_totals}')

    len_cell = len(the_data[0])
    midpoint = int(len_cell/2)
    
    cell = [[list[:midpoint],list[midpoint:len_cell]] for list in the_data]
    # print(f'cell = {cell}')
    # print(len(cell))
    
    # print([list for list in cell])
    # print([list[n] for n in range(midpoint) for list in cell])

    cell_items = [list[n] for n in range(midpoint) for list in cell]
    print(f'cell_items = {cell_items}')

    n = len(cell_items[0])
    print(f'n = {n}')

    sum_cell_items = [sum(item) for item in cell_items]
    print(f'sum_cell_items = {sum_cell_items}')

    T_sq_cell_over_n = round(sum([num**2/n for num in sum_cell_items]),3)
    print(f'T_sq_cell_over_n = {T_sq_cell_over_n}')

    SS_between = round(T_sq_cell_over_n - G_sq_over_N,3)
    print(f'SS_between = {SS_between}')

    SS_within = round(sum_X_sq - T_sq_cell_over_n,3)
    print(f'SS_within = {SS_within}')

    # sum_cell_items = [16, 14, 18, 20, 34, 42]
    midway = int(len(sum_cell_items)/2)
    list1 = sum_cell_items[:midway]
    list2 = sum_cell_items[midway:]
    sum_T_sq_column = round(sum([((list1[i]+list2[i])**2)/rn for i in range(len(list1))]),3)
    print(f'sum_T_sq_column = {sum_T_sq_column}')

    T_sq_row_cn = round((sum(list1)**2)/cn + (sum(list2)**2)/cn,3)
    print(f'T_sq_row_cn = {T_sq_row_cn}')

    SS_column = round(sum_T_sq_column - G_sq_over_N,3)
    print(f'SS_column = {SS_column}')

    SS_row = round(T_sq_row_cn - G_sq_over_N,3)
    print(f'SS_row = {SS_row}')

    SS_interaction = round(SS_between - (SS_column + SS_row),3)
    print(f'SS_interaction = {SS_interaction}')

    df_total = N - 1
    df_column = c - 1
    df_row = r - 1
    df_interaction = (c-1)*(df_row)
    df_within = N - (c*r)
    print(f'df_total = {df_total}, df_column = {df_column}, df_row = {df_row}, df_interaction = {df_interaction}, df_within = {df_within}')

    MS_column = round(SS_column/df_column,3)
    MS_row = round(SS_row/df_row,3)
    MS_interaction = round(SS_interaction/df_interaction,3)
    MS_within = round(SS_within/df_within,3)
    print(f'MS_column = {MS_column}, MS_row = {MS_row}, MS_interaction = {MS_interaction}, MS_within = {MS_within}')
    
    F_column = round(MS_column/MS_within,3)
    F_row = round(MS_row/MS_within,3)
    F_interaction = round(MS_interaction/MS_within,3)
    print(f'F_column = {F_column}, F_row = {F_row}, F_interaction = {F_interaction}')

    eta_sq_column = round(SS_column/(SS_column+SS_within),3)
    eta_sq_row = round(SS_row/(SS_row+SS_within),3)
    eta_sq_interaction = round(SS_interaction/(SS_interaction+SS_within),3)
    print(f'eta_sq_column = {eta_sq_column}, eta_sq_row = {eta_sq_row}, eta_sq_interaction = {eta_sq_interaction}')

    means = [round(statistics.mean(list),3) for list in the_data]
    print(f'means = {means}')
    
    means.sort(reverse=True)
    differences_bet_means = [round(max(means)-means[n],3) for n in range(len(means))]
    print(f'differences_bet_means = {differences_bet_means}')

In [2]:
X_0 = [8,8,9,11]
X_2 = [8,6,15,19]
X_4 = [10,8,24,18]
list = [X_0,X_2,X_4]
row = 2
sig = 0.05
two_factor_ANOVA(list,sig,row)

joined = [ 8  8  9 11  8  6 15 19 10  8 24 18]
rn = 4
N = 12
G = 144
c = 3
cn = 6
X_sq = [64, 64, 81, 121, 64, 36, 225, 361, 100, 64, 576, 324]
sum_X_sq = 2080
G_sq_over_N = 1728.0
SS_total = 352.0
cell_items = [[8, 8], [8, 6], [10, 8], [9, 11], [15, 19], [24, 18]]
n = 2
sum_cell_items = [16, 14, 18, 20, 34, 42]
T_sq_cell_over_n = 2048.0
SS_between = 320.0
SS_within = 32.0
sum_T_sq_column = 1800.0
T_sq_row_cn = 1920.0
SS_column = 72.0
SS_row = 192.0
SS_interaction = 56.0
df_total = 11, df_column = 2, df_row = 1, df_interaction = 2, df_within = 6
MS_column = 36.0, MS_row = 192.0, MS_interaction = 28.0, MS_within = 5.333
F_column = 6.75, F_row = 36.002, F_interaction = 5.25
eta_sq_column = 0.692, eta_sq_row = 0.857, eta_sq_interaction = 0.636
means = [9, 12, 15]
differences_bet_means = [0, 3, 6]


#### Progress Check *18.1 A college dietitian wishes to determine whether students prefer a particular pizza topping (either plain, vegetarian, salami, or everything) and one type of crust (either thick or thin). A total of 160 volunteers are randomly assigned to one of the eight cells in this two-factor experiment. After eating their assigned pizza, the 20 subjects in each cell rate their preference on a scale ranging from 0 (inedible) to 10 (the best). The results, in the form of means for cells, rows, and columns, are as follows:

MEAN PREFERENCE SCORES OR PIZZA AS A FUNCTION OF TOPPING AND CRUST

\begin{array}{ccc}
\text{CRUST} & \text{PLAIN} & \text{VEGETARIAN} & \text{SALAMI} & \text{EVERYTHING} & \text{ROW}\\
\hline
Thick & 7.2 & 5.7 & 4.8 & 6.1 & 6.0 \\
Thin & 8.9 & 4.8 & 8.4 & 1.3 & 5.9  \\
Column & 8.1 & 5.3 & 6.6 & 3.7 &   \\
\end{array}

Construct graphs for each of the three possible effects, and use this information to make preliminary interpretations about pizza preferences. Ordinarily, of course, you would verify these speculations by performing an ANOVA—a task that cannot be performed for these data, since only means are supplied.

In [3]:
plain = [7.2,8.9]
vegetarian = [5.7,4.8]
salami = [4.8,8.4]
everything = [6.1,1.3]
list = [plain,vegetarian,salami,everything]
sig = 0.05
row = 2
two_factor_ANOVA(list,sig,row)

joined = [7.2 8.9 5.7 4.8 4.8 8.4 6.1 1.3]
rn = 2
N = 8
G = 47.2
c = 4
cn = 8
X_sq = [51.84, 79.21, 32.49, 23.04, 23.04, 70.56, 37.21, 1.69]
sum_X_sq = 319.08
G_sq_over_N = 278.48
SS_total = 40.6
cell_items = [[7.2], [5.7], [4.8], [6.1]]
n = 1
sum_cell_items = [7.2, 5.7, 4.8, 6.1]
T_sq_cell_over_n = 144.58
SS_between = -133.9
SS_within = 174.5
sum_T_sq_column = 141.62
T_sq_row_cn = 35.652
SS_column = -136.86
SS_row = -242.828
SS_interaction = 245.788
df_total = 7, df_column = 3, df_row = 1, df_interaction = 3, df_within = 0
MS_column = -45.62, MS_row = -242.828, MS_interaction = 81.929, MS_within = inf
F_column = -0.0, F_row = -0.0, F_interaction = 0.0
eta_sq_column = -3.636, eta_sq_row = 3.554, eta_sq_interaction = 0.585
means = [8.05, 5.25, 6.6, 3.7]
differences_bet_means = [0.0, 1.45, 2.8, 4.35]


  MS_within = round(SS_within/df_within,3)


#### Progress Check *18.3 A school psychologist wishes to determine the effect of TV violence on disruptive behavior of first graders in the classroom. Two first graders are randomly assigned to each of the various combinations of the two factors: the type of violent TV program (either cartoon or real life) and the amount of viewing time (either 0, 1, 2, or 3 hours). The subjects are then observed in a controlled classroom setting and assigned a score, reflecting the total number of disruptive class behaviors displayed during the test period.

AGGRESSION SCORES OF FIRST GRADERS VIEWING TIME (HOURS)

\begin{array}{ccc}
\text{TYPE OF PROGRAM} & \text{0} & \text{1} & \text{2} & \text{3} \\
\hline
Cartoon & 0,1 & 1,0 & 3,5 & 6,9 \\
Real~life & 0,0 & 1,1 & 6,2 & 6,10 \\
\end{array}

(a) Test the various null hypotheses at the .05 level of significance.

(b) Summarize the results with an ANOVA table. Save the ANOVA summary table for use in subsequent questions.

#### Progress Check *18.4 Referring to the ANOVA summary table in your answer to Question 18.3, estimate the effect size for any significant F with $\eta^2_{p}$.

In [4]:
X0 = [0,1,0,0]
X1 = [1,0,1,1]
X2 = [3,5,6,2]
X3 = [6,9,6,10]
tv = [X0,X1,X2,X3]
sig = 0.05
row = 2
two_factor_ANOVA(tv,sig,row)

joined = [ 0  1  0  0  1  0  1  1  3  5  6  2  6  9  6 10]
rn = 4
N = 16
G = 51
c = 4
cn = 8
X_sq = [0, 1, 0, 0, 1, 0, 1, 1, 9, 25, 36, 4, 36, 81, 36, 100]
sum_X_sq = 331
G_sq_over_N = 162.5625
SS_total = 168.438
cell_items = [[0, 1], [1, 0], [3, 5], [6, 9], [0, 0], [1, 1], [6, 2], [6, 10]]
n = 2
sum_cell_items = [1, 1, 8, 15, 0, 2, 8, 16]
T_sq_cell_over_n = 307.5
SS_between = 144.938
SS_within = 23.5
sum_T_sq_column = 306.75
T_sq_row_cn = 162.625
SS_column = 144.188
SS_row = 0.062
SS_interaction = 0.688
df_total = 15, df_column = 3, df_row = 1, df_interaction = 3, df_within = 8
MS_column = 48.063, MS_row = 0.062, MS_interaction = 0.229, MS_within = 2.938
F_column = 16.359, F_row = 0.021, F_interaction = 0.078
eta_sq_column = 0.86, eta_sq_row = 0.003, eta_sq_interaction = 0.028
means = [0.25, 0.75, 4, 7.75]
differences_bet_means = [0.0, 3.75, 7.0, 7.5]


#### Progress Check *18.5 In Question 18.3, the F for the interaction isn’t significant, but F for one of the main effects, Viewing Time, is significant. Using the .05 level, calculate the critical value for Tukey’s HSD ; evaluate the significance of each possible mean difference for Viewing Time; and interpret the results.

In [5]:
def HSD_two_factor_anova(MS_within,df_within,n,k,alpha):
    import statsmodels.stats.libqsturng as qsturng
    q = round(qsturng.qsturng(1 - alpha, k, df_within),3)
    print(f'q = {q}')
    Tukeys_HSD_two_factor_anova = round(q*(MS_within/n)**0.5,3)
    print(f'Tukeys_HSD_two_factor_anova = {Tukeys_HSD_two_factor_anova}')

MS_within = 2.938
df_within = 8
n = 4 #can be row numbers(rn) or column numbers(cn)
k = 4
alpha = 0.05
HSD_two_factor_anova(MS_within,df_within,n,k,alpha)

q = 4.529
Tukeys_HSD_two_factor_anova = 3.881
