### Select a suitable dataset for following test of significance (Hypothesis testing), and interpret output parameters and their significance with respect to the selected data set and Hypothesis.

##### ● One Way ANOVA

##### ● Two Way ANOVA

##### Task to be performed using two different methods.
##### 1. Use inbuilt function aov().
##### 2. Design your own functions to perform the above stated task.

In [1]:
#Importing the datasets pakage
library(datasets)

In [2]:
#Getting the top few rows of the mtcars dataset
head(mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
Mazda RX4,21.0,6,160,110,3.9,2.62,16.46,0,1,4,4
Mazda RX4 Wag,21.0,6,160,110,3.9,2.875,17.02,0,1,4,4
Datsun 710,22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
Hornet 4 Drive,21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
Hornet Sportabout,18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
Valiant,18.1,6,225,105,2.76,3.46,20.22,1,0,3,1


In [3]:
#Printing the structure of data
str(mtcars)

'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...


Considering the columns: 'mpg', 'vs' and 'am', where 'mpg' is the dependent variable and 'vs' and 'am' are the independent variables.

### One-way Anova testing


In [4]:
#Using aov function on columns: mpg and vs
summary(aov(formula = mpg~vs,data = mtcars))

            Df Sum Sq Mean Sq F value   Pr(>F)    
vs           1  496.5   496.5   23.66 3.42e-05 ***
Residuals   30  629.5    21.0                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In [5]:
aov(formula = mpg~vs,data = mtcars)

Call:
   aov(formula = mpg ~ vs, data = mtcars)

Terms:
                      vs Residuals
Sum of Squares  496.5279  629.5193
Deg. of Freedom        1        30

Residual standard error: 4.580827
Estimated effects may be unbalanced

In [6]:
#Writing custom function for one-way anova
one_way_anova <- function(x, y) {
 # x is the first attribute (categorical) , y is the second attribute (numer
 n <- length(y)

 #Getting number of unique classes in the categorical attribute
 k <- length(unique(x))
 classes <- unique(x)

 #Getting mean of data corresponding to each class and
 #calculating the value of sum of squares within classes (WSS) alongside.

 yi <- c()
 ni <- c()
 wss <- 0
 for(i in c(1:k)){
 yi[i] <- mean(y[x==classes[i]])
 ni[i] <- length(y[x==classes[i]])
 wss = wss + sum((y[x==classes[i]]-yi[i])**2)
 }

 #Overall mean of data
 y_overall <- mean(y)

 #Calculating anova parameters
 bss <- sum(ni*(yi-y_overall)**2)
 msb <- bss / (k-1)
 msw <- wss / (n-k)

 #Putting together in the dataframe
 anova <- data.frame(
 "Source of variation" = c("Between Classes","Within Classes","Total"),
 "Degree of Freedom" = c(k-1,n-k,n-1),
 "Sum of Squares" = c(bss, wss, bss+wss),
 "Mean sum of Squares" = c(msb, msw, msb+msw),
 stringsAsFactors = FALSE
 )

 #Calculating and printing f-ratio
 f_ratio <- msb/msw
 print(paste("Calculated F-value = ",f_ratio))
 print(paste("Tabulated F-value with 5% significance level: ",(qf(0.95,k-1 ,n-k))))
 return(anova)
}
one_way_anova(mtcars$vs,mtcars$mpg)

[1] "Calculated F-value =  23.6622410013536"
[1] "Tabulated F-value with 5% significance level:  4.17087678576669"


Source.of.variation,Degree.of.Freedom,Sum.of.Squares,Mean.sum.of.Squares
Between Classes,1,496.5279,496.5279
Within Classes,30,629.5193,20.98398
Total,31,1126.0472,517.51188


### Two-way Anova Testing


In [7]:

# Using built-in aov function
aov(mpg~vs+am,data = mtcars)


Call:
   aov(formula = mpg ~ vs + am, data = mtcars)

Terms:
                      vs       am Residuals
Sum of Squares  496.5279 276.0333  353.4860
Deg. of Freedom        1        1        29

Residual standard error: 3.491299
Estimated effects may be unbalanced

In [8]:
summary(aov(mtcars$mpg~as.factor(mtcars$vs)+as.factor(mtcars$am)))

                     Df Sum Sq Mean Sq F value   Pr(>F)    
as.factor(mtcars$vs)  1  496.5   496.5   40.73 5.61e-07 ***
as.factor(mtcars$am)  1  276.0   276.0   22.65 4.96e-05 ***
Residuals            29  353.5    12.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1