# Non parametric Test

- The non parametric test is a test that does not require the variable and sample to be normally distributed. Most of the time we use parametric tests like the t-test, chi-square test and ANOVA because they are more accurate.
- You use non-parametric tests when you do not have normally distributed data and the sample data is big.

## Wilcoxon Signed Rank Test

- The Wilcoxon signed rank test is used to replace the one-sample t-test.
- For each x<sub>i</sub>, for i = 1,2,.....,n the signed difference is d<sub>i</sub> = x<sub>i</sub> - \mu<sub>0</sub>, where \mu<sub>0</sub> is the given median.
- The null hypothesis is that the population median has the specified value of \mu<sub>0</sub>.
    - Null Hypothesis: H<sub>0</sub> : \mu = \mu<sub>0</sub>
    - Alternate Hypothesis: H<sub>1</sub> : \mu != \mu<sub>0</sub>
    
.....

To use the Wilcoxon signed rank test in R, you can first generate the data using random.org packages, so that the variables are not normally distributed.

```
install.packages('random')
```

In [2]:
library(random)

In [3]:
var1 <- randomNumbers(n=100,min=1,max=1000,col=1)
var2 <- randomNumbers(n=100,min=1,max=1000,col=1)
var3 <- randomNumbers(n=100,min=1,max=1000,col=1)

n is the number of random numbers, min is the minimum value, max is the maximum value and col is the number of columns for all the numbers.
This is the method to generate true random numbers in R. Your data may be different because the data is generated randomly. You can then create the data using

In [5]:
data <- data.frame(var1[,1],var2[,1],var3[,1])

In [8]:
print(head(data))

  var1...1. var2...1. var3...1.
1       460       339       334
2       436       699       357
3       188        44         6
4       564       118       754
5       799       431       299
6       374       153       121


To use Wilcoxon signed rank test, you can use the wilcox.test() function

In [11]:
wilcox.test(data[,1],mu=0,alternatives='two.sided')


	Wilcoxon signed rank test with continuity correction

data:  data[, 1]
V = 5050, p-value < 2.2e-16
alternative hypothesis: true location is not equal to 0


The p-value is < 2.2e<sup>-16</sup>, which is less than 0.05. Hence, you reject the null hypothesis. There are significant differences in the median for the first variable median and the median of 0. The alternate hypothesis is true at the 95% confidence interval.

## R-program to illustrate one-sample Wilcoxon signed rank test

In [12]:
set.seed(1234)

In [13]:
my_data = data.frame(name=paste0(rep('R_',10),1:10),weight=round(rnorm(10,30,2),1))

In [16]:
print(head(my_data))

  name weight
1  R_1   27.6
2  R_2   30.6
3  R_3   32.2
4  R_4   25.3
5  R_5   30.9
6  R_6   31.0


In [17]:
res = wilcox.test(my_data$weight,mu=25)

In [18]:
res


	Wilcoxon signed rank test with continuity correction

data:  my_data$weight
V = 55, p-value = 0.005793
alternative hypothesis: true location is not equal to 25


As the p-value is 0.005793, which is less than 0.05. We can reject the null hypothesis. There are significant differences in the median for the first variable median and the median of 25. The alternate hypothesis is true at the 95% confidence interval.

In [19]:
res1 = wilcox.test(my_data$weight,mu=25,alternative = 'less')

In [20]:
res1


	Wilcoxon signed rank test with continuity correction

data:  my_data$weight
V = 55, p-value = 0.9979
alternative hypothesis: true location is less than 25


As the p-value is 0.9979, which is more than 0.05. We cannot reject the null hypothesis. There is no significant differences in the median for the first variable median and the median of 25.

In [22]:
res2 = wilcox.test(my_data$weight,mu=25,alternative = 'greater')

In [24]:
res2


	Wilcoxon signed rank test with continuity correction

data:  my_data$weight
V = 55, p-value = 0.002897
alternative hypothesis: true location is greater than 25


As the p-value is 0.002897, which is more than 0.05. We can reject the null hypothesis. The median weight is greater than 25.

## Wilcoxon-Mann-Whitney Test

In [27]:
var1 <- randomNumbers(100,1,1000,1)
var1 <- randomNumbers(100,1,1000,1)
var1 <- randomNumbers(100,1,1000,1)

In [28]:
data <- data.frame(var1[,1],var2[,1],var3[,1])

In [31]:
wilcox.test(data[,1],data[,2],correct=FALSE)


	Wilcoxon rank sum test

data:  data[, 1] and data[, 2]
W = 5295.5, p-value = 0.4703
alternative hypothesis: true location shift is not equal to 0


The p-value is 0.4703, which is more than 0.05. Hence we can not reject the null hypothesis. There are no significant differences in the median for first variable median and second variable median. The null hypothesis is true at the 95% confidence interval.

## Kruskal-Wallis Test

The Kruskal-Wallis test is a non parametric test that is an extension of the Mann-Whitney U test for three or more samples. The test requires samples to be identically distributed. Kruskal-Wallis is an alternative to one-way ANOVA. The Kruskal-Wallis test tests the differences between scores of k independent samples of unequal sizes with the i<sup>th</sup> sample containing l<sub>i</sub> rows.

In [32]:
data('airquality')

In [33]:
kruskal.test(airquality$Ozone ~ airquality$Month)


	Kruskal-Wallis rank sum test

data:  airquality$Ozone by airquality$Month
Kruskal-Wallis chi-squared = 29.267, df = 4, p-value = 6.901e-06


## Q1

In [20]:
x = c(12.3,15.4,10.3,8,14.6,15.7,10.8,45,12.3,8.2,20.1,26.3,32.4,41.2,35.1,25,8.2,18.4,32.5)
y = c(rep('A',5),rep('B',7),rep('C',7))

In [21]:
kruskal.test(x~y)


	Kruskal-Wallis rank sum test

data:  x by y
Kruskal-Wallis chi-squared = 5.217, df = 2, p-value = 0.07365


## Q2

In [24]:
x = c(166.7,172.2,165,176.9,166.2,157.3,166.7,161.1,158.6,176.4,153.1,156,162.8,142.4,162.7,162.4)
y = c(rep('0',4),rep('1',4),rep('3',4),rep('9',4))

In [25]:
kruskal.test(x~y)


	Kruskal-Wallis rank sum test

data:  x by y
Kruskal-Wallis chi-squared = 5.5725, df = 3, p-value = 0.1344


# Friedmann test

It is a non-parametric test which is used for three or more samples. It is used when there are two independent samples. It is an alternative of two way anova.
<br>Ho : \muo = \mu1 = \mu2 = ... = \muk
<br>Ha: \mu0 != \muk
<br>
\mu is median

In [44]:
obs = c(45,49,38,48,45,39,43,42,35,41,39,36)
# obs = c(45,48,43,41,49,45,42,39,38,39,35,36)
soyabean_var = c(rep('A',3),rep('B',3),rep('C',3),rep('D',3))
block = c(rep(c('1','2','3'),4))

In [36]:
df = data.frame(matrix(obs,nrow=3,ncol=4,byrow = TRUE),row.names = c(1,2,3))

In [37]:
colnames(df) = c('A','B','C','D')

In [38]:
df

Unnamed: 0_level_0,A,B,C,D
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>
1,45,48,43,41
2,49,45,42,39
3,38,39,35,36


In [46]:
friedman.test(obs~soyabean_var|block)


	Friedman rank sum test

data:  obs and soyabean_var and block
Friedman chi-squared = 7.4, df = 3, p-value = 0.06018


In [50]:
obs = c(5,7,3,4,3,4,5,5,8,6,7,7,9,2,6,8,2,3,4,3,7,9,10,9,6,8,8,6,1,5,2,1,4,1,1,2,10,10,9,10)
students = c(rep('1',4),rep('2',4),rep('3',4),rep('4',4),rep('5',4),rep('6',4),rep('7',4),
            rep('8',4),rep('9',4),rep('10',4))
prof = c(rep(c(1,2,3,4),10))

In [51]:
friedman.test(obs~students|prof)


	Friedman rank sum test

data:  obs and students and prof
Friedman chi-squared = 28.309, df = 9, p-value = 0.0008468


## Q3

In [1]:
reaction_time = c(1.21,1.63,1.42,2.43,1.16,1.94,1.48,1.85,2.06,1.98,1.27,2.44,1.56,2.01,1.7,2.64,1.48,2.81)
lbl = c(rep('A',6),rep('B',6),rep('C',6))

In [3]:
sub = c(rep(c('1','2','3','4','5','6'),3))

In [2]:
kruskal.test(reaction_time~lbl)


	Kruskal-Wallis rank sum test

data:  reaction_time by lbl
Kruskal-Wallis chi-squared = 2.485, df = 2, p-value = 0.2887


In [5]:
friedman.test(reaction_time~lbl|sub)


	Friedman rank sum test

data:  reaction_time and lbl and sub
Friedman chi-squared = 8.3333, df = 2, p-value = 0.0155
