-
Notifications
You must be signed in to change notification settings - Fork 3
/
Forecasting with One-way ANOVA in R.Rmd
179 lines (116 loc) · 4.86 KB
/
Forecasting with One-way ANOVA in R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Forecasting with ANOVA in R"
author: "Anita Owens"
output:
html_document:
df_print: paged
toc: true
toc_float:
collapsed: false
smooth_scroll: false
toc_depth: 2
---
## Set up environment
```{r Load packages}
# Install pacman if needed
if (!require("pacman")) install.packages("pacman")
# load packages
pacman::p_load(pacman,
tidyverse, openxlsx, ggthemes)
```
# Scenario 1: If groups are insignificant
Weekly sales (in hundreds)
```{r Import bookstore dataset}
bookstore <- read.xlsx("datasets/OnewayANOVA.xlsx", skipEmptyRows = TRUE, sheet = "insignificant")
head(bookstore)
```
```{r Sales means by location}
colMeans(bookstore, na.rm = TRUE)
```
```{r From wide to long}
books <- bookstore %>%
mutate(week_num = row_number()) %>%
pivot_longer(!week_num, names_to = "location",
values_to = "sales")
books
```
```{r visualize book sales, fig.align='center'}
#set Wall Street Journal theme for all plots
theme_set(theme_wsj())
ggplot(data = books, aes(x=location, y=sales)) + geom_boxplot(na.rm=TRUE) + ggtitle("Book sales by shelf location")
```
```{r Test Assumptions}
# Test normality across groups (Shapiro)
tapply(books$sales, books$location, FUN = shapiro.test)
# Check the homogeneity of variance (Bartlett)
bartlett.test(sales ~ location, data = books)
```
Shapiro-Wilk normality test - all the p-values are very large so we can assume normality.
Bartlett variance test - p-value is very large, we can assume homogeneity of variance
```{r Oneway Test}
# Perform one-way ANOVA
(anova_results <- oneway.test(sales ~ location, data = books, var.equal = TRUE))
#Extract p-value
#If true, means are different. If false, mean sales are identical in all shelf positions.
if(anova_results$p.value < 0.05){
print("Means are different")
} else{
print("Means are not different")
}
```
Null hypothesis: Group means are equal
Alternative hypothesis: Group means are not equal
one-way analysis of means - p-value is very large at 0.5089
INTERPRETATION OF ONE-WAY ANOVA RESULT:
The p-value of the test is greater than the significance level alpha = 0.05. We can cannot conclude that sales are significantly different based on shelf height. (The p-value is higher than 5%, so we fail to reject the null hypothesis that the means across groups are equal). In other words, we accept the null hypothesis and conclude that sales are not significantly different across shelf positions.
## Forecasting for scenario 1: The predicted mean for each group is the overall mean.
-The forecast will be weekly sales of irrespective of shelf location.
```{r Forecast: mean of sales with missing data removed}
mean(books$sales, na.rm = TRUE)
```
We can expect sales of 1,120 books to be sold per week.
# Scenario 2: If one of the groups is significant
```{r import bookstore sales for scenario 2}
bookstore2 <- read.xlsx("~/Documents/GitHub/Forecasting-in-R/Forecasting-in-R/datasets/OnewayANOVA.xlsx", skipEmptyRows = TRUE, sheet = "significant")
head(bookstore2)
```
```{r from wide to long scenario 2}
(books2 <- bookstore2 %>%
mutate(week_num = row_number()) %>%
pivot_longer(!week_num, names_to = "location",
values_to = "sales"))
```
```{r Plot scenario 2, fig.align='center'}
ggplot(data = books2, aes(x=location, y=sales)) + geom_boxplot(na.rm=TRUE)+ ggtitle("Book sales by shelf location", subtitle = "scenario 2")
```
```{r Test assumptions - test 2}
# Test normality across groups
tapply(books2$sales, books2$location, FUN = shapiro.test)
# Check the homogeneity of variance
bartlett.test(sales ~ location, data = books2)
```
```{r One-way Test 2}
# Perform one-way ANOVA
(anova_results2 <- oneway.test(sales ~ location, data = books2, var.equal = TRUE))
anova_results2$p.value < 0.05 #If true, means are different. reject null hypothesis and alternative hypothesis is true. if false, mean sales are identical in all shelf positions.
```
INTERPRETATION OF ONE-WAY ANOVA RESULT:
The p-value is very small 0.003426 so we reject the null hypothesis and conclude that sales are significantly different.
## Forecasting for scenario 2: The predicted mean for each group equals the group mean
```{r Forecast mean of sales scenario 2 with not applicables removed}
(booksales_forecast_signif <- books2 %>%
group_by(location) %>%
summarize(mean_sales = mean(sales, na.rm = TRUE)))
```
We can expect sales of 900 books to be sold per week when located at the front, 1100 per week when located at the middle and 1400 per week when located at the back of the book section.
```{r Oneway ANOVA Significance Function, include=FALSE}
oneway_sigif_function <- function(dataframe, y, x){
result <- oneway.test(y ~ x, data = dataframe, var.equal = TRUE)
if(result$p.value < 0.05){
print("Means are different")
} else{
print("Means are not different")
}
}
oneway_sigif_function(books, books$sales, books$location)
```