## Exercise 4. Hemoglobin in trout

Hemoglobin is measured (g/100 ml.) in the blood of brown trout after 35 days of treatment with four rates of sulfamerazine: the daily rates of 0, 5, 10 and 15 g of sulfamerazine per 100 pounds of fish, denoted as rates 1, 2, 3 and 4, respectively. (Beware that the levels of the factor rate are coded by numbers.) Two methods (denoted as A and B) of administering the sulfamerazine were used. The data is collected in data set hemoglobin.txt

In [1]:
fish = read.table("hemoglobin.txt",header=TRUE)

### 4a)  Present an R-code for the randomization process to distribute 80 fishes over all combinations of levels of factors rate and method.

In [2]:
head(fish)

hemoglobin,rate,method
6.7,1,A
7.0,1,B
7.8,1,A
7.8,1,B
5.5,1,A
6.8,1,B


In [4]:
hemoglobin=fish$hemoglobin; rate=fish$rate; method=fish$method; 
rbind(rep(method), rep(rate), sample(hemoglobin))

0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20
1.0,2.0,1.0,2.0,1,2.0,1,2.0,1.0,2.0,...,1.0,2.0,1.0,2.0,1.0,2.0,1.0,2.0,1.0,2.0
1.0,1.0,1.0,1.0,1,1.0,1,1.0,1.0,1.0,...,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
10.2,9.3,8.1,11.3,9,6.8,7,7.8,6.7,6.1,...,8.6,10.2,9.3,10.6,7.1,9.3,9.3,7.2,10.7,5.5


### 4b) Perform the two-way ANOVA to test for effects of factors rate, method and their interaction on the response variable hemoglobin. Comment on your findings.

In [None]:
par(mfrow=c(1,2))
attach(fish)
boxplot(hemoglobin~rate); boxplot(hemoglobin~method)

In [None]:
par(mfrow=c(2,1))
interaction.plot(method, rate, hemoglobin); interaction.plot(rate, method, hemoglobin)

Since rate is defined as a numerical variable it has to be converted to a factor variable

In [None]:
fish$rate = factor(fish$rate, 
                    levels = c(1,2,3,4),
                    labels = c('Rate1', 'Rate2', 'Rate3', 'Rate4'))

We assume that factors rate and method are independent for the first test

In [None]:
res.aov_dep = aov(hemoglobin ~ method * rate, data = fish)
summary(res.aov_dep)

From this test, we can conclude that the interaction of the method on the hemoglobin is not significant, but that the rate is significant on the amount of hemoglobin in the trout. This is based on the significance level α=0.05. Since the null hypothesis can be rejected based on this test, there is no need to test for the effects under the additive model.

In [None]:
par(mfrow=c(1,2)); qqnorm(residuals(res.aov_dep)); qqline(residuals(res.aov_dep), col = 'red')
plot(fitted(res.aov_dep), residuals(res.aov_dep)); 

The normality of the left plot is clear, while the right plot seems to have a larger concentration and spread when the fitted values get above 9.

### 4c) Which of the two factors has the greatest influence? Is this a good question? Consider the additive model. Which combination of rate and method yield the highest hemoglobin? Estimate the mean hemoglobin value for rate 3 by using method A. What rate leads to the highest mean hemoglobin?

In [None]:
res.aov_ind = aov(hemoglobin ~ method + rate, data = fish)
summary(res.aov_ind)

To first address which of the factors has the greatest influence, since the interaction of rate on the hemoglobin values is the only factor that has a significant interaction with the hemoglobin we can conclude that the rate has the greatest influence.

This is also the case in the additive model seen above, where the method doesn't have a significant influence on the amount of hemoglobin.

#### good question??

In [None]:
fish[which.max(fish$hemoglobin),]

In [None]:
rate3_method_a = fish[fish$rate == 'Rate3' & fish$method == 'A', ]; mean(rate3_method_a$hemoglobin)

In [None]:
mean(fish[fish$rate == 'Rate1', ]$hemoglobin); 
mean(fish[fish$rate == 'Rate2', ]$hemoglobin); 
mean(fish[fish$rate == 'Rate3', ]$hemoglobin); 
mean(fish[fish$rate == 'Rate4', ]$hemoglobin)

By using Rate 2 and method A, the highest hemoglobin count can be achieved at 11.9. The mean hemoglobin value for rate 3 using method A is 9.03 and the highest mean hemoglobin can be achieved using Rate 2, at a value of 9.735.

### 4d) Test the null hypothesis that the hemoglobin is the same for all rates by a one-way ANOVA test, ignoring the variable method. Is it right/wrong or useful/not useful to perform this test on this dataset?

In [None]:
fish_no_method = subset(fish, select=c(hemoglobin, rate))

In [None]:
fish_no_methodaov=lm(hemoglobin~rate, data=fish_no_method);
anova(fish_no_methodaov)

Since the p-value is below 0.05, we can reject the null hypothesis, stating that hemoglobin is the same for all rates.

In [None]:
par(mfrow=c(1,2)); qqnorm(residuals(fish_no_methodaov)); qqline(residuals(fish_no_methodaov), col = 'red')
plot(fitted(fish_no_methodaov), residuals(fish_no_methodaov)); 

The qqplot shows normality, however, the value seen in the right plot has a heavy skew towards the right, implying non-normality

While the influence of method on hemoglobin was deemed insignificant, the little effect it might have had is now added onto the rates, this makes it somewhat of a wrong way to go about testing significance. However, it can be useful to have a more isolated view of the influence of the rates.