## Anova(Analysis of Variance)

In the "Bias Against Associates of the Obese" case study, the researchers were interested 
in whether the weight of a companion of a job applicant would affect judgments of a male 
applicant's qualifications for a job. Two independent variables were investigated: 
(1) whether the companion was obese or of typical weight, (2) whether the companion 
was a girlfriend or just an acquaintance. One approach could have been to conduct two 
separate studies, one with each independent variable. However, it is more efficient to 
conduct one study that includes both independent variables. Moreover, there is a much 
bigger advantage than efficiency for including two variables in the same study: it 
allows a test of the **interaction** between the variables. There is an interaction when 
the effect of one variable differs depending on the level of a second variable. 
For example, it is possible that the effect of having an obese companion would 
differ depending on the relationship to the companion. Perhaps there is more 
prejudice against a person with an obese companion if the companion is a girlfriend 
than if she is just an acquaintance. If so, there would be an interaction between the 
obesity factor and the relationship factor.

There are three effects of interest in this experiment:

**Weight:** Are applicants judged differently depending on the weight of their companion?

**Relationship:** Are applicants judged differently depending on their relationship with their companion?

**Weight x Relationship Interaction:** Does the effect of weight differ depending on the relationship with the companion?

We will apply ANOVA to study these effects. 

In [None]:
# Read the data
weight=read.csv("../../../datasets/weight/weight.csv")

str(weight)
# WEIGHT and RELATE are factors
weight$WEIGHT=as.factor(weight$WEIGHT)
weight$RELATE=as.factor(weight$RELATE)

Descriptions of the variables:

**Weight:** The weight of the woman sitting next to the job applicant; 1 = obese, 2 = average weight.

**Relate:** Type of relationship between the job applicant and the woman seated next to him: 1 = girlfriend, 2 = acquaintance.

**Qualifid:** Larger numbers represent higher professional qualification ratings.

Let's see of the mean qualification score differs with respect to companion's weight. We'll plot the group means.

In [None]:
library(ggplot2)
ggplot(weight,aes(WEIGHT,QUALIFID))+geom_boxplot()

In [None]:
# Also, plot the group means for RELATE 
ggplot(weight,aes(RELATE,QUALIFID))+geom_boxplot()

From the plots above, it seems like WEIGHT has somewhat an effect on perceived qualification; RELATE does not seem to 
have an effect. Let's run ANOVA and see if these hypotheses hold.

In [None]:
# run ANOVA
fit1 <- aov(QUALIFID ~ WEIGHT + RELATE, data=weight)
summary(fit1)

ANOVA results show that WEIGHT has an effect. The p value is 0.009 and therefore the null hypothesis of no main effect of WEIGHT is rejected. The conclusion is that being accompanied by an obese companion lowers judgments of qualifications. The effect of RELATE is not as significant but we can't readily reject the null hypothesis either. The conclusion is that being accompanied by a girlfriend leads to somewhat lower ratings than being accompanied by an acquaintance.

Let's see if there is interaction between these two independent variables. **YOUR TURN: **

In [None]:
# run ANOVA with interaction
fit2 <- aov(QUALIFID ~ <what goes in here>, data=weight)
summary(fit2)

The p value for the interaction is 0.8, which is the probability of getting an interaction as big or bigger than the one obtained in the experiment if there were no interaction in the population. Therefore, these data provide no evidence for an interaction.

Let's apply the same analysis to the kc_house_data that we have seen before. 

In [None]:
hs <- read.csv("../../../datasets/house_sales_in_king_county/kc_house_data.csv",header=TRUE)
head(hs)
str(hs)

In [None]:
# these are factors
hs$bedrooms = as.factor(hs$bedrooms)
hs$floors = as.factor(hs$floors)

Let's see if price is effected by number of bedrooms and number of floors; and also see if there is interaction between the two.
**Your turn:**

In [None]:
fit3 <- aov(price ~ <what goes in here>, data=hs)
summary(fit3)

As we can see, both variables have a significant effect and there is interaction between them (we would expect that). 

Let's see how we can apply MANOVA to this data set. We'd like to see if there is a relation between price and location. We will use the price as the independent variable and see if it has an effect on the location (usually the other way around makes more sense). Here we use "lat" and "long" as the coordinates of the location; so we need to bind them in order to apply MANOVA. 

In [None]:
fit4 <- aov(cbind(<what goes in here>) ~ price, data = hs)
summary(fit4)

The price has more effect on lattitude than longitude (we know why from a previous practice); the p value for lattitude is almost zero whereas the p value for longitude is 0.0015 which suggests good significance. 