# Factor Analysis on Food Data

In this notebook, we will do a factor analysis (FA) on food texture data set. The variables represent the texture measurements of pastry-type foods. 

The data set consists of 50 rows (observations) and 5 columns (features/variables). The features are:

 - Oil: percentage oil in the pastry
 - Density: the product’s density (the higher the number, the more dense the product)
 - Crispy: a crispiness measurement, on a scale from 7 to 15, with 15 being more crispy.
 - Fracture: the angle, in degrees, through which the pasty can be slowly bent before it fractures (the higher the number, the more it can bend). 
 - Hardness: a sharp point is used to measure the amount of force required before breakage occurs.

In [None]:
food <- read.csv("https://userpage.fu-berlin.de/soga/300/30100_data_sets/food-texture.csv", row.names = "X")
str(food)

In [None]:
food.fa <- factanal(food, factors = 2)

In [None]:
food.fa

Loadings less than 0.1 do not show up in the table. P-value suggests that two factors are enough to capture the full dimensionality of the data. 

Before we interpret the results of the factor analysis, recall the basic idea behind 
it. Factor analysis creates linear combinations to abstract the variables' underlying communality. To the extent that the variables have an underlying communality, fewer factors capture most of the variance in the data set. This allows us to aggregate a large number of observable variables in a model to represent an underlying concept, making it easier to understand the data.

In [None]:
food.fa$uniquenesses


In [None]:
# Communality
1 - food.fa$uniquenesses

Communality above suggests that two factors are adequate to represent the most variance in the data. Let's try to interpret the factors. First, plot the loadings. 

In [None]:
options(repr.plot.width=12, repr.plot.height=12)

plot(food.fa$loadings[,1], 
     food.fa$loadings[,2],
     xlab = "Factor 1", 
     ylab = "Factor 2", 
     ylim = c(-1,1),
     xlim = c(-1,1),
     main = "No rotation")
text(food.fa$loadings[,1]-0.08, 
     food.fa$loadings[,2]+0.08,
      colnames(food),
      col="blue")

abline(h = 0, v = 0)


Taking a look on the figure above, it seems like the Factor 1 accounts for pastry, which is dense and can be bend a lot before it breaks, it also accounts for **less** crispy products (sign of Crispy loading for Factor 1) and less oily. 

Factor 2 seems to account for pastry that is crispy and hard to break apart. Also, the sign of the loading for Fracture is negative, meaning Factor 2 accounts for **less** flexible products. It doesn't seem to represent oil or density

Based on these observations, we could probably call the factors soft pastry (Factor 1) and hard pastry (Factor 2). 

Note that the communality values for oil and hardness are not too high. They are not represented too well by these two factors. 

In [None]:
loadings(food.fa)