
# FMZ Übung 3: **Hauptkomponentenanalyse und multivariate Datenvisualisierung**

---
### Ziel der Übung  
- multivariate Datensätze grafisch zu erkunden  
- Beziehungen und Strukturen in mehreren Dimensionen zu erkennen  
- die Hauptkomponentenanalyse (PCA) durchzuführen und zu interpretieren  
- PCA auf reale Datensätze anzuwenden (Schweizer Banknoten, Iris, Mtcars, USArrests, Penguins, Human Activity Recognition)

---
### Benötigte Pakete
```R
install.packages(c("MASS", "fmsb", "dplyr", "psych","FactoMineR",
    "factoextra", "corrplot", "mclust", "pracma","aplpack"))
library(MASS)
library(fmsb)
library(dplyr)
library(psych)
library(FactoMineR)
library(factoextra)
library(pracma)
library(corrplot)
library(aplpack)
```
---

## 1️. Multivariate Visualisierung

Wir beginnen mit dem Datensatz **Schweizer Banknoten** (`banknote` aus dem Paket `mclust`).

```R
data(banknote, package="mclust")
banknote_Status <- as.factor(banknote$Status)
banknote <- banknote[, sapply(banknote, is.numeric)]
head(banknote)
summary(banknote)
```


### Aufgaben

---
1. Scatterplot-Matrix
   ```R
   pairs(banknote[,1:6], col=banknote_Status, pch=19,
      main="Multiscatter-Plot der Schweizer Banknoten")
   ```

---
2. Korrelationsanalyse
   ```R
   cor_matrix <- cor(banknote[,1:6])
   round(cor_matrix, 2)
   corrplot(cor_matrix, method="color", addCoef.col="black", tl.col="black",
         number.cex=0.7, title="Korrelationsmatrix – Schweizer Banknoten")
   ```

---
3. Parallel Coordinates Plot
   ```R
   parcoord(banknote[,1:6], col=as.numeric(banknote_Status),
         main="Parallel Coordinates Plot der Banknoten")
   ```

---
4. Radar-Chart
   ```R
   radarchart(as.data.frame(scale(banknote)),plwd = 1, plty = 1, maxmin=F,
           pcol = as.numeric(banknote_Status),#ifelse(banknote_status == "genuine", "#1f77b480", "#d6272880"),
           title = "Radar Chart – Schweizer Banknoten")
   ```

---
5. Andrews Plot
   ```R
   andrewsplot(scale(banknote), f=factor(banknote_Status))
   ```

---
6. Chernoff Faces
   ```R
   faces(banknote[c(1:12,101:113),1:6], main="Chernoff Faces – 25 Banknoten")
   ```






## 2. Hauptkomponentenanalyse (PCA)

```R
pca <- prcomp(banknote)
summary(pca)
```

### Aufgaben

---
1. Scree-Plot & Eigenwerte
   ```R
   fviz_eig(pca, addlabels=TRUE)
   ```

---
2. Biplot
   ```R
   fviz_pca_biplot(pca, habillage=banknote_Status, col.var="steelblue", repel=TRUE,
                title="PCA-Biplot – Schweizer Banknoten")
   ```

---
3. PCA-Scores und Klassentrennung
   ```R
   fviz_pca_ind(pca, geom="point", habillage=banknote_Status, addEllipses=TRUE, ellipse.level=0.95,
             title="PCA-Individuenplot mit Klassenellipse")
   ```

---
4. Beitrag der Variablen
   ```R
   fviz_pca_var(pca, col.var="contrib",
             gradient.cols=c("lightblue", "blue", "darkblue"),
             title="Beitrag der Variablen zu den Hauptkomponenten")
   ```






## 3. Erweiterung: Weitere Datensätze für PCA

---
1. Iris-Datensatz
   ```R
   data(iris)
   head(iris)
   summary(iris)
   iris_pca <- prcomp(iris[,1:4], scale.=TRUE)
   summary(iris_pca)
   fviz_pca_biplot(iris_pca, habillage=iris$Species, repel=TRUE,
             title="PCA – Iris Datensatz")
   fviz_pca_ind(iris_pca, geom="point",
             habillage=iris$Species,
             addEllipses=TRUE, ellipse.level=0.95,
             title="PCA-Individuenplot mit Klassenellipse")
   ```
   
---
2. Mtcars-Datensatz
   ```R
   data(mtcars)
   head(mtcars)
   summary(mtcars)
   mtcars_pca <- prcomp(mtcars, scale.=TRUE)
   summary(mtcars_pca)
   fviz_pca_biplot(mtcars_pca, repel=TRUE, habillage=factor(mtcars$cyl),
             title="PCA – Mtcars-Datensatz")
   fviz_pca_ind(mtcars_pca, geom="point",
             habillage=factor(mtcars$cyl),
             addEllipses=TRUE, ellipse.level=0.95,
             title="PCA-Individuenplot mit Klassenellipse nach Zylinder")
   ```
   
---
3. USArrests-Datensatz
   ```R
   data(USArrests)
   head(USArrests)
   summary(USArrests)
   USArrests_pca <- prcomp(USArrests, scale.=TRUE)
   summary(USArrests_pca)
   group <- cut(USArrests$Murder, breaks = 3, labels = c("Low", "Medium", "High"))
   fviz_pca_biplot(USArrests_pca, repel=TRUE, habillage=group,
            title="PCA – USArrests-Datensatz")
   fviz_pca_ind(USArrests_pca, geom="point",
             habillage=group,
             addEllipses=TRUE, ellipse.level=0.95,
             title="PCA-Individuenplot mit Klassenellipse nach Mordrate")
   ```
   
---
4. Penguins-Datensatz
   ```R
   data(penguins)
   head(penguins)
   summary(penguins)
   penguins <- na.omit(penguins)
   penguins_pca <- prcomp(penguins[,3:6], scale.=TRUE) # numeric values only
   summary(penguins_pca)
   fviz_pca_biplot(penguins_pca, habillage=penguins$species,
             title="PCA – Penguins-Datensatz")
   fviz_pca_ind(penguins_pca, geom="point",
             habillage=penguins$species,        
             addEllipses=TRUE, ellipse.level=0.95,
             title="PCA-Individuenplot mit Klassenellipse")
   ```

---
5. Human Activity Recognition Dataset (see https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones)

   ```R
   temp <- tempfile()
   download.file("https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip", temp)
   unzip(temp, exdir="HAR")
   train <- read.table("HAR/UCI HAR Dataset/train/X_train.txt")
   y_train <- read.table("HAR/UCI HAR Dataset/train/y_train.txt")
   activities <- read.table("HAR/UCI HAR Dataset/activity_labels.txt", col.names = c("ID","Activity"))
   activity_labels <- factor(y_train$V1, labels = activities$Activity)
   train_pca <- prcomp(train, scale.=TRUE)
   #summary(train_pca)
   fviz_pca_ind(train_pca, geom="point",
          habillage=activity_labels,        
          addEllipses=TRUE, ellipse.level=0.95,
          title="PCA – Human Activity Recognition Dataset")
   ```



In [None]:
temp <- tempfile()
download.file("https://archive.ics.uci.edu/ml/machine-learning-databases/00240/UCI%20HAR%20Dataset.zip", temp)
unzip(temp, exdir="HAR")
train <- read.table("HAR/UCI HAR Dataset/train/X_train.txt")
head(train)
summary(train)
y_train <- read.table("HAR/UCI HAR Dataset/train/y_train.txt")
activities <- read.table("HAR/UCI HAR Dataset/activity_labels.txt", col.names = c("ID","Activity"))
activity_labels <- factor(y_train$V1, labels = activities$Activity)
train_pca <- prcomp(train, scale.=TRUE)
summary(train_pca)
fviz_pca_ind(train_pca, geom="point",
       habillage=activity_labels,
       addEllipses=TRUE, ellipse.level=0.95,
       title="PCA – Human Activity Recognition Dataset")
