This repository contains coursework and project assignments for a Multivariate Analysis course, demonstrating various statistical techniques applied to real-world datasets using R.
The repository includes implementations of multiple multivariate statistical methods including Principal Component Analysis (PCA), Multiple Correspondence Analysis (MCA), Discriminant Analysis, Multidimensional Scaling (MDS), and Clustering techniques. Each analysis is documented in R Markdown files with comprehensive explanations and visualizations.
├── hw1/ # Homework Assignment 1
│ ├── HW1_complete.Rmd # Complete analysis: PCA and MDS on Euroleague data
│ ├── HW1_complete.pdf # Rendered PDF report
│ └── data/ # Euroleague 2023-24 basketball statistics
│
├── hw2/ # Homework Assignment 2
│ ├── HW2_complete.Rmd # Complete analysis: MCA and HCPC on wholesale data
│ ├── HW2_complete.pdf # Rendered PDF report
│ └── data/ # Wholesale customers dataset
│
└── project/ # Final Project
├── eda.Rmd # Exploratory Data Analysis
├── pca.Rmd # Principal Component Analysis
├── mca.Rmd # Multiple Correspondence Analysis
├── da.Rmd # Discriminant Analysis
├── mds.Rmd # Multidimensional Scaling
├── cluster-pca.Rmd # Clustering based on PCA
└── data/ # Student placement dataset
- Dataset: Euroleague 2023-24 player statistics
- Methods:
- Principal Component Analysis (PCA) to identify player performance patterns
- Multidimensional Scaling (MDS) to visualize player similarities
- Key Findings: Identified dimensions related to overall involvement, interior vs. perimeter play, and defensive vs. offensive focus
- Dataset: Wholesale customer spending data across product categories
- Methods:
- Multiple Correspondence Analysis (MCA) for categorical variable analysis
- Hierarchical Clustering on Principal Components (HCPC)
- Key Findings: Identified distinct customer segments based on spending patterns
- Dataset: Campus placement data including academic performance and employability metrics
- Methods:
- Exploratory Data Analysis (EDA)
- Principal Component Analysis (PCA)
- Multiple Correspondence Analysis (MCA)
- Discriminant Analysis (Linear and Quadratic)
- K-means Clustering on PCA results
- Multidimensional Scaling (MDS)
- Objective: Predict student placement outcomes based on academic and demographic factors
- R >= 4.0.0 recommended
# Data manipulation and visualization
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("here")
# Multivariate analysis
install.packages("FactoMineR")
install.packages("factoextra")
# Statistical methods
install.packages("MASS")
install.packages("cluster")
install.packages("biotools")
install.packages("Hotelling")
install.packages("mvnormtest")
# Additional utilities
install.packages("klaR")
install.packages("DescTools")
install.packages("cowplot")
install.packages("gridExtra")
install.packages("plotly")
install.packages("VIM")
# Document generation
install.packages("rmarkdown")
install.packages("knitr")To generate PDF reports from R Markdown files:
# Set working directory to repository root
setwd("/path/to/mva")
# Render homework assignments
rmarkdown::render("hw1/HW1_complete.Rmd")
rmarkdown::render("hw2/HW2_complete.Rmd")
# Render project analyses
rmarkdown::render("project/pca.Rmd")
rmarkdown::render("project/da.Rmd")
# ... etc.Open any .Rmd file in RStudio and:
- Install required packages (see code chunks with
install.packages()comments) - Click "Knit" to generate the PDF report
- Or run code chunks interactively to explore the analyses
All datasets are included in the respective data/ directories:
hw1/data/: Euroleague basketball statisticshw2/data/: Wholesale customer dataproject/data/: Student placement data (raw and cleaned versions)
- Comprehensive Documentation: Each analysis includes detailed explanations, statistical tests, and interpretations
- Reproducible Research: All code is self-contained with relative paths using the
herepackage - Professional Visualizations: High-quality plots using ggplot2, factoextra, and plotly
- Statistical Rigor: Includes assumption checking (normality, homogeneity of variance) and appropriate transformations
- Multiple Perspectives: Each dataset is analyzed using various complementary techniques
- Rebecca Weiss
- Ákos Schneider
- Jonas Grüner
This repository is for educational purposes as part of a university course assignment.