Skip to content

akossch0/mva

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multivariate Analysis (MVA)

This repository contains coursework and project assignments for a Multivariate Analysis course, demonstrating various statistical techniques applied to real-world datasets using R.

Overview

The repository includes implementations of multiple multivariate statistical methods including Principal Component Analysis (PCA), Multiple Correspondence Analysis (MCA), Discriminant Analysis, Multidimensional Scaling (MDS), and Clustering techniques. Each analysis is documented in R Markdown files with comprehensive explanations and visualizations.

Repository Structure

├── hw1/                    # Homework Assignment 1
│   ├── HW1_complete.Rmd   # Complete analysis: PCA and MDS on Euroleague data
│   ├── HW1_complete.pdf   # Rendered PDF report
│   └── data/              # Euroleague 2023-24 basketball statistics
│
├── hw2/                    # Homework Assignment 2
│   ├── HW2_complete.Rmd   # Complete analysis: MCA and HCPC on wholesale data
│   ├── HW2_complete.pdf   # Rendered PDF report
│   └── data/              # Wholesale customers dataset
│
└── project/                # Final Project
    ├── eda.Rmd            # Exploratory Data Analysis
    ├── pca.Rmd            # Principal Component Analysis
    ├── mca.Rmd            # Multiple Correspondence Analysis
    ├── da.Rmd             # Discriminant Analysis
    ├── mds.Rmd            # Multidimensional Scaling
    ├── cluster-pca.Rmd    # Clustering based on PCA
    └── data/              # Student placement dataset

Analyses Performed

Homework 1: Euroleague Basketball Statistics

  • Dataset: Euroleague 2023-24 player statistics
  • Methods:
    • Principal Component Analysis (PCA) to identify player performance patterns
    • Multidimensional Scaling (MDS) to visualize player similarities
  • Key Findings: Identified dimensions related to overall involvement, interior vs. perimeter play, and defensive vs. offensive focus

Homework 2: Wholesale Customer Segmentation

  • Dataset: Wholesale customer spending data across product categories
  • Methods:
    • Multiple Correspondence Analysis (MCA) for categorical variable analysis
    • Hierarchical Clustering on Principal Components (HCPC)
  • Key Findings: Identified distinct customer segments based on spending patterns

Project: Student Placement Prediction

  • Dataset: Campus placement data including academic performance and employability metrics
  • Methods:
    • Exploratory Data Analysis (EDA)
    • Principal Component Analysis (PCA)
    • Multiple Correspondence Analysis (MCA)
    • Discriminant Analysis (Linear and Quadratic)
    • K-means Clustering on PCA results
    • Multidimensional Scaling (MDS)
  • Objective: Predict student placement outcomes based on academic and demographic factors

Requirements

R Version

  • R >= 4.0.0 recommended

Required R Packages

# Data manipulation and visualization
install.packages("dplyr")
install.packages("tidyr")
install.packages("ggplot2")
install.packages("here")

# Multivariate analysis
install.packages("FactoMineR")
install.packages("factoextra")

# Statistical methods
install.packages("MASS")
install.packages("cluster")
install.packages("biotools")
install.packages("Hotelling")
install.packages("mvnormtest")

# Additional utilities
install.packages("klaR")
install.packages("DescTools")
install.packages("cowplot")
install.packages("gridExtra")
install.packages("plotly")
install.packages("VIM")

# Document generation
install.packages("rmarkdown")
install.packages("knitr")

Usage

Rendering Reports

To generate PDF reports from R Markdown files:

# Set working directory to repository root
setwd("/path/to/mva")

# Render homework assignments
rmarkdown::render("hw1/HW1_complete.Rmd")
rmarkdown::render("hw2/HW2_complete.Rmd")

# Render project analyses
rmarkdown::render("project/pca.Rmd")
rmarkdown::render("project/da.Rmd")
# ... etc.

Running Individual Analyses

Open any .Rmd file in RStudio and:

  1. Install required packages (see code chunks with install.packages() comments)
  2. Click "Knit" to generate the PDF report
  3. Or run code chunks interactively to explore the analyses

Working with the Data

All datasets are included in the respective data/ directories:

  • hw1/data/: Euroleague basketball statistics
  • hw2/data/: Wholesale customer data
  • project/data/: Student placement data (raw and cleaned versions)

Key Features

  • Comprehensive Documentation: Each analysis includes detailed explanations, statistical tests, and interpretations
  • Reproducible Research: All code is self-contained with relative paths using the here package
  • Professional Visualizations: High-quality plots using ggplot2, factoextra, and plotly
  • Statistical Rigor: Includes assumption checking (normality, homogeneity of variance) and appropriate transformations
  • Multiple Perspectives: Each dataset is analyzed using various complementary techniques

Authors

  • Rebecca Weiss
  • Ákos Schneider
  • Jonas Grüner

License

This repository is for educational purposes as part of a university course assignment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors