# Modeling Methods Overview

This notebook summarizes the analytical methods applied after exploratory analysis,
including regression, clustering, and time-series analysis. The goal is not to reproduce
all modeling steps in detail, but to explain why each method was chosen, what insights it
provided, and what its limitations are in the context of the happiness analysis.

Each method is discussed in relation to the type of data used and the specific analytical question it was intended to address.

## 1. Linear Regression

Linear regression was applied to quantify the relationship between happiness scores and selected socio-economic factors, particularly GDP per capita and social support. This method was chosen because exploratory analysis suggested approximately linear relationships between these variables and overall happiness.

The regression analysis allowed for:
- Estimating the strength and direction of relationships between predictors and happiness scores.
- Assessing how much variance in happiness could be explained by individual factors.
- Providing an interpretable baseline model for comparison with other analytical approaches.

However, linear regression assumes linearity, independence, and homoscedasticity, and it does not capture complex or non-linear relationships between variables.

## 2. K-Means Clustering

K-means clustering was applied to explore whether countries could be grouped into distinct well-being profiles based on multiple socio-economic indicators. This method was chosen to identify patterns that may not be apparent through pairwise or linear analysis alone.

Clustering helped to:
- Group countries with similar happiness and socio-economic characteristics.
- Explore potential development or well-being profiles across countries.
- Complement regression analysis by capturing multivariate structure in the data.

However, K-means clustering requires the number of clusters to be specified in advance and assumes that clusters are roughly spherical and similar in size. As a result, the method may oversimplify complex or overlapping country profiles.

## 3. Time-Series Analysis (U.S. Unemployment)

Time-series analysis was conducted using U.S. unemployment data to explore temporal patterns, trends, and forecasting behavior. This analysis was included to complement the cross-sectional happiness analysis and to demonstrate how analytical approaches differ when working with time-dependent data.

The time-series analysis focused on:
- Examining long-term trends and seasonal patterns in unemployment rates.
- Testing for stationarity and applying differencing where necessary.
- Evaluating forecasting techniques to project future unemployment values.

Unlike the global happiness dataset, which represents a single snapshot in time, the unemployment data required methods that explicitly account for temporal dependence. This highlights the importance of selecting analytical techniques that align with the structure of the data and the research question being addressed.