# World Happiness Dataset - ETL Pipeline

This ETL (Extract, Transform, Load) pipeline processes the World Happiness data from multiple CSV files spanning the years 2015 to 2019. The goal is to prepare a clean, consistent dataset ready for analysis, modeling, and visualization.

---

## Pipeline Steps

### Step 1. Extract
- Load all five CSV files (`2015.csv`, `2016.csv`, `2017.csv`, `2018.csv`, `2019.csv`) into separate dataframes.
- Each file contains similar but not identical columns, with some variations in column names and structure.

### Step 2. Transform
- Standardize column names across all datasets for consistency. For example, rename columns like `Country or region` to `Country`.
- Handle missing values:
  - Use imputation methods (mean, median, or model-based) or drop columns/rows based on analysis needs.
- Convert data types to appropriate formats (e.g., numeric).
- Merge the datasets into a single combined dataframe with a `Year` column indicating the data year.
- Create additional features such as:
  - Year-over-year changes
  - Regional groupings (e.g., continents)
  - Happiness categories (e.g., Low, Medium, High)
- Conduct quality checks to ensure data integrity after transformations.

### Step 3. Load
- Save the cleaned and transformed dataset into a new CSV file.
---

# World Happiness Dataset - Column Names by Year

### 2015.csv
- Country or region
- Happiness Rank
- Happiness Score
- Standard Error
- Economy (GDP per Capita)
- Family
- Health (Life Expectancy)
- Freedom
- Trust (Government Corruption)
- Generosity
- Dystopia Residual

---

### 2016.csv
- Country
- Happiness Rank
- Happiness Score
- Lower Confidence Interval
- Upper Confidence Interval
- Economy (GDP per Capita)
- Family
- Health (Life Expectancy)
- Freedom
- Trust (Government Corruption)
- Generosity
- Dystopia Residual

---

### 2017.csv
- Country
- Happiness.Rank
- Happiness.Score
- Whisker.high
- Whisker.low
- Economy..GDP.per.Capita.
- Family
- Health..Life.Expectancy.
- Freedom
- Trust..Government.Corruption.
- Generosity
- Dystopia.Residual

---

### 2018.csv
- Country or region
- Overall rank
- Score
- GDP per capita
- Social support
- Healthy life expectancy
- Freedom to make life choices
- Generosity
- Perceptions of corruption

---

### 2019.csv
- Country or region
- Overall rank
- Score
- GDP per capita
- Social support
- Healthy life expectancy
- Freedom to make life choices
- Generosity
- Perceptions of corruption


# World Happiness Dataset - Column Meanings

| Column Name                                   | Meaning / Description                                                                                   |
|----------------------------------------------|-------------------------------------------------------------------------------------------------------|
| **Country or region / Country**               | The name of the country or region being evaluated.                                                    |
| **Happiness Rank / Overall rank**             | The position of the country in the list ranked by happiness score (1 = happiest).                      |
| **Happiness Score / Score**                    | A numerical score representing the overall happiness of the country; higher means happier.            |
| **Standard Error**                             | The margin of error or uncertainty in the happiness score estimate (2015 only).                       |
| **Lower Confidence Interval**                  | The lower bound of the confidence interval for the happiness score, indicating uncertainty (2016 only).|
| **Upper Confidence Interval**                  | The upper bound of the confidence interval for the happiness score (2016 only).                       |
| **Whisker.high / Whisker.low**                 | Similar to confidence intervals, indicating the range of uncertainty in the happiness score (2017 only).|
| **Economy (GDP per Capita) / GDP per capita** | A measure of economic output per person, indicating the wealth level of the country.                   |
| **Family / Social support**                    | The perceived level of social support from family, friends, and community.                            |
| **Health (Life Expectancy) / Healthy life expectancy** | Average expected lifespan or a proxy for health quality in the country.                            |
| **Freedom / Freedom to make life choices**    | The perceived freedom individuals have to make their own life decisions.                             |
| **Trust (Government Corruption) / Perceptions of corruption** | Measure of how much corruption is perceived in government and public institutions.          |
| **Generosity**                                 | A measure of the generosity or charitable behavior in the country’s population.                      |
| **Dystopia Residual**                          | The gap between the predicted happiness score based on the model and actual scores — a residual value representing unexplained factors. |






