# Project Template

A guide for applying machine learning on a dataset.

## Step 1: Prepare Project

1. Load libraries
2. Load dataset

In [2]:
# Load libraries
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import kagglehub

# Load dataset
print("Downloading dataset via KaggleHub...")
path = kagglehub.dataset_download("sevgisarac/temperature-change")
print(f"Dataset path: {path}")

# Find and load the main CSV file
import os
files = os.listdir(path)
csv_file = [f for f in files if f.endswith('.csv') and 'NOFLAG' in f][0]
file_path = os.path.join(path, csv_file)

# Create a DataFrame for analysis
df = pd.read_csv(file_path, encoding='ISO-8859-1')

# Dimensions of the dataset
print(f"Dataset shape (rows, columns): {df.shape}")
print(f"Number of observations: {df.shape[0]}")
print("Each country has 34 rows. (12 months + 4 quarters + 1 whole meteorogical year)")
print("Duplicated for both 'Temperature change' and 'Standard Deviation'")
print(f"Number of variables: {df.shape[1]}")
print(f"Time period covered: 1961-2019 (59 years)")
print(f"\nVariable names {df.shape[1]} in total:")
for i, column in enumerate(df.columns[:]):
    print(f"  {i+1:2d}. {column}")

Downloading dataset via KaggleHub...
Downloading from https://www.kaggle.com/api/v1/datasets/download/sevgisarac/temperature-change?dataset_version_number=4...


100%|██████████| 4.07M/4.07M [00:00<00:00, 6.04MB/s]

Extracting files...





Dataset path: /root/.cache/kagglehub/datasets/sevgisarac/temperature-change/versions/4
Dataset shape (rows, columns): (9656, 66)
Number of observations: 9656
Each country has 34 rows. (12 months + 4 quarters + 1 whole meteorogical year)
Duplicated for both 'Temperature change' and 'Standard Deviation'
Number of variables: 66
Time period covered: 1961-2019 (59 years)

Variable names 66 in total:
   1. Area Code
   2. Area
   3. Months Code
   4. Months
   5. Element Code
   6. Element
   7. Unit
   8. Y1961
   9. Y1962
  10. Y1963
  11. Y1964
  12. Y1965
  13. Y1966
  14. Y1967
  15. Y1968
  16. Y1969
  17. Y1970
  18. Y1971
  19. Y1972
  20. Y1973
  21. Y1974
  22. Y1975
  23. Y1976
  24. Y1977
  25. Y1978
  26. Y1979
  27. Y1980
  28. Y1981
  29. Y1982
  30. Y1983
  31. Y1984
  32. Y1985
  33. Y1986
  34. Y1987
  35. Y1988
  36. Y1989
  37. Y1990
  38. Y1991
  39. Y1992
  40. Y1993
  41. Y1994
  42. Y1995
  43. Y1996
  44. Y1997
  45. Y1998
  46. Y1999
  47. Y2000
  48. Y2001
  49. Y2

## Step 2: Define Problem
What is your task? What are your goals? What do you want to achieve?

TASK: Time series analysis and regression modeling of temperature changes comparing Greece against global averages (1961-2019)

GOALS:
1. Extract and analyze Greece's temperature change data (1961-2019)
2. Compare Greece's warming trend with World average
3. Identify months/seasons with strongest warming in Greece
4. Build regression models to predict future temperature changes
5. Quantify acceleration of warming in recent decades

## STEP 2: Define Problem

### PRIMARY TASK
**Time series forecasting of temperature anomalies**  
Predict future temperature changes for **Greece** and the **World** *(2020–2030)*

---

### SECONDARY TASK
**Comparative analysis of warming trends**  
Greece vs Global temperature anomalies *(1961–2019)*

---

### DATA INTERPRETATION
- Temperature values are **ANOMALIES** from the **1951–1980 baseline**
- **Positive values** → warmer than 1951–1980 average
- **Negative values** → cooler than 1951–1980 average
- **0°C** → same temperature as 1951–1980 average

---

### PRIMARY GOALS (FORECASTING)
1. Build regression models to forecast temperature anomalies to **2030**
2. Predict **Greece's** temperature anomalies for the next decade
3. Predict **global** temperature anomalies for the next decade
4. Estimate when Greece might reach specific **warming thresholds**
5. Provide **uncertainty estimates** (confidence intervals) for predictions

---

### SECONDARY GOALS (COMPARISON)
1. Compare warming rates: **Greece °C/decade vs World °C/decade**
2. Identify whether the **Mediterranean region (Greece)** is warming faster
3. Analyze **seasonal patterns**: which months warm fastest in Greece
4. Quantify **acceleration of warming** since 1990 vs 1961–1990 period

---

### MACHINE LEARNING APPROACH
- **Time Series Regression:** Year → Temperature anomaly
- **Models to test:** Linear, Polynomial, Exponential regression
- **Forecast horizon:** 2020–2030 *(short-term climate projection)*
- **Validation:** Last 10 years *(2009–2019)* as test set
- **Evaluation metrics:** RMSE, MAE, R²

---

### EXPECTED OUTCOMES
- Forecasted temperature anomalies for **Greece (2020–2030)**
- Forecasted temperature anomalies for **World (2020–2030)**
- Comparison: **Is Greece warming faster than the global average?**
- Prediction: When might Greece reach **+2°C above the 1951–1980 baseline?**
- Seasonal insights: **Summer vs winter warming patterns in Greece**


## Step 3: Exploratory Analysis
Understand your data: Take a “peek” of your data, answer basic questions about the dataset.
Summarise your data. Explore descriptive statistics and visualisations.

A peek at the data. The first three rows.

In [3]:
print("First 3 rows showing structure:")
print(df[['Area', 'Months', 'Element', 'Unit', 'Y1961', 'Y1962', 'Y2019']].head(3))
print("\nLast 3 rows showing structure:")
print(df[['Area', 'Months', 'Element', 'Unit', 'Y1961', 'Y1962', 'Y2019']].tail(3))
print()

First 3 rows showing structure:
          Area    Months             Element Unit  Y1961  Y1962  Y2019
0  Afghanistan   January  Temperature change   °C  0.777  0.062  2.951
1  Afghanistan   January  Standard Deviation   °C  1.950  1.950  1.950
2  Afghanistan  February  Temperature change   °C -1.743  2.465  0.086

Last 3 rows showing structure:
      Area               Months             Element Unit  Y1961  Y1962  Y2019
9653  OECD          SepOctNov  Standard Deviation   °C  0.378  0.378  0.378
9654  OECD  Meteorological year  Temperature change   °C  0.165 -0.009  1.297
9655  OECD  Meteorological year  Standard Deviation   °C  0.260  0.260  0.260



## Step 4: Prepare Data
Data Cleaning/Data Wrangling/Collect more data (if necessary).

First 3 rows showing structure:


NameError: name 'df' is not defined

## Step 5: Feature Engineering
Feature selection/feture engineering (as in new features)/data transformations.

## Step 6: Algorithm Selection
Select a set of algorithms to apply, select evaluation metrics, and evaluate/compare algorithms.

## Step 7: Model Training
Apply ensembles and improve performance by hyperparameter optimisation.

## Step 8: Finalise Model
Predictions on validation set, create model from the entire (training) dataset.