<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Temperature Anomalies

![sky](../Images/temperature.jpg)

*Image modified from Gerd Altmann, Pixabay*

This notebook analyzes a dataset of global temperature anomalies for months from 1850 to 2024 against the 1901-2000 average. The data serves as a critical resource for assessing long-term climate trends and variations over time. It is sourced from the NOAA National Centers for Environmental Information. The dataset is used to demonstrate outliers and data gaps in time series, to analyze stationarity, autocorrelation, and patterns in time series and to calculate linear trends.


**Original dataset:** NOAA National Centers for Environmental information: Climate at a Glance: Global Time Series [Data set]. https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series, retrieved on August 23, 2024.

In [None]:
pip install statsmodels

In [None]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # Suppress specific warnings

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## **Exercise 1: Time Series Basics**

## 1. Load, Prepare and Plot Time Series Data

**Exercise:** Import the dataset of NOAA global temperature anomalies in **monthly resolution**. The path of the dataset is '../Datasets/NOAA_time_series_monthly.csv'.

**Exercise:** Print the first rows of the dataset to see it's structure.

**Exercise:** convert *Date* into a `datetime` object, set the date column as index for easy analysis and and check the datasets' structure after conversion.

**Exercise:** Print summary statistics of the time series.

**Exercise:** Plot the time series.

**Exercise:** In addition, create a bar plot showing median monthly anomalies.

## **Exercise 2: Exploration of Time Series Features**

## 2. Time Series Decomposition and Evaluation

Investigating the time series features of global temperature anomalies is essential for understanding the Earth's climate dynamics. Analyzing trends allows us to identify long-term warming patterns. Examining seasonal patterns helps differentiate natural climatic variations from anthropogenic influences. Additionally, studying the residuals, or the remaining values after removing trends and seasonality, helps uncover and study any anomalies or outliers.

**Exercise:** Decompose the time series into the components trend, seasonal, and residual using an additive model from the `statsmodels` library. Are the patterns as you expected?

**Exercise:** Does the decomposition effectively captured the underlying structure of the time series, accurately separating the trend and seasonal components from the random fluctuations? Plot the ACF (using the `statsmodels` library) with 48 lags to check for any remaining autocorrelation in the residuals. Eventually test different lag sizes to see the difference in information you gain from the ACF plot. In addition, plot the lag plot from the `pandas` library. 

**Exercise:** Conducting a white noise test like the Ljung-Box test would provide additional confidence that the residuals are uncorrelated and the decomposition effectively captured the time series structure. Calculate the Ljung-Box statistics using the `statsmodels` library.

**Exercise:** Further assess the stationarity of residuals for validating the decomposition process by examining their rolling mean and standard deviation. Use a rolling window size of 12 months. The choice of a 12-month window allows for capturing annual trends and seasonal variations. Plot these statistics alongside the residuals to evaluate their stationarity visually.

**Exercise:** Explore seasonality using the `month_plot` from the `statsmodels` library. This plot can be used to check for any recurring patterns or trends across different months over several years.

## **Exercise 3: Time Series Model ARIMA**

## 3. Investigate the Time Series for Stationarity and Differencing

**Exercise:** Do you expect that the temperature anomaly time series is stationary? Calculate the `kpss` statistics to test it.

**Exercise:** Conduct differencing the time series with a lag of 1 and plot the resulting time series.

**Exercise:** Plot the ACF and PACF plots of the differenced time series and calculate the KPSS statistics to check again for stationarity.

## 4. ARIMA

In [None]:
from statsmodels.tsa.arima.model import ARIMA 

def choose_model(x, max_p, max_q, ctrl=1.03):
    best_aic = np.inf
    best_order = None
    best_mdl = None

    for p in range(max_p + 1):
        for q in range(max_q + 1):
            try:
                if p == 0 and q == 0:
                    continue
                # Use ARIMA model instead of ARMA
                model = ARIMA(x, order=(p, 0, q))
                results = model.fit()
                aic = results.aic
                if aic < best_aic:
                    best_aic = aic
                    best_order = (p, q)
                    best_mdl = results
            except Exception as e:
                print(f"Model fitting failed for order ({p},{q}) with error: {e}")
                continue

    print(f"Best ARIMA model order: {best_order} with AIC: {best_aic}")
    return best_mdl

**Exercise:** Use the 'choose_model' function to identify the best ARIMA model for the time series. Explore a range of autoregressive (p) and moving average (q) parameters. Analyze the output to assess the most suitable ARIMA model parameters for the temperature anomalies.

Note: Running the loop to find the best ARIMA model with the temperature anomalies time series can take a considerable amount of time because of it's length. Therefore choose a rather small range of parameters (e.g. 0 to 3).

**Exercise:** Based on your previous model selection process, specify the optimal ARIMA model parameters and utilize the ARIMA function to fit the model to the differenced temperature anomalies, applying the selected parameters.

**Exercise:** Analyze whether the ARIMA model fits the data well. Ideally, we want to see that the residuals resemble white noise, implying that you have captured all systematic patterns in the data. Analyze the residuals from the ARIMA model using with the following: ACF, Lag plot, Ljung-Box statistics, Q-Q plot and normality test.

**Exercise:** Nevertheless, use the fitted ARMA model to predict temperature anomalies for the period from January 2000 to December 2025 to get a feeling for the workflow.