<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Temperature Anomalies

![sky](../Images/temperature.jpg)

*Image modified from Gerd Altmann, Pixabay*

The objective in this notebook is to analyze and predict a dataset of global temperature anomalies from 1850 to 2024 against the 1901-2000 average with monthly frequency. The data is sourced from the NOAA National Centers for Environmental Information.

**Original dataset:** NOAA National Centers for Environmental information: Climate at a Glance: Global Time Series [Data set]. https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series, retrieved on August 23, 2024.

## **Exercise 1: Time Series Basics**

## 1. Load, Prepare and Plot Time Series Data

**Exercise:** Import the dataset of NOAA global temperature anomalies in monthly resolution. The path of the dataset is '../Datasets/NOAA_time_series_monthly.csv'. Print the first rows of the dataset to see it's structure.

**Exercise:** convert 'Date' into a `datetime` object, set the date column as index for easy analysis and and check the structure of the dataset after conversion.

**Exercise:** Print summary statistics of the time series.

**Exercise:** Plot the full time series and a subset of the time series covering only a few years to examine short-term patterns and variations in greater detail. Additionally, create bar plots to display the median temperature anomalies for each month over both the full time series and the time series subset, highlighting typical seasonal effects and recurring patterns.

## **Exercise 2: Exploration of Time Series Features**

## 2. Time Series Components

**Exercise:** Decompose the time series into the components trend, seasonal, and residual using an additive model from the `statsmodels` library. Conduct the decomposition also for the subset of data that you plotted in Exercise 1. Are the patterns as you would expect? Would you expect stationarity?

**Exercise:** Does the decomposition effectively capture the underlying structure of the time series by accurately separating the trend and seasonal components from random fluctuations? Use the `statsmodels` library to check the residuals for stationarity, test for normality, and assess their overall white noise behavior.

## **Exercise 3: Time Series Model ARIMA**

## 3. Stationarity Test and Differencing

**Exercise:** Calculate the `kpss` statistics to test for stationarity.

**Exercise:** The small p-value also means that the global temperature anomalies time series is likely non-stationary. Applying stationary time series models requires differencing the time series first. Conduct differencing the time series with a lag of 1 to remove trends and stabilize the mean of a time series by subtracting each observation from the previous one. Plot the resulting time series.

**Exercise:** Plot the ACF and PACF plots of the differenced time series and calculate the KPSS statistics to check again for stationarity.

## 4. ARIMA

In [None]:
# Set the frequency of the index to avoid a warning when fitting the model
global_temp_diff.index = global_temp_diff.index.to_period('M').to_timestamp() 

**Exercise:** Use the `tsa.arma_order_select_ic` function from the `statsmodels` library to identify the best ARIMA model for the time series. Explore a range of autoregressive (p) and moving average (q) parameters. Analyze statistical metrics to determine the best parameters for the data.

Note: Running the loop to find the best model with the temperature anomalies time series can take a considerable amount of time because of it's length. Therefore choose a rather small range of parameters (e.g. 0 to 3).

**Exercise:** Based on your previous model selection process with the differenced time series, specify the optimal model parameters in the `ARIMA` function to fit the model on the original data with built-in differencing. Print the model summary.

**Exercise:** Analyze whether the ARIMA model fits the data well.

While the lack of significant autocorrelation at most lags and a scattered lag plot are positive signs, the normality of residuals is rejected. Let's see how the prediction would look like...

**Exercise:** Plot the fitted values and the out-sample foreast for 5 years.

## **Exercise 4: Smoothed Time Series and Trend**

## 5. Overall Trend in Temperature Anomalies

Global temperature anomalies can be displayed together with the smoothed time series (based on different filtering techniques) and the linear trend line on the [NOAA website](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series).

**Exercise:** Apply smoothing to the original time series using a 5-year mean, calculate the trend using linear regression (use 'numerical_index' as the time component) and retreive the average warming in °C per decade.

In [None]:
# Convert the datetime index to a numerical format to avoid type conflicts in trend calculation
numerical_index = np.arange(len(global_temp))