<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Time Series Concepts in Python 

This notebook explores foundational time series programming concepts in Python.

In [None]:
import numpy as np
import pandas as pd
import math
import statistics as stat
import random
from datetime import datetime
import xarray as xr
import py7zr
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

## 1. Create time series in Python

## 1.1 Python `datetime`

The datetime format in Python refers to representing and displaying dates and times. In Python's built-in `datetime` module, the `strftime` method formats `datetime` objects into readable strings based on specified format codes, e.g. `%Y`: Year with century (e.g., 2023); `%m`: Month as a zero-padded decimal (01 to 12); `%d`: Day of the month as a zero-padded decimal (01 to 31); `%H`: Hour (24-hour clock) as a zero-padded decimal (00 to 23);...

In [None]:
my_year = 2020
my_month = 1
my_day = 23
# my_hour = ...

my_date = datetime(my_year,my_month,my_day)
print(my_date)

In [None]:
print(my_date.day)

Operations in datetime format:

In [None]:
nowday = datetime.today()
diff = nowday - my_date
print(diff.days)

## 1.2. `datetime64` and `timedelta64` in NumPy

In [None]:
np.array(['2020-03-09','2020-03-10','2020-03-11','2020-03-12','2020-03-13','2020-03-14'], dtype='datetime64')
# or:
np.arange('2020-03-09','2020-03-15', dtype='datetime64[D]')

In [None]:
# Create a timedelta of 5 days
np.timedelta64(5, 'D')

## 1.3. `datetime64` and `timedelta64` in Pandas

Pandas has a `DatetimeIndex` for time series data, which allows for easy indexing and slicing of dates:

In [None]:
pd.date_range('2020-03-09', periods=6, freq='D')

Pandas utilizes `Timestamp` objects to represent specific points in time (built on top of NumPy's `datetime64` data type).  

Convert a string representation of a date into a Pandas `Timestamp` object:

In [None]:
pd.to_datetime('2020-09-03', format='%Y-%m-%d')

**Exercise:** How can the following format be converted into a DateTime object: 15/10/2020

In [None]:
pd.to_datetime('15/10/2020', format='%d/%m/%Y')

## 1.4. Create a Pandas time series dataframe

In [None]:
data = np.random.randn(6,2) # create a 2-dimensional array of random numbers
cols = ['A','B']
idx = pd.date_range('2020-03-09', periods=6, freq='D')
df = pd.DataFrame(data, index=idx, columns=cols)
print(df)

In [None]:
df.plot()

**Exercise:** Create a Pandas time series dataframe that contains decadal data and plot the time series. Refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for a full list of frequency strings.

In [None]:
data = np.random.randn(100,2)
cols = ['A','B']
idx = pd.date_range('2020-03-09', periods=100, freq='YE')
df2 = pd.DataFrame(data, index=idx, columns=cols)
print(df2)
df2.plot()

## 1.5 Interchange date column and index in dataframes

Interchanging the date column and the index in DataFrames can make data easier to analyze and visualize.

In [None]:
df['Date'] = df.index
print(df)

## 1.6 Derive DateTime elements

In [None]:
df['Month'] = df['Date'].dt.month
print(df)

In [None]:
df['Month'] = df.index.month
print(df)

**Exercise:** How could we define a new column with the years?

In [None]:
df['Year'] = df['Date'].dt.year
print(df)

## 2. Basic time series operations

## 2.1. Resampling - changing the frequency of time series data

In [None]:
df.resample('W-MON').mean()

## 2.2 Shifting the data points in a time series forward (or backward) in time

In [None]:
df.shift(1)

## 2.3 Time-based slicing

In [None]:
df['2020-03-10':'2020-03-12']
# or
df[(df['Date'] >= '2020-03-10') & (df['Date'] <= '2020-03-12')]

## 3. Import & analyze - temperature time series
![sky](../Images/temperature.jpg)

*Image modified from Gerd Altmann, Pixabay*

**Original datasets:**

NOAA National Centers for Environmental information: Climate at a Glance: Global Time Series [Data set]. https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series, retrieved on August 23, 2024.

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2023): ERA5 hourly data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.adbb2d47,  retrieved on August 27, 2024.

## 3.1. Pandas for csv files

Import data from csv:

In [None]:
path = '../Datasets/NOAA_time_series.csv' 
df = pd.read_csv(path, skiprows=4, delimiter=',')
print(df.head())
print(df.describe())

Setting the time variable as index:

In [None]:
df.set_index('Year', inplace=True)
print(df.head())

Plot:

In [None]:
df.plot()

More plots ...

Boxplot:

In [None]:
# Create a new column for decade
df['Decade'] = (df.index // 10) * 10

# Prepare data for boxplot
decade_data = [df[df['Decade'] == decade]['Anomaly'].values for decade in df['Decade'].unique()]

# Create boxplot using matplotlib
plt.figure()
plt.boxplot(decade_data, labels=df['Decade'].unique())
plt.title('Decade Boxplot of Anomalies')
plt.xlabel('Decade')
plt.ylabel('Anomaly')
plt.grid(axis='y')
plt.xticks(rotation=45)
plt.show()

Bar plot:

In [None]:
# Calculate the average anomaly per decade
average_anomalies = df.groupby('Decade')['Anomaly'].median()

# Create a bar plot using matplotlib
plt.figure()
average_anomalies.plot(kind='bar', color='skyblue')
plt.title('Average Anomalies by Decade')
plt.xlabel('Decade')
plt.ylabel('Average Anomaly')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.show()

## 3.2. xarray for netCDF

Like pandas for netCDF, it is a powerful library for handling and analyzing multi-dimensional arrays, commonly used for time series data.

Load a dataset from ERA5 reanalysis: 

If not unpacked already, the .7z file must be unpacked (use py7zr library).

In [None]:
with py7zr.SevenZipFile('../Datasets/ERA5_snippet.7z', mode='r', password='secret') as archive:
    archive.extractall(path='../Datasets/')

In [None]:
ERA5 = xr.open_dataset('../Datasets/ERA5_snippet.nc')

Investigate:

In [None]:
print(ERA5)

In [None]:
print(ERA5['tp'])

In [None]:
print(ERA5['t2m'])

Mathematical operations:

In [None]:
ERA5['t2m'] = ERA5['t2m'] - 273.15 # convert K to °C
print(ERA5['t2m'])

Resample:

In [None]:
ERA5 = ERA5.resample(time='1D').mean()  # Daily mean
print(ERA5)

Select a specific date:

In [None]:
print(ERA5['t2m'][0,:,:]) # select by index
# or 
print(ERA5['t2m'].sel(time='2023-01-01')) # select by time component

Plot with `matplotlib`:

In [None]:
ERA5['t2m'][0,:,:].plot(cmap='viridis')

Plot with `cartopy` & `matplotlib`:

In [None]:
plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())

# Use the plot method for quick visualization
ERA5['t2m'][0,:,:].plot(ax=ax, cmap='viridis', add_colorbar=True)

# Add coastlines and grid lines
ax.coastlines()
ax.gridlines(draw_labels=True)

plt.title('2-Meter Temperature [°C] on 2023-01-01 00:00:00 over Germany')
plt.show()

### Extracting 1D time series from a single grid cell:

Select data for Bremen:

In [None]:
bremen = ERA5.sel(longitude=8.808, latitude=53.075, method='nearest')
print(bremen)

Plot:

In [None]:
bremen['t2m'].plot()
plt.title('Daily mean temperature for Bremen in the year 2023')
plt.ylabel('Temperature (°C)')
plt.xlabel('Date')
plt.grid()
plt.show()

**Exercise:** Select and plot the temperature time series for another region of your choice in Germany.

In [None]:
berlin = ERA5.sel(longitude=13.405, latitude=52.520, method='nearest')
print(berlin)

In [None]:
berlin['t2m'].plot()
plt.title('Daily mean temperature for Bremen in the year 2023')
plt.ylabel('Temperature (°C)')
plt.xlabel('Date')
plt.grid()
plt.show()