<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Time Series Concepts in Python 

Basic Python packages for data handling:

In [None]:
import numpy as np
import pandas as pd

Python packages for specific analysis:

In [None]:
import math
import statistics as stat
import random

Matplotlib - the basic library for plotting:

In [None]:
import matplotlib.pyplot as plt

Python packages specifically for time series data analysis:

In [None]:
from datetime import datetime

## 1. Create time series in Python

In [None]:
my_year = 2020
my_month = 1
my_day = 23
#my_hour = 13
#my_min = 30
#my_sec = 15
my_date = datetime(my_year,my_month,my_day)

In [None]:
my_date

In [None]:
my_date.day

Operations in datetime format:

In [None]:
nowday = datetime.today()
diff = nowday - my_date
diff.days

## 1.2. Create DateTime object using NumPy

In [None]:
np.array(['2020-03-09','2020-03-10','2020-03-11','2020-03-12','2020-03-13','2020-03-14'], dtype='datetime64')
# or:
np.arange('2020-03-09','2020-03-15', dtype='datetime64[D]')

## 1.3. Create DateTime object using Pandas

In [None]:
pd.date_range('2020-03-09', periods=6, freq='D')

Specify format:

In [None]:
pd.to_datetime('2020-09-03', format='%Y-%d-%m')

**Exercise:** Test how the following format could be converted in a DateTime object: 15/10/2020

## 1.4. Create a Pandas time series dataframe

In [None]:
data = np.random.randn(6,2) # create some random data
data

In [None]:
cols = ['A','B']
idx = pd.date_range('2020-03-09', periods=6, freq='D')
df = pd.DataFrame(data, index=idx, columns=cols)
df

In [None]:
df.plot()

**Exercise:** Create a Pandas time series dataframe that contains decadal data.

## 1.5 Interchange date column and index in dataframes

In [None]:
data = np.random.randn(100,2) # create some random data
cols = ['A','B']
idx = pd.date_range('2020-03-09', periods=100, freq='D')
df = pd.DataFrame(data, index=idx, columns=cols)

df['Date'] = df.index
df

## 1.6 Derive DateTime elements

In [None]:
df['Month'] = df['Date'].dt.month
df

In [None]:
df['Month'] = df.index.month
df

**Exercise:** How could we define a new column with the years?

## 2. Basic time series operations

## 2.1. Resampling - changing the frequency of time series data

In [None]:
df.resample('W-MON').mean()

## 2.2 Shifting the data points in a time series forward (or backward) in time

In [None]:
df.shift(1)

## 2.3 Time-based slicing

In [None]:
df['2020-03-10':'2020-03-12']

## 3. Import & analyze - temperature time series
![sky](../Images/temperature.jpg)

*Image modified from Gerd Altmann, Pixabay*

## 3.1. Pandas for csv files

Import data from csv:

In [None]:
path = '../Datasets/NOAA_time_series.csv' 
df = pd.read_csv(path, skiprows=4, delimiter=',')

In [None]:
df.head()

Summary statistics:

In [None]:
df.describe()

Setting the time variable as index:

In [None]:
df.set_index('Year', inplace=True) # Set the 'Date' column as the index

In [None]:
df.head()

Plot:

In [None]:
df.plot()

### More plots ...

### Boxplot:

In [None]:
# Create a new column for decade
df['Decade'] = (df.index // 10) * 10

# Prepare data for boxplot
decade_data = [df[df['Decade'] == decade]['Anomaly'].values for decade in df['Decade'].unique()]

# Create boxplot using matplotlib
plt.figure()
plt.boxplot(decade_data, labels=df['Decade'].unique())
plt.title('Decade Boxplot of Anomalies')
plt.xlabel('Decade')
plt.ylabel('Anomaly')
plt.grid(axis='y')
plt.xticks(rotation=45)
plt.show()

### Bar plot:

In [None]:
# Calculate the average anomaly per decade
average_anomalies = df.groupby('Decade')['Anomaly'].median()

# Create a bar plot using matplotlib
plt.figure()
average_anomalies.plot(kind='bar', color='skyblue')
plt.title('Average Anomalies by Decade')
plt.xlabel('Decade')
plt.ylabel('Average Anomaly')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.show()

## 3.2. xarray for netCDF

Like pandas for netCDF ... common data format for time series data ...

In [None]:
pip install xarray

In [None]:
import xarray as xr

Import a dataset from [ERA5](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels) reanalysis: 

In [None]:
ERA5 = xr.open_dataset('../Datasets/ERA5_snippet.nc')

Investigate:

In [None]:
ERA5

In [None]:
ERA5['tp']

In [None]:
ERA5['t2m']

Mathematical operations:

In [None]:
ERA5['t2m'].values = (ERA5.t2m-273.15).values

In [None]:
ERA5

Resample:

In [None]:
ERA5 = ERA5.resample(time='1D').mean()  # Daily mean

In [None]:
ERA5

Select a specific date:

In [None]:
ERA5['t2m'][0,:,:] # select by index: first entry

In [None]:
ERA5['t2m'].sel(time='2023-07-01') # select by date

Plot with matplotlib:

In [None]:
ERA5['t2m'][0,:,:].plot()

Plotting on a map using Cartopy & Matplotlib:

In [None]:
pip install cartopy

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

In [None]:
plt.figure(figsize=(10, 8))
ax = plt.axes(projection=ccrs.PlateCarree())

# Use the plot method for quick visualization
ERA5['t2m'][0,:,:].plot(ax=ax, cmap='coolwarm', add_colorbar=True)

# Add coastlines and grid lines
ax.coastlines()
ax.gridlines(draw_labels=True)

plt.title('2-Meter Temperature [°C] on 2023-01-01 00:00:00 over Germany')
plt.show()

### Temperature time series for Bremen (single grid cell):

Select data:

In [None]:
# Select a specific location (e.g., longitude = 10.0, latitude = 50.0)
bremen = ERA5.sel(longitude=8.808, latitude=53.075, method='nearest')

In [None]:
bremen

Plot:

In [None]:
plt.figure(figsize=(10, 5))
bremen['t2m'].plot()
plt.title('Daily mean temperature for Bremen in the year 2023')
plt.ylabel('Temperature (°C)')
plt.xlabel('Date')
plt.grid()
plt.show()

**Exercise:** Select and plot the temperature time series for another region of your choice in Germany.