# Automating Frequency and Pattern Analysis: FFT and Wavelets and OOP

At first glance, frequency analysis and object-oriented programming might not seem closely related – and that’s partially true. However, we will combine these approaches by integrating automated frequency analysis, such as Fourier Transformations and Wavelet Analysis, into a class structure, allowing us to streamline calculations for our datasets.

Let’s start by loading the SST data, as we did before.

Run the next cells.


In [None]:
# import necessary libraries

import requests

import xarray as xr
import numpy as np
from scipy.signal import detrend
import pycwt as wavelet
from scipy.stats import chi2

# Visualization
from matplotlib import pyplot as plt

import holoviews as hv

import hvplot.xarray
hv.extension('bokeh')






import warnings
warnings.filterwarnings("ignore")

#### Load SST (again) 

The next cell loads the data via OpenDAP if not stored locally.
Note: If multiple users attempt to access the data simultaneously, it may cause performance issues or delays due to the limitations of the OpenDAP server. To avoid this, consider downloading the dataset locally if possible and not already done.

In [None]:
#run the cell if data are not stored locally
#load SST data

url = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.ersst.v5/sst.mnmean.nc'
ds_sst = xr.open_dataset(url)
ds_sst

I stored data in [..Data/SST](../Data/SST/).  Make sure to check your path and filename accordingly.

In [None]:
## load SST data from local storage
filename = '../Data/SST/sst.mnmean.nc' 
ds_sst = xr.open_dataset(filename)
ds_sst

#### Load Wind

We also load wind data; these are CEMS wind data that we've previously worked with. However, this time they start from the year 2000 and are gridded on a 1° x 1° grid.

**Question**: Do you remember how data can be resampled?



The wind data can be downloaded directly from the cloud: [Wind Data](https://cloud.hcu-hamburg.de/nextcloud/s/GFzRHQaZPKxmZjY), or alternatively, by running the next cell.


In [None]:
## Define the cloud link
url = 'https://cloud.hcu-hamburg.de/nextcloud/s/GFzRHQaZPKxmZjY/download' 

## Download the file
filename = '../Data/Wind/cmems_wind_coarse.nc'
response = requests.get(url)

## Save the file locally
with open(filename, 'wb') as file:
    file.write(response.content)

## Now open the file using xarray
ds_wind = xr.open_dataset(filename, use_cftime=True)
ds_wind

# if its already stored locally you only need this:
#ds_wind = xr.open_dataset('../Data/Wind/cmems_wind_coarse.nc',use_cftime=True) # change path if necessary
#ds_wind


**1. Exercise:** We want to slice the SST data to match the time period of the wind data. Please perform this operation in the next cell. You can also load the data into memory using `load()`.

In [None]:
# your code here
ds_sst_24a = 


We need to calculate the detrended anomalies again. Do you remember how?

**2. Exercise**:  
Calculate the detrended anomalies of the sliced SST data. We already covered this in Session 2 and you have done this in your homeworks. If you're unsure how to proceed, revisit the notebooks and copy the relevant code snippets and make the necessary modifications.

In [None]:
# your code here 
# (copy/paste from session 2 :-)

sst_clim = ...
sst_anom = ...
sst_anom_detrended = ...

Optional: To verify that your calculations are correct, you can for example plot the data.

In [None]:
#Optional: your plot here. You can maybe plot the std of the anomalies


In the following we select a region in the eastern central Pacific for further analysis. This region is close to the NINO3.4 region, but more to the South-East. This adjustment allows us to compare the results with wind and Ekman parameters later, which isn't feasible directly on the equator.



We will slice the data again for latitude and longitude, and then calculate the spatial mean to get a time series for the region.

In [None]:
#run the cell
lat_start_sst, lat_end_sst = -5, -20
lon_start_sst, lon_end_sst = 240, 280  # for 0-360 120°W to 80°W
# calculate box-average 
latitudes = sst_anom_detrended['lat']
weights = np.cos(np.deg2rad(latitudes)).where(~sst_anom_detrended[0].isnull()).fillna(0)
sst_box = (sst_anom_detrended).sel(lon=slice(lon_start_sst, lon_end_sst), 
                                   lat=slice(lat_start_sst, lat_end_sst))
sst_box_mean = (sst_box.weighted(weights)).mean(dim=['lon', 'lat'])


## Fast Fourier Transform (FFT)


The Fast Fourier Transform (FFT) is a tool for identifying dominant frequencies/periods within a signal -> here the SST anomalies. By transforming these time series into the frequency domain, we can uncover the main cycles that characterize variability in the anomaly data.

The power spectrum, obtained by squaring the amplitude of the FFT, reveals the strength of the various frequencies present in the time series. Higher values in the power spectrum indicate a stronger presence of the corresponding frequency in the original time series. By setting a significance threshold—calculated as the mean plus two standard deviations of the power spectrum—we can highlight the most prominent frequencies, filtering them from background noise.

Execute the following two code cells to begin the analysis.

In [None]:
# run the cell
# Compute the Fast Fourier Transform (FFT) of the sst_box_mean

fft_sst = np.fft.fft(sst_box_mean)

n = len(sst_box_mean) # numer of data points
# compute corresponding frequency values for the FFT output
freq = np.fft.fftfreq(n, d=1)  # with the sampling intervall,  d=1 assumes monthly data points
periods = 1 / freq  # Periods will be in months

# Compute the Power Spectrum (amplitude squared)
power_sst = np.abs(fft_sst)**2

# Calculate a simple significance threshold, here set at mean + 2 * standard deviation
threshold_sst = np.mean(power_sst) + 2*np.std(power_sst)



# Plot the Power Spectrum with Periods on a logarithmic scale
valid_periods = periods[:n // 2] > 0  # Exclude zero and negative frequencies (FFT creates symmetric output)

plt.figure(figsize=(14, 6))

# SST Plot
plt.subplot(1, 1, 1)

plt.semilogy(periods[:n // 2][valid_periods], power_sst[:n // 2][valid_periods])  # Plot only positive frequencies and their power
plt.axhline(y=threshold_sst, color='r', linestyle='--', label='Significance Threshold')
plt.title('FFT Power Spectrum of SST Anomalies')
plt.xlabel('Period (months)')
plt.ylabel('Power (log scale)')
plt.grid(True)  # Adding grid lines
plt.legend()


plt.tight_layout()
plt.show()


This code transforms the time series into the frequency domain using FFT.  
It identifies the dominant periods with high power, which are highlighted by comparing them to a significance thresholdand the plot shows the dominant frequencies (but we plot the periods) in the SST anomaly data.  

It can be challenging to visually identify the dominant cycles directly from the plot. To make this easier, we will print the dominant periods in months. Run the next cell to see the results.

In [None]:
# run the cell
# Find the indices of significant power values
valid_periods = periods[:n // 2] > 0  # Exclude the infinite period at zero frequency
significant_sst_indices = np.where(power_sst[:n // 2][valid_periods] > threshold_sst)[0]


# Extract the corresponding significant periods
significant_sst_periods = periods[:n // 2][valid_periods][significant_sst_indices]


# Filter out infinite values from significant_sst_periods
significant_sst_periods = significant_sst_periods[np.isfinite(significant_sst_periods)]

# Print the significant periods for both time series
print("Significant SST Periods (months):", significant_sst_periods)


We identified dominant periods in our data: approximately 144 months (12 years), 96 months, 48 months, 41 months, 28 and 18 months. These findings correspond to known climatic cycles (e.g. PDO, ENSO etc.)


Now that we have identified significant periods in our data, we want to understand how these periods vary over time.

# Wavelet Analysis

This is where Wavelet Analysis comes into play, offering a method to extract both frequency and time information from a signal simultaneously, unlike FFT, which only offers a global frequency spectrum. This makes it especially useful for analyzing non-stationary signals where frequency content changes over time.



### What is Wavelet Analysis?

Wavelet analysis breaks a signal into time and frequency components using localized wave-like functions called wavelets. The Morlet wavelet, often used for climate data, effectively captures oscillatory patterns while balancing time and frequency localization.


This method is particularly valuable for identifying and understanding climatic cycles and their temporal variability.

In [None]:
# run the cell

# time array for 288 months 
time = np.arange(288)


# Compute wavelet transform
dt = 1  # Monthly time step
mother_wavelet = wavelet.Morlet(6) # standard

sst_wavelet, scales, freqs, coi, fft, fftfreqs = wavelet.cwt(sst_box_mean.values, dt, wavelet=mother_wavelet)

# Ensure the dimensions of the wavelet match the time series
sst_wavelet = sst_wavelet[:, :len(time)]  # Match wavelet data to the time series

# Calculate the wavelet power
power = np.abs(sst_wavelet) ** 2

# Manually calculate significance using chi-square distribution
alpha = 0.05
dof = 2  # Degrees of freedom for Morlet wavelet
signif_level = power.mean() * chi2.ppf(1 - alpha, dof) / dof
signif = power / signif_level

# Average the power and significance over time for each period
mean_power = np.mean(power, axis=1)  # Average power over time for each period
mean_significance = np.mean(signif, axis=1)  # Average significance over time for each period

# Convert frequencies to periods (in months)
periods = 1 / freqs



**How Wavelet Analysis Works in the Code**:

In the provided wavelet code, we transform the SST anomaly time series into the wavelet domain using the Morlet wavelet, producing a power spectrum that shows the strength of different periods over time. This spectrum is visualized as a contour plot, where color intensity represents signal power at specific periods and times.

The Cone of Influence (COI) marks regions where edge effects from the finite time series may distort results. Values inside the COI are more reliable than those near the edges.


Significance levels are calculated using the chi-square distribution to identify statistically significant periods and times, shown as contours over the wavelet spectrum. These highlight areas of meaningful variability.

**<p style="color:royalblue;">For a step-by-step visual explanation, refer to the notebook  [`wavelet_demo.ipynb`](wavelet_demo.ipynb), providing a clearer understanding of the transformation and interpretation of the results. Please take your time to review it at home.</p>**

Next: We plot our Wavelet spectrum

In [None]:
years = np.linspace(2000, 2023, num=288)

# Create the figure with subplots
fig, (ax1, ax2) = plt.subplots(1, 2, gridspec_kw={'width_ratios': [3, 1]}, figsize=(12, 6))

# Plot the wavelet power spectrum on the left subplot
contour = ax1.contourf(years, periods, np.abs(sst_wavelet), levels=39, extend='both', cmap='jet')

# Add significance levels manually
ax1.contour(years, periods, signif, levels=[1.0], colors='white', linewidths=2)

# Plot the cone of influence (COI) -> indicates region where edge effects make the results less reliable
ax1.fill_between(years, coi, periods[-1], color='white', alpha=0.3)

# Set the x-axis to show years and y-axis for periods
ax1.set_xlabel('Time (years)')
ax1.set_ylabel('Period (months)')
ax1.set_yscale('log')
ax1.set_ylim(2, 144)  # Start the y-axis at 2 months to avoid displaying below 2
ax1.set_yticks([2, 6, 12, 24, 48, 96, 144, 300])
ax1.set_yticklabels(['2', '6', '12', '24', '48', '96', '144','300'])
ax1.set_title('Wavelet Spectrum of SST Box Mean Anomalies with COI')

# Add color bar to the left subplot
cbar = plt.colorbar(contour, ax=ax1, label='Wavelet Power')

# Plot the mean power and significance as a function of period on the right subplot
ax2.plot(mean_power, periods, 'b', label='Mean Power')
#ax2.plot(mean_significance, periods, 'r--', label='Mean Significance')
ax2.set_yscale('log')
ax2.set_ylim(ax1.get_ylim())  # Match the y-axis limits with the left plot
ax2.set_ylabel('Period (months)')  # Y-axis now matches the wavelet plot
ax2.set_yticks([2, 6, 12, 24, 48, 96, 144, 300])
ax2.set_yticklabels(['2', '6', '12', '24', '48', '96', '144', '300'])
ax2.set_xlabel('Power (°C²)')
ax2.set_title('Mean Power and Significance over Time')

# Add grid, legend, and format the plot
ax2.grid(True)
ax2.legend()
ax2.yaxis.set_label_position("right")
ax2.yaxis.tick_right()

# Show the full plot
plt.tight_layout()
plt.show()

This plot demonstrates the presence and evolution of dominant signals in the SST anomalies over time.


## Object-Oriented Programming

Ok, this was a lot of code – once again we calculated anomalies, detrended data, and applied several analyses. Now, we want to apply the same steps to other datasets, but copying, pasting, and modifying all the code each time isn’t efficient. We’ve already made use of functions and modules, but there’s another tool available to us: **Object-Oriented Programming (OOP)** using classes.

OOP organizes code into reusable objects with attributes and methods, which can enhance scalability, modularity, and reusability compared to standalone functions and modules.


A class is like a blueprint for a house. It defines the general structure and attributes, such as the number of rooms or the size of the garden, but it’s not a specific house yet. An object is the actual house built from that blueprint. If you remove the kitchen from one specific house (object), other houses remain unaffected. However, if you remove the kitchen from the blueprint (class), no house built from that blueprint will have a kitchen.

In the same way, the [`TimeSeriesAnalyzer`](../Modules/timeseries_analyzer.py) class is a blueprint for processing climate data. It defines attributes like the dataset to be analyzed and methods like compute_fft or wavelet_analysis to perform frequency analyses. When we create an instance (object) of this class, such as `processor = TimeSeriesAnalyzer(ds_sst24a)`, it applies these predefined methods to the specific dataset (ds_sst24a). This allows us to analyze various datasets consistently without rewriting the analysis code each time, much like building different houses from the same blueprint.


<p style="color:royalblue;">Note that this is a simplified introduction to Object-Oriented Programming (OOP). In practice, we often work with more sophisticated classes, such as those provided by <code>xarray</code>, which we've been using throughout the course to efficiently handle multidimensional climate data.</p>


A class consists of attributes, which store data, and methods, which are functions that define behaviors or actions the class can perform.

#### Why Classes are Helpful:

**Efficiency**: Instead of running individual scripts for different datasets, the class provides a unified framework that can process any dataset with the same structure (daily or monthly resoulution, 3 (time, lat, lon) or 4 dimensions (time, depth, lat, lon)).  
    
**Modularity**: Methods are self-contained, so debugging and extending functionality is much simpler.
    
**Reusability**: The class can be reused across different datasets, whether it’s SST, wind data, or other climate variables.

**Scalability**: As the analysis grows more complex, we can continue adding methods and functionality to the class without needing to rewrite existing code.

Applying the Class to Your Data

Let’s now quickly reproduce the calculations we have done before, but this time using the TimeSeriesAnalyzer class to automate the process. Afterward, you’ll get a chance to do the same for the wind data as a major task.



In [11]:
# run the cell
# append the Modules folder to path -> we have done it with the ekman_properties.py module before
import sys
sys.path.append('../Modules')  

# import
from timeseries_analyzer import TimeSeriesAnalyzer 

Print the docstring of the `TimeSeriesAnalyzer` Class to get a quick overview:

In [None]:
# run the cell

help(TimeSeriesAnalyzer)


In [13]:
# run the cell
# Create an instance of TimeSeriesAnalyzer with the SST dataset, naming it processor, but feel free to choose any name you prefer.
processor = TimeSeriesAnalyzer(ds_sst24a)

# Compute anomalies
sst_anomalies = processor.compute_anomalies_and_detrend()
print("Anomalies:\n", sst_anomalies)


With `processor.plot_std_and_annual_var('sst', vmax=5)`, we calculate and plot the standard deviation (variability) of the SST data, along with the annual variability. This gives us a quick visual representation of how the data varies over time and across seasons. The `vmax=5` parameter sets the maximum value of the colorbar, controlling the upper bound for the variability being displayed in the plot. This helps in visually interpreting the intensity of variations across the dataset.

Run the next cell


In [None]:
# run the cell
# calculate and plot variability and annual variability
processor.plot_std_and_annual_var('sst', vmax=5)

With `processor.compute_fft(variable='sst', lon_min=240, lon_max=280, lat_min=-5, lat_max=-20)`, we compute the Fast Fourier Transform (FFT) of the SST anomalies for the specified spatial region. This process helps us identify the dominant periods in the time series, revealing the most significant cycles present in the SST data over time.

In [None]:
#run the cell
#Perform FFT analysis
processor.compute_fft(variable='sst', lon_min=240, lon_max=280, lat_min=-5, lat_max=-20)


With `processor.wavelet_analysis(variable='sst', lon_min=240, lon_max=280, lat_min=-5, lat_max=-20)`, we perform a wavelet analysis to examine how the significant frequencies and periods change over time. This method allows us to explore time-localized variations in the SST anomalies, giving us a deeper insight into the temporal evolution of different cycles.

Run the next cell.

In [None]:
# run the cell
# wavelet analysis
processor.wavelet_analysis(variable='sst', lon_min=240, lon_max=280, lat_min=-5, lat_max=-20)

With just a few lines of code, we were able to recreate and automate all the calculations we performed earlier—from calculating detrended anomalies to performing FFT and wavelet analyses, as well as obtaining a quick overview of the variabilities within each dataset. The class simplifies these processes and makes them easily applicable to different datasets.



For a general overview of how object-oriented modeling and programming work, please visit this 
[Storymap](https://storymaps.arcgis.com/stories/dd9d06f89a63400c96927de117a5b28a). The text is in German, but most browsers offer automatic translation. The article provides a closer look at the programming paradigm and dives into the benefits of encapsulation, inheritance, polymorphism, and abstraction.


**3. Exercise**

Now it’s your turn to adapt this analysis for the wind data. Before you begin, you need to calculate the Ekman properties (e.g., Ekman transport, Ekman pumping). Do you remember how? Once you’ve calculated these, save them as new variables within your wind dataset.

Your task:

1. Calculate Ekman properties and store them as new variables within your `ds_wind`. Apply a mask using the variable `number_of_observations`    
2. Create an instance of the TimeSeriesAnalyzer class for the wind data.  
3. Follow the same analysis steps that we performed for the SST data:
    - compute anomalies  
    - Compute and plot the variability and annual variability of **vertical Ekman velocity** .  
    - Perform an FFT analysis of **vertical Ekman velocity** to determine the dominant periods in the wind data. Choose the same region as SST. Be careful to check the lat and lon dimensions of your wind dataset before starting, as they may differ from the SST data. Adjust the slicing accordingly to match the same region.  
    - Conduct a wavelet analysis of **vertical Ekman velocity**  to investigate the temporal evolution of these dominant periods.
Finally, compare the dominant frequencies of the SST and the vertical Ekman velocity to see how they align or differ.

Good luck!


In [18]:
# run the cell

from ekman_dynamics import compute_ekman_properties

In [19]:
# your code here to calculate Ekman properties and store them in ds_wind


In [20]:
# your code here to apply a mask using number_of_observations

#'number_of_observations' is the variable used as a mask
mask = ...

# Apply the mask to the eastward wind data
ds_wind = ...

In [None]:
# Optional: plot the temporal mean of a variable to check if everything is correct


In [22]:
# your code here
# Create an instance of TimeSeriesAnalyzer with the Wind dataset
processor_wind = ...


# Compute anomalies
wind_anomalies = ...
#print("Anomalies:\n", anomalies)

In [None]:
 # your code here to calculate and plot std and annual std of vertical Ekman velocity



In [None]:
# your code here: Tip: check lat,lon carefully
#Perform FFT analysis to vertical Ekman velocity


In [None]:
# your code here: Tip: check lat,lon carefully
# wavelet analysis / vertical Ekman velocity


Compare the power spectra of SST and vertical Ekman velocity from both the FFT and wavelet analyses. Discuss the dominant periods you observe in each dataset, and how they correspond to one another or not. 

Summarize your findings, highlighting any similarities or differences in the frequency and temporal patterns between the two datasets.



Your Observations: 

....


In [30]:
#save files

processor.save_results('../Data/SST/SST_Anomalies.nc')
processor_wind.save_results('../Data/Wind/Wind_Anomalies.nc')


In [31]:
#close files

#ds_sst.close()
#ds_wind.close()

**Extra Exercise** 

Perform the analysis using the longer SST dataset, which spans from 1854 to 2024. You are encouraged to explore various time periods and lengths of the time series. Investigate how different temporal windows affect the identified dominant frequencies, variabilities, and patterns. This will help you understand the impact of dataset length and coverage on your frequency and variability analyses.




## EOF Analysis

For a more comprehensive understanding of such variability, longer datasets and a careful examination of different time windows are essential, as they allow us to uncover patterns and signals that might otherwise be missed in shorter or less representative datasets.In addition to the analyses we performed, further investigations can be carried out using advanced techniques like **Empirical Orthogonal Function (EOF) analysis**. EOF analysis is a statistical method used to identify dominant modes of variability in spatiotemporal data. It helps in simplifying the complex relationships within the data by decomposing the dataset into principal components.

For a detailed example of how EOF analysis is applied, refer to the accompanying notebook [`EOFanalysis_demo.ipynb`](EOFanalysis_demo.ipynb). This analysis can provide deeper insights into the spatial patterns of variability, complementing the frequency-based methods we've already explored.