# Day 4: Mathematical elements of Data Sciense

Data Science Lab, University of Bern, 2025

Prepared by Dr. Mykhailo Vladymyrov.

This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/)

# Summary

Main content is currently provided solely as interactive life discussion.
We look at a few of more commonly used mathematical concepts in Data Science, such as vector spaces, probability & distributions, frequency domain, and differential calculus. Specifically, we look at those form the perspective of Data Science, and how they can be used in practice to and how we can think of them in a more intuitive way.

This notebook is primarily for running the visualizations and the exercises.

To keep track of the topics suggested for the discussion, in this notebook is also given this list of topics:

- frequency domain
    - sine, cosine
    - Fourier transform
    - trends

# Interactive Demo

# Frequency Domain

Suggested reading: [Use Frequency More Frequently](https://medium.com/data-science/use-frequency-more-frequently-14715714de38)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
# Create a sample signal
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose


In [None]:
t = np.linspace(0, 16, 500, endpoint=False)  # time vector

In [None]:
dt = t[1] - t[0]  # time step

In [None]:
100/dt, 16/dt

In [None]:
# exmaple: periodic signal sin wave dt*1/100 @ A = 1 + dt/16 @ A=0.2 + noise normal 0.05

y1 = 1 * np.sin(2 * np.pi * t/(dt * 100))  # periodic signal
y2 = 0.05 * np.sin(2 * np.pi * t/(dt * 6))  # periodic signal
y3 = np.random.normal(0, 0.1, t.shape)  # noise
y = y1 + y2 + y3  # combined signal

plt.plot(t, y)


In [None]:
# get power spectral density
f, Pxx = signal.welch(y, fs=1/dt, nperseg=512)
plt.semilogy(f, Pxx)
plt.xlabel('Frequency [Hz]')

# detrending the signal

In [None]:
n = 120
t = np.arange(n)

sawtooth_wave = signal.sawtooth(2 * np.pi * t / 30)  # example of a sawtooth wave

trend = 10 + 0.05 * (t + sawtooth_wave*20)
seasonal = 2 * np.sin(2 * np.pi * t / 12)
noise = np.random.normal(0, 0.5, n)
y = trend + seasonal + noise


# Use a pandas Series with a datetime index (best practice)
date_index = pd.date_range(start='2020-01-01', periods=n, freq='M')
series = pd.Series(y, index=date_index)

# Seasonal decomposition
result = seasonal_decompose(series, model='additive', period=12)  # period=12 for yearly seasonality

# Detrended signal (remove trend, keep seasonality and noise)
detrended = series - result.trend

# Plot
plt.figure(figsize=(10,6))
plt.plot(series, label='Original')
plt.plot(result.trend, label='Trend')
plt.plot(date_index, trend, '--', label='Trend')
plt.plot(detrended, label='Detrended')
plt.plot(date_index, seasonal+ noise, '--', label='Seasonal+Noise')
plt.legend()
plt.title('Detrending with statsmodels.tsa.seasonal_decompose')
plt.show()
