# Computational Analysis of Sounds and Music (CH-CASM-M)

## 01 - Fundamentals of Audio Processing (1/2) Part 2

**WS 2025/2026**

Prof. Dr. Jakob Abeßer, jakob.abesser@uni-bamberg.de

Last update: 21.10.2025

**Outline**

In this notebook, you will learn 
 - how to load and process audio files in Python
 - how to sonify and visualize waveforms
 - how to compute and visualize the STFT, Mel Spectrogram, and CQT

In [None]:
!pip install wget
!pip install soundfile

In [None]:
import numpy as np
import wget
import os
import matplotlib
import librosa
%matplotlib inline
import matplotlib.pyplot as pl
import platform
import IPython.display as ipd

### *(Platform-independent Code)*

**HINT**: if you want to write Python scripts that work on multiple platforms (like Windows, Linux etc.), you can use ```platform.platform()``` to figure out automatically, which platform your Python code is run.

In [None]:
# check current platform
print(platform.platform())

# save audio in current directory
dir_audio = ''

### Get audio example files

This script loads 2 audio files that we need here.

In [None]:
if not os.path.isfile('piano.wav') or not os.path.isfile('bird.wav'):
    for fn in ('piano.wav', 'bird.wav'):
        wget.download('https://github.com/CHBamberg/CH-CASM-M-2025/raw/refs/heads/main/data/{}'.format(fn), 
                      out=fn, bar=None)
else:
    print('Files already exist!')

### File paths

When working with multiple audio files, it is a good practice to treat directories and filenames separately and use ```os.path.join``` to combine both to absolute filenames. This command uses the correct delimiter signs for all operating systems (Windows, MacOS, Linux).

In [None]:
# 1) define path to the directory that contains the audio files (WAV format)
# TIP: under Windows, it is also recommended to use '/', e.g. 'C:/my_audio_files'
dir_wav = ''  # here, we use the same directory as the notebook is in

# this could also look like
# dir_wav = 'c:/audio_files'

# 2) create absolute path of audio file (directory + filename)
# os.path.join takes care of the correct delimiter signs
# - Linux / MacOSx: "/"
# - Windows: "\\"

fn_wav = os.path.join(dir_wav, 'bird.wav')  # original filename: 416529__inspectorj__bird-whistling-single-robin-a_2s
assert os.path.isfile(fn_wav)

### Loading audio files

- first check librosa documentation: https://librosa.org/doc/main/generated/librosa.load.html

In [None]:
# (1) use the sample rate of the file, load stereo if needed
x, fs = librosa.load(fn_wav)

print("Sample vector shape:", x.shape)  # 1D numpy array, mono
print("Sample rate [Hz]", fs)
print(f"Audio duration (seconds): {len(x)/fs}")

In [None]:
# (2) you could also enforce another sample rate
fs_fix = 44100
x, fs = librosa.load(fn_wav, sr=fs_fix)  # in this case, the signal is upsampled to a higher sample rate

print(x.shape)  # ! increase of sampling rate (upsampling) -> more samples!
print(fs) # ! fix sample rate was used
print(f"Duration = {len(x)/fs} s")

In [None]:
# (3) if you have a stereo file, you can enforce one channel audio (mono)
# x, fs = librosa.load(fn_wav, mono=True)

### Sonification

Let's listen to our example audio file (birds)

In [None]:
ipd.display(ipd.Audio(data=x, rate=fs))

### Segmentation

Our audio signal is roughly 2.06 s long. Let's extract the first 1.5 seconds of it.

In [None]:
# TASK: find the upper sample index for slicing that corresponds to a physical time of 1.5s
sample_end = None # TASK: replace "None"
x_first_1_5_s = x[:sample_end]
print(f"Our segment has a duration of {len(x_first_1_5_s)/fs} seconds.")

### Waveform Visualization

Let's plot our waveform

In [None]:
pl.figure(figsize=(10,2))
pl.plot(x)
pl.show()  

**Observation**: the x-axis just shows the sample number so far, this is not informative without the sample rate

#### Create time axis

The sample rate in Hz defines, how many audio samples exist per second. If we compute the inverse ($1/f_\mathrm{s}$), we get the duration of each sample in seconds.

In [None]:
number_of_samples = len(x)
print("Number of samples:", number_of_samples)

seconds_per_sample = None # TASK: replace "None" with correct command!
print("Duration [seconds] of one sample", seconds_per_sample)  # on audio sample corresponds to ~22.7 ms

# let's create a numpy array with the physical time of each audio sample
frames_in_seconds = np.arange(number_of_samples)*seconds_per_sample
print(frames_in_seconds[:3])

let's plot the signal again, this time with an interpretable x-axis

In [None]:

pl.figure(figsize=(10,2))
pl.plot(frames_in_seconds, x)
pl.xlabel('Time [s]')
pl.ylabel('Amplitude')
pl.show()

Done :)

In [None]:
# TASK
# (1) open "piano.wav" instead
# (2) what is the duration of the audio file (in seconds)
# (3) create a waveform plot
# (4) normalize the signal such that the maximum absolute value of all samples is 1
# (5) plot the normalized waveform in a different color 
# (6) select a very short segment from an arbitrary position in the audio file (50 ms long) -> compare the waveform 
#     to the complex periodic signals discussed in the lecture