Based on several research papers we have read we found out that there are 3 groups of sound features that get affected by the Parkinson Disease: Phonation, articulation and prosody.


We will start extracting the phonation features that get affected by the parkinson disease based on (1):

Feature                      |            Effect

Jitter                        |    Higher in PD than healthy

Shimmer                        |   Higher in PD than healthy

Harmonics-to-noise ratio        |  Lower in PD than in healthy	

Correlation Dimension (D2)       | Higher in PD than in healthy

Pitch Period Entropy              |Higher in PD than in healthy	

First, we will start now by extracting the jitter feature.

In [31]:
from pathlib import Path
import numpy as np
import pandas as pd
from tqdm import tqdm
import parselmouth
from parselmouth.praat import call


In calculating jitter, Praat must find the pitch(how many times the vocal folds vibrate per second)
Women and men voices vibrate only within a certain range(80-150Hz for men and 150-250Hz for women)

We take 75 for floor and 300 for ceiling to give some threshold for outliers.

In [32]:
PITCH_FLOOR = 75
PITCH_CEILING = 300


Set the path of the folder containing all voice recordings (.wav files).

Set the name of the .csv file we will generate.

In [33]:
AUDIO_DIR = Path(r"C:\Users\user\Downloads\23849127\HC_AH\HC_AH")
OUTPUT_CSV = "jitter_features.csv"

.wav files may have one sound channel(mono) or two(stereo). If a file has 2 channels then we should average them. 

Example: Have R(t) and L(t) new signal =(R(t)+L(t))/2

In [34]:
def to_mono(sound: parselmouth.Sound) -> parselmouth.Sound:
    if sound.get_number_of_channels() == 1:   #If we have only one channel then we are done.
        return sound
    mono_vals = sound.values.mean(axis=0, keepdims=True) # calculating average.
    return parselmouth.Sound(mono_vals, sampling_frequency=sound.sampling_frequency) # return the new average sound.


Here we are extracting jitter, but what is it?

Jitter is commonly known as the cycle-to-cycle variation in the periodic signal generated at the larynx, more commonly known as the voice box.(2)

In simpler terms, it measures how much the period time differs from one period to another.

Example: period1: 0.8555    period2:0.8567   jitter is the difference between the 2 periods.

It can be measured in several ways:

1)Local jitter: Average absolute cycle-to-cycle variation normalized by the average period time.

2)RAP: Average difference between the current cycle and the average of the current cycle and its 2 neighbors (1 previous and 1 next) normalized by the average period time.

3)PPQ5: Same as RAP but compares the current cycle with the average of the current cycle with the 2 previous cycles and 2 next cycles normalized by the average period time.

Note :Cycle = Period 

In [35]:
def extract_jitter_for_file(wav_path: Path,
                            pitch_floor=PITCH_FLOOR,
                            pitch_ceiling=PITCH_CEILING):
    # load wav file as Parselmouth Sound Object.
    snd = parselmouth.Sound(str(wav_path))
    # convert to mono
    snd = to_mono(snd)
    # get total duration of recording
    duration = snd.get_total_duration()

    # Build PointProcess using pitch limits in Hz
    pp = call(snd, "To PointProcess (periodic, cc)", pitch_floor, pitch_ceiling)

    # Convert pitch limits (Hz) -> period limits (seconds)
    min_period = 1.0 / pitch_ceiling
    max_period = 1.0 / pitch_floor
    max_period_factor = 1.3  # Praat default

    # Jitter funcs expect: (tmin, tmax, minPeriod, maxPeriod, maxPeriodFactor)
    jitter_local = call(pp, "Get jitter (local)", 0, 0, min_period, max_period, max_period_factor)
    jitter_rap   = call(pp, "Get jitter (rap)",   0, 0, min_period, max_period, max_period_factor)
    jitter_ppq5  = call(pp, "Get jitter (ppq5)",  0, 0, min_period, max_period, max_period_factor)

    return {
        "filename": wav_path.name,
        "relpath": str(wav_path.as_posix()),
        "duration_sec": duration,
        "jitter_local": jitter_local,
        "jitter_rap": jitter_rap,
        "jitter_ppq5": jitter_ppq5,
    }

Iterate over all the .wav files and extracting the jitter feature(local, RAP and PPQ5) from them and append it to the csv file for each iteration. Then compute each value in percentage by multiplying by 100 mainly because differences are small.

In [36]:
wav_files = sorted(list(AUDIO_DIR.rglob("*.wav")) + list(AUDIO_DIR.rglob("*.WAV")))
if not wav_files:
    print(f"No .wav files found under: {AUDIO_DIR.resolve()}")

rows = []
for f in tqdm(wav_files, desc="Extracting jitter"):
    try:
        rows.append(extract_jitter_for_file(f))
    except Exception as e:
        rows.append({
            "filename": f.name,
            "relpath": str(f.as_posix()),
            "duration_sec": np.nan,
            "jitter_local": np.nan,
            "jitter_rap": np.nan,
            "jitter_ppq5": np.nan,
            "error": str(e)
        })

df = pd.DataFrame(rows)
for col in ["jitter_local", "jitter_rap", "jitter_ppq5"]:
    if col in df.columns:
        df[col + "_pct"] = df[col] * 100.0

df.to_csv(OUTPUT_CSV, index=False)
print(f"✅ Done. Saved {len(df)} rows to {OUTPUT_CSV}")
df.head()


Extracting jitter: 100%|██████████| 82/82 [00:00<00:00, 107.12it/s]

✅ Done. Saved 82 rows to jitter_features.csv





Unnamed: 0,filename,relpath,duration_sec,jitter_local,jitter_rap,jitter_ppq5,jitter_local_pct,jitter_rap_pct,jitter_ppq5_pct
0,AH_064F_7AB034C9-72E4-438B-A9B3-AD7FDA1596C5.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,3.738875,0.004138,0.002028,0.002163,0.413828,0.202779,0.216324
1,AH_064F_7AB034C9-72E4-438B-A9B3-AD7FDA1596C5.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,3.738875,0.004138,0.002028,0.002163,0.413828,0.202779,0.216324
2,AH_114S_A89F3548-0B61-4770-B800-2E26AB3908B6.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.267375,0.005911,0.003164,0.003722,0.591076,0.316431,0.372203
3,AH_114S_A89F3548-0B61-4770-B800-2E26AB3908B6.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.267375,0.005911,0.003164,0.003722,0.591076,0.316431,0.372203
4,AH_121A_BD5BA248-E807-4CB9-8B53-47E7FFE5F8E2.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.867625,0.004685,0.002691,0.00268,0.468469,0.269138,0.26803


Here we are extracting shimmer, but what is it?

Shimmer is defined as the variation in the amplitude (peak) of successive glottal cycles (cyclical series of events that produces voicing) in a sustained phonation(aaaa pronounciation in our case). In simpler terms, if the loudness of the voice signal is not consistent across cycles, shimmer values will be high.(3)

Example: period 1: 0.52

    period 2:0.55   
    
    shimmer is the difference between the 2 periods' peaks(0.03).

It can be measured in several ways:

1)Local shimmer: Average absolute cycle-to-cycle variation in peaks normalized by the average amplitude.

2)APQ3: Amplitude Perturbation Quotient over 3 cycles. Compares each cycle to the average of itself and its two neighbors (i−1, i, i+1).

3)APQ5: Amplitude Perturbation Quotient over 5 cycles. Compares each cycle to the average of itself and its four neighbors (i-2,i−1, i, i+1,  i+2).

4)APQ11: same as APQ3 and APQ5 but now with 10 neighbors.

Note :Amplitude = peak amplitude; in the context of shimmer.

In [37]:
OUTPUT_CSV = "shimmer_features.csv"

In [38]:
def extract_shimmer_for_file(wav_path: Path,
                             pitch_floor=PITCH_FLOOR,
                             pitch_ceiling=PITCH_CEILING):
    """Extract shimmer features (local, apq3, apq5, apq11) from a voice recording."""
    snd = parselmouth.Sound(str(wav_path))
    snd = to_mono(snd)
    duration = snd.get_total_duration()

    # Build PointProcess to find glottal pulses
    pp = call(snd, "To PointProcess (periodic, cc)", pitch_floor, pitch_ceiling)

    # Convert pitch limits (Hz) to period limits (seconds)
    min_period = 1.0 / pitch_ceiling
    max_period = 1.0 / pitch_floor
    max_period_factor = 1.3  # Praat default

    # Shimmer functions need both the sound and its PointProcess
    # Format: (Sound, PointProcess, tmin, tmax, minPeriod, maxPeriod, maxPeriodFactor, silenceThreshold, timeFactor)
    # We'll use typical default Praat values for shimmer extraction
    tmin, tmax = 0, 0
    min_period = 1.0 / pitch_ceiling
    max_period = 1.0 / pitch_floor
    max_period_factor   = 1.3   # typical Praat default
    max_amplitude_factor = 1.6   # typical Praat default

    shimmer_local = call([snd, pp], "Get shimmer (local)",
                        tmin, tmax, min_period, max_period,
                        max_period_factor, max_amplitude_factor)

    shimmer_apq3  = call([snd, pp], "Get shimmer (apq3)",
                        tmin, tmax, min_period, max_period,
                        max_period_factor, max_amplitude_factor)

    shimmer_apq5  = call([snd, pp], "Get shimmer (apq5)",
                        tmin, tmax, min_period, max_period,
                        max_period_factor, max_amplitude_factor)

    shimmer_apq11 = call([snd, pp], "Get shimmer (apq11)",
                        tmin, tmax, min_period, max_period,
                        max_period_factor, max_amplitude_factor)

    return {
        "filename": wav_path.name,
        "relpath": str(wav_path.as_posix()),
        "duration_sec": duration,
        "shimmer_local": shimmer_local,
        "shimmer_apq3": shimmer_apq3,
        "shimmer_apq5": shimmer_apq5,
        "shimmer_apq11": shimmer_apq11,
    }

Same as when we have extracted jitter feature. We iterate over all the .wav files and extract the shimmer feature. Then  at the end we add the percentage value of each feature by just multiplying by 100 mainly because differences are small.

In [39]:
wav_files = sorted(list(AUDIO_DIR.rglob("*.wav")) + list(AUDIO_DIR.rglob("*.WAV")))
if not wav_files:
    print(f"No .wav files found under: {AUDIO_DIR.resolve()}")

rows = []
for f in tqdm(wav_files, desc="Extracting shimmer"):
    try:
        rows.append(extract_shimmer_for_file(f))
    except Exception as e:
        rows.append({
            "filename": f.name,
            "relpath": str(f.as_posix()),
            "duration_sec": np.nan,
            "shimmer_local": np.nan,
            "shimmer_apq3": np.nan,
            "shimmer_apq5": np.nan,
            "shimmer_apq11": np.nan,
            "error": str(e)
        })

# Save results to CSV
df = pd.DataFrame(rows)
for col in ["shimmer_local", "shimmer_apq3", "shimmer_apq5", "shimmer_apq11"]:
    if col in df.columns:
        df[col + "_pct"] = df[col] * 100.0  # convert to percentage

df.to_csv(OUTPUT_CSV, index=False)
print(f"✅ Done. Saved {len(df)} rows to {OUTPUT_CSV}")
df.head()

Extracting shimmer: 100%|██████████| 82/82 [00:00<00:00, 94.87it/s]

✅ Done. Saved 82 rows to shimmer_features.csv





Unnamed: 0,filename,relpath,duration_sec,shimmer_local,shimmer_apq3,shimmer_apq5,shimmer_apq11,shimmer_local_pct,shimmer_apq3_pct,shimmer_apq5_pct,shimmer_apq11_pct
0,AH_064F_7AB034C9-72E4-438B-A9B3-AD7FDA1596C5.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,3.738875,0.03881,0.021972,0.026096,0.033093,3.881016,2.197198,2.609649,3.309269
1,AH_064F_7AB034C9-72E4-438B-A9B3-AD7FDA1596C5.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,3.738875,0.03881,0.021972,0.026096,0.033093,3.881016,2.197198,2.609649,3.309269
2,AH_114S_A89F3548-0B61-4770-B800-2E26AB3908B6.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.267375,0.090319,0.046664,0.058237,0.081274,9.031922,4.666442,5.823694,8.127356
3,AH_114S_A89F3548-0B61-4770-B800-2E26AB3908B6.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.267375,0.090319,0.046664,0.058237,0.081274,9.031922,4.666442,5.823694,8.127356
4,AH_121A_BD5BA248-E807-4CB9-8B53-47E7FFE5F8E2.wav,C:/Users/user/Downloads/23849127/HC_AH/HC_AH/A...,2.867625,0.047971,0.026389,0.033175,0.043938,4.797145,2.638924,3.317495,4.393782


Here we are extracting harmonics-to-noise ratio, but what is it?

Harmonic to Noise Ratio (HNR) measures the ratio between periodic and non-periodic components of a speech sound.(4)

The hnr related features we are extracting are:

1)hnr_mean_db: Average HNR of the frames across the entire recording.

2)hnr_median_db: Median HNR of the frames ignoring unvoiced frames.

3)hnr_stdev_db: Standard deviation(how much values of HNR frames vary from mean).


In [40]:
OUTPUT_CSV = "hnr_features.csv"

In [41]:
TIME_STEP = 0.01          # seconds (analysis frame hop)
VOICING_THRESHOLD = 0.1   # used internally in the method (typical default)
SILENCE_THRESHOLD = 1.0    # typical default for (cc) in common examples

In [42]:
def nanmedian_hnr(harm: parselmouth.Harmonicity) -> float:
    """
    Return the median HNR (dB) ignoring undefined frames (≈ -200 dB).
    """
    vals = harm.values.squeeze()  # shape (T,)
    # Treat <= -200 dB as undefined
    vals = np.where(vals <= -200.0, np.nan, vals)
    return float(np.nanmedian(vals))

In [43]:
def extract_hnr_for_file(wav_path: Path,
                         pitch_floor=PITCH_FLOOR,
                         time_step=TIME_STEP,
                         voicing_threshold=VOICING_THRESHOLD,
                         silence_threshold=SILENCE_THRESHOLD):
    snd = parselmouth.Sound(str(wav_path))
    snd = to_mono(snd)
    duration = snd.get_total_duration()

    harm = call(snd, "To Harmonicity (cc)",
                time_step, pitch_floor, voicing_threshold, silence_threshold)

    # Mean & stdev from Praat over full duration
    hnr_mean_db  = call(harm, "Get mean", 0, 0)
    hnr_stdev_db = call(harm, "Get standard deviation", 0, 0)

    # Median computed in NumPy (since "Get quantile" isn't available here)
    hnr_median_db = nanmedian_hnr(harm)

    return {
        "filename": wav_path.name,
        "relpath": str(wav_path.as_posix()),
        "duration_sec": duration,
        "hnr_mean_db": hnr_mean_db,
        "hnr_median_db": hnr_median_db,
        "hnr_stdev_db": hnr_stdev_db,
    }

Calculate the mean, median and standard deviation of each .wav file in the folder.

In [44]:
wav_files = sorted({*AUDIO_DIR.rglob("*.wav"), *AUDIO_DIR.rglob("*.WAV")})
if not wav_files:
    print(f"No .wav files found under: {AUDIO_DIR.resolve()}")

rows = []
for f in tqdm(wav_files, desc="Extracting HNR"):
    try:
        rows.append(extract_hnr_for_file(f))
    except Exception as e:
        rows.append({
            "filename": f.name,
            "relpath": str(f.as_posix()),
            "duration_sec": np.nan,
            "hnr_mean_db": np.nan,
            "hnr_median_db": np.nan,
            "hnr_stdev_db": np.nan,
            "error": str(e)
        })

df = pd.DataFrame(rows)
df.to_csv(OUTPUT_CSV, index=False)
print(f"✅ Done. Saved {len(df)} rows to {OUTPUT_CSV}")

Extracting HNR:   0%|          | 0/41 [00:00<?, ?it/s]

Extracting HNR: 100%|██████████| 41/41 [00:00<00:00, 103.00it/s]

✅ Done. Saved 41 rows to hnr_features.csv





In [60]:
import nolds

In [61]:
OUTPUT_CSV = "D2_features.csv"
EMBED_DIM = 10        # embedding dimension (try 8–12 typically)
SAMPLE_TRIM = 50000   # number of samples to use (optional: shorter = faster)


In [62]:
def extract_d2_for_file(wav_path: Path,
                        emb_dim=EMBED_DIM,
                        sample_trim=SAMPLE_TRIM):
    """
    Compute the correlation dimension D2 of the speech waveform.
    """
    snd = parselmouth.Sound(str(wav_path))
    samples = to_mono(snd)
    fs = snd.sampling_frequency

    # Optionally trim to make computation faster (for long files)
    if len(samples) > sample_trim:
        samples = samples[:sample_trim]

    # Normalize samples (zero mean, unit variance)
    samples = (samples - np.mean(samples)) / np.std(samples)

    # Compute correlation dimension (D2)
    try:
        d2 = nolds.corr_dim(samples, emb_dim)
        return {
            "filename": wav_path.name,
            "relpath": str(wav_path.as_posix()),
            "duration_sec": snd.get_total_duration(),
            "correlation_dimension_d2": d2,
        }
    except Exception as e:
        return {
            "filename": wav_path.name,
            "relpath": str(wav_path.as_posix()),
            "duration_sec": snd.get_total_duration(),
            "correlation_dimension_d2": np.nan,
            "error": str(e)
        }

In [63]:
wav_files = sorted({*AUDIO_DIR.rglob("*.wav"), *AUDIO_DIR.rglob("*.WAV")})
rows = []
for f in tqdm(wav_files, desc="Extracting D2"):
    rows.append(extract_d2_for_file(f))

df = pd.DataFrame(rows)
df.to_csv(OUTPUT_CSV, index=False)
print(f"✅ Done. Saved {len(df)} rows to {OUTPUT_CSV}")
df.head()

Extracting D2:  32%|███▏      | 13/41 [39:01<1:24:03, 180.14s/it]


KeyboardInterrupt: 

In [None]:
from pathlib import Path
import numpy as np
import pandas as pd
import parselmouth
from parselmouth.praat import call
from scipy.signal import lfilter

# Optional (recommended) for LPC whitening; otherwise fallback used
try:
    import librosa
    HAVE_LIBROSA = True
except Exception:
    HAVE_LIBROSA = False

# ---------- Config ----------
AUDIO_DIR = Path(r"C:\Users\user\Downloads\23849127\HC_AH\HC_AH")  # <- put your folder here
OUTPUT_CSV = "ppe_features.csv"

# Pitch settings (Hz). Use your previous floor/ceiling that worked for jitter/shimmer.
PITCH_FLOOR = 60.0
PITCH_CEILING = 400.0

# PPE settings
SEMITONE_REF_HZ = 130.81  # C3 (as in Little et al. examples)
LPC_ORDER = 3             # small AR order to remove smooth vibrato
N_BINS = 50               # histogram bins for entropy
HIST_RANGE = (-1.0, 1.0)  # semitone range for residuals (robust & close to paper figures)

def hz_to_semitone(f0_hz, ref_hz=SEMITONE_REF_HZ):
    """Convert Hz to semitones relative to ref note: 12 * log2(f/ref)."""
    f0 = np.asarray(f0_hz, dtype=float)
    with np.errstate(divide="ignore", invalid="ignore"):
        st = 12.0 * np.log2(f0 / ref_hz)
    return st

def lpc_whiten(x, order=LPC_ORDER):
    """
    Apply a simple LPC whitening:
      - If librosa is available: use librosa.lpc to get AR coefficients a (length order+1, a[0]=1)
        and compute residual e = lfilter(a, [1], x).
      - Else: fallback to 1st-order high-pass on a moving mean (keeps PPE usable).
    Returns residual same length as x.
    """
    x = np.asarray(x, dtype=float)
    # Remove NaNs before LPC; interpolate small gaps if any
    good = np.isfinite(x)
    if good.sum() < max(order + 5, 20):
        # too short or many NaNs → fallback detrend
        return detrend_fallback(x)

    # simple linear interpolation over NaNs to make LPC stable
    if not good.all():
        idx = np.arange(len(x))
        x[~good] = np.interp(idx[~good], idx[good], x[good])

    if HAVE_LIBROSA:
        a = librosa.lpc(x, order=order)  # a[0]=1
        e = lfilter(a, [1.0], x)         # inverse filtering → residual
        return e
    else:
        return detrend_fallback(x)

def detrend_fallback(x, win=201):
    """
    Smooth-moving-average removal + first-difference as a robust fallback.
    """
    x = np.asarray(x, dtype=float)
    n = len(x)
    if n < 5:
        return np.zeros_like(x)

    w = min(win, n - (1 - n % 2))  # ensure odd, <= n
    if w % 2 == 0:
        w -= 1
    # simple moving average via convolution
    kern = np.ones(w) / w
    smooth = np.convolve(np.nan_to_num(x), kern, mode="same")
    resid = x - smooth
    # light high-pass: first difference (pad to original length)
    diff = np.empty_like(resid)
    diff[0] = 0.0
    diff[1:] = resid[1:] - resid[:-1]
    return diff

def shannon_entropy_from_hist(samples, bins=N_BINS, rng=HIST_RANGE):
    """
    Discrete Shannon entropy H = -sum p log p (natural log).
    Also return normalized H / log(#bins) in [0,1].
    """
    samples = np.asarray(samples, dtype=float)
    samples = samples[np.isfinite(samples)]
    if samples.size == 0:
        return np.nan, np.nan

    counts, edges = np.histogram(samples, bins=bins, range=rng)
    p = counts.astype(float)
    total = p.sum()
    if total <= 0:
        return np.nan, np.nan
    p /= total
    # avoid log(0)
    p = p[p > 0]
    H = -np.sum(p * np.log(p))
    H_norm = H / np.log(bins)
    return float(H), float(H_norm)

def extract_ppe_for_file(wav_path: Path,
                         pitch_floor=PITCH_FLOOR,
                         pitch_ceiling=PITCH_CEILING):
    """
    Compute PPE for a single WAV:
      1) F0 with Praat (autocorrelation)
      2) Convert to semitone scale
      3) LPC-whiten to remove smooth trends
      4) Entropy of residual semitone distribution (raw and normalized)
    """
    snd = parselmouth.Sound(str(wav_path))
    duration = snd.get_total_duration()

    # Praat pitch
    # time_step=0 lets Praat choose; units Hz; voicing threshold default OK
    pitch = call(snd, "To Pitch (ac)", 0, pitch_floor, pitch_ceiling)
    # Get vector of times & F0 values
    n = call(pitch, "Get number of frames")
    f0_vals = []
    for i in range(1, n + 1):
        f0 = call(pitch, "Get value in frame", i)  # Hz, undefined=NaN
        f0_vals.append(f0 if np.isfinite(f0) and f0 > 0 else np.nan)
    f0_vals = np.array(f0_vals, dtype=float)

    # Require enough voiced frames
    voiced = np.isfinite(f0_vals)
    if voiced.sum() < 20:
        raise RuntimeError("Not enough voiced frames for reliable PPE.")

    # 1) Hz → semitone
    st = hz_to_semitone(f0_vals, ref_hz=SEMITONE_REF_HZ)

    # 2) remove smooth variations (vibrato/microtremor) via light LPC whitening
    #    (interpolates small gaps internally)
    resid = lpc_whiten(st, order=LPC_ORDER)

    # 3) Entropy of residual semitone variations
    H_raw, H_norm = shannon_entropy_from_hist(resid[np.isfinite(resid)],
                                              bins=N_BINS, rng=HIST_RANGE)

    return {
        "filename": wav_path.name,
        "relpath": str(wav_path.as_posix()),
        "duration_sec": duration,
        "ppe_raw": H_raw,      # natural-log entropy of the histogram
        "ppe_norm": H_norm,    # normalized to 0–1 by log(#bins)
        "voiced_frames": int(voiced.sum()),
    }

def main():
    rows = []
    wavs = sorted([p for p in AUDIO_DIR.rglob("*.wav")])
    if not wavs:
        print(f"No .wav files under {AUDIO_DIR}")
        return

    for wav in wavs:
        try:
            feat = extract_ppe_for_file(wav)
            feat["error"] = ""
        except Exception as e:
            feat = {
                "filename": wav.name,
                "relpath": str(wav.as_posix()),
                "duration_sec": np.nan,
                "ppe_raw": np.nan,
                "ppe_norm": np.nan,
                "voiced_frames": 0,
                "error": str(e),
            }
        rows.append(feat)

    df = pd.DataFrame(rows, columns=[
        "filename", "relpath", "duration_sec", "ppe_raw", "ppe_norm", "voiced_frames", "error"
    ])
    df.to_csv(OUTPUT_CSV, index=False)
    print(f"Saved {len(df)} rows → {OUTPUT_CSV}")

if __name__ == "__main__":
    main()


References:

1)https://pmc.ncbi.nlm.nih.gov/articles/PMC11939921/#sec4-brainsci-15-00320

2)https://scholarworks.umt.edu/umcur/2012/poster_2/4/

3)https://blog.phonalyze.com/understanding-shimmer-in-voice-assessment-with-phonalyze/

4)https://www.sciencedirect.com/science/article/pii/S1877050918316739