# Synthetic experiments

This notebook contains code to reproduce the synthetic experiments found in our paper. 
We include examples for the RBF and spectral mixture kernel with 4 components. Running these examples back to back will probably lead to OOM errors due to TF memory issues. As mentioned in the README, there might be some variance in the results compared to those in the paper.

In [1]:
import os
import json
import numpy as np
import tensorflow as tf
import sys
parent_directory = os.path.abspath('..')
sys.path.append(parent_directory)
from cpgp.metrics import f_measure
from cpgp.segcpgp import SegCPGP
from tqdm import tqdm

import pandas as pd

with open(parent_directory+"/data/annotations.json") as f:
    annotations = json.load(f)

2025-06-10 15:54:54.868873: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-10 15:54:54.925805: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-10 15:54:55.063346: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-06-10 15:54:55.063403: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-06-10 15:54:55.075420: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to

In [2]:
results = {}
for d in tqdm(os.listdir(parent_directory+"/data/synthetic")):   # For each of the synthetic datasets...
    data = np.load(parent_directory+f"/data/synthetic/{d}")
    X, y = data["X"], data["y"]
    
    # Fit SegCPGP
    segcpgp = SegCPGP()
    segcpgp.fit(X, y, base_kernel_name="rbf", verbose=False)
    
    # Analysis
    locs = [l[0] for l in segcpgp.LOCS]
    fscore = f_measure({0: [100, 200, 300]}, locs)
    results[d] = fscore 
    tf.keras.backend.clear_session()   # Necessary to prevent memory leak. 

100%|██████████| 40/40 [12:33<00:00, 18.85s/it]


Compute results; group results by type via the synthetic-keys.csv file and take the mean of the results.

In [3]:
rdf = pd.DataFrame(results.items())
keys = pd.read_csv(parent_directory+"/data/synthetic-keys.csv")
rdf["Identifier"] = rdf[0].apply(lambda s: s.split(".")[0])
df = keys.merge(rdf, on="Identifier", how="outer")
df.groupby("Category").mean(numeric_only=True)


Unnamed: 0_level_0,1
Category,Unnamed: 1_level_1
mean,0.880664
periodicity-stable-mean,0.740317
trend,0.664524
variance,0.548624
