# Exercises

---

## Exercise 1: structured and ustructured images
Write a function that outputs the number of PC you have to choose to retain a given percentage of explained variance.
Use this funtion on the  handwritten digit data, and plot the number of PC needed to retain a certain fraction of EVR, for a few vaules of EVR.

Compare this curve to the one you can obtain in unstructured images: either generate images with the same shape of our data or randomly permute the pixels in each image independenlty. How does the number of PC needed to retain a certain fraction of the variance in this case compare to the number needed for digit images?

## Exercise 2: cortical dynamics

Work with the cortical data we used in the lesson. Write some code that lets you easily chose a specific behavioural event from the `trial_data` dictionary (the recorded events are `CSp` for positive conditioned stimulus, `CSn` for negative conditioned stimulus and `licks` for lick events), one of the principal components, an onset time and and offset time, and plots the timecourse of this component in the selected period around the chosen event type.

Use this code to explore the data: how do different components behave? Do you notice any patterns? How would you test for the statistical significance of the effects that you see (if any)?

In [9]:
import numpy as np
import matplotlib.pyplot as plt
import pickle
from sklearn.decomposition import PCA

# === 1. Load the Data ===
voltage_data = np.load("voltage_signal.npy")  # Must be manually uploaded in Colab
mask = np.load("cortex_mask.npy")

with open("behaviour.pickle", "rb") as f:
    trial_data = pickle.load(f)

# === 2. Check Frame Information ===
frames = trial_data.get("frames", None)
if frames is None:
    raise ValueError("No 'frames' key found in trial_data.")
frame_diffs = np.diff(frames)
frame_rate = 1 / np.mean(frame_diffs)  # seconds per frame
print(f"Estimated frame rate: {frame_rate:.4f} Hz")

# === 3. Reshape Voltage Data for PCA ===
if voltage_data.shape == (96, 61, 29999):
    # Expected shape: (n_pixels_x, n_pixels_y, n_frames)
    voltage_data = voltage_data.reshape(-1, voltage_data.shape[2])  # shape: (pixels, time)
    print(f"Reshaped voltage data to: {voltage_data.shape}")
else:
    raise ValueError(f"Unexpected shape: {voltage_data.shape}")

# === 4. Apply PCA ===
pca = PCA(n_components=10)
pc_scores = pca.fit_transform(voltage_data.T)  # Transpose: (time, pixels)

print(f"PCA result shape: {pc_scores.shape} (frames x PCs)")

# === 5. Define Plot Function ===
def plot_pc_around_event(event_name, pc_index=0, window=(-20, 50)):
    """
    Plot the average timecourse of a principal component around a behavioral event.
    :param event_name: 'CSp', 'CSn', or 'Lick'
    :param pc_index: which PC to analyze (0 = PC1)
    :param window: tuple (pre, post) in frames
    """
    assert event_name in trial_data, f"Event '{event_name}' not in trial_data"
    event_times = trial_data[event_name]
    valid_trials = []

    for t in event_times:
        start = t + window[0]
        end = t + window[1]
        if start >= 0 and end < pc_scores.shape[0]:
            valid_trials.append(pc_scores[start:end, pc_index])

    if not valid_trials:
        print(f"⚠️ No valid trials found for event '{event_name}' in window {window}.")
        return

    valid_trials = np.array(valid_trials)
    mean_trace = valid_trials.mean(axis=0)
    std_trace = valid_trials.std(axis=0)

    time_axis = np.arange(window[0], window[1])

    plt.figure(figsize=(8, 4))
    plt.plot(time_axis, mean_trace, label=f"PC{pc_index+1}")
    plt.fill_between(time_axis, mean_trace - std_trace, mean_trace + std_trace, alpha=0.3)
    plt.axvline(0, color='red', linestyle='--', label='Event')
    plt.title(f"PC{pc_index+1} around '{event_name}' events")
    plt.xlabel("Time (frames)")
    plt.ylabel("PC value")
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

# === 6. Example Usage ===
plot_pc_around_event("CSp", pc_index=0, window=(-1000, 1000))
plot_pc_around_event("CSn", pc_index=1, window=(-1000, 1000))
plot_pc_around_event("Lick", pc_index=2, window=(-1000, 1000))


  trial_data = pickle.load(f)


Estimated frame rate: 0.0025 Hz
Reshaped voltage data to: (5856, 29999)
PCA result shape: (29999, 10) (frames x PCs)
⚠️ No valid trials found for event 'CSp' in window (-1000, 1000).
⚠️ No valid trials found for event 'CSn' in window (-1000, 1000).
⚠️ No valid trials found for event 'Lick' in window (-1000, 1000).


In [10]:
# Examine the first few event timestamps and the frame range
print("CSp event times:", trial_data['CSp'][:5])
print("CSn event times:", trial_data['CSn'][:5])
print("Lick event times:", trial_data['Lick'][:5])
print("Frame range:", trial_data['frames'][0], "to", trial_data['frames'][-1])
print("Total frames recorded:", len(trial_data['frames']))


CSp event times: [ 280321 1660291 3107098 8821400 9449479]
CSn event times: [ 873276 2140765 3488066 4434228 5357890]
Lick event times: [285769 286518 343309 343310 344079]
Frame range: 1 to 12022719
Total frames recorded: 30000


## Exercise 3: nonlinear dimensionality reduction

Using the MNIST digit dataset, explore the effect of hyperparameters on the result of nonlinear dimensionality reduction methods.

- Change the `perplexity` and `exageration` parameters in `TNSE` (one at a time). Graphically compare the results of different values.
- Change the neighbourhood size in `Isomap` (set by the parameter `n_neighbors`), and visualize how the results change.

Finally, try to apply one or more of these methods to the cortical data we used in the lessons. How do the data look in two dimensions? Do any cluster seem to appear? Are they robust with respect to hyperparameter choice?