# Tutorial 3: Merging Timeseries With ``merge_dicts``

``merge_dicts()`` combines timeseries data from different tasks and sessions, enabling analyses
that identify similar CAPs across these tasks, sessions, or both. This is only useful when the tasks and sessions
includes the same subjects. This function produces a merged dictionary only containing subject IDs present across all
input dictionaries. Additionally, while the run IDs across task do not need to be similar, the timeseries of the same
run-IDs across dictionaries will be appended. Note that successful merging requires all dictionaries to contain the
same number of columns/ROIs.

In [None]:
# Download packages
!pip install neurocaps[windows,demo]

In [None]:
import numpy as np
from neurocaps.analysis import merge_dicts

# Simulate two subject_timeseries dictionaries
# First dictionary contains 3 subjects, each with three runs that have 10 timepoints and 100 rois
subject_timeseries_session_pre = {str(x): {f"run-{y}": np.random.rand(10, 100) for y in range(3)} for x in range(3)}

# Deleting run-2 for subject 2; situation where subject 2 only completed two runs of a task
del subject_timeseries_session_pre["2"]["run-2"]

# Second dictionary contains 2 subjects, each with a single run that have 20 timepoints and 100 rois
subject_timeseries_session_post = {str(x): {f"run-{y}": np.random.rand(20, 100) for y in range(1)} for x in range(2)}

# The subject_timeseries_list also takes pickle files and can save the modified dictionaries as pickles too.
subject_timeseries_merged = merge_dicts(
    subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
    return_merged_dict=True,
    return_reduced_dicts=False,
)

for subj_id in subject_timeseries_merged["merged"]:
    for run_id in subject_timeseries_merged["merged"][subj_id]:
        timeseries = subject_timeseries_merged["merged"][subj_id][run_id]
        print(f"sub-{subj_id}; {run_id} shape is {timeseries.shape}")


In [None]:
# The original dictionaries can also be returned too. The only modifications done is that the originals will
# only contain the subjects present across all dictionaries in the list. Note that the "dict_#" IDs correspond
# to the index that the subject timeseries are in `subject_timeseries_list`. `subject_timeseries_list` also
# accepts pickle files
merged_dicts = merge_dicts(
    subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
    return_merged_dict=True,
    return_reduced_dicts=True,
)

for dict_id in merged_dicts:
    for subj_id in merged_dicts[dict_id]:
        for run_id in merged_dicts[dict_id][subj_id]:
            timeseries = merged_dicts[dict_id][subj_id][run_id]
            print(f"For {dict_id} sub-{subj_id}; {run_id} shape is {timeseries.shape}")

CAPs can be derived using the merged subject timeseries data. This analysis will identify CAPs present across session
or tasks.

In [None]:
from neurocaps.analysis import CAP

cap_analysis = CAP()

# Deriving CAPs from the merged timeseries data
cap_analysis.get_caps(
    merged_dicts["merged"], n_clusters=range(2, 8), cluster_selection_method="davies_bouldin", show_figs=True
)

Then each reduced subject timeseries (representing a session or task) can be used to compute the temporal dynamics
of the previously identified CAPs from the merged timeseries. These files can then be used to perform analyses
assessing how to the same CAPs changed across time, tasks, or both time and tasks. Note that if ``standardize`` was set
to True in ``CAP.get_caps()``, then the column (ROI) means and standard deviations computed from the concatenated data
used to obtain the CAPs are also used to standardize each subject in the timeseries data inputted into
``CAP.calculate_metrics()``. This ensures proper CAP assignments for each subjects frames.

In [None]:
import os

cap_analysis.calculate_metrics(
    merged_dicts["dict_0"],
    continuous_runs=False,
    metrics=["persistence"],
    output_dir=os.getcwd(),
    prefix_filename="session-pre",
)

**Note that due to each subject only having a single run, the run names do not change to "run-continuous".**

In [None]:
cap_analysis.calculate_metrics(
    merged_dicts["dict_1"],
    continuous_runs=True,
    metrics=["persistence"],
    output_dir=os.getcwd(),
    prefix_filename="session-post",
)