# Introduction

This notebook contains all of the scripts used to tune and apply t-distributed Stochastic Neighbor Embedding (tSNE) to the datasets cleaned and merged in the data_preprocessing_DR.ipynb notebook. The following 4 steps are taken in this notebook:

Step 1: Import the necessary libraries and datasets for the dimensionality reduction. The datasets involve a combination of MS/ALL, imputation methods, and unique/combined sessions. There is a total of 18 training datasets.

Step 2: Tune the hyperparameters (hps) of the tSNE algorithm (perplexity and learning rate) using K_fold in the make_K_folds function and the tSNE_gridsearch function. The dataset is split into 3 folds. For unique sessions datasets the subjects are split into train and test subjects which are then used to obtain the train and test dataset. For the combined sessions, the train subjects at Y00 and Y05 are used for training and the test subjects at Y05 are used for testing. The current train fold is used to fit the tSNE and Kmeans algorithm, the fitted tSNE is then used to get the test tSNE embeddings which are then used for making predictions with Kmeans. Adjusted rand index (ARI) is then used to evaluate the predictions against the true test labels. This sequence is repeated for every fold combination, after which the average ARI (AARI) is obtained for the given hps combination. These steps are then repeated for the other hyperparameter (hp) values. A dataframe of the optimal hp value per dataset is then created with the make_gridsearch_table function.

Step 3: The best hp value per dataset found during step 2 is used to make the training tSNE embeddings using the apply_tSNE function. A 2 dimensional plot of the embedded data is produced in the function group_plot_tSNE. The apply_tSNE function provides the embedded arrays for the unique sessions and combined session dataset. 

Step 4: The embedded tSNE arrays are saved and reserved for later use.

These datasets were used to assess the performance of tSNE in comparison with PCA, UMAP, and TPHATE. The statistical outcomes of part 1 of the project can be found in the 'Dimensionality Reduction' subsection of the results.

# Step 1: Import Libraries & Datasets

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
from sklearn.model_selection import KFold
from sklearn.metrics import adjusted_rand_score
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE

In [None]:
# Import the combined sessions datasets
MS_imp_all = pd.read_excel('prepro_data/all_imp/MStrain_ia.xlsx')
ALL_imp_all = pd.read_excel('prepro_data/all_imp/ALLtrain_ia.xlsx')
MS_imp_type = pd.read_excel('prepro_data/type_imp/MStrain_it.xlsx')
ALL_imp_type = pd.read_excel('prepro_data/type_imp/ALLtrain_it.xlsx')
MS_imp_nb = pd.read_excel('prepro_data/nb_imp/MStrain_in.xlsx')
ALL_imp_nb = pd.read_excel('prepro_data/nb_imp/ALLtrain_in.xlsx')

# Import the unique sessions datasets (for the Time imputation method)
MS_t1_imp_all = pd.read_excel('prepro_data/all_imp/MStrain_ia00.xlsx')
MS_t2_imp_all = pd.read_excel('prepro_data/all_imp/MStrain_ia05.xlsx')
ALL_t1_imp_all = pd.read_excel('prepro_data/all_imp/ALLtrain_ia00.xlsx')
ALL_t2_imp_all = pd.read_excel('prepro_data/all_imp/ALLtrain_ia05.xlsx')

# Import the unique sessions datasets (for the Time + Type imputation method)
MS_t1_imp_type = pd.read_excel('prepro_data/type_imp/MStrain_it00.xlsx')
MS_t2_imp_type = pd.read_excel('prepro_data/type_imp/MStrain_it05.xlsx')
ALL_t1_imp_type = pd.read_excel('prepro_data/type_imp/ALLtrain_it00.xlsx')
ALL_t2_imp_type = pd.read_excel('prepro_data/type_imp/ALLtrain_it05.xlsx')

# Import the unique sessions datasets (for the Time + Neighbor imputation method)
MS_t1_imp_nb = pd.read_excel('prepro_data/nb_imp/MStrain_in00.xlsx')
MS_t2_imp_nb = pd.read_excel('prepro_data/nb_imp/MStrain_in05.xlsx')
ALL_t1_imp_nb = pd.read_excel('prepro_data/nb_imp/ALLtrain_in00.xlsx')
ALL_t2_imp_nb = pd.read_excel('prepro_data/nb_imp/ALLtrain_in05.xlsx')

# Group the datasets by imputation methods and then by unique session 
time_set_all = [MS_t1_imp_all, MS_t2_imp_all, ALL_t1_imp_all, ALL_t2_imp_all]
time_set_type = [MS_t1_imp_type, MS_t2_imp_type, ALL_t1_imp_type, ALL_t2_imp_type]
time_set_nb = [MS_t1_imp_nb,  MS_t2_imp_nb, ALL_t1_imp_nb, ALL_t2_imp_nb]
time_set_ls = [time_set_all, time_set_type, time_set_nb]

# Group the datasets by imputation methods and then by combined sessions
complete_set_ls = [[MS_imp_all, ALL_imp_all], [MS_imp_type, ALL_imp_type], [MS_imp_nb, ALL_imp_nb]]

# Step 2: Hyperparameter tuning for tSNE (perplexity & learning rate)

In [None]:
def make_K_folds(df):
    """
    INPUT: Dataframe
    OUPUT: Nested list of dataframes containing the train and test datasets
    DESCRIPTION: Use k folds to split the input data into 3 splits, and return the x_train, x_test, y_train, and y_test for each of the
    folds as unique lists.
    """
    # Get the subject IDs & number of unique time points
    subjects = df['index'].unique()
    num_ses = len(df['Time'].unique())
    
    # Get the true labels
    label_col = [col for col in df.columns if col.startswith('MS')]
    labels = df[label_col[0]]
    
    # Normalize the dataframe
    norm_df = df.drop(columns=['EDSS', 'BL_Avg_cognition', 'Time', 'index'] + label_col, axis=1)
    norm_df = StandardScaler().fit_transform(norm_df)
    
    # Initialize KFold for subjects
    kfold = KFold(n_splits = 3, shuffle = True, random_state = 42)

    # Initialize output list
    x_train_list, x_test_list, y_test_list = [], [], []
    
    if num_ses == 1: # For unique session datasets

        for train_indices, test_indices in kfold.split(norm_df):
            # Get the training and test data for fold
            x_train, x_test = norm_df[train_indices], norm_df[test_indices]
            y_test = labels[test_indices]

            # Add the fold's training and test data to the corresponding list
            x_train_list.append(x_train)
            x_test_list.append(x_test)
            y_test_list.append(y_train)
        
    else: # For combined sessions datasets
        for train_idx, test_idx in kfold.split(df[df['Time'] == 1]):
            # Get the subject IDs for the training and test sets
            fold_subjects = subjects[test_idx]

            # Get the indices for test set
            test_idx = df[(df['index'].isin(fold_subjects)) & (df['Time'] == 2)].index

            # Get the training and test data
            x_train, x_test = norm_df[train_idx], norm_df[test_idx]
            y_test = labels[test_idx]

            # Add the fold's training and test data to the corresponding list
            x_train_list.append(x_train)
            x_test_list.append(x_test)
            y_test_list.append(y_test)
    
    return [x_train_list, x_test_list, y_test_list]

In [None]:
def tSNE_gridsearch(ls_ls_df, hp_dic):
    """
    INPUT: 
    ls_ls_df : (Nested lists of dataframes)
    OUTPUT: Nested lists of integers (Average adjusted rand index scores)
    DESCRIPTION: Find the optimal perplexity value & learning rate to include for each of the df based on their 
    obtained average adjusted rand index
    """
    # Define the perplexity, learning rate ranges and the number of runs
    perplexities = hp_dic['perplexities']
    learning_rates = hp_dic['learning_rates']
    n_runs = 2
    
    # Initialize the output list to maintain same structure as input list
    output_rand_indices = []

    # Iterate over the imputation methods lists
    for m, sublist in enumerate(ls_ls_df):
        sublist_rand_indices = []
        for n, df in enumerate(sublist):
            # Get the number of clusters needed for the df
            n_clusters = 2 if 'MS' in df.columns else 4
            
            # Get k-fold splits of the data
            k_fold_ls = make_K_folds(df)
            
            # Initializes the rand indices array (per df)           
            rand_indices = np.zeros((len(perplexities), len(learning_rates)))  

            # Perform grid search
            for i, perplexity in enumerate(perplexities):
                for j, learning_rate in enumerate(learning_rates):
                    temp_ARIs = []
                    for k in range(0, len(k_fold_ls[0])):
                        x_train = k_fold_ls[0][k]
                        x_test = k_fold_ls[1][k]
                        y_test = k_fold_ls[2][k]
                        
                        for run in range(n_runs):
                            # Initialise and fit tSNE model
                            tsne_model = TSNE(n_components = 2, perplexity = perplexity, 
                                              learning_rate = learning_rate, random_state = k)
                            x_train_tsne = tsne_model.fit_transform(x_train)
                            x_test_tsne = tsne_model.transform(x_test)
                            
                            # Fit & apply K-means clustering
                            kmeans = KMeans(n_clusters = n_clusters, random_state = 42, n_init = 'auto')
                            kmeans.fit(x_train_tsne)
                            y_pred = kmeans.predict(x_test_tsne)
                            temp_ARIs.append(adjusted_rand_score(y_test, y_pred))

                    # Get average ARI score
                    rand_indices[i, j] = np.mean(temp_ARIs)
            
            # Add AARI score array per df to sublist (for imputation type)
            sublist_rand_indices.append(rand_indices)
            print(f'Gridsearch for dataset {n} of type list {m} is completed')
        
        # Add (imputation type) sublist to output list
        output_rand_indices.append(sublist_rand_indices)
        
    return output_rand_indices

In [None]:
def make_row_names():
    """
    INPUT: 
    OUTPUT: List of strings
    DESCRIPTION: Creates the row names corresponding to the different datasets included in the make_gridsearch_tbl 
    function
    """
    ls_row_names = []
    ls_types = ['imp_type']
    ls_subjects = ['MS_', 'ALL_']
    
    for str3 in ls_types:
        for str1 in ls_subjects:
            for str2 in ['t1_', 't2_', 't3_']:
                name = str1 + str2 + str3
                ls_row_names.append(name)
    for str2 in ls_types:
        for str1 in ls_subjects:
            name = str1 + str2
            ls_row_names.append(name)   
    
    return ls_row_names

In [None]:
def make_gridsearch_tbl(time_RI, comp_RI, hp_dic):
    """
    INPUT: 
    time_RI : (nested lists of arrays) arrays AARI float for time split datasets
    comp_RI : (nested lists of arrays) arrays AARI float for time combined datasets
    OUTPUT: Dataframe
    DESCRIPTION: Create a table (df) with dataset names in column 1, best perplexity values in column 2, best 
    learning rate value in column 3, and gridsearch AARI scores in column 4. 
    """
    # Make row names
    dataset_names = make_row_names()
    
    perplexities = hp_dic['perplexities']
    learning_rates = hp_dic['learning_rates']

    # Initialize lists to store maximum values, row indices, and column indices
    max_values = []
    max_perp_indices = [] # rows
    max_LR_indices = [] #cols

    # Iterate over time list (time_RI)
    for sublist in time_RI:
        for array in sublist:
            # Append the maximum value and its indices to the respective lists
            max_values.append(array.max())
            max_perp_indices.append(np.argwhere(array == array.max())[0][0]) # first occurance | row index
            max_LR_indices.append(np.argwhere(array == array.max())[0][1]) # first occurance | col index
    
    # Iterate over combined time list (comp_RI)
    for sublist in comp_RI:
        for array in sublist:
            # Append the maximum value and its indices to the respective lists
            max_values.append(array.max())
            max_perp_indices.append(np.argwhere(array == array.max())[0][0]) # first occurance | row index
            max_LR_indices.append(np.argwhere(array == array.max())[0][1]) # first occurance | col index
    
    # Create the DataFrame
    output_df = pd.DataFrame({
        'Dataset': dataset_names,
        'Best_Perplexity': [perplexities[i] for i in max_perp_indices],
        'Best_Learning_Rate': [learning_rates[j] for j in max_LR_indices],
        'Best_ARI_Score': max_values})
     
    output_df.to_excel('output/tSNE/best_gridsearch_per_dataset_tbl.xlsx', index = False)
    return output_df

In [None]:
def plot_tSNE_gridsearch(time_RI, comp_RI, hp_dic):
    """
    INPUT: 
    time_RI : (Nested lists of floats) nested lists of gridsearch AARI scores for time split datasets
    comp_RI : (Nested lists of floats) nested lists of gridsearch AARI scores for time combined datasets
    OUTPUT: 3 figures
    DESCRIPTION: Plot the AARI scores for each tSNE gridsearch run. Figures 1-3 correspond to different 
    imputation methods and plots 1 and 2 represent the pwMS vs HC + pwMS datasets.
    """
    # Define the hyperparameters
    perplexities = hp_dic['perplexities']
    learning_rates = hp_dic['learning_rates']

    # Define the lists to iterate over for plotting & naming. 
    df_ind_range = [0, 2, 1, 3]
    imp_types = ['All Imputation', 'Type Imputation', 'Neighbor Imputation']
    time_point = ['1', '1', '2', '2', 'Combined']
    file_name = ['all_imp', 'type_imp', 'neighbor_imp']
    MS_col_ls = ['#D7BDE2', '#A569BD', '#7D3C98', '#4A235A']
    ALL_col_ls = ['#A9DFBF', '#27AE60', '#1E8449', '#145A32']

    # Make 3 main figures (imputation types)
    for fig_num in range(1, 4):
        plt.figure(figsize=(18, 24))

        # Make 6 main plots (MS & ALL)
        for plot_num in range(1, 7):
            plt.subplot(3, 2, plot_num)

            if plot_num % 2 != 0 and plot_num != 5:
                df_ind = df_ind_range[plot_num - 1]
                plot_title = ('MS Patients Only', time_point[plot_num - 1])
                for i, lr in enumerate(learning_rates):
                    plt.plot(perplexities, time_RI[fig_num - 1][df_ind][:,i], label = f'learning rate={learning_rates[i]}', color = MS_col_ls[i])

            elif plot_num % 2 == 0 and plot_num != 6:
                df_ind = df_ind_range[plot_num - 1]
                plot_title = ('All Patients', time_point[plot_num - 1])
                for i, lr in enumerate(learning_rates):
                    plt.plot(perplexities, time_RI[fig_num - 1][df_ind][:,i], label = f'learning rate={learning_rates[i]}', color = ALL_col_ls[i])
                
            elif plot_num == 5:
                plot_title = ('MS Patients Only', time_point[4])
                for i, lr in enumerate(learning_rates):
                    plt.plot(perplexities, comp_RI[fig_num - 1][plot_num - 5][:,i], label = f'learning rate={learning_rates[i]}', color = MS_col_ls[i])

            elif plot_num == 6:
                plot_title = ('All Patients', time_point[4])
                for i, lr in enumerate(learning_rates):
                    plt.plot(perplexities, comp_RI[fig_num - 1][plot_num - 5][:,i], label = f'learning rate={learning_rates[i]}', color = ALL_col_ls[i])

            plt.xlabel('Perplexity')
            plt.ylabel('Average Adjusted Rand Index')
            plt.title(f'{imp_types[fig_num - 1]} for {plot_title[0]} Dataset at Time Point {plot_title[1]}')
            plt.legend(title = 'Learning Rates')
            plt.grid(True)
        
        plt.tight_layout()
        plt.savefig(f'output/tSNE/{file_name[fig_num - 1]}/Gridsearch_plots_figure_{file_name[fig_num - 1]}_{plot_title[1]}.png')
        plt.show()

In [None]:
# Define the hps to use in gridsearch
param_dict = {
    'perplexities': [2, 5, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
    'learning_rates': [0.1, 0.01, 0.001, 0.0001]
}

# Run the tSNE gridsearch
time_gridsearch = tSNE_gridsearch(time_set_ls)
comp_gridsearch = tSNE_gridsearch(complete_set_ls)

# Make table with best PC number and corresponding ARI score
best_gridsearch_df = make_gridsearch_tbl(time_gridsearch, comp_gridsearch)

# Save the gridsearch table as an excel file
best_gridsearch_df.to_excel('output/tSNE/best_gridsearch_per_impdata.xlsx')

# Plot the gridsearch runs
plot_tSNE_gridsearch(time_gridsearch, comp_gridsearch)

# Step 3: Apply tSNE with Gridsearch Results (+ plot)

In [None]:
def group_plot_tSNE(time_arrays, time_ls, comp_arrays, comp_ls, best_GS_df):
    """
    INPUT: 
    time_arrays : (nested lists of arrays) nested lists with arrays of the tSNE embeddings for the time seperated datasets
    time_ls : (nested lists of dataframes) nested lists with dataframe for the time seperated datasets
    comp_arrays : (nested lists of arrays) nested lists with arrays of the tSNE embeddings for the time combined datasets
    comp_ls : (nested lists of dataframes) nested lists with dataframe for the time combined datasets
    best_GS_df : (dataframe) dataframe of the gridsearch outcomes
    OUTPUT: 3 figures of 2 by 3 subplots
    DESCRIPTION: creates 2 dimensional plots of the tSNE embedded dataframes
    """
    # Make list of file names for saving, and list to order the plots within the figure
    file_name = ['all_imp', 'type_imp', 'neighbor_imp']
    ordered_ls = [0,2,1,3,0,1]
    GS_ints = [0,2,1,3,12,13]
    
    # Make n main figures (imputation types)
    for fig_num in range(1, len(time_ls) + 1):
        plt.figure(figsize=(18, 24))

        # Make m main plots
        num_plots = len(time_arrays[0]) + len(comp_arrays[0])
        for plot_num, df_ind in enumerate(ordered_ls):
            plt.subplot(int(num_plots/2), 2, plot_num + 1)
            
            # Assign a df and array to plot
            label_df = time_ls[fig_num - 1][df_ind] if plot_num < len(time_arrays[0]) else comp_ls[fig_num - 1][df_ind]
            plotting_array = time_arrays[fig_num - 1][df_ind] if plot_num < len(time_arrays[0]) else comp_arrays[fig_num - 1][df_ind]    
            
            # Define the plot colours & label colum
            label_col = [col for col in label_df.columns if col.startswith('MS')]
            color_map = {0: 'pink', 1: 'orange', 2: 'purple'} if label_col[0] == 'MStype' else {0: 'green', 1: 'purple'}
            legend_labels = {0: 'PPMS', 1: 'SPMS', 2: 'RRMS'} if label_col[0] == 'MStype' else {0: 'HC', 1: 'MS'}
            mapped_colors = label_df['MStype'].map(color_map) if label_col[0] == 'MStype' else label_df['MS'].map(color_map)  

            # Make plot
            for category, color in color_map.items():
                indices = label_df[label_col[0]] == category
                plt.scatter(plotting_array[indices, 0], plotting_array[indices, 1], 
                            c = color, label = legend_labels[category], alpha=0.7)
            
            # Make plot labels
            df_name = best_GS_df.iloc[GS_ints[plot_num], 0]
            perplexity = best_GS_df.iloc[GS_ints[plot_num], 1]
            learning_rate = best_GS_df.iloc[GS_ints[plot_num], 2]
            plt.xlabel('t-SNE Component 1')
            plt.ylabel('t-SNE Component 2')
            plt.title(f'2D t-SNE for {df_name} Dataset (Perplexity={perplexity}, Learning Rate={learning_rate})')
            plt.legend()
            plt.grid(True)  
        
        GS_ints = [x + 4 if i < 4 else x + 2 for i, x in enumerate(GS_ints)]

        # Make figures and save
        plt.tight_layout()
        plt.savefig(f'output/tSNE/{file_name[fig_num - 1]}/best_param_tsne_plots_{file_name[fig_num - 1]}_multicolor.png')
        plt.show()

In [None]:
def apply_tSNE(time_ls, comp_ls, best_GS_df, plot_param):
    """
    INPUT:
    time_ls : (nested lists of dataframes) nested lists with dataframe for the time seperated datasets
    comp_ls : (nested lists of dataframes) nested lists with dataframe for the time combined datasets
    best_GS_df : (dataframe) dataframe of the gridsearch outcomes
    plot_param : (Boolean) True/False make a 2-dimensional plot of the tSNE embeddings
    OUTPUT: 2 lists of nested dfs, 6 lists of nested floats
    DESCRIPTION: Applies tSNE fitting to each of the dataframes in the given list of nested dataframes, based on
    its optimal perplexity and learning rate values. Plots the ouput arrays of the tSNE fittings if plot_param is
    True.
    """
    # Initialise output lists
    output_time_ls, output_comp_ls = [], []
    
    # Needed to iterate through best_GS_df 
    counter = 0
    
    # Iterate through the unique sessions dataset
    for sublist in time_ls:
        type_list = []
        
        for df in sublist:
            # Get name, perplexity, learning rate and label column name for the df
            df_name = best_GS_df.iloc[counter, 0]
            perplexity = best_GS_df.iloc[counter, 1]
            learning_rate = best_GS_df.iloc[counter, 2]
            label_col = [col for col in df.columns if col.startswith('MS')]

            # Remove target variables and normalise the df
            norm_df = df.drop(columns = ['EDSS', 'BL_Avg_cognition', 'index'] + label_col , axis = 1)
            norm_df = StandardScaler().fit_transform(norm_df) 

            # Run the tSNE model
            tsne_model = TSNE(n_components = 2, perplexity = perplexity, learning_rate = learning_rate, random_state = 42)
            tsne_array = tsne_model.fit_transform(norm_df)

            type_list.append(tsne_array)

            # Update the counter
            counter += 1

        output_time_ls.append(type_list)
        

    for sublist in comp_ls:
        type_list = []
        
        for df in sublist:
            # Get name, perplexity, learning rate and label column name for the df
            df_name = best_GS_df.iloc[counter, 0]
            perplexity = best_GS_df.iloc[counter, 1]
            learning_rate = best_GS_df.iloc[counter, 2]
            label_col = [col for col in df.columns if col.startswith('MS')]

            # Remove target variables and normalise the df
            norm_df = df.drop(columns = ['EDSS', 'BL_Avg_cognition', 'index'] + label_col , axis = 1)
            norm_df = StandardScaler().fit_transform(norm_df) 

            # Run the tSNE model
            tsne_model = TSNE(n_components = 2, perplexity = perplexity, learning_rate = learning_rate, random_state = 42)
            tsne_array = tsne_model.fit_transform(norm_df)

            type_list.append(tsne_array)

            # Update the counter
            counter += 1

        output_comp_ls.append(type_list)
                       
    # Plotting condition (if true, plots are generated)
    if plot_param:
        group_plot_tSNE(output_time_ls, time_ls, output_comp_ls, comp_ls, best_GS_df)
     
    return output_time_ls, output_comp_ls

In [None]:
# Run step 3 (apply_tSNE)
time_tsne_arrays, comp_tsne_arrays = apply_tSNE(time_set_ls, complete_set_ls, best_gridsearch_df, True)                          

# Step 4: Save the tSNE embedded dataset

In [None]:
def export_tSNE_embedings(ls_ls_tsne_array, ls_ls_matching_df):
    """
    INPUT:
    ls_ls_tsne_array : (nested list of np.array) nested list of array of the tSNE embedding
    ls_ls_matching_df : (nested list of pd.dataframe) nested list of original pre embedding dataframe
    OUTPUT:
    DESCRIPTION: Exports the tSNE embeddings as dataframes with the same index/ subjects ID as their original dataset. 
    """
    imp_file = ['all', 'type', 'neighbor']
    imp_df = ['ia', 'it', 'in']

    for imp_idx, imp_ls in enumerate(ls_ls_tsne_array):
        for emb_idx, tsne_emb in enumerate(imp_ls):
            # Make a dataframe from the array
            output_df = pd.DataFrame(tsne_emb, columns = [f'tSNE{i+1}' for i in range(tsne_emb.shape[1])])

            # Reintroduce the patients ID (index)
            output_df['index'] = ls_ls_matching_df[imp_idx][emb_idx]['index'].reset_index(drop = True)
            columns = ['index'] + [col for col in output_df.columns if col != 'index']
            
            # Reorder the columns such that index is first
            output_df = output_df[columns]

            sub_type = ''
            year = ''
            # Check for time split list or not
            if len(ls_ls_matching_df[0]) > 3:
                sub_type = 'MStrain_' if emb_idx < 2 else 'ALLtrain_'
                year = '00' if emb_idx % 2 == 0 else '05'

                # Save the new df as an excel file
                output_df.to_excel(f'output/tSNE/{imp_file[imp_idx]}/tSNE_{sub_type}{imp_df[imp_idx]}{year}.xlsx', index=False)
            
            else:
                sub_type = 'MStrain_' if emb_idx == 0 else 'ALLtrain_'

                # Save the new df as an excel file
                output_df.to_excel(f'output/tSNE/{imp_file[imp_idx]}/tSNE_{sub_type}{imp_df[imp_idx]}.xlsx', index=False)          

In [None]:
# Run the export_tSNE_embedings function for the unique sessions and the combined sessions.
export_tSNE_embedings(time_tsne_arrays, time_set_ls)
export_tSNE_embedings(comp_tsne_arrays, complete_set_ls)