# Data Process

This notebook processes simulation data from `raw_sim_data` in the same way as `Asymm_v_T0_121323.xlsx`.

## Notebook Configuration

In [6]:
# import packages
import os
from pathlib import Path

import pandas as pd
import numpy as np
import scipy.io as sio

from rich import print
from tqdm import tqdm

In [4]:
# define the path to raw sim data
raw_data = Path("raw_sim_data")

# retrieve .mat files
mat_files = list(raw_data.glob("*.mat"))

[PosixPath('raw_sim_data/T0_5_2.mat'),
 PosixPath('raw_sim_data/T0_5_1.mat'),
 PosixPath('raw_sim_data/T0_5.mat')]

In [25]:
# define a function to extract data from a .mat file
def extract_mat_data(mat_file: Path, var_list: list = None, array_list: list = None) -> tuple[dict, dict]:
    """
    Extracts variables and arrays from a .mat file.

    Parameters:
    mat_file (Path): The path to the .mat file.
    var_list (list): A list of variable names to extract.
    array_list (list): A list of array names to extract.

    Returns:
    tuple[dict, dict]: A tuple containing two dictionaries: the first dictionary contains the extracted variables, and the second dictionary contains the extracted arrays.
    """

    # check that the mat_file exists
    if not mat_file.exists():
        raise FileNotFoundError(f"File {mat_file} does not exist.")

    # check that var_list or array_list are non-empty
    if not var_list and not array_list:
        raise ValueError("At least one of var_list or array_list must be non-empty.")

    # read the .mat file
    mat_data = sio.loadmat(mat_file)

    # create dictionaries to store extracted data
    var_dict = {}
    array_dict = {}

    # extract variables
    if var_list:
        for var_tag in var_list:
            var_dict[var_tag] = mat_data.get(var_tag)

    # extract arrays
    if array_list:
        for array_tag in array_list:
            array_dict[array_tag] = mat_data.get(array_tag)

    # return the dictionaries
    return var_dict, array_dict

## Extract the Distribution Data

This is the data shown in the `.xlsx` file. Each `.mat` file contains one row of this data. From each `.mat` file extract the following variables:
- T0
- M0
- taufractip
- taufraclength
- tauasymm (tauplusendasymm)
- mapfractip
- mapfraclength
- mapasymm (mapplusendasymm)
- L

Using the extracted values, calculate the following:
- T0/M0

In [26]:
# make a list of all variables to extract
var_list = ["T0", "M0", "taufractip", "taufraclength", "tauplusendasymm", "mapfractip", "mapfraclength", "mapplusendasymm", "L"]


In [27]:
# iterate over each .mat file and extract the var list
# each extraction should correspond to a row in a dataframe
data = []
for mat_file in tqdm(mat_files, desc="Extracting data", unit="file"):
    # extract the data
    var_dict, array_dict = extract_mat_data(mat_file, var_list, None)

    # append the data to the list
    data.append(var_dict)

# create a dataframe from the data
df = pd.DataFrame(data)

Extracting data:   0%|          | 0/3 [00:00<?, ?file/s]


TypeError: 'NoneType' object is not subscriptable

In [21]:
df

Unnamed: 0,T0,M0,taufractip,taufraclength,tauplusendasymm,mapfractip,mapfraclength,mapplusendasymm,L
0,[[5]],[[0.1]],[[0.6363636363636364]],[[0.37423312883435583]],,[[0]],[[0.37423312883435583]],,[[73.4036518366907]]
1,[[5]],[[0.1]],[[0.6363636363636364]],[[0.4153846153846154]],,[[0]],[[0.33076923076923076]],,[[83.7966049636929]]
2,[[5]],[[0.1]],[[0.7272727272727273]],[[0.4411764705882353]],,[[0]],[[0.3176470588235294]],,[[68.41730832422807]]
