In [58]:
import sys
sys.path.append("..")
import pickle

from torch.utils.data import DataLoader

from wimusim.dataset import CPM

## Introduction

In this notebook, we will demonstrate how to apply **parameter transformations** in WIMUSim to introduce realistic variations in the generated virtual IMU data. This technique allows us to expand the diversity of IMU datasets without requiring additional real-world data collection. By systematically altering the identified WIMUSim parameters, we can simulate a broad range of conditions, including varying body morphologies, sensor placements, and hardware imperfections.

### What We Will Do in This Notebook
1. **Prepare the Identified Parameters**: We start by loading the optimized parameters for the REALDISP dataset (subjects 1 to 10) under the ideal scenario. 
   
2. **Define the Comprehensive Parameter Mixing (CPM) Object**: Using the loaded parameters, we will define a `CPM` object, which allows us to generate new combinations of the Body (B), Dynamics (D), Placement (P), and Hardware (H) parameters to reflect different subject configurations.

3. **Generate New Data**: We will use the defined `CPM` object to generate a diverse set of virtual IMU data by systematically varying the B, D, P, and H parameters. This section will include a demonstration of how to use the transformed data with PyTorch's `DataLoader` for training deep learning models.

4. **Personalized Dataset Generation (PDG)**: We will apply a simple modification when defining a CPM object to create subject-specific datasets. This is useful for generating personalized data for specific subjects by fixing some parameters while allowing others to vary, simulating a more realistic variation for a particular individual.

By the end of this notebook, you will understand how to use WIMUSim’s **CPM** and **PDG** modules to generate a wide range of synthetic IMU data.

## **1. Prepare the Identified Parameters**

To begin, we need to load the optimized parameters identified for the **REALDISP** dataset. These parameters correspond to **subjects 1 to 10** under the **"ideal scenario"** and serve as the baseline configurations for generating new virtual IMU data in WIMUSim.

### **Loading the Identified Parameters**
Before proceeding, please download the optimized parameters from [this link](https://sussex.box.com/s/z5puco39hrv9k42ggdvxnss1rtef4j3l) and specify the correct path in the variable `cpm_param_path` in the code cell below. These parameters are stored in a pre-saved file named `cpm_params.pkl`, which contains the configuration for each WIMUSim component.

### **Understanding the Loaded Parameters**
The loaded `cpm_params` dictionary contains the following entries:

- **`B_list`**: A list of Body parameters for each subject, specifying limb lengths and joint constraints.
- **`D_list`**: A list of Dynamics parameters, which define joint orientations and translations over time.
- **`P_list`**: A list of Placement configurations for the IMUs relative to each body segment.
- **`H_list`**: A list of Hardware characteristics for each IMU, including sensor-specific biases and noise.
- **`target_list`**: Reference IMU signals for each subject, serving as the ground truth for evaluation.

Each of these lists corresponds to the REALDISP subjects 1 to 10 and will be used to initialize the `CPM` object for generating new virtual IMU data.

In [59]:
# Step 1: Specify the path to the pre-saved CPM parameters
# Before running this cell, download the parameters file from the given link and set the correct path.
cpm_param_path = f"<path-to-the-pkl-file>/cpm_params.pkl"

# Step 2: Load the CPM parameters from the specified path
# The loaded file contains parameters for subjects 1 to 10 in the REALDISP ideal scenario.
# P and H are configured for two sensor placements: Right Lower Arm (RLA) and Left Lower Arm (LLA).
with open(cpm_param_path, "rb") as f:
    cpm_params = pickle.load(f)

## **2. Define the Comprehensive Parameter Mixing (CPM) Object**

The next step is to create the **Comprehensive Parameter Mixing (CPM)** object using the parameters we loaded in the previous step. The **CPM** object is a core component in WIMUSim that allows us to generate new combinations of **Body (B)**, **Dynamics (D)**, **Placement (P)**, and **Hardware (H)** parameters. By mixing these parameter sets, we can simulate diverse subject configurations and sensor setups, creating a rich and varied synthetic IMU dataset.

### **What is Comprehensive Parameter Mixing (CPM)?**
CPM is a systematic way to combine different parameter sets, generating a large number of virtual IMU data samples by altering the following aspects:
1. **Body (B)**: Variations in body morphology (e.g., different limb lengths, body segments).
2. **Dynamics (D)**: Variations in joint movements and temporal sequences.
3. **Placement (P)**: Changes in the relative positions and orientations of the IMUs.
4. **Hardware (H)**: Variations in sensor-specific noise, biases, and sampling rates.

### **Creating the CPM Object**
To create the `CPM` object, we pass the parameter lists (`B_list`, `D_list`, `P_list`, and `H_list`) for the REALDISP subjects 1 to 10. Additionally, we specify the `target_list`, which contains the reference IMU data for each subject. The `window` and `stride` parameters are used to define how the data is segmented for model training:

- **`window`**: Specifies the length of the sliding window used to segment the time-series IMU data.
- **`stride`**: Defines the step size between consecutive windows.

In [60]:
realdisp_cpm_dataset = CPM(
        B_list=cpm_params["B_list"],
        D_list=cpm_params["D_list"],
        P_list=cpm_params["P_list"],
        H_list=cpm_params["H_list"],
        target_list=cpm_params["target_list"],
        window=100, 
        stride=25,
        acc_only=False,
    )

CUDA is available! Use GPU.


## **3. Generate Data with CPM**

Now that we have defined the **Comprehensive Parameter Mixing (CPM)** object, we can proceed to generate a diverse set of virtual IMU data using various combinations of the **Body (B)**, **Dynamics (D)**, **Placement (P)**, and **Hardware (H)** parameters.

### **Generating Virtual IMU Data**
The `generate_data()` method of the `CPM` object randomly choose `n_combinations` from the given parameter lists to create new virtual IMU data samples. Each sample represents a unique configuration, reflecting realistic variations in the human body model, body movement dynamics, sensor placements, and sensor characteristics.

- **`n_combinations`**: This parameter specifies the number of parameter combinations to generate. For example, setting `n_combinations=100` will produce 100 different virtual IMU samples by varying the B, D, P, and H parameters across subjects.

> **Note**: The generated data is stored in the `realdisp_cpm_dataset.data` attribute, and the corresponding labels are stored in `realdisp_cpm_dataset.target`.

In [61]:
# This will set the realdisp_cpm_dataset.data and realdisp_cpm_dataset.target attributes
realdisp_cpm_dataset.generate_data(
        n_combinations=10  # Just for initialization. Can be any number.
    )

print(f"Generated {realdisp_cpm_dataset.__len__()} windows of virtual IMU data.")

Generating virtual IMU data...


100%|██████████| 10/10 [00:00<00:00, 13.81it/s]

Generated 55861 windows of virtual IMU data.





### **Using the Generated Data with PyTorch’s DataLoader**
The generated dataset can be used with PyTorch’s `DataLoader` to efficiently batch and shuffle the data for training machine learning models. This allows us to seamlessly integrate WIMUSim’s synthetic IMU data into model training pipelines.


### **Understanding the Output Data**
- **`data`**: Contains the generated virtual IMU signals. For each sample, the data is represented as a 3D tensor of shape `[batch_size, window_size, num_features]`.
  - `batch_size`: The number of samples in each batch (e.g., 1024).
  - `window_size`: The length of each time-series window (e.g., 100).
  - `num_features`: The number of features for each IMU signal (e.g., 12 for 6-axis IMU data with acceleration and gyroscope signals).
  
- **`target`**: Corresponding label for each sample (e.g., activity type or subject ID).
- **`idx`**: Unique identifier for each data sample, useful for tracking and debugging.


In [62]:
# You can use this with torch.dataloader
data_loader = DataLoader(
    realdisp_cpm_dataset,
    batch_size=1024,
    shuffle=True,
    num_workers=0,
)

# Print the shape of the first batch
for data, target, idx in data_loader:
    print(f"Data shape: {data.shape}")
    print(f"Target shape: {target.shape}")
    print(f"Index shape: {idx.shape}")
    break

Data shape: torch.Size([1024, 100, 12])
Target shape: torch.Size([1024])
Index shape: torch.Size([1024])


## **4. Personalized Dataset Generation (PDG)**

In the previous section, we generated a diverse set of virtual IMU data using multiple parameter combinations for subjects 1 to 10. While this approach captures a wide range of subject-specific variations, it treats all subjects as separate entities and combines their parameters freely. However, in some cases, we may want to focus on generating personalized datasets for a specific subject, where we retain most of the subject's unique characteristics and only introduce controlled variations around a limited set of parameters.

### **What is Personalized Dataset Generation (PDG)?**
**Personalized Dataset Generation (PDG)** involves fixing certain parameters (e.g., body morphology or sensor placement) for a target subject while varying others to introduce realistic intra-subject variability. This approach is useful for creating subject-specific data that captures plausible variations for a particular individual.


### **Creating a PDG Object**
To demonstrate PDG, we will limit the **Body (B)**, **Placement (P)**, and **Hardware (H)** parameters to a specific subject (e.g., Subject 3) and vary the **Dynamics (D)** parameter to simulate different movement patterns for this subject. This configuration will produce a dataset that is tailored to the unique characteristics of Subject 3, making it ideal for personalized training and evaluation.

### **Use Case: Personalized HAR Model Training**
Personalized datasets like these are particularly valuable when training **personalized HAR models**. Such models can achieve better performance for specific users by leveraging subject-specific data that captures realistic intra-subject variability. This approach can also be used to fine-tune general HAR models for specific subjects, improving their robustness to individual variations.

By limiting some of the parameters to specific subject, you can do Personalized Dataset Generation too.


In [63]:
realdisp_pdg_dataset = CPM(
        B_list=cpm_params["B_list"][3:4],
        D_list=cpm_params["D_list"],
        P_list=cpm_params["P_list"][3:4],
        H_list=cpm_params["H_list"][3:4],
        target_list=cpm_params["target_list"],
        window=100, 
        stride=25,
        acc_only=False,
    )

CUDA is available! Use GPU.


### **Generating Personalized Data**
We can now use the `generate_data()` method again to generate virtual IMU data for this subject. Since the B, P, and H parameters are fixed, only the D parameters will vary, simulating different movement patterns for Subject 3:

In [64]:
# Generate personalized data for Subject 3
realdisp_pdg_dataset.generate_data(n_combinations=5)  # Generate 5 unique movement patterns

print(f"Generated {realdisp_pdg_dataset.__len__()} virtual IMU samples for Subject 3.")

Generating virtual IMU data...


100%|██████████| 5/5 [00:00<00:00, 19.24it/s]

Generated 29243 virtual IMU samples for Subject 3.





## **Conclusion**

In this notebook, we explored how to use WIMUSim's **parameter transformation capabilities** to generate diverse and realistic virtual IMU datasets for Human Activity Recognition (HAR) models. 