<a href="https://colab.research.google.com/github/Moktacim/Time-Series-Library/blob/main/PyPOTS_Tutorials.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 😎 Quick-start Tutorials for PyPOTS are Here!

## Dependency Installation

In [None]:
# install pypots >=0.4
! pip install pypots>=0.4


## 📀 Preparing the **PhysioNet-2012** dataset for this tutorial

In [None]:
from pypots.data.generating import gene_physionet2012
from pypots.utils.random import set_random_seed

set_random_seed()

# Load the PhysioNet-2012 dataset
physionet2012_dataset = gene_physionet2012(artificially_missing_rate=0.1)

# Take a look at the generated PhysioNet-2012 dataset, you'll find that everything has been prepared for you,
# data splitting, normalization, additional artificially-missing values for evaluation, etc.
print(physionet2012_dataset.keys())

2024-03-19 09:40:46 [INFO]: Have set the random seed as 2204 for numpy and pytorch.
2024-03-19 09:40:46 [INFO]: Loading the dataset physionet_2012 with TSDB (https://github.com/WenjieDu/Time_Series_Data_Beans)...
2024-03-19 09:40:46 [INFO]: Starting preprocessing physionet_2012...
2024-03-19 09:40:46 [INFO]: You're using dataset physionet_2012, please cite it properly in your work. You can find its reference information at the below link: 
https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/physionet_2012
2024-03-19 09:40:46 [INFO]: Dataset physionet_2012 has already been downloaded. Processing directly...
2024-03-19 09:40:46 [INFO]: Dataset physionet_2012 has already been cached. Loading from cache directly...
2024-03-19 09:40:46 [INFO]: Loaded successfully!


dict_keys(['n_classes', 'n_steps', 'n_features', 'train_X', 'train_y', 'train_ICUType', 'val_X', 'val_y', 'val_ICUType', 'test_X', 'test_y', 'test_ICUType', 'scaler', 'val_X_ori', 'test_X_ori', 'test_X_indicating_mask'])


## 🌟 Imputation Models

In [None]:
# Assemble the datasets for training, validating, and testing.

dataset_for_training = {
    "X": physionet2012_dataset['train_X'],
}

dataset_for_validating = {
    "X": physionet2012_dataset['val_X'],
    "X_ori": physionet2012_dataset['val_X_ori'],
}

dataset_for_testing = {
    "X": physionet2012_dataset['test_X'],
}


### 🚀 An example of **SAITS** for imputation

In [None]:
from pypots.utils.metrics import calc_mae
from pypots.optim import Adam
from pypots.imputation import SAITS

a
# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
saits.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
saits_results = saits.predict(dataset_for_testing)
saits_imputation = saits_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    saits_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 06:21:05 [INFO]: No given device, using default device: cuda
2024-03-18 06:21:05 [INFO]: Model files will be saved to tutorial_results/imputation/saits/20240318_T062105
2024-03-18 06:21:05 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/saits/20240318_T062105/tensorboard
2024-03-18 06:21:05 [INFO]: SAITS initialized with the given hyperparameters, the number of trainable parameters: 1,378,358
2024-03-18 06:21:20 [INFO]: Epoch 001 - training loss: 0.7327, validating loss: 0.4593
2024-03-18 06:21:26 [INFO]: Epoch 002 - training loss: 0.5173, validating loss: 0.4261
2024-03-18 06:21:32 [INFO]: Epoch 003 - training loss: 0.4625, validating loss: 0.4070
2024-03-18 06:21:39 [INFO]: Epoch 004 - training loss: 0.4211, validating loss: 0.3837
2024-03-18 06:21:47 [INFO]: Epoch 005 - training loss: 0.3924, validating loss: 0.3833
2024-03-18 06:21:53 [INFO]: Epoch 006 - training loss: 0.3744, validating loss: 0.3663
2024-03-18 06:21:59 [INFO]: Epoch 007 - training 

Testing mean absolute error: 0.2289


### 🚀 An example of **Transformer** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import Transformer
from pypots.utils.metrics import calc_mae

# initialize the model
transformer = Transformer(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    n_layers=6,
    d_model=512,
    d_ffn=256,
    n_heads=4,
    d_k=128,
    d_v=128,
    dropout=0.1,
    attn_dropout=0,
    ORT_weight=1,  # you can adjust the weight values of arguments ORT_weight
    # and MIT_weight to make the SAITS model focus more on one task. Usually you can just leave them to the default values, i.e. 1.
    MIT_weight=1,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/transformer",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)


# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
transformer.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
transformer_results = transformer.predict(dataset_for_testing)
transformer_imputation = transformer_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    transformer_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 06:22:17 [INFO]: No given device, using default device: cuda
2024-03-18 06:22:17 [INFO]: Model files will be saved to tutorial_results/imputation/transformer/20240318_T062217
2024-03-18 06:22:17 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/transformer/20240318_T062217/tensorboard
2024-03-18 06:22:17 [INFO]: Transformer initialized with the given hyperparameters, the number of trainable parameters: 7,938,597
2024-03-18 06:22:29 [INFO]: Epoch 001 - training loss: 1.4288, validating loss: 1.1047
2024-03-18 06:22:39 [INFO]: Epoch 002 - training loss: 1.3915, validating loss: 1.0857
2024-03-18 06:22:48 [INFO]: Epoch 003 - training loss: 1.3786, validating loss: 1.0966
2024-03-18 06:22:57 [INFO]: Epoch 004 - training loss: 1.3699, validating loss: 1.0834
2024-03-18 06:23:08 [INFO]: Epoch 005 - training loss: 1.3677, validating loss: 1.0941
2024-03-18 06:23:18 [INFO]: Epoch 006 - training loss: 1.3653, validating loss: 1.0760
2024-03-18 06:23:28 [INFO]: Epo

Testing mean absolute error: 0.6741


### 🚀 An example of **TimesNet** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import TimesNet
from pypots.utils.metrics import calc_mae

# initialize the model
timesnet = TimesNet(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    n_layers=1,
    top_k=1,
    d_model=128,
    d_ffn=512,
    n_kernels=5,
    dropout=0.5,
    apply_nonstationary_norm=False,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/timesnet",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)


# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
timesnet.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
timesnet_results = timesnet.predict(dataset_for_testing)
timesnet_imputation = timesnet_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    timesnet_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 06:24:01 [INFO]: No given device, using default device: cuda
2024-03-18 06:24:01 [INFO]: Model files will be saved to tutorial_results/imputation/timesnet/20240318_T062401
2024-03-18 06:24:01 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/timesnet/20240318_T062401/tensorboard
2024-03-18 06:24:02 [INFO]: TimesNet initialized with the given hyperparameters, the number of trainable parameters: 21,649,317
2024-03-18 06:24:34 [INFO]: Epoch 001 - training loss: 0.4904, validating loss: 0.4605
2024-03-18 06:25:06 [INFO]: Epoch 002 - training loss: 0.4393, validating loss: 0.4285
2024-03-18 06:25:38 [INFO]: Epoch 003 - training loss: 0.3935, validating loss: 0.4192
2024-03-18 06:26:09 [INFO]: Epoch 004 - training loss: 0.3980, validating loss: 0.4135
2024-03-18 06:26:41 [INFO]: Epoch 005 - training loss: 0.3817, validating loss: 0.4081
2024-03-18 06:27:13 [INFO]: Epoch 006 - training loss: 0.3760, validating loss: 0.4034
2024-03-18 06:27:44 [INFO]: Epoch 007 -

Testing mean absolute error: 0.3276


### 🚀 An example of **CSDI** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import CSDI
from pypots.utils.metrics import calc_mae

# initialize the model
csdi = CSDI(
    n_features=physionet2012_dataset['n_features'],
    n_layers=6,
    n_heads=2,
    n_channels=128,
    d_time_embedding=64,
    d_feature_embedding=32,
    d_diffusion_embedding=128,
    target_strategy="random",
    n_diffusion_steps=50,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/csdi",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)


# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
csdi.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set

# CSDI has an argument to control the number of sampling times during inference
csdi_results = csdi.predict(dataset_for_testing, n_sampling_times=2)
csdi_imputation = csdi_results["imputation"]

print(f"The shape of csdi_imputation is {csdi_imputation.shape}")

# for error calculation, we need to take the mean value of the multiple samplings for each data sample
mean_csdi_imputation = csdi_imputation.mean(axis=1)

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    mean_csdi_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 06:30:49 [INFO]: No given device, using default device: cuda
2024-03-18 06:30:49 [INFO]: Model files will be saved to tutorial_results/imputation/csdi/20240318_T063049
2024-03-18 06:30:49 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/csdi/20240318_T063049/tensorboard
2024-03-18 06:30:49 [INFO]: CSDI initialized with the given hyperparameters, the number of trainable parameters: 1,694,753
2024-03-18 06:43:37 [INFO]: Epoch 001 - training loss: 0.3269, validating loss: 0.2420
2024-03-18 06:56:24 [INFO]: Epoch 002 - training loss: 0.2617, validating loss: 0.2194
2024-03-18 07:09:12 [INFO]: Epoch 003 - training loss: 0.2467, validating loss: 0.2036
2024-03-18 07:21:59 [INFO]: Epoch 004 - training loss: 0.2390, validating loss: 0.1972
2024-03-18 07:34:47 [INFO]: Epoch 005 - training loss: 0.2342, validating loss: 0.1972
2024-03-18 07:47:34 [INFO]: Epoch 006 - training loss: 0.2308, validating loss: 0.1955
2024-03-18 08:00:22 [INFO]: Epoch 007 - training los

The shape of csdi_imputation is (2398, 2, 48, 37)
Testing mean absolute error: 0.2788


### 🚀 An example of **US-GAN** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import USGAN
from pypots.utils.metrics import calc_mae

# initialize the model
us_gan = USGAN(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    rnn_hidden_size=256,
    lambda_mse=1,
    dropout=0.1,
    G_steps=1,
    D_steps=1,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    G_optimizer=Adam(lr=1e-3),
    D_optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/us_gan",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)


# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
us_gan.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
us_gan_results = us_gan.predict(dataset_for_testing)
us_gan_imputation = us_gan_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    us_gan_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 09:04:45 [INFO]: No given device, using default device: cuda
2024-03-18 09:04:45 [INFO]: Model files will be saved to tutorial_results/imputation/us_gan/20240318_T090445
2024-03-18 09:04:45 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/us_gan/20240318_T090445/tensorboard
2024-03-18 09:04:45 [INFO]: USGAN initialized with the given hyperparameters, the number of trainable parameters: 1,258,517
2024-03-18 09:07:27 [INFO]: Epoch 001 - generator training loss: 4.0775, discriminator training loss: 0.1833, validating loss: 0.4617
2024-03-18 09:09:46 [INFO]: Epoch 002 - generator training loss: 4.8206, discriminator training loss: 0.1190, validating loss: 0.4199
2024-03-18 09:12:04 [INFO]: Epoch 003 - generator training loss: 5.2968, discriminator training loss: 0.0918, validating loss: 0.4084
2024-03-18 09:14:21 [INFO]: Epoch 004 - generator training loss: 5.6666, discriminator training loss: 0.0766, validating loss: 0.4003
2024-03-18 09:16:40 [INFO]: Epoch

### 🚀 An example of **GP-VAE** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import GPVAE
from pypots.utils.metrics import calc_mae


# initialize the model
gp_vae = GPVAE(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    latent_size=37,
    encoder_sizes=(128,128),
    decoder_sizes=(256,256),
    kernel="cauchy",
    beta=0.2,
    M=1,
    K=1,
    sigma=1.005,
    length_scale=7.0,
    kernel_scales=1,
    window_size=24,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/gp_vae",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
gp_vae.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set

# GP-VAE has an argument to control the number of sampling times during inference
gp_vae_results = gp_vae.predict(dataset_for_testing, n_sampling_times=2)
gp_vae_imputation = gp_vae_results["imputation"]

print(f"The shape of gp_vae_imputation is {gp_vae_imputation.shape}")

# for error calculation, we need to take the mean value of the multiple samplings for each data sample
mean_gp_vae_imputation = gp_vae_imputation.mean(axis=1)

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    mean_gp_vae_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 09:33:42 [INFO]: No given device, using default device: cpu
2024-03-18 09:33:42 [INFO]: Model files will be saved to tutorial_results/imputation/gp_vae/20240318_T093342
2024-03-18 09:33:42 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/gp_vae/20240318_T093342/tensorboard
2024-03-18 09:33:42 [INFO]: GPVAE initialized with the given hyperparameters, the number of trainable parameters: 229,652
2024-03-18 09:35:31 [INFO]: Epoch 001 - training loss: 26158.0052, validating loss: 0.7120
2024-03-18 09:37:05 [INFO]: Epoch 002 - training loss: 22874.6371, validating loss: 0.6946
2024-03-18 09:38:35 [INFO]: Epoch 003 - training loss: 22840.9400, validating loss: 0.6870
2024-03-18 09:40:06 [INFO]: Epoch 004 - training loss: 22831.4853, validating loss: 0.6934
2024-03-18 09:41:36 [INFO]: Epoch 005 - training loss: 22828.6990, validating loss: 0.6697
2024-03-18 09:43:06 [INFO]: Epoch 006 - training loss: 22824.9689, validating loss: 0.6487
2024-03-18 09:44:40 [INFO]

The shape of gp_vae_imputation is (2398, 2, 48, 37)
Testing mean absolute error: 0.4900


### 🚀 An example of **BRITS** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import BRITS
from pypots.utils.metrics import calc_mae

# initialize the model
brits = BRITS(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    rnn_hidden_size=128,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/brits",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
brits.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
brits_results = brits.predict(dataset_for_testing)
brits_imputation = brits_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    brits_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 09:49:33 [INFO]: No given device, using default device: cpu
2024-03-18 09:49:33 [INFO]: Model files will be saved to tutorial_results/imputation/brits/20240318_T094933
2024-03-18 09:49:33 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/brits/20240318_T094933/tensorboard
2024-03-18 09:49:33 [INFO]: BRITS initialized with the given hyperparameters, the number of trainable parameters: 239,344
2024-03-18 09:51:21 [INFO]: Epoch 001 - training loss: 0.9293, validating loss: 0.4721
2024-03-18 09:52:39 [INFO]: Epoch 002 - training loss: 0.7275, validating loss: 0.4292
2024-03-18 09:54:10 [INFO]: Epoch 003 - training loss: 0.6797, validating loss: 0.4139
2024-03-18 09:55:35 [INFO]: Epoch 004 - training loss: 0.6552, validating loss: 0.4082
2024-03-18 09:56:57 [INFO]: Epoch 005 - training loss: 0.6406, validating loss: 0.4050
2024-03-18 09:58:22 [INFO]: Epoch 006 - training loss: 0.6292, validating loss: 0.4024
2024-03-18 09:59:45 [INFO]: Epoch 007 - training los

Testing mean absolute error: 0.2546


### 🚀 An example of **M-RNN** for imputation

In [None]:
from pypots.optim import Adam
from pypots.imputation import MRNN
from pypots.utils.metrics import calc_mae

# initialize the model
# initialize the model
mrnn = MRNN(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    rnn_hidden_size=128,

    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/imputation/mrnn",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
mrnn.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
mrnn_results = mrnn.predict(dataset_for_testing)
mrnn_imputation = mrnn_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    mrnn_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 10:02:51 [INFO]: No given device, using default device: cpu
2024-03-18 10:02:51 [INFO]: Model files will be saved to tutorial_results/imputation/mrnn/20240318_T100251
2024-03-18 10:02:51 [INFO]: Tensorboard file will be saved to tutorial_results/imputation/mrnn/20240318_T100251/tensorboard
2024-03-18 10:02:51 [INFO]: MRNN initialized with the given hyperparameters, the number of trainable parameters: 107,951
2024-03-18 10:13:26 [INFO]: Epoch 001 - training loss: 0.7622, validating loss: 1.0173
2024-03-18 10:23:22 [INFO]: Epoch 002 - training loss: 0.5198, validating loss: 0.9760
2024-03-18 10:33:24 [INFO]: Epoch 003 - training loss: 0.4850, validating loss: 0.9627
2024-03-18 10:43:35 [INFO]: Epoch 004 - training loss: 0.4666, validating loss: 0.9571
2024-03-18 10:53:35 [INFO]: Epoch 005 - training loss: 0.4482, validating loss: 0.9585
2024-03-18 11:03:38 [INFO]: Epoch 006 - training loss: 0.4405, validating loss: 0.9522
2024-03-18 11:13:40 [INFO]: Epoch 007 - training loss: 

Testing mean absolute error: 0.6699


### 🚀 An example of **LOCF** for imputation

In [None]:
from pypots.imputation import LOCF
from pypots.utils.metrics import cal_mae

from pypots.imputation import LOCF

# initialize the model
locf = LOCF()

# LOCF doesn't need to be trained, just call the impute() function
locf.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage, impute the originally-missing values and artificially-missing values in the test set
locf_results = locf.predict(dataset_for_testing)
locf_imputation = locf_results["imputation"]

# calculate mean absolute error on the ground truth (artificially-missing values)
testing_mae = calc_mae(
    locf_imputation,
    physionet2012_dataset['test_X_ori'],
    physionet2012_dataset['test_X_indicating_mask'],
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 11:34:37 [INFO]: No given device, using default device: cpu


Testing mean absolute error: 0.4112


## 🌟 Clustering Models

In [None]:
# Assemble the datasets for training, validating, and testing.
import numpy as np

# don't need validation set
dataset_for_training = {
    "X": np.concatenate([physionet2012_dataset['train_X'], physionet2012_dataset['val_X']], axis=0),
    "y": np.concatenate([physionet2012_dataset['train_y'], physionet2012_dataset['val_y']], axis=0),
}

dataset_for_testing = {
    "X": physionet2012_dataset['test_X'],
    "y": physionet2012_dataset['test_y'],
}


### 🚀 An example of **CRLI** for clustering

In [None]:
from pypots.optim import Adam
from pypots.clustering import CRLI
from pypots.utils.metrics import calc_rand_index, calc_cluster_purity

# initialize the model
crli = CRLI(
    n_steps=physionet2012_dataset["n_steps"],
    n_features=physionet2012_dataset["n_features"],
    n_clusters=physionet2012_dataset["n_classes"],
    n_generator_layers=2,
    rnn_hidden_size=256,
    rnn_cell_type="GRU",
    decoder_fcn_output_dims=[256, 128],  # the output dimensions of layers in the decoder FCN.
    # Here means there are 3 layers. Leave it to default as None will results in
    # the FCN haveing only one layer.
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    G_optimizer=Adam(lr=1e-3),
    D_optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/clustering/crli",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
crli.fit(train_set=dataset_for_training)

# the testing stage
crli_results = crli.predict(dataset_for_testing)
crli_prediction = crli_results["clustering"]

# calculate the values of clustering metrics on the model's prediction
RI = calc_rand_index(crli_prediction, dataset_for_testing["y"])
CP = calc_cluster_purity(crli_prediction, dataset_for_testing["y"])

print("Testing clustering metrics: \n"
      f'RI: {RI}, \n'
      f'CP: {CP}\n'
)


2024-03-18 11:34:38 [INFO]: No given device, using default device: cpu
2024-03-18 11:34:38 [INFO]: Model files will be saved to tutorial_results/clustering/crli/20240318_T113438
2024-03-18 11:34:38 [INFO]: Tensorboard file will be saved to tutorial_results/clustering/crli/20240318_T113438/tensorboard
2024-03-18 11:34:38 [INFO]: CRLI initialized with the given hyperparameters, the number of trainable parameters: 1,546,820
2024-03-18 11:40:35 [INFO]: Epoch 001 - generator training loss: 3.1785, discriminator training loss: 0.3930
2024-03-18 11:46:28 [INFO]: Epoch 002 - generator training loss: 3.2566, discriminator training loss: 0.3703
2024-03-18 11:52:20 [INFO]: Epoch 003 - generator training loss: 3.2419, discriminator training loss: 0.3625
2024-03-18 11:58:09 [INFO]: Epoch 004 - generator training loss: 3.3570, discriminator training loss: 0.3584
2024-03-18 11:58:09 [INFO]: Exceeded the training patience. Terminating the training procedure...
2024-03-18 11:58:09 [INFO]: Finished trai

Testing clustering metrics: 
RI: 0.6756857943432906, 
CP: 0.8486238532110092



### 🚀 An example of **VaDER** for clustering

In [None]:
from pypots.optim import Adam
from pypots.clustering import VaDER
from pypots.utils.metrics import calc_rand_index, calc_cluster_purity

# initialize the model
vader = VaDER(
    n_steps=physionet2012_dataset["n_steps"],
    n_features=physionet2012_dataset["n_features"],
    n_clusters=physionet2012_dataset["n_classes"],
    rnn_hidden_size=128,
    d_mu_stddev=2,
    pretrain_epochs=20,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/clustering/vader",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
vader.fit(train_set=dataset_for_training)

# the testing stage
vader_results = vader.predict(dataset_for_testing)
vader_prediction = vader_results["clustering"]

# calculate the values of clustering metrics on the model's prediction
RI = calc_rand_index(vader_prediction, dataset_for_testing["y"])
CP = calc_cluster_purity(vader_prediction, dataset_for_testing["y"])

print("Testing clustering metrics: \n"
      f'RI: {RI}, \n'
      f'CP: {CP},\n'
)

2024-03-18 11:58:23 [INFO]: No given device, using default device: cpu
2024-03-18 11:58:23 [INFO]: Model files will be saved to tutorial_results/clustering/vader/20240318_T115823
2024-03-18 11:58:23 [INFO]: Tensorboard file will be saved to tutorial_results/clustering/vader/20240318_T115823/tensorboard
2024-03-18 11:58:23 [INFO]: VaDER initialized with the given hyperparameters, the number of trainable parameters: 293,644
2024-03-18 12:25:14 [INFO]: Epoch 001 - training loss: 0.5046
2024-03-18 12:27:21 [INFO]: Epoch 002 - training loss: 0.2418
2024-03-18 12:29:42 [INFO]: Epoch 003 - training loss: 0.2420
2024-03-18 12:32:06 [INFO]: Epoch 004 - training loss: 0.2274
2024-03-18 12:34:31 [INFO]: Epoch 005 - training loss: 0.2373
2024-03-18 12:37:11 [INFO]: Epoch 006 - training loss: 0.2363
2024-03-18 12:39:54 [INFO]: Epoch 007 - training loss: 0.2302
2024-03-18 12:39:54 [INFO]: Exceeded the training patience. Terminating the training procedure...
2024-03-18 12:39:54 [INFO]: Finished train

Testing clustering metrics: 
RI: 0.7429699968997945, 
CP: 0.8486238532110092,



## 🌟 Forecasting Models

In [None]:
# Assemble the datasets for training, validating, and testing.

dataset_for_training = {
    "X": physionet2012_dataset['train_X'],
}

dataset_for_validating = {
    "X": physionet2012_dataset['val_X'],
    "X_intact": physionet2012_dataset['val_X_ori'],
}

dataset_for_testing = {
    "X": physionet2012_dataset['test_X'][:, :36],  # we only take the first 36 steps for model input,
    # and let the model forecast the left 12 steps
}


### 🚀 An example of **BTTF** for forecasting

In [None]:
from pypots.forecasting import BTTF
from pypots.utils.metrics import calc_mae

# initialize the model
bttf = BTTF(
    36,
    physionet2012_dataset["n_features"],
    pred_step=12,
    rank=10,
    time_lags=[1, 2, 3, 10, 10 + 1, 10 + 2, 20, 20 + 1, 20 + 2],
    burn_iter=5,
    gibbs_iter=5,
    multi_step=1,
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
bttf.fit(train_set=dataset_for_training, val_set=dataset_for_validating)
# BTTF does not need to run func fits().

# the testing stage
bttf_results = bttf.predict(dataset_for_testing)
bttf_prediction = bttf_results["forecasting"]

# calculate the mean absolute error on the ground truth in the forecasting task
testing_mae = calc_mae(
    bttf_prediction,
    np.nan_to_num(physionet2012_dataset['test_X'][:, 36:]),
    (~np.isnan(physionet2012_dataset['test_X'][:, 36:])).astype(int),
)
print(f"Testing mean absolute error: {testing_mae:.4f}")


2024-03-18 12:40:01 [INFO]: No given device, using default device: cpu


Testing mean absolute error: 0.8055


## 🌟 Classification Models

In [None]:
# Assemble the datasets for training, validating, and testing.

dataset_for_training = {
    "X": physionet2012_dataset['train_X'],
    "y": physionet2012_dataset['train_y'],
}

dataset_for_validating = {
    "X": physionet2012_dataset['val_X'],
    "y": physionet2012_dataset['val_y'],
}

dataset_for_testing = {
    "X": physionet2012_dataset['test_X'],
    "y": physionet2012_dataset['test_y'],
}

### 🚀 An example of **BRITS** for classification

In [None]:
from pypots.optim import Adam
from pypots.classification import BRITS
from pypots.utils.metrics import calc_binary_classification_metrics

# initialize the model
brits = BRITS(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    n_classes=physionet2012_dataset["n_classes"],
    rnn_hidden_size=256,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/classification/brits",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
brits.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage
brits_results = brits.predict(dataset_for_testing)
brits_prediction = brits_results["classification"]

# calculate the values of binary classification metrics on the model's prediction
metrics = calc_binary_classification_metrics(brits_prediction, dataset_for_testing["y"])
print("Testing classification metrics: \n"
    f'ROC_AUC: {metrics["roc_auc"]}, \n'
    f'PR_AUC: {metrics["pr_auc"]},\n'
    f'F1: {metrics["f1"]},\n'
    f'Precision: {metrics["precision"]},\n'
    f'Recall: {metrics["recall"]},\n'
)

2024-03-18 12:40:15 [INFO]: No given device, using default device: cpu
2024-03-18 12:40:15 [INFO]: Model files will be saved to tutorial_results/classification/brits/20240318_T124015
2024-03-18 12:40:15 [INFO]: Tensorboard file will be saved to tutorial_results/classification/brits/20240318_T124015/tensorboard
2024-03-18 12:40:16 [INFO]: BRITS initialized with the given hyperparameters, the number of trainable parameters: 730,612
2024-03-18 12:42:38 [INFO]: Epoch 001 - training loss: 0.9063, validating loss: 0.7915
2024-03-18 12:44:33 [INFO]: Epoch 002 - training loss: 0.7632, validating loss: 0.7572
2024-03-18 12:46:28 [INFO]: Epoch 003 - training loss: 0.7197, validating loss: 0.7349
2024-03-18 12:48:23 [INFO]: Epoch 004 - training loss: 0.7089, validating loss: 0.7252
2024-03-18 12:50:18 [INFO]: Epoch 005 - training loss: 0.6794, validating loss: 0.7184
2024-03-18 12:52:14 [INFO]: Epoch 006 - training loss: 0.6765, validating loss: 0.7295
2024-03-18 12:54:09 [INFO]: Epoch 007 - trai

Testing classification metrics: 
ROC_AUC: 0.8413832314658761, 
PR_AUC: 0.5156639586718953,
F1: 0.4335664335664336,
Precision: 0.5933014354066986,
Recall: 0.3415977961432507,



### 🚀 An example of **GRUD** for classification

In [None]:
from pypots.optim import Adam
from pypots.classification import GRUD
from pypots.utils.metrics import calc_binary_classification_metrics

# initialize the model
grud = GRUD(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    n_classes=physionet2012_dataset["n_classes"],
    rnn_hidden_size=32,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/classification/grud",
    # only save the best model after training finished.
    # You can also set it as "better" to save models performing better ever during training.
    model_saving_strategy="best",
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
grud.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage
grud_results = grud.predict(dataset_for_testing)
grud_prediction = grud_results["classification"]

# calculate the values of binary classification metrics on the model's prediction
metrics = calc_binary_classification_metrics(grud_prediction, dataset_for_testing["y"])
print("Testing classification metrics: \n"
    f'ROC_AUC: {metrics["roc_auc"]}, \n'
    f'PR_AUC: {metrics["pr_auc"]},\n'
    f'F1: {metrics["f1"]},\n'
    f'Precision: {metrics["precision"]},\n'
    f'Recall: {metrics["recall"]},\n'
)

2024-03-18 13:00:14 [INFO]: No given device, using default device: cpu
2024-03-18 13:00:14 [INFO]: Model files will be saved to tutorial_results/classification/grud/20240318_T130014
2024-03-18 13:00:14 [INFO]: Tensorboard file will be saved to tutorial_results/classification/grud/20240318_T130014/tensorboard
2024-03-18 13:00:14 [INFO]: GRUD initialized with the given hyperparameters, the number of trainable parameters: 16,128
2024-03-18 13:00:14 [INFO]: No given device, using default device: cpu
2024-03-18 13:00:27 [INFO]: No given device, using default device: cpu
2024-03-18 13:00:46 [INFO]: Epoch 001 - training loss: 0.3497, validating loss: 0.3167
2024-03-18 13:01:01 [INFO]: Epoch 002 - training loss: 0.3012, validating loss: 0.3019
2024-03-18 13:01:18 [INFO]: Epoch 003 - training loss: 0.2901, validating loss: 0.3011
2024-03-18 13:01:34 [INFO]: Epoch 004 - training loss: 0.2799, validating loss: 0.3026
2024-03-18 13:01:49 [INFO]: Epoch 005 - training loss: 0.2694, validating loss: 

Testing classification metrics: 
ROC_AUC: 0.8558301351689782, 
PR_AUC: 0.5432786552563934,
F1: 0.5080906148867314,
Precision: 0.615686274509804,
Recall: 0.4325068870523416,



### 🚀 An example of **Raindrop** for classification

In [None]:
import torch

print(f"Installed torch version: {torch.__version__}")
print("Now install necessary dependencies (pyg etc.) for the Raindrop model...\n")

pyg_whl_link = f"https://data.pyg.org/whl/torch-{torch.__version__}.html"

! pip install torch-geometric torch-scatter torch-sparse -f $pyg_whl_link

Installed torch version: 2.2.1+cu121
Now install necessary dependencies (pyg etc.) for the Raindrop model...

Looking in links: https://data.pyg.org/whl/torch-2.2.1+cpu.html


In [None]:
from pypots.optim import Adam
from pypots.classification import Raindrop
from pypots.utils.metrics import calc_binary_classification_metrics

# initialize the model
raindrop = Raindrop(
    n_steps=physionet2012_dataset['n_steps'],
    n_features=physionet2012_dataset['n_features'],
    n_classes=physionet2012_dataset["n_classes"],
    n_layers=2,
    d_model=physionet2012_dataset["n_features"] * 4,
    d_ffn=256,
    n_heads=2,
    dropout=0.3,
    batch_size=32,
    # here we set epochs=10 for a quick demo, you can set it to 100 or more for better performance
    epochs=10,
    # here we set patience=3 to early stop the training if the evaluting loss doesn't decrease for 3 epoches.
    # You can leave it to defualt as None to disable early stopping.
    patience=3,
    # give the optimizer. Different from torch.optim.Optimizer, you don't have to specify model's parameters when
    # initializing pypots.optim.Optimizer. You can also leave it to default. It will initilize an Adam optimizer with lr=0.001.
    optimizer=Adam(lr=1e-3),
    # this num_workers argument is for torch.utils.data.Dataloader. It's the number of subprocesses to use for data loading.
    # Leaving it to default as 0 means data loading will be in the main process, i.e. there won't be subprocesses.
    # You can increase it to >1 if you think your dataloading is a bottleneck to your model training speed
    num_workers=0,
    # just leave it to default as None, PyPOTS will automatically assign the best device for you.
    # Set it as 'cpu' if you don't have CUDA devices. You can also set it to 'cuda:0' or 'cuda:1' if you have multiple CUDA devices, even parallelly on ['cuda:0', 'cuda:1']
    device=None,
    # set the path for saving tensorboard and trained model files
    saving_path="tutorial_results/classification/raindrop",
    model_saving_strategy="best", # only save the best model after training finished.
                                  # You can also set it as "better" to save models performing better ever during training.
)

# train the model on the training set, and validate it on the validating set to select the best model for testing in the next step
raindrop.fit(train_set=dataset_for_training, val_set=dataset_for_validating)

# the testing stage
raindrop_results = raindrop.predict(dataset_for_testing)
raindrop_prediction = raindrop_results["classification"]

# calculate the values of binary classification metrics on the model's prediction
metrics = calc_binary_classification_metrics(raindrop_prediction, dataset_for_testing["y"])
print("Testing classification metrics: \n"
    f'ROC_AUC: {metrics["roc_auc"]}, \n'
    f'PR_AUC: {metrics["pr_auc"]},\n'
    f'F1: {metrics["f1"]},\n'
    f'Precision: {metrics["precision"]},\n'
    f'Recall: {metrics["recall"]},\n'
)

2024-03-19 09:41:48 [INFO]: No given device, using default device: cuda
2024-03-19 09:41:48 [INFO]: Model files will be saved to tutorial_results/classification/raindrop/20240319_T094148
2024-03-19 09:41:48 [INFO]: Tensorboard file will be saved to tutorial_results/classification/raindrop/20240319_T094148/tensorboard
2024-03-19 09:41:48 [INFO]: Raindrop initialized with the given hyperparameters, the number of trainable parameters: 1,415,006
2024-03-19 09:42:27 [INFO]: Epoch 001 - training loss: 0.3972, validating loss: 0.3448
2024-03-19 09:42:49 [INFO]: Epoch 002 - training loss: 0.3358, validating loss: 0.3341
2024-03-19 09:43:11 [INFO]: Epoch 003 - training loss: 0.3195, validating loss: 0.3501
2024-03-19 09:43:32 [INFO]: Epoch 004 - training loss: 0.3095, validating loss: 0.3208
2024-03-19 09:43:53 [INFO]: Epoch 005 - training loss: 0.3022, validating loss: 0.3282
2024-03-19 09:44:14 [INFO]: Epoch 006 - training loss: 0.2933, validating loss: 0.3335
2024-03-19 09:44:35 [INFO]: Epoc

Testing classification metrics: 
ROC_AUC: 0.8470634421047645, 
PR_AUC: 0.5042968844353513,
F1: 0.3080168776371308,
Precision: 0.6576576576576577,
Recall: 0.20110192837465565,

