# U-Cast Tutorial
This notebook give a tutorial on the high-dimensional time series forecasting task supported by U-Cast.

1. Install Python 3.10. For convenience, execute the following command.

In [None]:
pip install -r requirements.txt

or

In [None]:
conda env create -f environment.yaml

2. Package Import

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

3. U-Cast Construction

U-Cast is an efficient model that captures channel correlations via learning latent hierarchical structures. U-Cast also introduces a full-rank regularization
term to encourage disentanglement and improve the learning of structured representations.

In the following section, we will have a detailed view on U-Cast. To make it clearer, please see the figures below.

![U-Cast](../pic/U-Cast.png)

U-Cast consist of three main component (1) downsampling (2) full-rank regularization (3) upsampling

Downsampling (HierarchicalLatentQueryNetwork)

In [None]:
class HierarchicalLatentQueryNetwork(nn.Module):
    def __init__(self, orig_channels, time_dim, num_layers, head_dim, reduction_ratio=16, num_heads=1, dropout=0.1):
        super(HierarchicalLatentQueryNetwork, self).__init__()
        self.num_layers = num_layers
        self.layers = nn.ModuleList()
        
        # Compute latent dimensions for each layer, progressively reducing channels
        latent_dims = []
        current_channels = orig_channels
        for _ in range(num_layers):
            new_channels = max(1, current_channels // reduction_ratio)
            latent_dims.append(new_channels)
            current_channels = new_channels

        self.latent_dims = latent_dims
        current_in_dim = time_dim
        
        # Build hierarchical attention layers
        for latent_dim in latent_dims:
            self.layers.append(LatentQueryAttention(current_in_dim, latent_dim, head_dim, num_heads, dropout))
            current_in_dim = head_dim * num_heads
        self.norm_layers = nn.ModuleList([nn.LayerNorm(head_dim * num_heads) for _ in latent_dims])

    def forward(self, x, return_attn=False):
        B, T, C = x.shape
        x_base = x.transpose(1, 2)  # [B, C, T] - convert to channel-first format
        skip_list = [x_base]  # Save skip connections for upsampling
        x_down = x_base
        attn_maps = [] if return_attn else None
        
        # Downsample layer by layer, learning hierarchical representations
        for layer, norm in zip(self.layers, self.norm_layers):
            if return_attn:
                x_down, attn = layer(x_down, return_attn=True)
                attn_maps.append(attn.detach().cpu())
            else:
                x_down = layer(x_down)
            x_down = norm(x_down)
            skip_list.append(x_down)  # Save output of each layer for upsampling
            
        if return_attn:
            return skip_list[-1], skip_list, attn_maps
        else:
            return skip_list[-1], skip_list


In [None]:
# Latent Query Attention Mechanism - Channel reduction via learnable query vectors
class LatentQueryAttention(nn.Module):
    def __init__(self, in_dim, latent_dim, head_dim, num_heads=1, dropout=0.1):
        super(LatentQueryAttention, self).__init__()
        self.num_heads = num_heads
        self.head_dim = head_dim
        self.latent_dim = latent_dim

        # Learnable latent query vectors, used to extract important information from high-dimensional input
        self.latent_queries = nn.Parameter(torch.randn(latent_dim, head_dim * num_heads))
        self.q_proj = nn.Linear(head_dim * num_heads, head_dim * num_heads)
        self.k_proj = nn.Linear(in_dim, head_dim * num_heads)
        self.v_proj = nn.Linear(in_dim, head_dim * num_heads)
        self.out_proj = nn.Linear(head_dim * num_heads, head_dim * num_heads)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, return_attn=False):
        B, L, _ = x.shape
        # Expand latent queries to batch dimension
        queries = self.latent_queries.unsqueeze(0).expand(B, -1, -1)
        queries = self.q_proj(queries)
        keys = self.k_proj(x)
        values = self.v_proj(x)
        
        # Reshape into multi-head attention format
        queries = queries.view(B, self.latent_dim, self.num_heads, self.head_dim).transpose(1, 2)
        keys = keys.view(B, L, self.num_heads, self.head_dim).transpose(1, 2)
        values = values.view(B, L, self.num_heads, self.head_dim).transpose(1, 2)
        
        # Compute attention scores and weights
        scores = torch.matmul(queries, keys.transpose(-2, -1)) / (self.head_dim ** 0.5)
        attn = F.softmax(scores, dim=-1)
        attn = self.dropout(attn)
        
        # Apply attention weights and generate output
        out = torch.matmul(attn, values)
        out = out.transpose(1, 2).contiguous().view(B, self.latent_dim, self.num_heads * self.head_dim)
        out = self.out_proj(out)
        
        if return_attn:
            return out, attn
        else:
            return out


Full-rank regularization

In [None]:
# Full-rank regularization loss - Encourages learning disentangled representations
def covariance_loss(skip_list, lambda_cov=0.1, eps=1e-5):
    """
    Compute the normalized sum of negative log-determinant of covariance matrices 
    for full-rank regularization.
    
    Args:
        skip_list: list of tensors, each tensor has shape [B, C, D]
        lambda_cov: scaling factor for covariance loss
        eps: small constant for numerical stability
    """
    total_loss = 0.0
    num_layers = len(skip_list) - 1  # Exclude input layer
    
    # Compute covariance loss for each skip connection layer
    for x in skip_list[1:]:
        B, C, D = x.shape
        x_reshaped = x.reshape(B * C, D)
        
        # Normalize features
        x_centered = (x_reshaped - x_reshaped.mean(dim=0, keepdim=True)) / (
            x_reshaped.std(dim=0, keepdim=True) + eps
        )
        
        # Compute covariance matrix
        cov = (x_centered.T @ x_centered) / (B * C - 1)
        cov = cov + eps * torch.eye(D, device=x.device, dtype=x.dtype)

        # Dimension normalization to reduce scale variance
        loss = -torch.logdet(cov) / D  # Negative log-determinant encourages full rank
        total_loss += loss

    return lambda_cov * (total_loss / num_layers if num_layers > 0 else 0.0)


Upsampling

In [None]:
# Hierarchical Upsampling Network - U-Cast upsampling component, restores original dimensions using skip connections
class HierarchicalUpsamplingNetwork(nn.Module):
    def __init__(self, num_layers, q_in_dim, head_dim, num_heads=1, dropout=0.1):
        super(HierarchicalUpsamplingNetwork, self).__init__()
        # Build upsampling attention layers
        self.layers = nn.ModuleList([
            UpLatentQueryAttention(q_in_dim, head_dim, num_heads, dropout) for _ in range(num_layers)
        ])
        # Layer normalization
        self.norms = nn.ModuleList([
            nn.LayerNorm(q_in_dim) for _ in range(num_layers)
        ])

    def forward(self, x_bottom, skip_list):
        # Reverse skip connection list to upsample from bottom to top
        rev = list(reversed(skip_list))
        queries = rev[1:]  # Exclude the bottom-most layer
        x = x_bottom
        
        # Upsample layer by layer, using skip connections as queries
        for layer, norm, query in zip(self.layers, self.norms, queries):
            x = norm(layer(query, x) + query)  # Residual connection
            # x = layer(query, x)
        return x

For more details, please read the our paper (link: https://arxiv.org/pdf/2507.15119)

4. U-Cast

In [None]:
@register_model("UCast", paper="U-Cast: Learning Latent Hierarchical Channel Structure for High-Dimensional Time Series Forecasting", year=2024)
class Model(nn.Module):
    def __init__(self, configs):

    def forecast(self, x_enc):

    def forward(self, x_enc):

First of all, let us focus on __init__(self, configs)

In [None]:
def __init__(self, configs):
    super(Model, self).__init__()
    self.task_name = configs.task_name
    self.seq_len = configs.seq_len
    self.pred_len = configs.pred_len
    self.enc_in = configs.enc_in
    self.d_model = configs.d_model
    self.alpha = configs.alpha

    # Project input sequence length into model dimension
    self.input_proj = nn.Linear(self.seq_len, self.d_model)
    # Project model outputs back to prediction horizon
    self.output_proj = nn.Linear(self.d_model, self.pred_len)

    # Encoder: hierarchical latent query network for channel reduction
    self.channel_reduction_net = HierarchicalLatentQueryNetwork(
        orig_channels=self.enc_in,
        time_dim=self.d_model,
        num_layers=configs.e_layers,
        head_dim=self.d_model,
        reduction_ratio=configs.channel_reduction_ratio,
        num_heads=1,
        dropout=configs.dropout
    )

    # Decoder: hierarchical upsampling network to reconstruct channels
    self.upsample_net = HierarchicalUpsamplingNetwork(
        num_layers=configs.e_layers,
        q_in_dim=self.d_model,
        head_dim=self.d_model,
        num_heads=1,
        dropout=configs.dropout
    )

    # Final prediction head
    self.predict_layer = nn.Linear(self.d_model, self.d_model)


Then, let's focus on forecast(self, x_enc)

In [None]:
def forecast(self, x_enc):
    # Normalize input: zero-mean and unit-variance per channel
    means = x_enc.mean(1, keepdim=True).detach()
    x_enc = x_enc - means
    stdev = torch.sqrt(torch.var(x_enc, dim=1, keepdim=True, unbiased=False) + 1e-5)
    x_enc = x_enc / stdev

    # Project input into model space
    x_enc = x_enc.transpose(1, 2)        # [B, C, T]
    x_enc = self.input_proj(x_enc)       # [B, C, d_model]
    x_enc = x_enc.transpose(1, 2)        # [B, d_model, C]

    # Encoder: channel reduction with hierarchical latent queries
    x_bottom, skip_list = self.channel_reduction_net(x_enc)
    cov_loss = covariance_loss(skip_list, self.alpha)

    # Bottleneck prediction transformation
    x_bottom = self.predict_layer(x_bottom)

    # Decoder: upsample using skip connections
    x_up = self.upsample_net(x_bottom, skip_list)

    # Final projection to prediction horizon
    dec_out = self.output_proj(x_up + x_enc.transpose(1, 2))  # [B, enc_in, pred_len]
    dec_out = dec_out.transpose(1, 2)  # [B, pred_len, enc_in]

    # Denormalize output
    dec_out = dec_out * stdev[:, 0, :].unsqueeze(1)
    dec_out = dec_out + means[:, 0, :].unsqueeze(1)

    return dec_out, cov_loss


In [None]:
def forward(self, x_enc):
    return self.forecast(x_enc)

5. Training and Settings

5.1 Training for Hign-dimensional Forecasting

In [None]:
# class LongTermForecastingExperiment(BaseExperiment)
def train(self, setting: str) -> Tuple[nn.Module, list, dict, str]:
    """
    Execute the complete training procedure with validation monitoring and early stopping.

    Args:
        setting: Unique experiment setting string for checkpoint management

    Returns:
        Tuple containing:
            - all_epoch_metrics: List of metrics for each training epoch
            - best_metrics: Dictionary of best validation metrics achieved
            - best_model_path: Path to the saved best model checkpoint
    """
    # Load data splits (train/val/test loaders may include padding and masking)
    train_data, train_loader = self._get_data(flag='train')
    vali_data, vali_loader = self._get_data(flag='val')
    test_data, test_loader = self._get_data(flag='test')
    
    # Checkpoint directory for early-stopped best model
    path = os.path.join(self.config.checkpoints, setting)
    
    # Metric trackers (updated per epoch; best tracked by validation loss)
    all_epoch_metrics = []
    best_metrics = {
        "epoch": 0,
        "train_loss": float('inf'),
        "vali_loss": float('inf'),
        "vali_mae_loss": float('inf'),
        "test_loss": float('inf'),
        "test_mae_loss": float('inf')
    }
    best_model_path = ""
    
    time_now = time.time()
    train_steps = len(train_loader)

    # Early stopping with accelerator-aware checkpointing
    early_stopping = EarlyStopping(patience=self.config.patience, verbose=True, accelerator=self.accelerator)
    
    # Prepare model, optimizer, and loaders for device/distributed execution
    self.model, self.optimizer, train_loader, vali_loader, test_loader = self.accelerator.prepare(
        self.model, self.optimizer, train_loader, vali_loader, test_loader
    )
    
    # ===== Main training loop =====
    for epoch in range(self.config.train_epochs):
        iter_count = 0
        train_loss = []
        
        self.model.train()
        epoch_time = time.time()
        batch_times = []  # Per-batch wall-clock timing
        
        # ----- Batch loop -----
        for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(train_loader):
            batch_start_time = time.time()
            iter_count += 1
            self.optimizer.zero_grad()
            
            # Move inputs to device (keep dtypes explicit)
            batch_x = batch_x.float().to(self.device)
            batch_y = batch_y.float().to(self.device)
            batch_x_mark = batch_x_mark.float().to(self.device)
            batch_y_mark = batch_y_mark.float().to(self.device)
            
            # Decoder input: teacher-forcing prefix + zeros for prediction horizon
            dec_inp = torch.zeros_like(batch_y[:, -self.config.pred_len:, :]).float()
            dec_inp = torch.cat([batch_y[:, :self.config.label_len, :], dec_inp], dim=1).float().to(self.device)
            
            # Forward with mixed precision (managed by Accelerator)
            with self.accelerator.autocast():
                outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
            
            # Some models return (pred, aux_loss); handle both cases
            if isinstance(outputs, tuple):
                outputs, additional_loss = outputs
            else:
                additional_loss = 0
            
            # Compute loss on prediction horizon only
            batch_y = batch_y[:, -self.config.pred_len:, :].to(self.device)
            loss = self.criterion(outputs, batch_y) + additional_loss
            train_loss.append(loss.item())
            
            # Periodic logging with speed estimate
            if (i + 1) % 100 == 0:
                self.accelerator.print(f"\titers: {i+1}, epoch: {epoch+1} | loss: {loss.item():.7f}")
                speed = (time.time() - time_now) / iter_count
                left_time = speed * ((self.config.train_epochs - epoch) * train_steps - i)
                self.accelerator.print(f'\tspeed: {speed:.4f}s/iter; left time: {left_time:.4f}s')
                iter_count = 0
                time_now = time.time()
            
            # Backward + optimizer step (scaled if AMP is on)
            self.accelerator.backward(loss)
            self.optimizer.step()
            
            batch_times.append(time.time() - batch_start_time)
        # ----- End batch loop -----
        
        # Timing diagnostics
        epoch_cost_time = time.time() - epoch_time
        avg_batch_time = np.mean(batch_times)
        self.accelerator.print(f"Epoch: {epoch+1} cost time: {epoch_cost_time:.2f}s")
        self.accelerator.print(f"Average batch training time: {avg_batch_time:.4f}s")
        
        # Evaluation on validation and test (no gradient)
        train_loss = np.average(train_loss)
        val_time = time.time()
        vali_loss, vali_mae_loss = self.validate(vali_loader)
        self.accelerator.print(f"Val cost time: {time.time() - val_time:.2f}s")
        test_time = time.time()
        test_loss, test_mae_loss = self.validate(test_loader)
        self.accelerator.print(f"Test cost time: {time.time() - test_time:.2f}s")
        
        # Log epoch metrics and progress
        epoch_metrics = {
            "epoch": epoch + 1,
            "train_loss": float(train_loss),
            "vali_loss": float(vali_loss),
            "vali_mae_loss": float(vali_mae_loss),
            "test_loss": float(test_loss),
            "test_mae_loss": float(test_mae_loss)
        }
        all_epoch_metrics.append(epoch_metrics)
        self.accelerator.print(f'Epoch: {epoch+1}, Steps: {train_steps} | Train Loss: {train_loss:.7f} Vali Loss: {vali_loss:.7f} Test Loss: {test_loss:.7f}')
        
        # Early stopping (saves checkpoint on improvement)
        early_stopping(vali_loss, self.model, path, metrics=epoch_metrics)

        # Track best-by-validation-loss and checkpoint path
        if vali_loss < best_metrics["vali_loss"]:
            best_metrics.update(epoch_metrics)
            best_model_path = early_stopping.get_checkpoint_path()

        # Stop training if patience is exceeded
        if early_stopping.early_stop:
            self.accelerator.print("Early stopping")
            break

        # Scheduler step (epoch-based policy)
        adjust_learning_rate(self.optimizer, epoch + 1, self.config, self.accelerator)
    
    return all_epoch_metrics, best_metrics, best_model_path

If you want to learn more, please see it at core/experiments/long_term_forecasting.py

5.2 Distributed Training

In [None]:
# Wrap model, optimizer, and dataloaders with Accelerator for device placement and distributed training
self.model, self.optimizer, train_loader, vali_loader, test_loader = self.accelerator.prepare(
    self.model, self.optimizer, train_loader, vali_loader, test_loader
)

5.3 Early Stop

In [None]:
class EarlyStopping:
    def __init__(self, patience=7, verbose=False, delta=0, accelerator=None):
        # Number of epochs to wait before stopping if no improvement
        self.patience = patience
        self.verbose = verbose
        self.counter = 0
        self.best_score = None
        self.early_stop = False
        self.val_loss_min = np.inf
        # Minimum improvement in validation loss to reset counter
        self.delta = delta
        # Optional: use accelerator for printing/logging
        self.accelerator = accelerator
        # Store best validation metrics (for checkpointing/analysis)
        self.best_metrics = None  

    def __call__(self, val_loss, model, path, metrics=None):
        # Convert validation loss into score (lower loss → higher score)
        score = -val_loss
        if self.best_score is None:
            # First evaluation: save as baseline best
            self.best_score = score
            self.save_checkpoint(val_loss, model, path, metrics)
            if metrics is not None:
                self.best_metrics = metrics
        elif score < self.best_score + self.delta:
            # No sufficient improvement → increase counter
            self.counter += 1
            if self.accelerator:
                self.accelerator.print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            else:
                print(f'EarlyStopping counter: {self.counter} out of {self.patience}')
            # Stop if patience exceeded
            if self.counter >= self.patience:
                self.early_stop = True
        else:
            # Improvement found → save checkpoint and reset counter
            self.best_score = score
            self.save_checkpoint(val_loss, model, path, metrics)
            if metrics is not None:
                self.best_metrics = metrics
            self.counter = 0

5.4 Learning Rate Scheduler

In [None]:
def adjust_learning_rate(optimizer, epoch, args, accelerator=None):
    """
    Adjust the learning rate according to different scheduling strategies.
    
    Supported strategies are selected by args.lradj.
    Updates optimizer.param_groups['lr'] if current epoch matches schedule.
    """
    if args.lradj == 'type1':
        # Halve LR every epoch
        lr_adjust = {epoch: args.learning_rate * (0.5 ** ((epoch - 1) // 1))}
    elif args.lradj == 'type2':
        # Manually specified step decay
        lr_adjust = {
            2: 5e-5, 4: 1e-5, 6: 5e-6, 8: 1e-6,
            10: 5e-7, 15: 1e-7, 20: 5e-8
        }
    elif args.lradj == 'type3':
        # Keep constant LR for first 3 epochs, then exponential decay
        lr_adjust = {epoch: args.learning_rate if epoch < 3 
                     else args.learning_rate * (0.9 ** ((epoch - 3) // 1))}
    elif args.lradj == 'constant':
        # Fixed learning rate
        lr_adjust = {epoch: args.learning_rate}
    elif args.lradj == 'TSLR':
        # Slowly decaying LR over long horizon
        lr_adjust = {epoch: args.learning_rate * ((0.5 ** 0.1) ** (epoch // 20))}
    elif args.lradj == '3':
        # Step decay after 10 epochs
        lr_adjust = {epoch: args.learning_rate if epoch < 10 else args.learning_rate * 0.1}
    elif args.lradj == '4':
        # Step decay after 15 epochs
        lr_adjust = {epoch: args.learning_rate if epoch < 15 else args.learning_rate * 0.1}
    elif args.lradj == '5':
        # Step decay after 25 epochs
        lr_adjust = {epoch: args.learning_rate if epoch < 25 else args.learning_rate * 0.1}
    elif args.lradj == '6':
        # Step decay after 5 epochs
        lr_adjust = {epoch: args.learning_rate if epoch < 5 else args.learning_rate * 0.1}
    elif args.lradj == 'TST':
        # Linear warm-up style schedule (increasing slightly each epoch)
        lr_adjust = {epoch: args.learning_rate * (1.0 + 0.1 * epoch / args.train_epochs)}

    # Apply LR update if schedule defines current epoch
    if epoch in lr_adjust.keys():
        lr = lr_adjust[epoch]
        for param_group in optimizer.param_groups:
            param_group['lr'] = lr
        if accelerator is not None:
            accelerator.print(f'Updating learning rate to {lr}')
        else:
            print(f'Updating learning rate to {lr}')

6. Validation and Testing

In [None]:
def validate(self, vali_loader=None) -> Tuple[float, float]:
    """
    Validate the model with distributed-safe metric aggregation.

    Uses explicit accumulation of squared and absolute errors to avoid
    GPU out-of-memory during evaluation.

    Args:
        vali_loader: Optional validation DataLoader. If None, a loader is created.

    Returns:
        (MSE, MAE) as floats
    """
    if vali_loader is None:
        _, vali_loader = self._get_data(flag='val')

    # Initialize accumulators on device
    sum_sq_error = torch.tensor(0.0, device=self.device)
    sum_abs_error = torch.tensor(0.0, device=self.device)
    total_count = torch.tensor(0.0, device=self.device)

    self.model.eval()
    with torch.no_grad():
        for batch_x, batch_y, batch_x_mark, batch_y_mark in vali_loader:
            # Move batch to device
            batch_x = batch_x.float().to(self.device)
            batch_y = batch_y.float().to(self.device)
            batch_x_mark = batch_x_mark.float().to(self.device)
            batch_y_mark = batch_y_mark.float().to(self.device)

            # Decoder input: label_len history + zero padding for pred_len
            dec_inp = torch.zeros_like(batch_y[:, -self.config.pred_len:, :])
            dec_inp = torch.cat(
                [batch_y[:, :self.config.label_len, :], dec_inp], dim=1
            ).to(self.device)

            # Forward pass (supports AMP and tuple outputs)
            with self.accelerator.autocast():
                outputs = self.model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
            if isinstance(outputs, tuple):
                outputs = outputs[0]

            # Target slice: prediction horizon only
            true_slice = batch_y[:, -self.config.pred_len:, :]

            # Accumulate squared and absolute errors
            error = outputs - true_slice
            sum_sq_error += error.pow(2).sum()
            sum_abs_error += error.abs().sum()
            total_count += torch.tensor(error.numel(), device=self.device)

    # Distributed reduction: aggregate across all processes
    sum_sq_error = self.accelerator.reduce(sum_sq_error, reduction="sum")
    sum_abs_error = self.accelerator.reduce(sum_abs_error, reduction="sum")
    total_count = self.accelerator.reduce(total_count, reduction="sum")

    # Final metrics
    mse = sum_sq_error / total_count
    mae = sum_abs_error / total_count

    self.model.train()
    return mse.item(), mae.item()

In [None]:
def test(self, setting: str, best_model_path: Optional[str] = None) -> Tuple[float, float]:
    """
    Evaluate the trained model on test data using the same aggregation logic as in validate().

    Args:
        setting: Experiment identifier string, used for result saving and fallback checkpoint loading.
        best_model_path: Path to the model checkpoint. If None, defaults to ./checkpoints/{setting}.pth

    Returns:
        Tuple of (MSE, MAE) on the test set.
    """
    # Select checkpoint (default path if none provided)
    if best_model_path is None:
        best_model_path = os.path.join(self.config.checkpoints, f"{setting}.pth")

    self.accelerator.print(f'Loading trained model {best_model_path} for testing')
    
    # Load best model weights (unwrap to avoid DDP/AMP wrappers)
    self.model = self.accelerator.unwrap_model(self.model)
    self.model.load_state_dict(torch.load(best_model_path, map_location='cpu'))

    # Build test loader and prepare for distributed inference
    _, test_loader = self._get_data(flag='test')
    self.model, test_loader = self.accelerator.prepare(self.model, test_loader)

    # Reuse validate() to compute final metrics
    mse, mae = self.validate(test_loader)

    self.accelerator.print(f'Test MSE: {mse:.6f}, Test MAE: {mae:.6f}')
    return mse, mae

7. Prepare datasets from huggingface

To access the Time-HD benchmark dataset, follow these steps:

a. Create a Hugging Face account, if you do not already have one.

b. Visit the dataset page:  
   [https://huggingface.co/datasets/Time-HD-Anonymous/High_Dimensional_Time_Series](https://huggingface.co/datasets/Time-HD-Anonymous/High_Dimensional_Time_Series)

c. Click **"Agree and access repository"**. You must be logged in to complete this step.

d. Create new Access Token. Token type should be "write".

e. Authenticate on your local machine by running:

   ```bash
   huggingface-cli login
   ```

   and enter your generated token above.

f. Then, you can manually download all the dataset by running:

   ```bash
   python download_dataset.py
   ```

The summary of the supported high-dimensional time series datasets is shown in Table 2 above. Besides these, we also support datasets such as ECL, ETTh1, ETTh2, ETTm1, ETTm2, Weather, 

8. Running the Experiment

```bash
# 🖥️ Single GPU training
accelerate launch --num_processes=1 run.py --model UCast --data "Measles" --gpu 0

# 🚀 Multi-GPU training (auto-detect all GPUs)
accelerate launch run.py --model UCast --data "Measles"

# 🎯 Specific GPU selection (e.g. 4 GPUs, id: 0,2,3,7)
accelerate launch --num_processes=4 run.py --model UCast --data "Measles" --gpu 0,2,3,7

# 📋 List available models
accelerate launch run.py --list-models

# ℹ️ Show framework information
python run.py --info
```

The sign of a successful experiment running is that information about the experiment is printed out, such as:

In [None]:
================================================================================
🚀 Time-HD-Lib: A Lirbrary for High-Dimensional Time Series Forecasting
================================================================================
Loaded default parameters for Measles from configs/UCast.yaml: {'enc_in': 1161, 'train_epochs': 100, 'alpha': 0.001, 'seq_len_factor': 3, 'learning_rate': 0.0005, 'd_model': 512, 'e_layers': 2, 'lradj': 'type3', 'batch_size': 32}
args.batchsize: 32

📋 Configuration Summary:
   Task: long_term_forecast
   Model: UCast
   Dataset: Measles
   Input Sequence Length: 21
   Prediction Length: 7
   Training Mode: Yes
   GPU: Yes

🔧 Initializing experiment runner...
🎯 Starting experiment execution...

=== Accelerator Device Information ===
Number of processes: 8
Distributed type: MULTI_GPU

Device details for all processes:
  GPU #0: NVIDIA L40S - Total memory: 44.40 GB
  GPU #1: NVIDIA L40S - Total memory: 44.40 GB
  GPU #2: NVIDIA L40S - Total memory: 44.40 GB
  GPU #3: NVIDIA L40S - Total memory: 44.40 GB
  GPU #4: NVIDIA L40S - Total memory: 44.40 GB
  GPU #5: NVIDIA L40S - Total memory: 44.40 GB
  GPU #6: NVIDIA L40S - Total memory: 44.40 GB
  GPU #7: NVIDIA L40S - Total memory: 44.40 GB
=======================================

>>> Starting training for long_term_forecast_UCast_Measles_sl21_pl7 <<<
train 903
val 128
test 260

Then, the model starts training. Once one epoch finishes training, information like below will be printer out：

In [None]:
Epoch: 1 cost time: 0.94s
Average batch training time: 0.1826s
Val cost time: 0.21s
Test cost time: 0.17s
Epoch: 1, Steps: 29 | Train Loss: 0.9740449 Vali Loss: 0.0712604 Test Loss: 0.0255510
Validation loss decreased (inf --> 0.071260).  Saving model ...
Updating learning rate to 0.0005
...
Epoch: 40 cost time: 0.34s
Average batch training time: 0.0256s
Val cost time: 0.21s
Test cost time: 0.18s
Epoch: 40, Steps: 29 | Train Loss: 0.3438028 Vali Loss: 0.0463830 Test Loss: 0.0147475
EarlyStopping counter: 3 out of 3
Early stopping

When all epochs are done, the model steps into testing. The following information about testing will be printed out, giving the MAE and MSE of test.

In [None]:
>>> Starting testing for long_term_forecast_UCast_Measles_sl21_pl7 <<<
Loading trained model ./checkpoints/long_term_forecast_UCast_Measles_sl21_pl7.pth for testing
test 260
Test MSE: 0.014731, Test MAE: 0.052162

In [None]:
================================================================================
✅ Experiment Completed Successfully!
================================================================================

📊 Best Training Results (Epoch 37):
   Train Loss: 0.349271
   Validation Loss: 0.046337
   Validation MAE: 0.108613
   Test Loss: 0.014731
   Test MAE: 0.052162

🎯 Final Test Results:
   MSE: 0.014731
   MAE: 0.052162

🎉 All results have been saved to the experiments directory.