# EasyTPP - Getting Started Guide

This notebook presents the main features of the **EasyTPP** (Easy Temporal Point Processes) library with practical examples.

## 🎯 Notebook Objectives

- Understand the basic concepts of temporal point processes
- Learn to configure and train models
- Explore the different types of data and available models
- Visualize and analyze results

## 📚 Table of Contents

1. [Environment Setup](#1-configuration)
2. [Basic Concepts](#2-concepts)
3. [Data Loading and Preparation](#3-donnees)
4. [Model Configuration and Training](#4-entrainement)
5. [Evaluation and Metrics](#5-evaluation)
6. [Advanced Examples](#6-avances)

## 1. Environment Setup {#1-configuration}

Let's start by importing the necessary modules and setting up the environment.

In [1]:
import sys
from pathlib import Path

# Add the project root directory to PYTHONPATH
project_root = Path().absolute().parent
sys.path.insert(0, str(project_root))

# EasyTPP imports
from easy_tpp.config_factory import RunnerConfig
from easy_tpp.utils.yaml_config_utils import parse_runner_yaml_config
from easy_tpp.runner import Runner

print("✅ EasyTPP imported successfully!")
print(f"📁 Project directory: {project_root}")

KeyboardInterrupt: 

## 2. Basic Concepts {#2-concepts}

### What is a Temporal Point Process?

A **Temporal Point Process** (TPP) is a sequence of events that occur over time. Each event is characterized by:

- **Occurrence time**: When the event happens
- **Event type**: What category of event (optional)

### Application examples:

- 🏥 **Medical**: Patient arrivals at a hospital
- 💰 **Finance**: Stock market transactions
- 🌍 **Geophysics**: Earthquakes
- 📱 **Social Networks**: User posts

### Models available in EasyTPP:

- **NHP** (Neural Hawkes Process): Hawkes processes with neural networks
- **THP** (Transformer Hawkes Process): Based on Transformer architecture
- **RMTPP** (Recurrent Marked Temporal Point Process): Based on RNNs
- **AttNHP** (Attentive Neural Hawkes Process): With attention mechanism

## 3. Data Loading and Preparation {#3-donnees}

EasyTPP supports multiple data formats. Let's see how to load and prepare data.

In [None]:
from easy_tpp.config_factory import DataConfig
from easy_tpp.config_factory.data_config import DataLoadingSpecsConfig, TokenizerConfig
from easy_tpp.data.preprocess import TPPDataModule

In [None]:
# Data configuration with proper nested structure
data_config = DataConfig(
    test_dir="NzoCs/test_dataset",                  # Directory for test data
    valid_dir="NzoCs/test_dataset",                 # Directory for validation data
    train_dir="NzoCs/test_dataset",            # Source directory for data
    dataset_id="test",                          # Dataset to use
    data_format="json",                       # Data format (pickle, json, csv)
    data_loading_specs=DataLoadingSpecsConfig(
        batch_size=32,                          # Batch size
        num_workers=1,                          # Number of workers for data loading
        shuffle=True                            # Shuffle data
    ),
    data_specs=TokenizerConfig(
        num_event_types=2,                      # Number of event types
        padding_side='left',                    # Padding side
        truncation_side='left'                  # Truncation side
    )
)

print("📊 Data configuration created:")
print(f"   Dataset: {data_config.dataset_id}")
print(f"   Format: {data_config.data_format}")
print(f"   Event types: {data_config.data_specs.num_event_types}")
print(f"   Batch size: {data_config.data_loading_specs.batch_size}")
print(f"   Number of workers: {data_config.data_loading_specs.num_workers}")
print(f"   Padding side: {data_config.data_specs.padding_side}")

📊 Data configuration created:
   Dataset: test
   Format: pickle
   Event types: 2
   Batch size: 32
   Number of workers: 1
   Padding side: left


In [None]:
# Alternative: Create DataConfig using from_dict (simpler approach)
data_config_dict = {
    "source_dir": "NzoCs/test_dataset",
    "dataset_id": "test",
    "data_format": "json",
    "data_loading_specs": {
        "batch_size": 32,
        "num_workers": 1,
        "shuffle": True
    },
    "data_specs": {
        "num_event_types": 2,
        "padding_side": "left",
        "truncation_side": "left"
    }
}

# Create DataConfig from dictionary
data_config_alt = DataConfig.from_dict(data_config_dict)

print("📊 Alternative DataConfig created from dictionary:")
print(f"   Dataset: {data_config_alt.dataset_id}")
print(f"   Format: {data_config_alt.data_format}")
print(f"   Event types: {data_config_alt.data_specs.num_event_types}")
print(f"   Batch size: {data_config_alt.data_loading_specs.batch_size}")

📊 Alternative DataConfig created from dictionary:
   Dataset: test
   Format: pickle
   Event types: 2
   Batch size: 32


In [None]:
# Create data module
datamodule = TPPDataModule(data_config_alt)
datamodule.setup(stage='fit')  # Setup for training and validation

# Get data loaders
train_loader = datamodule.train_dataloader()
val_loader = datamodule.val_dataloader()

print("✅ Data loaders created successfully!")
print(f"   📈 Train loader: {len(train_loader)} batches")
print(f"   📊 Validation loader: {len(val_loader)} batches")

[38;20m2025-07-12 17:55:12,055 - data_loader.py[pid:17904;line:140:setup] - INFO: Setting up data for stage: fit[0m
[38;20m2025-07-12 17:55:14,772 - data_loader.py[pid:17904;line:149:setup] - INFO: Train dataset created with 6 sequences[0m
[38;20m2025-07-12 17:55:17,282 - data_loader.py[pid:17904;line:158:setup] - INFO: Validation dataset created with 2 sequences[0m
✅ Data loaders created successfully!
   📈 Train loader: 1 batches
   📊 Validation loader: 1 batches


### Data Inspection

Let's use the Visualizer to analyze the data distribution.

In [None]:
from easy_tpp.data.preprocess.visualizer import Visualizer

# Create the visualizer
visualizer = Visualizer(
    data_module=datamodule,
    split="train",
    save_dir="./analysis_plots"
)

# Generate visualizations
visualizer.show_all_distributions(save_graph=True, show_graph=False)
visualizer.delta_times_distribution(save_graph=True)
visualizer.event_type_distribution(save_graph=True)

print("📈 Analysis plots generated!")
print("   Check the './analysis_plots' folder for saved graphs")

TypeError: Visualizer.show_all_distributions() got an unexpected keyword argument 'save_graph'

## 4. Model Configuration and Training {#4-entrainement}

Now, let's configure and train a Neural Hawkes Process (NHP) model.

In [None]:
# Load configuration from YAML file
config_file = project_root / "examples" / "runner_config.yaml"
runner_dict = parse_runner_yaml_config(str(config_file), "NHP", "test")

# Create configuration
config = RunnerConfig.from_dict(runner_dict)

print("⚙️ Model configuration:")
print(f"   🧠 Model: NHP")
print(f"   📊 Dataset: test")

⚙️ Model configuration:
   🧠 Model: NHP
   📊 Dataset: test


In [None]:
# Create runner and start training
runner = Runner(config=config, output_dir="./training_results")

print("🚀 Starting training...")
print("   This may take a few minutes depending on your configuration.")

# Train the model
runner.run(phase="train")

print("✅ Training completed!")

[31;1m2025-07-12 17:57:03,616 - runner.py[pid:17904;line:39:__init__] - CRITICAL: Runner initialized for model: NHP on dataset: test[0m
🚀 Starting training...
   This may take a few minutes depending on your configuration.
[38;20m2025-07-12 17:57:03,621 - runner.py[pid:17904;line:129:run] - INFO: Runner executing phases: ['train'][0m
[38;20m2025-07-12 17:57:03,623 - runner.py[pid:17904;line:72:train] - INFO: === TRAINING PHASE ===[0m
[38;20m2025-07-12 17:57:03,640 - lightning_runner.py[pid:17904;line:117:__init__] - INFO: No valid checkpoint found. Starting from scratch.[0m
[38;20m2025-07-12 17:57:03,643 - lightning_runner.py[pid:17904;line:222:train] - INFO: --- Starting Training for Model : NHP on dataset : test ---[0m


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


[38;20m2025-07-12 17:57:03,873 - data_loader.py[pid:17904;line:140:setup] - INFO: Setting up data for stage: fit[0m
[38;20m2025-07-12 17:57:07,141 - data_loader.py[pid:17904;line:149:setup] - INFO: Train dataset created with 6 sequences[0m
[38;20m2025-07-12 17:57:10,435 - data_loader.py[pid:17904;line:158:setup] - INFO: Validation dataset created with 2 sequences[0m



  | Name            | Type             | Params | Mode 
-------------------------------------------------------------
0 | layer_type_emb  | Embedding        | 192    | train
1 | rnn_cell        | ContTimeLSTMCell | 57.8 K | train
2 | layer_intensity | Sequential       | 132    | train
-------------------------------------------------------------
58.1 K    Trainable params
0         Non-trainable params
58.1 K    Total params
0.232     Total estimated model params size (MB)
7         Modules in train mode
0         Modules in eval mode


Sanity Checking: |          | 0/? [00:00<?, ?it/s]

c:\Users\enzo.cAo\Documents\Projects\finance\projet_recherche\New_LTPP\.venv\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:425: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


                                                                           

c:\Users\enzo.cAo\Documents\Projects\finance\projet_recherche\New_LTPP\.venv\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.
c:\Users\enzo.cAo\Documents\Projects\finance\projet_recherche\New_LTPP\.venv\Lib\site-packages\pytorch_lightning\loops\fit_loop.py:310: The number of training batches (1) is smaller than the logging interval Trainer(log_every_n_steps=5). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.


Epoch 9: 100%|██████████| 1/1 [00:00<00:00,  2.99it/s, v_num=0, train_loss=2.300]

Metric val_loss improved. New best score: 2.681


Epoch 19: 100%|██████████| 1/1 [00:00<00:00,  2.43it/s, v_num=0, train_loss=1.870, val_loss=2.680]

Metric val_loss improved by 0.354 >= min_delta = 0.0. New best score: 2.327


Epoch 29: 100%|██████████| 1/1 [00:00<00:00,  2.44it/s, v_num=0, train_loss=1.640, val_loss=2.330]

Metric val_loss improved by 0.203 >= min_delta = 0.0. New best score: 2.124


Epoch 39: 100%|██████████| 1/1 [00:00<00:00,  1.11it/s, v_num=0, train_loss=1.510, val_loss=2.120]

Metric val_loss improved by 0.139 >= min_delta = 0.0. New best score: 1.985


Epoch 49: 100%|██████████| 1/1 [00:00<00:00,  2.82it/s, v_num=0, train_loss=1.430, val_loss=1.980]

Metric val_loss improved by 0.094 >= min_delta = 0.0. New best score: 1.891


Epoch 59: 100%|██████████| 1/1 [00:00<00:00,  2.89it/s, v_num=0, train_loss=1.380, val_loss=1.890]

Metric val_loss improved by 0.062 >= min_delta = 0.0. New best score: 1.829


Epoch 69: 100%|██████████| 1/1 [00:00<00:00,  2.41it/s, v_num=0, train_loss=1.340, val_loss=1.830]

Metric val_loss improved by 0.038 >= min_delta = 0.0. New best score: 1.791


Epoch 79: 100%|██████████| 1/1 [00:00<00:00,  2.80it/s, v_num=0, train_loss=1.310, val_loss=1.790]

Metric val_loss improved by 0.025 >= min_delta = 0.0. New best score: 1.766


Epoch 89: 100%|██████████| 1/1 [00:00<00:00,  2.62it/s, v_num=0, train_loss=1.280, val_loss=1.770]

Metric val_loss improved by 0.018 >= min_delta = 0.0. New best score: 1.748


Epoch 99: 100%|██████████| 1/1 [00:00<00:00,  2.61it/s, v_num=0, train_loss=1.260, val_loss=1.750]

Metric val_loss improved by 0.013 >= min_delta = 0.0. New best score: 1.736


Epoch 149: 100%|██████████| 1/1 [00:00<00:00,  2.46it/s, v_num=0, train_loss=1.140, val_loss=1.840]

Monitored metric val_loss did not improve in the last 5 records. Best score: 1.736. Signaling Trainer to stop.


Epoch 149: 100%|██████████| 1/1 [00:00<00:00,  1.13it/s, v_num=0, train_loss=1.140, val_loss=1.890]



Detected KeyboardInterrupt, attempting graceful shutdown ...


RuntimeError: Please call `iter(combined_loader)` first.

## 5. Evaluation and Metrics {#5-evaluation}

Let's now evaluate the performance of the trained model.

In [None]:
# Evaluation on test dataset
print("🧪 Evaluating model on test dataset...")

test_results = runner.run(phase="test")

print("📊 Evaluation results:")
if hasattr(runner, 'test_metrics'):
    for metric_name, value in runner.test_metrics.items():
        print(f"   {metric_name}: {value:.4f}")
else:
    print("✅ Evaluation completed - check logs for detailed metrics")

🧪 Evaluating model on test dataset...
[38;20m2025-07-12 18:00:02,705 - runner.py[pid:17904;line:129:run] - INFO: Runner executing phases: ['test'][0m
[31;1m2025-07-12 18:00:02,707 - runner.py[pid:17904;line:85:test] - CRITICAL: === TESTING PHASE ===[0m
[38;20m2025-07-12 18:00:02,712 - lightning_runner.py[pid:17904;line:104:__init__] - INFO: Checkpoint found: loading from ./training_results\best.ckpt[0m
[38;20m2025-07-12 18:00:02,714 - lightning_runner.py[pid:17904;line:115:__init__] - INFO: Loading model from checkpoint: ./training_results\best.ckpt.[0m
[38;20m2025-07-12 18:00:02,716 - lightning_runner.py[pid:17904;line:246:test] - INFO: --- Starting Testing for Model : NHP on dataset : test ---[0m


GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs


[38;20m2025-07-12 18:00:02,785 - data_loader.py[pid:17904;line:140:setup] - INFO: Setting up data for stage: test[0m


Restoring states from the checkpoint path at ./training_results\best.ckpt
Loaded model weights from the checkpoint at ./training_results\best.ckpt
c:\Users\enzo.cAo\Documents\Projects\finance\projet_recherche\New_LTPP\.venv\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:425: The 'test_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=15` in the `DataLoader` to improve performance.


Testing DataLoader 0: 100%|██████████| 1/1 [00:00<00:00,  5.70it/s]


[38;20m2025-07-12 18:00:48,859 - lightning_runner.py[pid:17904;line:270:test] - INFO: Test results saved to ./training_results\test_results.json[0m
📊 Evaluation results:
   ✅ Evaluation completed - check logs for detailed metrics


### Comparison with Baselines

Let's compare our model with simple baselines.

In [None]:
from easy_tpp.evaluation.benchmarks.mean_bench import MeanInterTimeBenchmark
from easy_tpp.evaluation.benchmarks.sample_distrib_mark_bench import MarkDistributionBenchmark

# Baseline benchmark: mean prediction
mean_benchmark = MeanInterTimeBenchmark(
    data_config=data_config_alt,
    experiment_id="mean_baseline",
    save_dir="./benchmark_results"
)

print("📊 Baseline benchmark (mean):")
mean_results = mean_benchmark.evaluate()
print(f"   Results: {mean_results}")

# Type distribution benchmark
mark_benchmark = MarkDistributionBenchmark(
    data_config=data_config_alt,
    experiment_id="mark_baseline",
    save_dir="./benchmark_results"
)

print("\n📊 Type distribution benchmark:")
mark_results = mark_benchmark.evaluate()
print(f"   Results: {mark_results}")

[38;20m2025-07-12 18:02:41,162 - data_loader.py[pid:17904;line:140:setup] - INFO: Setting up data for stage: test[0m
📊 Baseline benchmark (mean):
[38;20m2025-07-12 18:02:44,880 - base_bench.py[pid:17904;line:147:evaluate] - INFO: Starting mean_inter_time benchmark evaluation...[0m
[38;20m2025-07-12 18:02:44,886 - mean_bench.py[pid:17904;line:51:_prepare_benchmark] - INFO: Computing mean inter-time from training data...[0m
[38;20m2025-07-12 18:03:16,165 - mean_bench.py[pid:17904;line:74:_prepare_benchmark] - INFO: Computed mean inter-time: 1.506245[0m
[38;20m2025-07-12 18:03:47,672 - base_bench.py[pid:17904;line:366:_save_results] - INFO: Results saved to: ./benchmark_results\mean_baseline\mean_inter_time_results.json[0m
[38;20m2025-07-12 18:03:47,678 - base_bench.py[pid:17904;line:375:_log_summary] - INFO: mean_inter_time benchmark completed successfully![0m
[38;20m2025-07-12 18:03:47,683 - base_bench.py[pid:17904;line:381:_log_summary] - INFO: Time RMSE: 2.772637[0m
[38

## 6. Advanced Examples {#6-avances}

### Synthetic Data Generation

EasyTPP allows generating synthetic data to test models.

In [1]:
from easy_tpp.data.generation import HawkesSimulator

# Hawkes process configuration
params = {
    "mu": [0.1, 0.2],                    # Base intensities
    "alpha": [[0.3, 0.1], [0.2, 0.4]],  # Excitation matrix
    "beta": [[2, 1], [1.5, 3]]          # Decay matrix
}

# Create simulator
simulator = HawkesSimulator(
    mu=params["mu"],
    alpha=params["alpha"],
    beta=params["beta"],
    dim_process=2,
    start_time=0,
    end_time=100
)

print("🎲 Generating synthetic data...")

# Generate and save
simulator.generate_and_save(
    output_dir='./synthetic_data',
    num_simulations=10,
    splits={'train': 0.6, 'test': 0.2, 'dev': 0.2}
)

print("✅ Synthetic data generated in './synthetic_data'")

🎲 Generating synthetic data...
Génération de 10 simulations 2D...


Simulation de 10 processus: 100%|██████████| 10/10 [00:00<00:00, 33.72it/s]

Division des données en ensembles train/test/dev...
Sauvegarde des données...
Toutes les données ont été sauvegardées dans ./synthetic_data
✅ Synthetic data generated in './synthetic_data'





### Multiple Model Comparison

Let's compare the performance of different models on the same dataset.

In [2]:
# List of models to compare
models_to_compare = ['NHP', 'THP', 'RMTPP']
results_comparison = {}

for model_name in models_to_compare:
    print(f"\n🧠 Training model {model_name}...")
    
    try:
        # Configuration for this model
        config_file = project_root / "examples" / "runner_config.yaml"
        model_config_dict = parse_runner_yaml_config(str(config_file), model_name, "test")
        
        config = RunnerConfig.from_dict(model_config_dict)
        runner = Runner(config=config, output_dir=f"./comparison_results/{model_name}")
        
        # Quick training (fewer epochs for demo)
        runner.run(phase="train")
        test_results = runner.run(phase="test")
        
        results_comparison[model_name] = "✅ Success"
        print(f"   ✅ {model_name} trained successfully")
        
    except Exception as e:
        results_comparison[model_name] = f"❌ Error: {str(e)[:50]}..."
        print(f"   ❌ Error with {model_name}: {str(e)[:50]}...")

print("\n📊 Comparison summary:")
for model, result in results_comparison.items():
    print(f"   {model}: {result}")


🧠 Training model NHP...
   ❌ Error with NHP: name 'project_root' is not defined...

🧠 Training model THP...
   ❌ Error with THP: name 'project_root' is not defined...

🧠 Training model RMTPP...
   ❌ Error with RMTPP: name 'project_root' is not defined...

📊 Comparison summary:
   NHP: ❌ Error: name 'project_root' is not defined...
   THP: ❌ Error: name 'project_root' is not defined...
   RMTPP: ❌ Error: name 'project_root' is not defined...


## 🎉 Conclusion

This notebook has covered the main features of EasyTPP:

✅ **Environment setup** and imports

✅ **Understanding basic concepts** of TPPs

✅ **Data loading and preparation**

✅ **Model configuration and training**

✅ **Evaluation and comparison** with baselines

✅ **Synthetic data generation**

✅ **Multiple model comparison**

### 🚀 Next Steps

- Explore other available models (AttNHP, Transformer-based)
- Test with your own data
- Adjust hyperparameters to optimize performance
- Use advanced analysis tools to understand model behavior

### 📚 Useful Resources

- [EasyTPP Documentation](https://github.com/your-repo/EasyTPP)
- [Additional Examples](../examples/)
- [Advanced Configuration](../configs/)

## 6. Prediction Phase and Distribution Analysis

**Why the prediction phase is crucial:**

Temporal Point Process (TPP) models don't just serve to calculate performance metrics - their true value lies in their ability to **predict and simulate** new events. These predictions enable:

1. **Distribution comparisons** - Analyze whether the model captures temporal patterns well
2. **Realistic benchmarks** - Compare model simulations to real data  
3. **Qualitative validation** - Visualize differences between predictions and reality
4. **Practical applications** - Generate future scenarios for decision-making

### 6.1 Complete Pipeline with Predictions

In [None]:
# Complete example: train → test → predict
print("🔄 Complete pipeline with predictions...")

# Configuration
config_dict = parse_runner_yaml_config(
    yaml_path="../configs/runner_config.yaml",
    experiment_id="NHP", 
    dataset_id="test"
)
config = RunnerConfig.from_dict(config_dict)

# Runner
runner = Runner(config=config, output_dir="./prediction_analysis")

# Phase 1: Training
print("📚 1. Training the model...")
runner.run(phase="train")

# Phase 2: Test/Evaluation  
print("🧪 2. Performance evaluation...")
runner.run(phase="test")

# Phase 3: Predictions and comparisons (CRUCIAL!)
print("🔮 3. Generating predictions and distribution comparisons...")
runner.run(phase="predict")

print("✅ Complete pipeline finished!")
print("📊 Results available in:")
print("   - Performance metrics")
print("   - Model simulations") 
print("   - Distribution comparisons")
print("   - Analysis graphs")

🔄 Complete pipeline with predictions...


NameError: name 'parse_runner_yaml_config' is not defined

### 6.2 Simplified Alternative: Single Command

If you want the complete pipeline all at once:

In [None]:
# Ultra-simple version: everything in one command
runner = Runner(config=config, output_dir="./complete_pipeline")

# Automatically executes: train → test → predict
runner.run(phase="all")

print("🎉 Complete pipeline executed with phase='all'!")
print("💡 This command is equivalent to the 3 separate phases above")

### 6.3 Why Predictions Are Essential

**🎯 Main objective:** Verify that the model has learned the correct temporal distributions.

**📊 What the `predict` phase generates:**
- **Event simulations** based on the trained model
- **Visual comparisons** between real and simulated data
- **Statistical analyses** of temporal distributions
- **Prediction quality metrics**

**🔍 Practical applications:**
- **Finance:** Predict trading volume peaks
- **Healthcare:** Anticipate epidemics or relapses
- **Networks:** Forecast traffic overloads
- **Social:** Model information propagation

**⚠️ Crucial point:** Without the prediction phase, you only have numerical metrics. With predictions, you can **see** if your model truly understands the temporal dynamics of your data.