# EasyTPP - Getting Started Guide

This notebook presents the main features of the **EasyTPP** (Easy Temporal Point Processes) library with practical examples.

## 🎯 Notebook Objectives

- Understand the basic concepts of temporal point processes
- Learn to configure and train models
- Explore the different types of data and available models
- Visualize and analyze results

## 📚 Table of Contents

1. [Environment Setup](#1-configuration)
2. [Basic Concepts](#2-concepts)
3. [Data Loading and Preparation](#3-donnees)
4. [Model Configuration and Training](#4-entrainement)
5. [Evaluation and Metrics](#5-evaluation)
6. [Advanced Examples](#6-avances)

## 1. Environment Setup {#1-configuration}

Let's start by importing the necessary modules and setting up the environment.

In [1]:
import sys
from pathlib import Path

# Add the project root directory to PYTHONPATH
project_root = Path().absolute().parent
sys.path.insert(0, str(project_root))

# EasyTPP imports
from easy_tpp.config_factory import RunnerConfig
from easy_tpp.utils.yaml_config_utils import parse_runner_yaml_config
from easy_tpp.runner import Runner

print("✅ EasyTPP imported successfully!")
print(f"📁 Project directory: {project_root}")

✅ EasyTPP imported successfully!
📁 Project directory: c:\Users\enzo.cAo\Documents\Projects\finance\projet_recherche\New_LTPP


## 2. Basic Concepts {#2-concepts}

### What is a Temporal Point Process?

A **Temporal Point Process** (TPP) is a sequence of events that occur over time. Each event is characterized by:

- **Occurrence time**: When the event happens
- **Event type**: What category of event (optional)

### Application examples:

- 🏥 **Medical**: Patient arrivals at a hospital
- 💰 **Finance**: Stock market transactions
- 🌍 **Geophysics**: Earthquakes
- 📱 **Social Networks**: User posts

### Models available in EasyTPP:

- **NHP** (Neural Hawkes Process): Hawkes processes with neural networks
- **THP** (Transformer Hawkes Process): Based on Transformer architecture
- **RMTPP** (Recurrent Marked Temporal Point Process): Based on RNNs
- **AttNHP** (Attentive Neural Hawkes Process): With attention mechanism

## 3. Data Loading and Preparation {#3-donnees}

EasyTPP supports multiple data formats. Let's see how to load and prepare data.

In [4]:
from easy_tpp.config_factory import DataConfig
from easy_tpp.config_factory.data_config import DataLoadingSpecsConfig, TokenizerConfig
from easy_tpp.data.preprocess import TPPDataModule

# Data configuration with proper nested structure
data_config = DataConfig(
    source_dir="NzoCs/test_dataset",            # Source directory for data
    dataset_id="test",                          # Dataset to use
    data_format="pickle",                       # Data format (pickle, json, csv)
    data_loading_specs=DataLoadingSpecsConfig(
        batch_size=32,                          # Batch size
        num_workers=1,                          # Number of workers for data loading
        shuffle=True                            # Shuffle data
    ),
    data_specs=TokenizerConfig(
        num_event_types=2,                      # Number of event types
        padding_side='left',                    # Padding side
        truncation_side='left'                  # Truncation side
    )
)

print("📊 Data configuration created:")
print(f"   Dataset: {data_config.dataset_id}")
print(f"   Format: {data_config.data_format}")
print(f"   Event types: {data_config.data_specs.num_event_types}")
print(f"   Batch size: {data_config.data_loading_specs.batch_size}")
print(f"   Number of workers: {data_config.data_loading_specs.num_workers}")
print(f"   Padding side: {data_config.data_specs.padding_side}")

📊 Data configuration created:
   Dataset: test
   Format: pickle
   Event types: 2
   Batch size: 32
   Number of workers: 1
   Padding side: left


In [None]:
# Alternative: Create DataConfig using from_dict (simpler approach)
data_config_dict = {
    "source_dir": "NzoCs/test_dataset",
    "dataset_id": "test",
    "data_format": "pickle",
    "data_loading_specs": {
        "batch_size": 32,
        "num_workers": 1,
        "shuffle": True
    },
    "data_specs": {
        "num_event_types": 2,
        "padding_side": "left",
        "truncation_side": "left"
    }
}

# Create DataConfig from dictionary
data_config_alt = DataConfig.from_dict(data_config_dict)

print("📊 Alternative DataConfig created from dictionary:")
print(f"   Dataset: {data_config_alt.dataset_id}")
print(f"   Format: {data_config_alt.data_format}")
print(f"   Event types: {data_config_alt.data_specs.num_event_types}")
print(f"   Batch size: {data_config_alt.data_loading_specs.batch_size}")

# Both approaches create equivalent configurations
print(f"\n✅ Both configs are equivalent: {data_config.dataset_id == data_config_alt.dataset_id}")

📊 Alternative DataConfig created from dictionary:
   Dataset: test
   Format: pickle
   Event types: 2
   Batch size: 32

✅ Both configs are equivalent: True


In [6]:
# Create data module
datamodule = TPPDataModule(data_config)
datamodule.setup()

# Get data loaders
train_loader = datamodule.train_dataloader()
val_loader = datamodule.val_dataloader()
test_loader = datamodule.test_dataloader()

print("✅ Data loaders created successfully!")
print(f"   📈 Train loader: {len(train_loader)} batches")
print(f"   📊 Validation loader: {len(val_loader)} batches")
print(f"   🧪 Test loader: {len(test_loader)} batches")

  from .autonotebook import tqdm as notebook_tqdm


AttributeError: 'NoneType' object has no attribute 'endswith'

### Data Inspection

Let's use the Visualizer to analyze the data distribution.

In [None]:
from easy_tpp.data.preprocess.visualizer import Visualizer

# Create the visualizer
visualizer = Visualizer(
    data_setup=datamodule,
    split="train",
    save_dir="./analysis_plots"
)

# Generate visualizations
visualizer.show_all_distributions(save_graph=True, show_graph=False)
visualizer.delta_times_distribution(save_graph=True)
visualizer.event_type_distribution(save_graph=True)

print("📈 Analysis plots generated!")
print("   Check the './analysis_plots' folder for saved graphs")

## 4. Model Configuration and Training {#4-entrainement}

Now, let's configure and train a Neural Hawkes Process (NHP) model.

In [None]:
# Load configuration from YAML file
config_file = project_root / "examples" / "runner_config.yaml"
runner_dict = parse_runner_yaml_config(str(config_file), "NHP", "test")

# Create configuration
config = RunnerConfig.from_dict(runner_dict)

print("⚙️ Model configuration:")
print(f"   🧠 Model: NHP")
print(f"   📊 Dataset: test")

⚙️ Model configuration:
   🧠 Model: NHP
   📊 Dataset: test


AttributeError: 'RunnerConfig' object has no attribute 'device'

In [None]:
# Create runner and start training
runner = Runner(config=config, output_dir="./training_results")

print("🚀 Starting training...")
print("   This may take a few minutes depending on your configuration.")

# Train the model
runner.run(phase="train")

print("✅ Training completed!")

## 5. Evaluation and Metrics {#5-evaluation}

Let's now evaluate the performance of the trained model.

In [None]:
# Evaluation on test dataset
print("🧪 Evaluating model on test dataset...")

test_results = runner.run(phase="test")

print("📊 Evaluation results:")
if hasattr(runner, 'test_metrics'):
    for metric_name, value in runner.test_metrics.items():
        print(f"   {metric_name}: {value:.4f}")
else:
    print("   ✅ Evaluation completed - check logs for detailed metrics")

### Comparison with Baselines

Let's compare our model with simple baselines.

In [None]:
from easy_tpp.evaluation.benchmarks.mean_bench import MeanInterTimeBenchmark
from easy_tpp.evaluation.benchmarks.sample_distrib_mark_bench import MarkDistributionBenchmark

# Baseline benchmark: mean prediction
mean_benchmark = MeanInterTimeBenchmark(
    data_config=data_config,
    experiment_id="mean_baseline",
    save_dir="./benchmark_results"
)

print("📊 Baseline benchmark (mean):")
mean_results = mean_benchmark.evaluate()
print(f"   Results: {mean_results}")

# Type distribution benchmark
mark_benchmark = MarkDistributionBenchmark(
    data_config=data_config,
    experiment_id="mark_baseline",
    save_dir="./benchmark_results"
)

print("\n📊 Type distribution benchmark:")
mark_results = mark_benchmark.evaluate()
print(f"   Results: {mark_results}")

## 6. Advanced Examples {#6-avances}

### Synthetic Data Generation

EasyTPP allows generating synthetic data to test models.

In [None]:
from easy_tpp.data.generation import HawkesSimulator

# Hawkes process configuration
params = {
    "mu": [0.1, 0.2],                    # Base intensities
    "alpha": [[0.3, 0.1], [0.2, 0.4]],  # Excitation matrix
    "beta": [[2, 1], [1.5, 3]]          # Decay matrix
}

# Create simulator
simulator = HawkesSimulator(
    mu=params["mu"],
    alpha=params["alpha"],
    beta=params["beta"],
    dim_process=2,
    start_time=0,
    end_time=100
)

print("🎲 Generating synthetic data...")

# Generate and save
simulator.generate_and_save(
    output_dir='./synthetic_data',
    num_simulations=10,
    splits={'train': 0.6, 'test': 0.2, 'dev': 0.2}
)

print("✅ Synthetic data generated in './synthetic_data'")

### Multiple Model Comparison

Let's compare the performance of different models on the same dataset.

In [None]:
# List of models to compare
models_to_compare = ['NHP', 'THP', 'RMTPP']
results_comparison = {}

for model_name in models_to_compare:
    print(f"\n🧠 Training model {model_name}...")
    
    try:
        # Configuration for this model
        config_file = project_root / "examples" / "runner_config.yaml"
        model_config_dict = parse_runner_yaml_config(str(config_file), model_name, "test")
        
        config = RunnerConfig.from_dict(model_config_dict)
        runner = Runner(config=config, output_dir=f"./comparison_results/{model_name}")
        
        # Quick training (fewer epochs for demo)
        runner.run(phase="train")
        test_results = runner.run(phase="test")
        
        results_comparison[model_name] = "✅ Success"
        print(f"   ✅ {model_name} trained successfully")
        
    except Exception as e:
        results_comparison[model_name] = f"❌ Error: {str(e)[:50]}..."
        print(f"   ❌ Error with {model_name}: {str(e)[:50]}...")

print("\n📊 Comparison summary:")
for model, result in results_comparison.items():
    print(f"   {model}: {result}")

## 🎉 Conclusion

This notebook has covered the main features of EasyTPP:

✅ **Environment setup** and imports

✅ **Understanding basic concepts** of TPPs

✅ **Data loading and preparation**

✅ **Model configuration and training**

✅ **Evaluation and comparison** with baselines

✅ **Synthetic data generation**

✅ **Multiple model comparison**

### 🚀 Next Steps

- Explore other available models (AttNHP, Transformer-based)
- Test with your own data
- Adjust hyperparameters to optimize performance
- Use advanced analysis tools to understand model behavior

### 📚 Useful Resources

- [EasyTPP Documentation](https://github.com/your-repo/EasyTPP)
- [Additional Examples](../examples/)
- [Advanced Configuration](../configs/)

## 6. Prediction Phase and Distribution Analysis

**Why the prediction phase is crucial:**

Temporal Point Process (TPP) models don't just serve to calculate performance metrics - their true value lies in their ability to **predict and simulate** new events. These predictions enable:

1. **Distribution comparisons** - Analyze whether the model captures temporal patterns well
2. **Realistic benchmarks** - Compare model simulations to real data  
3. **Qualitative validation** - Visualize differences between predictions and reality
4. **Practical applications** - Generate future scenarios for decision-making

### 6.1 Complete Pipeline with Predictions

In [None]:
# Complete example: train → test → predict
print("🔄 Complete pipeline with predictions...")

# Configuration
config_dict = parse_runner_yaml_config(
    yaml_path="../examples/runner_config.yaml",
    experiment_id="NHP", 
    dataset_id="test"
)
config = RunnerConfig.from_dict(config_dict)

# Runner
runner = Runner(config=config, output_dir="./prediction_analysis")

# Phase 1: Training
print("📚 1. Training the model...")
runner.run(phase="train")

# Phase 2: Test/Evaluation  
print("🧪 2. Performance evaluation...")
runner.run(phase="test")

# Phase 3: Predictions and comparisons (CRUCIAL!)
print("🔮 3. Generating predictions and distribution comparisons...")
runner.run(phase="predict")

print("✅ Complete pipeline finished!")
print("📊 Results available in:")
print("   - Performance metrics")
print("   - Model simulations") 
print("   - Distribution comparisons")
print("   - Analysis graphs")

### 6.2 Simplified Alternative: Single Command

If you want the complete pipeline all at once:

In [None]:
# Ultra-simple version: everything in one command
runner = Runner(config=config, output_dir="./complete_pipeline")

# Automatically executes: train → test → predict
runner.run(phase="all")

print("🎉 Complete pipeline executed with phase='all'!")
print("💡 This command is equivalent to the 3 separate phases above")

### 6.3 Why Predictions Are Essential

**🎯 Main objective:** Verify that the model has learned the correct temporal distributions.

**📊 What the `predict` phase generates:**
- **Event simulations** based on the trained model
- **Visual comparisons** between real and simulated data
- **Statistical analyses** of temporal distributions
- **Prediction quality metrics**

**🔍 Practical applications:**
- **Finance:** Predict trading volume peaks
- **Healthcare:** Anticipate epidemics or relapses
- **Networks:** Forecast traffic overloads
- **Social:** Model information propagation

**⚠️ Crucial point:** Without the prediction phase, you only have numerical metrics. With predictions, you can **see** if your model truly understands the temporal dynamics of your data.