# UMTS Lineage Tracking Demo

This notebook demonstrates the UMTS (Universal Modality Translator System) lineage tracking feature in MDVTools.

Lineage tracking provides full provenance for your data conversions, recording:
- Source files with cryptographic hashes
- Conversion parameters
- Software versions
- Timestamps

This ensures reproducibility and creates an audit trail for regulatory compliance.


## Setup


In [None]:
import scanpy as sc
import numpy as np
import pandas as pd
from mdvtools import convert_scanpy_to_mdv, MDVProject
from mdvtools.umts import LineageTracker, compute_file_hash
import json
from pathlib import Path


## 1. Convert to MDV with Lineage Tracking

Convert the pbmc3k data to MDV format. Lineage tracking is **enabled by default**.


In [None]:
# Load example data (pbmc3k from scanpy)
adata = sc.datasets.pbmc3k()

# Basic preprocessing
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata.var['mt'] = adata.var_names.str.startswith('MT-')
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)

# Add UMAP
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
sc.pp.pca(adata, n_comps=20)
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=20)
sc.tl.umap(adata)

print(f"Processed: {adata.n_obs} cells, {adata.n_vars} genes")


In [None]:
# Convert with lineage tracking
project_dir = "./umts_demo_project"

mdv = convert_scanpy_to_mdv(
    folder=project_dir,
    scanpy_object=adata,
    max_dims=2,
    delete_existing=True,
    track_lineage=True  # This is the default
)

print(f"✓ Created MDV project at: {project_dir}")
print(f"✓ Lineage information saved")


## 2. Query Lineage Information

The lineage information is stored in `lineage.json` and can be queried using the MDVProject.


In [None]:
# Query lineage
lineage = mdv.get_lineage()

print("=== Lineage Information ===")
print(f"\nUMTS Version: {lineage['umts_version']}")
print(f"Created: {lineage['created_timestamp']}")
print(f"\nConversion function: {lineage['conversion']['function']}")

print("\n=== Conversion Parameters ===")
params = lineage['conversion']['parameters']
for key, value in params.items():
    print(f"  {key}: {value}")

print("\n=== Environment ===")
env = lineage['environment']
print(f"Python: {env['python_version']}")
print(f"Platform: {env['platform']}")
print(f"\nKey packages:")
for pkg in ['mdvtools', 'scanpy', 'numpy', 'pandas']:
    version = env['packages'].get(pkg, 'not installed')
    print(f"  {pkg}: {version}")


## Summary

This notebook demonstrated:

1. ✓ Converting data with automatic lineage tracking
2. ✓ Querying parameters and environment
3. ✓ Persisting lineage with the project

See `docs/UMTS_LINEAGE.md` for more details and advanced usage.
