Trajectory Clustering Analysis (TCA)

🚀 Description

TrajectoryClusteringAnalysis (TCA) is a Python package designed to analyze and visualize individual trajectories over time using sequence clustering techniques. While initially developed for modeling healthcare trajectories (e.g., treatment sequences for cancer patients), TCA is versatile and can be applied to a wide range of life course data such as employment histories, education paths, or any form of individual longitudinal states.

🔍 Main Features

Unidimensional Analysis:
- Modeling Care Trajectories: Representation of patients through chronological sequences of treatments.
Multidimensional Analysis:
- Tensor Decomposition using the SWoTTeD model to identify and analyze complex, multi-event trajectories.
Flexible Distance Metrics: Includes Hamming, Levenshtein, DTW, Optimal Matching (OM), and GAK.
Clustering Algorithms:
- Hierarchical clustering (CAH).
- K-Medoids clustering (for robustness against noise):Clustering based on a precomputed distance matrix.
- K-Means Clustering: Two methods available:
  - Clustering based on the frequency of states.
  - Clustering directly on the wide-format encoded sequences.
Visualization Tools: Heatmaps, dendrograms, cluster plots, etc.
Notebook Examples: Provided for quick experimentation.

📦 Installation

✅ Install from PyPI (recommended)

pip install trajectoryclusteringanalysis

🛠️ Install from source (for development)

Clone the repository:

git clone https://github.com/QuanTIMLab/TrajectoryClusteringAnalysis.git
cd TrajectoryClusteringAnalysis

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Install the package:
```
pip install .
```

⚙️ Basic Usage

from trajectoryclusteringanalysis.tca import TCA

# Example data
trajectories = [
    ["Surgery", "Chemotherapy", "Radiotherapy"],
    ["Chemotherapy", "Radiotherapy"],
    ["Surgery", "Radiotherapy"]
]

# Preprocessing data

# Initialization and clustering
# Example for DataFrame input (ensure df_wide_format is defined, e.g., from pivoted data)
model = tca(data=df_wide_format,
            index_col='id',
            time_col=None,  # Not used in unidimensional analysis
            event_col=None,  # Not used in unidimensional analysis
            alphabet=["Surgery", "Chemotherapy", "Radiotherapy"],
            states=["Surgery State", "Chemotherapy State", "Radiotherapy State"],
            mode='unidimensional')

# Compute distance matrix (e.g., Hamming or Optimal Matching)
distance_matrix = model.compute_distance_matrix(metric='hamming')
# OR with optimal matching and custom costs:
# custom_costs = {'Surgery:Chemotherapy': 1, 'Surgery:Radiotherapy': 2, 'Chemotherapy:Radiotherapy': 3}
# sub_matrix = model.compute_substitution_cost_matrix(method='custom', custom_costs=custom_costs)
# distance_matrix = model.compute_distance_matrix(metric='optimal_matching', substitution_cost_matrix=sub_matrix, indel_cost=1.5)

# Hierarchical Clustering (CAH)
linkage_matrix = model.hierarchical_clustering(distance_matrix)
model.plot_dendrogram(linkage_matrix)
# Visualization
model.plot_clustermap(model.data,linkage_matrix,title="Clustermap of individuals")
# Assign clusters
clusters = model.assign_clusters(linkage_matrix, num_clusters=4)
model.plot_cluster_heatmaps(model.data,clusters,title='Heatmaps of Treatment Sequences by Cluster')

🔬 Applications

TCA is suitable for analyzing sequential data in various domains, such as:

Healthcare: Patient treatment pathways, diagnosis sequences
Social Sciences: Employment trajectories, education paths
Marketing: Customer journey modeling
Sociology/Demography: Life course studies

📁 Repository Structure

TrajectoryClusteringAnalysis/
├── data/                   # Example and demo datasets
├── Notebooks/               # Jupyter notebooks (examples)
├── src/
│   └── trajectoryclusteringanalysis/
│       ├── tca.py
│       ├── plotting.py
│       ├── utils.py
│       ├── logger.py
│       ├── images/                  # Visuals for documentation
│       ├── optimal_matching.pyx
│       ├── unidimensional/
│       └── multidimensional/
├── tests/                  # Unit tests
├── requirements.txt
├── setup.py
├── pyproject.toml
├── MANIFEST.in
├── LICENSE
└── README.md

🧪 Examples

Example notebooks are available in the Notebooks folder to illustrate different trajectory analyses.

🧪 Running Tests

To run the tests, use the following command:

python -m unittest discover -s tests

🤝 Contributing

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📧 Contact

Authors: DIENG Ndiaga & GREVET Nicolas
Email: ndiaga.dieng@univ-amu.fr Email: nicolas.GREVET@univ-amu.fr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Trajectory Clustering Analysis (TCA)

🚀 Description

🔍 Main Features

📦 Installation

✅ Install from PyPI (recommended)

🛠️ Install from source (for development)

⚙️ Basic Usage

🔬 Applications

TCA is suitable for analyzing sequential data in various domains, such as:

📁 Repository Structure

🧪 Examples

🧪 Running Tests

🤝 Contributing

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
Notebooks		Notebooks
data		data
src/trajectoryclusteringanalysis		src/trajectoryclusteringanalysis
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

QuanTIMLab/TrajectoryClusteringAnalysis

Folders and files

Latest commit

History

Repository files navigation

Trajectory Clustering Analysis (TCA)

🚀 Description

🔍 Main Features

📦 Installation

✅ Install from PyPI (recommended)

🛠️ Install from source (for development)

⚙️ Basic Usage

🔬 Applications

TCA is suitable for analyzing sequential data in various domains, such as:

📁 Repository Structure

🧪 Examples

🧪 Running Tests

🤝 Contributing

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages