PDHP

The script PDHP.py is an implementation of the Powered Dirichlet-Hawkes process prior coupled to a simple Dirichlet-Multinomial language model, that can be ran from console. If you use it, please cite:

@article{Poux2021PDHP,
    title={Powered Hawkes-Dirichlet Process: Challenging Textual Clustering using a Flexible Temporal Prior},
    author={Ga\"el Poux-M\'edard and Julien Velcin and Sabine Loudcher},
    year={2021},
    booktitle={2021 IEEE International Conference on Data Mining (ICDM)},
    journal={ICDM},
    doi={10.1109/ICDM51629.2021.00062}
}

Usage

Run the script PDHP.py using the following syntax: [keyword]=[value][space] The keywords are:

data_file (str) (*) => The file that contains events
kernel_file (str) (**) => The file that contains Gaussian kernel's parameters
output_folder (str) (***) => Where to save the files (words index and particles every 1000 iterations)
r (float or comma separated floats) => Exponent used for the Powered Dirichlet-Hawkes prior. If a list is provided, experiments will be ran for each value
runs (int) => Number of runs on the given dataset
theta0 (float) => Value of the symmetric Dirichlet-Multinomial prior to model textual content
alpha0 (float) => Value of the symmetric Dirichlet prior to model temporal kernels weights
number_samples (int) => Number of samples using in Gibbs sampling inference of alpha
number_particles (int) => Number of particles used by SMC algorithm; each aprticle keeps track clusters allcoations hypotheses
print_progress (bool) => Whether to print script's progress every 100 documents

(*) The data file must follow the following structure for each event entry:

[timestamp][tabulation][comma-separated words][end line]

(**) The kernel file must follow the following structure:

[lambda0][end line]
[end line]
[mean_1][end line]
[mean_2][end line]
...
[mean_K][end line]
[end line]
[sigma_1][end line]
[sigma_2][end line]
...
[sigma_K][end line]

(***) The particles output file has the following structure:

Particle[tabulation]particle index (int)[tabulation]particle weight (float)[tabulation]events clusters (array of int)[end line]
Cluster[tabulation]cluster index (int)[tabulation]alpha0 (float)[tabulation]inferred alpha (array of floats)[tabulation]textual likelihood (float) [tabulation]number of words in the cluster (int)[tabulation]words distribution (array of ints)[end line]
Cluster[tabulation]cluster index (int)[tabulation]alpha0 (float)[tabulation]inferred alpha (array of floats)[tabulation]textual likelihood (float) [tabulation]number of words in the cluster (int)[tabulation]words distribution (array of ints)[end line]
...
Cluster[tabulation]cluster index (int)[tabulation]alpha0 (float)[tabulation]inferred alpha (array of floats)[tabulation]textual likelihood (float) [tabulation]number of words in the cluster (int)[tabulation]words distribution (array of ints)[end line]
Particle[tabulation]particle index (int)[tabulation]particle weight (float)[tabulation]events clusters (array of int)[end line]
Cluster[tabulation]cluster index (int)[tabulation]alpha0 (float)[tabulation]inferred alpha (array of floats)[tabulation]textual likelihood (float) [tabulation]number of words in the cluster (int)[tabulation]words distribution (array of ints)[end line]
Cluster[tabulation]cluster index (int)[tabulation]alpha0 (float)[tabulation]inferred alpha (array of floats)[tabulation]textual likelihood (float) [tabulation]number of words in the cluster (int)[tabulation]words distribution (array of ints)[end line]

Dependencies

Numpy, Scipy, re

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
data/Reddit		data/Reddit
output/Reddit		output/Reddit
results		results
.gitattributes		.gitattributes
Evaluation.py		Evaluation.py
Generate_data.py		Generate_data.py
PDHP.py		PDHP.py
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDHP

Usage

Dependencies

About

Releases

Packages

Languages

GaelPouxMedard/PDHP

Folders and files

Latest commit

History

Repository files navigation

PDHP

Usage

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages