Hierarchical NonParmetric Point Processes (HNPP) is unified statistical framework for modeling temporal marked events and predicting future events using the history of events. HNPP is able to model and infer the growing number of patterns underly the events by sharing the patterns among users using a hierarchical structure. In this library we implemented two variants of HNPP, i.e. Hierarchical Dirichlet Point Process (HDPP) and Factorized Gamma Point Processes (FGPP). HDPP models the time of events using a multi-dimensional point process and mark of events by a Dirichlet Process and shares the patterns among users by a common base measure. FGPP is another variant of HNP3 which infers the latent patterns underly events and shares such patterns among users using a size-biased ordering. In this library, we implemented an online inference algorithm based on sequential Monte Carlo for HDPP which is implemented in Python and a scalable variational inference algorithm for FGPP which is implemented in Matlab.
In order to evaluate the HDPP method in the context of content diffusion over social networks, we gathered two datasets from EventRegistry and Twitter. These datasets can be downloaded from here and here respectively. In order to run the code on these datasets, copy these files in data folder of HDPP folder.
To evaluate the FGPP, we used the Tianchi and last.fm datasets. In order to run the code on these datasets, download these datasets and copy these files in Datasets folder of FGPP folder.
-
Python version 3.x
-
Matlab version R2014 or later
-
A coherent generative model for content over social media
-
A novel generative model for learning user preferences and recommendation
-
The models manage their complexity by adapting the size of the latent space over time.
-
An online algorithm for inference on HDPP and a scalable variational inference algorithm for FGPP.
The EventRegistry and Twitter dataset which is used for evaluating the HDPP model is in the data folder as a sample. For HDPP model, the dataset should contain the following files:
events.csv: The events sorted in an increasing order of time in the following format:
<time> <doc_id> <user_id> <is_dup> <dup_doc_id> <words_len> [word_id:word_count] <ne_len> [ne_id:ne_count]
wordmap.pickle: The mapping between the words and their corresponding index. Indices start from 0.
wordcount.pickle: Number of times each word have occurred. The word are represented by their corresponding index.
nemap.pickle: The mapping between the named entities and their corresponding index. Indices start from 0.
necount.pickle: Number of times each named entity have occurred. The word are represented by their corresponding index.
If the dataset contains the social relations among the users, then the dataset folder should contain a file which contains the adjacency list among the users. The name of this file should be "adjacency_matrix.pickle" which is a numpy matrix that contains the adjacency matrix.
In order to run HDPP model:
-
Run the install script using python by
python setup.py build_ext --inplace
-
Set the Dataset in the "Main" Script
-
Run the Main script by
python main.py MethodName DatasetName
where MethodName can be "HDPP" or "DirichletHawkes" or "Hawkes" and DatasetName can be "EventRegistry" or any other dataset which is located in "data" folder. The results will be saved under the results folder.
In order to run FGPP method:
Go to the methods folder
Set the Dataset in the "RunFGPP" Script
Run the run script
The results will be saved under the "Results" folder.