Neural Hawkes Pytorch

This repository is a more concise and simpler pytorch implementation of the model in paper Hongyuan Mei, Jason Eisner The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process.

Introduction:

A sequence of events with different types is often generated in our lives. For instance, a patient may be diagnosed with different diseases in his or her record history; a kind of stock may be sold or bought several times in a given day. We can define that the i^th event in such a sequence above is a tuple (k_i, t_i), where k_i denote the type of the event and t_i denote when does this event happen. Therefore, a sequence of events can be represented in a sequence of such tuples above. Such sequences are usually called Marked Point Process or Multivariate Point Process. The problem we care about is to predict when will the next event happens and what will be the event type given a stream of events.That is given a stream of event of form:

(k₁, t₁), (k₂, t₂), (k₃, t₃) ... (k_n, t_n)

we want to predict the next event time and type (k_n+1, t_n+1)

Previous Work and Background Knowledge

Intensity Function in Point Process

Please refer to J. G. Rasmussen. Temporal point processes: the conditional intensity function. 2009. for proof and detailed math formular deductions in the place with mark [1] above.

Hawkes Process

LSTM

To learn more about how Neural Network, RNN an LSTM works, Dive into Deep Learning is a good source.

Model Description:

Model and Model Training

Prediction on Time (duration) and types:

Running the Code:

To run the program on your computer, please make sure that you have the following files and packages being downloaded.

Python3: you can download through the link here: https://www.python.org/
Numpy: you can dowload it through command
```
pip install numpy
```
Scikit-Learn: you can download it through command
```
pip install sklearn
```
matplotlib: you can download it through command
```
pip install matplotlib
```
pytorch installation is more complicated than the package described above. You can go to https://pytorch.org/get-started/locally/ for more information. If you still cannot install it on windows computer through pip, you can download Anaconda first and then download the pytorch through method described here: https://dziganto.github.io/data%20science/python/anaconda/Creating-Conda-Environments/

2. In order to train the model, please type the command below for more information:

!python train.py --help

Examples include:

!python train.py --dataset conttime

!python train.py --dataset hawkes --seq_len 75 --batch_size 64

3. In order to test the model, please type the command below for more information:

!python test.py --help

Examples include:

!python test.py --dataset conttime --test_type 2

!python test.py --dataset self-correcting --test_type 1

Google Colab: Because of the complexity of the model, and long training time, it is better to train and test the model on cloud such as Google Colab rather than train the model on your laptop or desktop. The purpose of using Google Colab is to accelerate the training process and protect your personal laptop from overheated caused by a long time intense computing. If you are using a desktop built for neural network training or scientific computing, you may simply ingore this section.

Google Colab is a plotform which allow you to write and run python Jupyter Notebook with CPU, GPU or TPU that are designed for Neural Network training. Google Colab also allows you to type linux command line to execute python files such as files in this repository.

To use the Google Colab, you must use the chrome browser, log in to your google account, and follow the picture below: It is recommanded to use GPU to train the model. To change to GPU mode, select Runtime, Change run time type, and in Hardware accelerator select GPU. Type the commands blow cell by cell:

!git clone  https://github.com/Hongrui24/NeuralHawkesPytorch

!cd NeuralHawkesPytorch

Then you can type the command in this section 2. and 3. to train and test the model.

Testing:

Data Source and Structure:

We use the data provided by the Hongyuan Mei and Du Nan to do tests.

Name	Type of Dataset	Number of types	Number of training sequence	Number of testing sequence	Sequence Length Mean	Sequence Length Min	Sequence Length Max
data_hawkes, data_hawkeshib, data_conttime	Simulated	5	8000	1000	60	20	100
MIMIC-II(1)(2)(3)(4)(5)	Real World Dataset	75	527	65	3	1	31
SO(Stack Overflow) (1)(2)(3)(4)(5)	Real World Dataset	22	4777	1326	72	41	736
hawkes, self-correcting	Simulated	1	64	64	train: 1406, testing: 156	train: 1406, testing: 156	train: 1406, testing: 156

Description of MIMIC II Datasets
The Electron Medical Record (MIMIC II) is a collection of de-identified clinical visit of Intensice Care Unite patient for 7 years. Each event in the dataset is a record of its time stamp and disease diagnosis.
Description of SO (Stack Overflow) Datasets
The Stack Overflow dataset represents two years of user awards on a question-answering website: each user received a sequence of badges

Notice:
The dataset 'data_hawkes', 'data_hawkeshib', 'conttime', 'hawkes', 'data_so', and 'self-correcting' in this repository is truncated from the original data due to uploading difficulty and long training time. You may train the data in this repository with more epochs to get a similar result below. The original dataset can be found in this page and here

Test Results

Log-Likelihood Test on 'data_conttime'

The first test we do is to calculate average log-likelihood of events in test file of "data_conttime", and compare the results in Hongyuan Mei's paper. The model is trained with lr = 0.01, epochs = 30, mini batch size = 10. Test results:

	Model Result	Result on Paper
log-likelihood over seqs	-0.99	-1.00 to -0.98
log-likelihood over time	0.447	0.440 to 0.455
lo-likelihood over type	-1.44	-1.44 to -1.43

We use this test to verify that our pytorch implementation of Neural Hawkes is the Neural Hawkes model described in Hongyuan Mei's paper.

Test on 'hawkes' and 'self-correcting':

We also test out model with data provided in Du, Nan, et al. “Recurrent Marked Temporal Point Processes.” paper about self-correcting and hawkes. We make predictions on inter-event durations, intensities, and calculate RMSE between real inter-event durations and our predictions for events in a test sequence. We also compare the results with Du Nan's RMTPP's prediction and optimal prediction. We train the model for 10 epochs with learning rate = 0.01 and truncated sequence length = 75.

Training Figure of 'self-correcting':
Result of "hawkes" (The first picture is results by Neural Hawkes; the second picture is results by RMTPP on Du et. al's paper):
Result of "self-correcting" (The first picture is results by Neural Hawkes; the second picture is results by RMTPP on Du et. al's paper)

This test show that Neural Hawkes model has the ability to achieve the prediction by optimal equation (prediction made by actual equation behind the dataset) for hawkes and self-correcting.

Test on 'MIMIC-II' and 'SO'

The third test we do is to test on type prediction accuracy. We choose two dataset to do the test: 'MIMICii' and 'SO' (Stack Overflow). For testing, we input a sequence in testing file except the last event to the trained model trained by 'train.pkl' and compare the model prediction with the actual one for the last event. For testing purpose, we also look at how loss and prediction accuracy on types changes with number of epochs, and we compare the type prediction accuracy with the prediction accuracy by pytorch implementation of RMTPP.

Model During Training:
Testing Results on MIMIC-II:

Dataset	(# epochs, lr)Error by Neural Hawkes	(# epochs, lr)Error by RMTPP
data_mimic1	(200, 0.001) 10.8%	(700, 0.0005) 20%
data_mimic2	(300, 0.001) 16.9%	(900, 0.0005) 38.5%
data_mimic3	(200, 0.001) 16.9%	(900, 0.0005) 32.3%
data_mimic4	(200, 0.001) 20%	(2000, 0.0002)36.9%
data_mimic5	(200, 0.001) 9.2%	(2000, 0.0002) 35.4%
MIMIC-II Average	(---, ----)14.76%	(---, ----)32.62%

Testing Results on SO dataset:

Dataset	(epochs, lr)Error by Neural Hawkes
data_so1	(20, 0.01) 62%
data_so1	(20, 0.01) 61.5
data_so1	(20, 0.01) 59.5%
data_so1	(20, 0.01) 63%
data_so1	(20, 0.01) 62.3%
average	(--, ---) 61.66%

Thoughts on Testing Results:

The prediction on types achieve a lower error rate on MIMIC-II dataset than Stack Overflow dataset. This may caused by a simplier type sequence on MIMIC-II dataset. That is event types in single sequence of MIMIC-II seldom changes. The following is a sample print out of a sequence in MIMIC-II and SO:
Sample MIMIC-II Sequence:

Sample SO Sequence:

Thus, the better prediction on types in MIMIC-II dataset may caused by the the recurrence of same event type in each sequence.

Acknowledgement:

This model is built by Hongrui Lyu, supervised by Hyunouk Ko and Dr. Huo. The file cont-time-cell is just a copy from Hongyuan Mei's code, but all other files are written by us. As notice by the original github page of pytorch implementation, this license need to be included.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.idea		.idea
__pycache__		__pycache__
data		data
venv		venv
NeuralHawkes.iml		NeuralHawkes.iml
README.md		README.md
cont_time_cell.py		cont_time_cell.py
conttime.py		conttime.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Hawkes Pytorch

Introduction:

Previous Work and Background Knowledge

Intensity Function in Point Process

Hawkes Process

LSTM

Model Description:

Model and Model Training

Prediction on Time (duration) and types:

Running the Code:

Testing:

Data Source and Structure:

Test Results

Log-Likelihood Test on 'data_conttime'

Test on 'hawkes' and 'self-correcting':

Test on 'MIMIC-II' and 'SO'

Thoughts on Testing Results:

Acknowledgement:

About

Releases

Packages

Languages

Hongrui24/NeuralHawkesPytorch

Folders and files

Latest commit

History

Repository files navigation

Neural Hawkes Pytorch

Introduction:

Previous Work and Background Knowledge

Intensity Function in Point Process

Hawkes Process

LSTM

Model Description:

Model and Model Training

Prediction on Time (duration) and types:

Running the Code:

Testing:

Data Source and Structure:

Test Results

Log-Likelihood Test on 'data_conttime'

Test on 'hawkes' and 'self-correcting':

Test on 'MIMIC-II' and 'SO'

Thoughts on Testing Results:

Acknowledgement:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages