Skip to content

Rainicy/survival

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Survival Analysis

A Java Survival Analysis Library

This package highly depends on Pyramid Package (version 0.8.4), a Java Machine Learning Library.

This library includes the following algorithms (keep updating):

The package implements the Cox PH for both linear and non-linear models described in the paper, especially in the Supplementary Information part.

The impact of long-term PM2.5 exposure on specific causes of death: exposure-response curves and effect modification among 53 million U.S. Medicare beneficiaries

Bingyu Wang, Ki-Do Eum, Fatemeh Kazemiparkouhi, Cheng Li, Justin Manjourides, Virgil Pavlu & Helen Suh

Environ Health (2020)

To cite this article

Wang, B., Eum, K., Kazemiparkouhi, F. et al. The impact of long-term PM2.5 exposure on specific causes of death: exposure-response curves and effect modification among 53 million U.S. Medicare beneficiaries. Environ Health 19, 20 (2020). https://doi.org/10.1186/s12940-020-00575-0

Requirements

[Application Only] if you just want to use survival as a tool on your dataset, you need Java 8

[Development] if you need modify the code and extend for your own application, you need Java 8 and Maven.

Setup

In order to run survival as an application, please download the latest pre-complied package (with a name like survival-x.x.x.zip) and unzip it. Go to the unzipped folder. You are ready to go now!

Cox PH Linear Model Application

Data Sample

A sample data is located in sample_data/test.csv, which has the following format:

location,date,pm,age,sex,race,total,dead,ratiodead
1,1,12.7045,71,0,1,2,0,0
1,1,12.7045,73,0,1,1,0,0
1,1,12.7045,78,0,1,1,0,0
1,1,12.7045,85,1,1,1,0,0

Config file

A Cox PH config file for training the model is included in config/coxph: (You need assign the data_path, and predictors, stratas, death and life column names based on your data format).

# data path
data_path=sample_data/test.csv

# loader settings
# separated symbol for the given data
sep=,
# the predictors are included in the model
predictors=pm
# stratas
stratas=age,sex,race
# death
death=dead
# life
life=total

# optimizer settings
# whether considering ties for death, default false
isTies=false
# if given multiple cores, set isParallelism as true
isParallelism=true
# whether penalizing the weight
l2=false
# if penalizing the weight, set the strength, otherwise it is useless.
l2Strength=1

# the application class name
survival.class=CoxPH

Train Cox PH linear model

Once the data and config file is ready, you could run the command to train a Cox PH linear model, e.g.

./survival config/coxph

wherein survival is a launcher script for the Cox PH linear model application, and config/coxph is the configuration file specifying the data path, format and algorithm setup.

Note: if you are running out of memory, you probably need a larger machine, and increase memory allocation in the survival file, i.e. JAVA_OPTS="-Xms20g -Xmx200g".

Estimated Result

Once the model is trained from the previous step, the console outputs the estimated coefficients and standard errors. An example of the output is shown as:

done with loading data...
updating death
done with updating death
death/total: 2965/1011945
final average negative log likelihood: 0.027289124009878832
number of features: 1
Negative Log-likelihood History:
[27664.565091730812, 27615.097594555747, 27615.092596347815, 27615.092596176833, 27615.092596176833, 27615.092596176833, 27615.092596176833, 27615.092596176833]
Means
[11.094297994752171]
Var
0       [7.677454251439562E-5]

Results
predictor       coefficient     standard_error
pm              0.011026        0.008762
Training time: 19 sec.

Wherein the estimated predictor is pm, the estimated coefficient is 0.011026 and the SE is 0.008762.

Restricted Cubic Spline Cox PH non-Linear Model Application

Data Sample

Same data format as above.

Config file

A Restricted Cubic Spline (RCS) Cox PH config file for training the model is included in config/rcscoxph: (You need assign the data_path, and predictors, stratas, death and life column names based on your data format).

You need pay attention to the following two new variables:

  • knots: the knots values for the predictor. E.g. number of knots is 3; then the knot locations are at 0.1, 0.5, 0.9 quantitles of the predictor.

  • norm: normalization option (default=2) for the cubic spline.

# data path
data_path=sample_data/test.csv

# loader settings
# separated symbol for the given data
sep=,
# the predictors are included in the model
predictors=pm
# stratas
stratas=age,sex,race
# death
death=dead
# life
life=total

# optimizer settings
# whether considering ties for death, default false
isTies=false
# if given multiple cores, set isParallelism as true
isParallelism=true
# whether penalizing the weight
l2=false
# if penalizing the weight, set the strength, otherwise it is useless.
l2Strength=1

# knots for restricted cubic splines
knots=8.490129,11.025768,13.755482
# normalization options (1 or 2) for the knots, default value 2.
norm=2

# the application class name
survival.class=RCSCoxPH

Train RCS Cox PH non-linear model

Once the data and config file is ready, you could run the command to train a RCS Cox PH linear model, e.g.

./survival config/rcscoxph

For Developer

If you are a Java developer who prefer working with the source code, you can clone this repository to your local machine. (Maven is required)

mvn install pre-required package

  • Download the pyramid-x.x.x.jar (v0.8.4) to your local, and suppose the jar file is located in path_to_pyramid/pyramid-0.8.4.jar.

  • mvn install the jar package to your local, by running

mvn install:install-file -Dfile=path_to_pyramid/pyramid-0.8.4.jar -DgroupId=edu.neu.ccs.pyramid -DartifactId=pyramid -Dversion=0.8.4 -Dpackaging=jar

mvn compile the survival package

  • In order to compile the package from the survival source code (in the cloned survival folder), simply run:
mvn clean package

The compressed package will be created under the target/releases directory.