Feature-Selection

This repository contains R code for various feature selection methods. The code is organized as outlined below.

Model Fits & GOF: Includes functions for fitting the Cox, PO & YP models, as well as testing for goodness-of-fit. For the Cox and PO cases, there are options for adjusting for age and stage, if desired.
- Required: R packages (survival, timereg, YPmodel)
- Inputs:
  - predictor, x (gene expression)
  - data frame with n rows (number of subjects) and columns with survival information (time = survival time, censor = censoring information) and p genes
- Output: model coefficient (beta), coefficient standard error, significance p-value, GOF p-value
Pseudo-R² Measures: Includes functions for calculating each pseudo-R² measures (PO, CO, CH, ModCH and PH). Note, each measure has a separate function. In the future, we plan to combine the code into one R² function with option type = c("PO", "CO", "CH", "ModCH", "PH").
- Required: R package (survival)
- Inputs:
  - predictor, x (gene expression)
  - survival time
  - censoring indicator
- Output: R² measure
R²_LR & R²_I Measures: This code contains functions for R²_LR, R²_{I_PO} & R²_{I_PH}. There are options for adjusting for age and stage, if desired.
- Required: R packages (survival, timereg)
- Inputs:
  - predictor, x (gene expression)
  - survival time
  - censoring indicator
- Output: R² measures
I Measures: This code contains functions for I_PO and I_YP. There are options for adjusting for age and stage for I_PO, if desired.
- Required: R packages (survival, timereg, YPmodel)
- This code assumes that your data is in the the following form:
  - Column 1 = survival time (time)
  - Column 2 = censoring indicator (censor)
  - Columns 3+ = genes
- Output:
  - I_PO, outPO (I, I test statistic, I p-value)
  - I_YP, outYP (I, I test statistic, I p-value)
Youden & AUC: Computes Youden & AUC values based on gene ranking by a specified feature selection method
- Required: R packages (MESS)
- Inputs:
  - data frame with n rows (number of genes) and 1 column (feature selection measure)
  - effectGenes: number of significant genes
- Output option: Specificity, Sensitivity, Youden & AUC
Venn Diagrams: This code creates Venn Diagrams showing various interestions between different feature selection measures.
- Required: R packages (gpplots, VennDiagram, latex2exp)
- Inputs:
  - Obtain a data frame named "out" with n rows (number of genes) and 1 column for each of the measures listed above (except I_YP). All measures here are based on continuous gene expression.
  - Obtain a data frame named "out2" with n rows (number of genes) and 1 column for each of the measures based on dichotomized expression (I_YP, I_PO, concreg).
- Venn Diagrams created:
  - I_PO, I_YP, & concreg (dichotomized case is coded separately)
  - R²_PO, R²_{I_PO} & R²_LR
  - R²_PO, R²_ModCH & R²_CO
Other Existing Measures: This code contains functions for computing some existing measures in the literature.
- Required: R packages (concreg, survAUC, survival, pec, timereg)
- Measures computed:
  - Concreg (Dunkler et al. 2010)
  - Uno's C (Uno et al. 2011)
  - R²_G - PH & PO cases (Graf et al. 1999; Gerds & Schumacher 2006)
  - R²_SH (Schemper & Henderson 2000)
- Inputs:
  - predictor, x (gene expression)
  - survival time
  - censoring indicator
- Output: Concreg (absolute effect size), Uno's C, R²_G (PH), R²_G (PO) and R²_SH.
Simulations: Contains R code for simulating data
- Scheme 1: Univariate aproach; genes linked to survival one at a time
- Scheme 2: Multivariate approach; incorporates correlation between features
- For both schemes, there are options to simulate from the following models: LN, LL1, LL2, W1, W2
Complete Example: This example does the following
1. Creates a simulated data set (Scheme 1 - W, 33% censoring)
2. Obtains PH, PO & YP model fits and GOF
3. Computes all proposed measures for feature selection:
  - I measures - I_PO, I_YP
  - R²_measures - R²_LR, R²_{I_PO}, R²_{I_PH}, R²_PO, R²_CO, R²_ModCH
  - R²_measures (by Rouam et al. 2010, 2011) - R²_CH, R²_PH
4. Computes other existing measures (concreg, Uno's C, R²_G & R²_SH)
5. Computes Sensitivity, Specificity, Youden & AUC for each measure.
6. Creates venn diagrams showing overlaps between measures.
- Note: Before running this example, some functions need to be run from the other R code in this repository. All required functions are noted thoughout the example.
- Required: R packages (survival, timereg, YPmodel, concreg, survAUC, pec, MESS, gplots, VennDiagram, latex2exp)

Copyright & Citations

Spirko, L.N., Devarajan, K. Unified methods for variable selection in large-scale genomic studies with censored survival outcomes. Under review. COBRA pre-print series, Article 120 (June 2019). http://biostats.bepress.com/cobra/art120.

Spirko, L. (2017). Variable Selection and Supervised Dimension Reduction for Large-Scale Genomic Data with Censored Survival Outcomes. Ph.D. Dissertation. Department of Statistical Science, Temple University, Philadelphia.

License

This work by Lauren Spirko-Burns and Karthik Devarajan is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
R Code		R Code
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R Code

R Code

README.md

README.md

Repository files navigation

Feature-Selection

Copyright & Citations

License

About

Releases

Packages

Languages

lburns27/Feature-Selection

Folders and files

Latest commit

History

R Code

R Code

README.md

README.md

Repository files navigation

Feature-Selection

Copyright & Citations

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages