Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Cox model for ultra-high dimensional data #457

Open
EQUIWDH opened this issue Nov 15, 2022 · 2 comments
Open

[Question] Cox model for ultra-high dimensional data #457

EQUIWDH opened this issue Nov 15, 2022 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@EQUIWDH
Copy link

EQUIWDH commented Nov 15, 2022

Hello, I am doing some real data analysis about high-dimensional cox model. My real dataset's shape is like 240*7000, however, I try to use the abess.CoxPHSurvivalAnalysis() with cv and it can not choose any feature out. So, I must use screening before abess for Cox model. I also did simulation test for only screening method in abess package and found that the screening method can not contain all the real features spawn by make_glm_data. So, I doubt the algorithm of screening in this package, I hope you guys may adapt it, thank u!!!

@Mamba413
Copy link
Collaborator

Can you offer a minimal code to reproduce your report? Also, does your results is consistent with this paper: Principled sure independence screening for Cox models with ultra-high-dimensional covariates.

@EQUIWDH
Copy link
Author

EQUIWDH commented Nov 16, 2022

Sorry about that, Here is the simulation code using jupyter notebook. And the performance of screening in abess package can not be as good as that in the cox-psis paper because the screening method in the paper can almost contain all the true features no matter how many features you want to choose. I append the result picture.

from abess import make_glm_data
from abess import CoxPHSurvivalAnalysis 
import numpy as np

sim = make_glm_data(n = 240, p = 7000, k = 20, family = 'cox', rho = 0.5, c = 60)
indice_real = np.array(np.where(sim.coef_ != 0)).reshape(-1)
print(indice_real)

cox = CoxPHSurvivalAnalysis(max_iter = 0,screening_size=1000,support_size=1000)
cox.fit(sim.x,sim.y)
indice_sc = np.array(np.where(cox.coef_ != 0)).reshape(-1)

inter = np.intersect1d(indice_sc,indice_real)
print(inter)

choose

@EQUIWDH EQUIWDH changed the title some problems about algorithm for cox model when the dimension is ultra-high Some problems about algorithm for cox model when the dimension is ultra-high Nov 16, 2022
@Mamba413 Mamba413 added the help wanted Extra attention is needed label Nov 17, 2022
@Mamba413 Mamba413 changed the title Some problems about algorithm for cox model when the dimension is ultra-high [Question] Cox model for ultra-high dimensional data Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants