Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Handling genes with only 1 region in search space #2

Closed
cbravo93 opened this issue Sep 10, 2021 · 0 comments
Closed

[BUG] Handling genes with only 1 region in search space #2

cbravo93 opened this issue Sep 10, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@cbravo93
Copy link
Member

cbravo93 commented Sep 10, 2021

  • Information

On the link inference, the program suddenly stops on certain genes with this error:

Traceback (most recent call last):
  File "/data/leuven/software/biomed/skylake_centos7/2018a/software/Python/3.7.4-GCCcore-6.4.0/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/leuven/software/biomed/skylake_centos7/2018a/software/Python/3.7.4-GCCcore-6.4.0/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/lustre1/project/stg_00002/lcb/sdewin/PhD/python_modules/scenicplus/src/scenicplus/cli/benchmarking.py", line 170, in <module>
    main()
  File "/lustre1/project/stg_00002/lcb/sdewin/PhD/python_modules/scenicplus/src/scenicplus/cli/benchmarking.py", line 166, in main
    args.func(args)
  File "/lustre1/project/stg_00002/lcb/sdewin/PhD/python_modules/scenicplus/src/scenicplus/cli/benchmarking.py", line 80, in region_to_gene_command
    _temp_dir = temp_dir
  File "/lustre1/project/stg_00002/lcb/sdewin/PhD/python_modules/scenicplus/src/scenicplus/enhancer_to_gene.py", line 479, in calculate_regions_to_genes_relationships
    **kwargs)
  File "/lustre1/project/stg_00002/lcb/sdewin/PhD/python_modules/scenicplus/src/scenicplus/enhancer_to_gene.py", line 419, in score_regions_to_genes
    regions_to_genes = {gene: regions_to_gene for gene, regions_to_gene in zip(genes_to_use, regions_to_genes)}
UnboundLocalError: local variable 'regions_to_genes' referenced before assignment

Doing some test, I realized this happens for genes with only 1 region in the search space (1 feature); the real error it is not printed because it is within a try [https://github.com/aertslab/scenicplus/blob/main/src/scenicplus/enhancer_to_gene.py, 399-419], but this is a reproducible test:

import numpy as np
from scipy.stats import pearsonr, spearmanr
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor, ExtraTreesRegressor
RF_KWARGS = {
    'n_jobs': 1,
    'n_estimators': 1000,
    'max_features': 'sqrt'
}
SKLEARN_REGRESSOR_FACTORY = {
    'RF': RandomForestRegressor,
    'ET': ExtraTreesRegressor,
    'GBM': GradientBoostingRegressor
}
SCIPY_CORRELATION_FACTORY = {
    'PR': pearsonr,
    'SR': spearmanr
}
#RUNS
test=score_regions_to_single_gene(imp_acc.mtx[0:5,0:5].todense(), np.array(dgem.iloc[0,0:5]), 'RF', RF_KWARGS)
#ERROR
test=score_regions_to_single_gene(imp_acc.mtx[0,0:5].todense(), np.array(dgem.iloc[0,0:5]), 'RF', RF_KWARGS)

Which outputs:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-40-64b45ad4144d> in <module>
     16     'SR': spearmanr
     17 }
---> 18 test=score_regions_to_single_gene(imp_acc.mtx[0,0:5].todense(), np.array(dgem.iloc[0,0:5]), 'RF', RF_KWARGS)
     19 test

<ipython-input-8-6fcb3afe2133> in score_regions_to_single_gene(X, y, regressor_type, regressor_kwargs)
     16                                                     regressor_kwargs = regressor_kwargs,
     17                                                     tf_matrix = X,
---> 18                                                     target_gene_expression = y)
     19             #get importance scores for each feature
     20             feature_importance = arboreto_core.to_feature_importances(  regressor_type = regressor_type, 

~/.local/lib/python3.7/site-packages/arboreto/core.py in fit_model(regressor_type, regressor_kwargs, tf_matrix, target_gene_expression, early_stop_window_length, seed)
    125         target_gene_expression = target_gene_expression.A.flatten()
    126 
--> 127     assert tf_matrix.shape[0] == target_gene_expression.shape[0]
    128 
    129 

AssertionError: 
  • Solution
    Adding a transposition when there is only one region solves it (returns a importance of 1):
#ERROR
test=score_regions_to_single_gene(imp_acc.mtx[0,0:5].todense(), np.array(dgem.iloc[0,0:5]), 'RF', RF_KWARGS)
#WORKS
test=score_regions_to_single_gene(imp_acc.mtx[0,0:5].T.todense(), np.array(dgem.iloc[0,0:5]), 'RF', RF_KWARGS)

I will test this further and push if all goes well, but this situation may need to be handled in other steps too (e.g. binarization), so I will keep the issue open until all is tested.

Cheers!

C

@cbravo93 cbravo93 added the bug Something isn't working label Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant