# netremCV
## By: Saniya Khullar
Cross-validation approach for estimating the optimal $\beta_{net}$ and $\alpha_{lasso}$ for NetREm models.

Selection for $\beta_{net}$ can impact the optimal values for $\alpha_{net}$

netremCV(`edge_list`,<br>`X`, # *gene expression data for the predictors (e.g. Transcription Factors (TFs))*<br>
`y`, # *gene expression data for the response variable (e.g. target gene (TG))* <br>
             `num_beta`: int = 10,<br>
             `extra_beta_list` = [0.25, 0.5, 0.75, 1], # *additional beta to try out*<br>
            `num_alpha`: int = 10,<br>
             `max_beta`: float = 200,  # *max_beta used to help prevent explosion of beta_net values*<br>
            `reduced_cv_search`: bool = False, # *should we do a reduced search (Randomized Search) or a GridSearch?*<br>
             `default_edge_weight`: float = 0.1,<br>
            `degree_threshold`: float = 0.5,<br>
            `gene_expression_nodes` = [],<br>
            `overlapped_nodes_only`: bool = False,<br>
             `standardize_X`: bool = True,<br>
             `center_y`: bool = True,<br>
            `y_intercept`: bool = False,<br>
            `model_type` = "Lasso",<br>
            `lasso_selection` = "cyclic", # *default in sklearn*<br>
            `all_pos_coefs`: bool = False,<br>
            `tolerance`: float = 1e-4,<br>
            `maxit`: int = 10000,<br>
            `num_jobs`: int = -1,<br>
            `num_cv_folds`: int = 5,<br>
            `lassocv_eps`: float = 1e-3, # *default in sklearn*<br>
            `lassocv_n_alphas`: int = 100, # *default in sklearn*<br>
            `lassocv_alphas` = None, # *default in sklearn*<br>
            `verbose` = False,<br>
            `searchVerbosity`: int = 2,<br>
            `show_warnings`: bool = False<br>):

In [1]:
from DemoDataBuilderXandY import generate_dummy_data
from Netrem_model_builder import *
import PriorGraphNetwork as graph
import error_metrics as em 
import essential_functions as ef
import netrem_evaluation_functions as nm_eval

dummy_data = generate_dummy_data(corrVals = [0.9, 0.5, 0.3, -0.2, -0.8],
                                 num_samples_M = 100000,
                                 train_data_percent = 70)

:) same_train_test_data = False


Generating predictors:   0%|          | 0/5 [00:00<?, ?it/s]

Please note that since we hold out 30.0% of our 100000 samples for testing, we have:
X_train = 70000 rows (samples) and 5 columns (N = 5 predictors) for training.
X_test = 30000 rows (samples) and 5 columns (N = 5 predictors) for testing.
y_train = 70000 corresponding rows (samples) for training.
y_test = 30000 corresponding rows (samples) for testing.


  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

Gene expression data for predictors (X: Transcription Factors (TFs)) and response variable (y: target gene (TG)):

In [2]:
# 70,000 samples for training data (used to train and fit GRegulNet model)
X_train = dummy_data.view_X_train_df()
y_train = dummy_data.view_y_train_df()

# 30,000 samples for testing data
X_test = dummy_data.view_X_test_df()
y_test = dummy_data.view_y_test_df()

In [3]:
X_train

Unnamed: 0,TF1,TF2,TF3,TF4,TF5
0,-0.133190,0.590034,-0.537113,0.085502,0.360832
1,0.441382,0.845980,-1.344182,0.328925,0.311403
2,0.671258,1.171499,1.013758,-0.952091,0.265659
3,-0.565290,0.719051,0.344798,1.276574,-0.349368
4,-1.410821,0.522239,-0.679817,1.101129,1.755733
...,...,...,...,...,...
69995,0.639805,-0.727337,0.895220,0.266462,-0.287123
69996,0.491223,1.649460,-1.260947,-0.452498,-0.300503
69997,-0.688052,-0.428763,-1.080820,-0.933508,0.795700
69998,-2.117304,-1.195660,1.409443,2.082779,1.809950


Input Protein-Protein Interaction (PPI) network relating TF predictors to each other:

In [4]:
# prior network edge_list:
edge_list = [["TF1", "TF2", 0.9], ["TF4", "TF5", 0.75], ["TF1", "TF3"], ["TF1", "TF4"], ["TF1", "TF5"], 
             ["TF2", "TF3"], ["TF2", "TF4"], ["TF2", "TF5"], ["TF3", "TF4"], ["TF3", "TF5"]]
edge_list

[['TF1', 'TF2', 0.9],
 ['TF4', 'TF5', 0.75],
 ['TF1', 'TF3'],
 ['TF1', 'TF4'],
 ['TF1', 'TF5'],
 ['TF2', 'TF3'],
 ['TF2', 'TF4'],
 ['TF2', 'TF5'],
 ['TF3', 'TF4'],
 ['TF3', 'TF5']]

In [5]:
%%time 

netrem_demoCV = netremCV(edge_list = edge_list, X = X_train, y = y_train) 
netrem_demoCV

:) Generating beta_net and alpha_lasso pairs:   0%|          | 0/14 [00:00<?, ?it/s]

  0%|          | 0/28 [00:00<?, ?it/s]

Fitting 5 folds for each of 28 candidates, totalling 140 fits
[CV] END ....alpha_lasso=0.0008985897297578544, beta_net=1.0; total time=   0.1s
[CV] END ....alpha_lasso=0.0008985897297578544, beta_net=1.0; total time=   0.1s
[CV] END ....alpha_lasso=0.0008985897297578544, beta_net=1.0; total time=   0.1s
[CV] END ....alpha_lasso=0.0008985897297578544, beta_net=1.0; total time=   0.0s
[CV] END ....alpha_lasso=0.0008985897297578544, beta_net=1.0; total time=   0.1s
[CV] END ......alpha_lasso=0.02246474324394635, beta_net=1.0; total time=   0.0s
[CV] END ......alpha_lasso=0.02246474324394635, beta_net=1.0; total time=   0.1s
[CV] END ......alpha_lasso=0.02246474324394635, beta_net=1.0; total time=   0.0s
[CV] END ......alpha_lasso=0.02246474324394635, beta_net=1.0; total time=   0.1s
[CV] END ......alpha_lasso=0.02246474324394635, beta_net=1.0; total time=   0.1s
[CV] END ...alpha_lasso=0.0008985897297578534, beta_net=0.75; total time=   0.1s
[CV] END ...alpha_lasso=0.0008985897297578534, 

[CV] END alpha_lasso=0.02246474324394635, beta_net=3.8576664392664197; total time=   0.1s
[CV] END alpha_lasso=0.02246474324394635, beta_net=3.8576664392664197; total time=   0.1s
[CV] END alpha_lasso=0.02246474324394635, beta_net=3.8576664392664197; total time=   0.1s
[CV] END alpha_lasso=0.02246474324394635, beta_net=3.8576664392664197; total time=   0.0s
[CV] END alpha_lasso=0.0008985897297578544, beta_net=2.312610273324169; total time=   0.1s
[CV] END alpha_lasso=0.0008985897297578544, beta_net=2.312610273324169; total time=   0.1s
[CV] END alpha_lasso=0.0008985897297578544, beta_net=2.312610273324169; total time=   0.1s
[CV] END alpha_lasso=0.0008985897297578544, beta_net=2.312610273324169; total time=   0.0s
[CV] END alpha_lasso=0.0008985897297578544, beta_net=2.312610273324169; total time=   0.1s
[CV] END alpha_lasso=0.02246474324394635, beta_net=2.312610273324169; total time=   0.1s
[CV] END alpha_lasso=0.02246474324394635, beta_net=2.312610273324169; total time=   0.0s
[CV] EN

In [6]:
netrem_demoCV

In [7]:
netrem_demoCV.final_corr_vs_coef_df

Unnamed: 0,info,input_data,TF1,TF2,TF3,TF4,TF5
0,network regression coeff. with y: y,X_train,0.614536,0.106068,0.007784,-0.030845,-0.298912
0,corr (r) with y: y,X_train,0.900178,0.496234,0.302551,-0.203738,-0.800527
0,Absolute Value NetREm Coefficient Ranking,X_train,1.0,3.0,5.0,4.0,2.0


In [8]:
netrem_demoCV.get_params()

{'info': 'NetREm Model',
 'alpha_lasso': 0.0008985897297578544,
 'beta_net': 0.25,
 'y_intercept': False,
 'model_type': 'Lasso',
 'standardize_X': True,
 'center_y': True,
 'max_lasso_iterations': 10000,
 'network': <PriorGraphNetwork.PriorGraphNetwork at 0x234f33479d0>,
 'verbose': False,
 'all_pos_coefs': False,
 'model_info': 'fitted_model :)',
 'target_gene_y': 'y',
 'tolerance': 0.0001,
 'lasso_selection': 'cyclic'}

In [9]:
netrem_demoCV.test_mse(X_train, y_train)

0.13363812169073666

In [10]:
netrem_demoCV.test_mse(X_test, y_test)

0.13487280608906413

In [11]:
netrem_demoCV.model_nonzero_coef_df

Unnamed: 0,y_intercept,TF1,TF2,TF3,TF4,TF5
0,,0.614536,0.106068,0.007784,-0.030845,-0.298912


In [12]:
netrem_demoCV.combined_df

Unnamed: 0,coef,TF,TG,info,train_mse,beta_net,alpha_lasso,AbsoluteVal_coefficient,Rank,final_model_TFs,TFs_input_to_model,original_TFs_in_X,standardized_X,centered_y
0,,y_intercept,y,netrem_no_intercept,0.133638,0.25,0.000899,,6,5,5,5,True,True
1,0.614536,TF1,y,netrem_no_intercept,0.133638,0.25,0.000899,0.614536,1,5,5,5,True,True
2,0.106068,TF2,y,netrem_no_intercept,0.133638,0.25,0.000899,0.106068,3,5,5,5,True,True
3,0.007784,TF3,y,netrem_no_intercept,0.133638,0.25,0.000899,0.007784,5,5,5,5,True,True
4,-0.030845,TF4,y,netrem_no_intercept,0.133638,0.25,0.000899,0.030845,4,5,5,5,True,True
5,-0.298912,TF5,y,netrem_no_intercept,0.133638,0.25,0.000899,0.298912,2,5,5,5,True,True


In [13]:
netrem_demoCV.B_interaction_df

Unnamed: 0,TF1,TF2,TF3,TF4,TF5
TF1,1.023976,0.426869,0.207654,-0.186497,-0.721913
TF2,0.426869,1.023976,0.082365,-0.104892,-0.400304
TF3,0.207654,0.082365,9.0,-0.128178,-0.303677
TF4,-0.186497,-0.104892,-0.128178,1.020979,0.148067
TF5,-0.721913,-0.400304,-0.303677,0.148067,1.020979


In [14]:
organize_B_interaction_network(netrem_demoCV)

Unnamed: 0,TF1,TF2,B_train_weight,sign,potential_interaction,absVal_B,info,candidate_TFs_N,target_gene_y,num_final_predictors,model_type,beta_net,X_standardized,gene_data,rank,percentile
20,TF1,TF5,-0.721913,:(,:( competitive (-),0.721913,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,1.0,95.0
4,TF5,TF1,-0.721913,:(,:( competitive (-),0.721913,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,1.0,95.0
5,TF1,TF2,0.426869,:),:(,0.426869,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,3.0,85.0
1,TF2,TF1,0.426869,:),:(,0.426869,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,3.0,85.0
9,TF5,TF2,-0.400304,:(,:( competitive (-),0.400304,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,5.0,75.0
21,TF2,TF5,-0.400304,:(,:( competitive (-),0.400304,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,5.0,75.0
14,TF5,TF3,-0.303677,:(,:( competitive (-),0.303677,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,7.0,65.0
22,TF3,TF5,-0.303677,:(,:( competitive (-),0.303677,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,7.0,65.0
2,TF3,TF1,0.207654,:),:(,0.207654,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,9.0,55.0
10,TF1,TF3,0.207654,:),:(,0.207654,B matrix of TF-TF interactions,5,y,5,Lasso,0.25,True,training gene expression data,9.0,55.0
