# Final ELN models over ten random seeds

## Outline

The **MLAging - SVZ all-cell** workflow consists of sections:

`30 SVZpreprocessing.R` Data preprocessing and preparation in Seurat.

`311 SVZ All-cell ELN Tuning - Before Binarization` ML model tunning using *non-binarized* HVGs and hyperparameter selection using `GridSearchCV`.

`312 SVZ All-cell ELN Tuning - After Binarization` ML model tunning using *binarized* HVGs and hyperparameter selection using `GridSearchCV`.

`321 SVZ All-cell ELN 10x` Run the best ELN model for both binarized and nonbinarized HVGs over 10 random seed  -- **this notebook:** 

`322 SVZ All-cell MLP 10x - Before Binarization` Run the best MLP model for *non-binarized* HVGs over 10 random seeds.

`323 SVZ All-cell MLP 10x - After Binarization` Run the best MLP model for *binarized* HVGs over 10 random seeds.
 
`33 SVZ All-cell Model Result Viz` Result visulization.

`34 SVZ All-cell Stat` Stat test on whether exercise rejuvenates cells.

In [1]:
import warnings
warnings.filterwarnings('ignore')
from src.data_processing import *

from src.preprocessing_eln import *
import os
import numpy as np
from sklearn.metrics import make_scorer

data_type = 'float32'

In [2]:
pr_auc_scorer = make_scorer(pr_auc_score, greater_is_better=True,
                            needs_proba=True)

In [3]:
input_train = '../data/svz_processed/svz_ctl_train_cell_sep3integ_batch1.csv'
input_test = '../data/svz_processed/svz_ctl_test_cell_sep3integ_batch2.csv'

### After Binarization

In [4]:
train_X, train_y, test_X, test_y, custom_cv = data_prep(input_test, input_train, "All", binarization=True)

Finished data prepration for All


In [5]:
scores = []
final_models = []
final_test = []
for i in tqdm(range(10)):
    random_state = 42*i    
    X_test, y_test = shuffle(test_X, test_y, random_state=random_state)
    X_train, y_train = shuffle(train_X, train_y, random_state=random_state)
    
    eln = LogisticRegression(penalty='elasticnet', C=0.016681005372000592, l1_ratio=0, 
                             solver='saga', max_iter=10000000)
        
    eln.fit(X_train, y_train)
    
    y_pred = eln.predict_proba(X_test)[:, 1]
    auprc = pr_auc_score(y_test, y_pred)
    
    final_test.append((X_test, y_test))
    final_models.append(eln)
    scores.append(auprc)   
print(f'auprc: {mean(scores)} ± {stdev(scores)}' )

100%|██████████| 10/10 [05:00<00:00, 30.07s/it]

auprc: 0.8770983217994573 ± 8.738276997744861e-07





In [6]:
file = open('../results/svz_int2/eln_model_test_scores.save', 'wb')
pickle.dump(scores, file)
file.close()

file = open('../results/svz_int2/eln_model_test_sets.save', 'wb')
pickle.dump(final_test, file)
file.close()

file = open('../results/svz_int2/eln_model_test_models.save', 'wb')
pickle.dump(final_models, file)
file.close()

### Before Binarization

In [7]:
train_X, train_y, test_X, test_y, custom_cv = data_prep(input_test, input_train, "All", binarization=False)

Finished data prepration for All


In [8]:
scores = []
final_models = []
final_test = []
for i in tqdm(range(10)):
    random_state = 42*i    
    X_test, y_test = shuffle(test_X, test_y, random_state=random_state)
    X_train, y_train = shuffle(train_X, train_y, random_state=random_state)
    
    eln = LogisticRegression(penalty='elasticnet', C=0.0001, l1_ratio=0.1778279410038923, 
                             solver='saga', max_iter=10000000)
        
    eln.fit(X_train, y_train)
    
    y_pred = eln.predict_proba(X_test)[:, 1]
    auprc = pr_auc_score(y_test, y_pred)
    
    final_test.append((X_test, y_test))
    final_models.append(eln)
    scores.append(auprc)   
print(f'auprc: {mean(scores)} ± {stdev(scores)}' )

100%|██████████| 10/10 [01:22<00:00,  8.22s/it]

auprc: 0.39842376680553476 ± 8.518086605579367e-07





In [10]:
file = open('../results/svz_int2_before/eln_model_test_scores.save', 'wb')
pickle.dump(scores, file)
file.close()

file = open('../results/svz_int2_before/eln_model_test_sets.save', 'wb')
pickle.dump(final_test, file)
file.close()

file = open('../results/svz_int2_before/eln_model_test_models.save', 'wb')
pickle.dump(final_models, file)
file.close()