# scRFE Tutorial


Here we present an example of how to use scRFE. We analyze the Limb Muscle Facs data from the Tabula-Muris-Senis dataset that is available on Figshare. We split the data by age.

More features were selected than ideal in this model, because we used a very small number of estimators and a low CV score, for time's sake. This results are not accurate though, and we recommend running the code with 1000 estimators and CV>=5 with an EC2 instance.

### Imports 

In [9]:
# Imports 
import numpy as np
import pandas as pd
import scanpy as sc
from anndata import read_h5ad
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectFromModel
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import RFE
from sklearn.feature_selection import RFECV

### Read in anndata file 

In [10]:
adata = read_h5ad('/Users/madelinepark/Downloads/Limb_Muscle_facs.h5ad')
tiss = adata

In [29]:
age_of_interest = list(set(tiss.obs['age']))[0]

tiss.obs.loc[tiss.obs[tiss.obs['age'] == age_of_interest].index,'age_type_of_interest'] = age_of_interest
# tiss.obs['age_type_of_interest']
# tiss.obs

In [23]:
tiss.var_names

Index(['0610005C13Rik', '0610007C21Rik', '0610007L01Rik', '0610007N19Rik',
       '0610007P08Rik', '0610007P14Rik', '0610007P22Rik', '0610008F07Rik',
       '0610009B14Rik', '0610009B22Rik',
       ...
       'Zxdb', 'Zxdc', 'Zyg11a', 'Zyg11b', 'Zyx', 'Zzef1', 'Zzz3', 'a',
       'l7Rn6', 'zsGreen_transgene'],
      dtype='object', name='index', length=22899)

### Run scRFE

we decreased n_estimators and cv so that the code will run faster, but you should increase both before using

In [28]:
tiss.obs['age_type_of_interest'] = 'rest'
results_age_cv = pd.DataFrame() #create results data frame 

for c in list(set(tiss.obs['age'])): 
    print(c)
    clf = RandomForestClassifier(n_estimators=10, random_state=0, n_jobs=-1, oob_score=True)
    selector = RFECV(clf, step=0.2, cv=3, n_jobs=4) # step = % rounded down at each iteration  
    age_of_interest = c
    
    tiss.obs.loc[tiss.obs[tiss.obs['age'] == age_of_interest].index,'age_type_of_interest'] = age_of_interest
    
    feat_labels = tiss.var_names 
    X = tiss.X
    y = tiss.obs['age_type_of_interest']
    
    print('training...')
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05, random_state=0) 
    clf.fit(X_train, y_train)
    selector.fit(X_train, y_train)
    feature_selected = feat_labels[selector.support_] 
    
    print('result writing')
    column_headings = []
    column_headings.append(c)
    column_headings.append(c + '_gini')
    
    resaux = pd.DataFrame(columns=column_headings)
    resaux[c] = feature_selected
    resaux[c + '_gini'] = (selector.estimator_.feature_importances_)
    
    print(feature_selected)
    print (selector.estimator_.feature_importances_)
    
    results_age_cv = pd.concat([results_age_cv,resaux],axis=1)
    
    tiss.obs['age_type_of_interest'] = 'rest'
    
results_age_cv

24m
index
A10_B001732_S70_L004.mus-2-0      24m
A11_B001732_S71_L004.mus-2-0      24m
A12_B001732_S72_L004.mus-2-0      24m
A13_B001732_S73_L004.mus-2-0      24m
A14_B001732_S74_L004.mus-2-0      24m
A15_B001732_S75_L004.mus-2-0      24m
A16_B001732_S76_L004.mus-2-0      24m
A17_B001732_S77_L004.mus-2-0      24m
A18_B001732_S78_L004.mus-2-0      24m
A19_B001732_S79_L004.mus-2-0      24m
A1_B001732_S61_L004.mus-2-0       24m
A20_B001732_S80_L004.mus-2-0      24m
A22_B001732_S82_L004.mus-2-0      24m
A3_B001732_S63_L004.mus-2-0       24m
A4_B001732_S64_L004.mus-2-0       24m
A5_B001732_S65_L004.mus-2-0       24m
A6_B001732_S66_L004.mus-2-0       24m
A8_B001732_S68_L004.mus-2-0       24m
A9_B001732_S69_L004.mus-2-0       24m
B11_B001732_S95_L004.mus-2-0      24m
B13_B001732_S97_L004.mus-2-0      24m
B16_B001732_S100_L004.mus-2-0     24m
B17_B001732_S101_L004.mus-2-0     24m
B18_B001732_S102_L004.mus-2-0     24m
B19_B001732_S103_L004.mus-2-0     24m
B20_B001732_S104_L004.mus-2-0     24m
B2

  warn("Some inputs do not have OOB scores. "
  predictions[k].sum(axis=1)[:, np.newaxis])
  warn("Some inputs do not have OOB scores. "
  predictions[k].sum(axis=1)[:, np.newaxis])
  warn("Some inputs do not have OOB scores. "
  predictions[k].sum(axis=1)[:, np.newaxis])
  warn("Some inputs do not have OOB scores. "
  predictions[k].sum(axis=1)[:, np.newaxis])
  warn("Some inputs do not have OOB scores. "
  predictions[k].sum(axis=1)[:, np.newaxis])


KeyboardInterrupt: 