# Feature Selection with Random Forest, Neural Network, and Indicator Species analysis 

To begin, import the Pandas library for importing data and the FeatureSelection class from ML_microbiome: 

In [1]:
import pandas as pd
from RFINN import *

Import training, testing, and target data as Pandas DataFrames: 

In [2]:
train_data = pd.read_csv('train_data.csv')
test_data = pd.read_csv('test_data.csv')
targets = pd.read_csv('DOC_targets.csv')

If available, load representative sequences file. This file should have OTU names in the first column and representative sequences in the second column 

In [3]:
rep_seqs = pd.read_csv("rep_sequences.csv")

Create a Feature Selection object by calling the FeatureSelection class

In [4]:
FS = FeatureSelection(train_data, test_data, targets, rep_seqs=None)

Generate a table with feature selection results by calling the FeatureSelectionTable() function: 

In [5]:
FS_results = FS.FeatureSelectionTable(iterations=100)

1626


Display Feature Selection Results! The table is sorted by the Indicator Species stat by default. 

In [6]:
FS_results.head(10)

Unnamed: 0,Taxa,RF Importance,NN Importance,IS stat,IS Site Label,IS P value,Rep. Sequences
68,OTU_55,0.039361,-1.0,0.825024,Low,0.0,ATCG
10,OTU_12,1.0,-0.815272,0.806841,Low,0.0,ATCG
44,OTU_24,0.102633,0.434487,0.781167,High,0.0,ATCG
40,OTU_23,0.028924,-0.453467,0.776977,Low,0.0,ATCG
21,OTU_16,0.026946,0.559174,0.775166,High,0.0,ATCG
73,OTU_6,0.028074,-0.709085,0.771662,Low,0.0,ATCG
52,OTU_30,0.007772,0.56582,0.758244,High,0.0,ATCG
59,OTU_38,0.013471,-0.441952,0.756098,Low,0.0,ATCG
67,OTU_534,0.030258,0.709439,0.755964,High,0.0,ATCG
49,OTU_29,0.016137,0.662435,0.7542,High,0.0,ATCG


Feature selection values determined by the neural network and random forest are scaled from 0 to 1.

Feature Selection results generally agree across methods, and correlations determined by the neural network (sign of feature importance) matches perfectly with Indicator Species site labels (Low or High) 

The feature selection table can be saved to a .csv file by calling the .to_csv() method 

FS_results.to_csv('myFeatureSelectionResults.csv') 