Confusion with documentation and MDR feature construction output #25

jay-reynolds · 2018-01-12T21:48:19Z

Hi, in the first example in the README, it states:

"For example, MDR can be used to construct a new feature composed from two existing features:"

but "GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1" used in the example has 21 columns, not 2.

The resulting output is a single column, which is a single feature -- is it that there's a single feature produced because that's what those 21 columns boiled down to, or is it because only 2 features from the dataframe were selected and used to construct the new feature? Or is there another reason?

Thanks in advance! I will continue reading the MDR paper I found on pubmed in the meanwhile.

jay-reynolds · 2018-01-12T22:54:22Z

From the paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3500181/

"MDR pools genotypes into 'high-risk' and 'low-risk' or 'response' and 'non-response' groups in order to reduce multidimensional data into only one dimension."

And from the abstract (paper behind paywall): https://www.ncbi.nlm.nih.gov/pubmed/16457852

"To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes."

I suppose that answers my question.

Closing ticket.

rhiever · 2018-01-15T18:11:14Z

Hi @jay-reynolds! I wanted to clarify this for you. In the example from the README:

from mdr import MDR
import pandas as pd

genetic_data = pd.read_csv('https://github.com/EpistasisLab/scikit-mdr/raw/development/data/GAMETES_Epistasis_2-Way_20atts_0.4H_EDM-1_1.tsv.gz', sep='\t', compression='gzip')

features = genetic_data.drop('class', axis=1).values
labels = genetic_data['class'].values

my_mdr = MDR()
my_mdr.fit(features, labels)
my_mdr.transform(features)
>>>array([[1],
>>>       [1],
>>>       [1],
>>>       ...,
>>>       [0],
>>>       [0],
>>>       [0]])

We are taking all of the features from the dataset (20 features in total) and constructing a single new feature from them. This is not a typical use of MDR, but it still works in this case because the example dataset is a fairly "easy" dataset for MDR.

Typically we use MDR in one of two ways:

We know exactly what features we want to perform feature construction on, so we subset the DataFrame down to those features and provide only those features to MDR. The regression example in the README shows an example of this case.
We don't know what features we want to perform feature construction on, so we perform an exhaustive combinatorial search of all possible feature combinations (typically up to tuples of 2 and 3 features) and provide each of those tuples to MDR separately, and choose the best tuple(s) according to some MDR quality metric (typically, 10-fold CV accuracy).

jay-reynolds · 2018-01-16T19:00:15Z

Thank you for the explanation, very much appreciated!

I've got TPOT going, so I think I'll give TPOT-MDR a go and see what it comes up with.

Have you tried using, say, hyperopt for combinatorial search instead of brute force or evolutionary methods?

rhiever · 2018-01-16T19:53:59Z

Have you tried using, say, hyperopt for combinatorial search instead of brute force or evolutionary methods?

We haven't tried that, but would be very curious to see a demo of it!

jay-reynolds closed this as completed Jan 13, 2018

rhiever added the question label Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion with documentation and MDR feature construction output #25

Confusion with documentation and MDR feature construction output #25

jay-reynolds commented Jan 12, 2018

jay-reynolds commented Jan 12, 2018 •

edited

rhiever commented Jan 15, 2018

jay-reynolds commented Jan 16, 2018

rhiever commented Jan 16, 2018

Confusion with documentation and MDR feature construction output #25

Confusion with documentation and MDR feature construction output #25

Comments

jay-reynolds commented Jan 12, 2018

jay-reynolds commented Jan 12, 2018 • edited

rhiever commented Jan 15, 2018

jay-reynolds commented Jan 16, 2018

rhiever commented Jan 16, 2018

jay-reynolds commented Jan 12, 2018 •

edited