New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added mean, min, and max feature extraction methods #20
Conversation
I think it is fine to start with these. It's pretty easy to do this using the already built in functionality
I think the only thing that yours adds is the renaming of the column names. At some point we need to figure out how to be able to chain or pass a list of features to extract, or possibly a pipeline. I know I sound like a broken record, but I really like how pliers solves this problem. It can deal with multiple features in one line and merges them all together in the output automatically. |
It renames the columns as well as transposing the output when not grouping,
but yes the added functionality is not much. That said, I think it would be
nice for all feature extraction methods to follow the standard format
'Fex.extract_*' so that users do not need to know which are pandas default
versus specific to feat.
I agree that a pipeline would be best. I can begin looking into how Pliers
does this tomorrow and see what I can do to implement something similar.
…On Tue, Feb 6, 2018 at 11:08 PM Luke Chang ***@***.***> wrote:
I think it is fine to start with these. It's pretty easy to do this using
the already built in functionality
fex.groupby(column_name).mean()
I think the only thing that yours adds is the renaming of the column names.
At some point we need to figure out how to be able to chain or pass a list
of features to extract, or possibly a pipeline. I know I sound like a
broken record, but I really like how pliers solves this problem. It can
deal with multiple features in one line and merges them all together in the
output automatically.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#20 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AYbVKxZUPcPs34sjAAlOQLsCoD_TWXwJks5tSSHEgaJpZM4R8G7U>
.
|
Pull Request Test Coverage Report for Build 67
💛 - Coveralls |
So pliers has an stimuli, extractor, and transformer classes. It's a pretty cool architecture that makes it extensible for almost anything, but it might be overkill for our purposes. They also have a really cool functionality built into their transformer class that treats a bunch of features and pipelines as graphs and parallelizes them. Lots to learn from their code. It would be great to have feature extractors as methods on our Fex data class. However, because they are outputting data into a different format, in some ways it might make more sense for the long term to have an extractor class that is more of the sklearn api style. each algorithm or extractor is its own class with a consistent api (e.g., fit, transform). We could do something like pliers where it can be output to a flat dataframe that can then be used for analyses. |
I would say we are still in the exploration phase, so we should try things a few different ways and see if there are any designs that really feel natural for a variety of use cases. I would recommend merging, but then we might remove all of these methods if we come up with better and cleaner way to do this. |
Sounds good! I agree that there will be a lot to try. I will merge this for now and then start looking into some of the ideas that you mentioned @ljchang |
added mean, min, and max feature extraction methods
added mean, min, and max feature extraction methods
Let me know what you all think of this method of feature extraction. The output format is the same as the boft extractor (1 row, and a column for each feature), and specifying the 'by' argument allows users to group observations by other features in the data before summarizing (e.g. by subjects, trials, or whatever). By default, the functions will summarize data across all rows.
This is all default pandas functionality too, so it is quick and easy.