-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace/filter interfaces for DataFrames #43
Comments
I like the SubDataFrame idea. We could use nafilter and nareplace for that. Or, maybe those should generate a new df, and nafilter_sub and nareplace_sub could return SubDataFrames. complete_cases(df) could return the row index of complete cases. Maybe that's all we need. Then, the user could do sub(df, complete_cases(df)) or df[complete_cases(df),:]. |
I often find myself wanting |
This is basically what subset does. We could rename it to filter. -- John On Jun 28, 2013, at 9:41 AM, "Viral B. Shah" notifications@github.com wrote:
|
I wonder how I missed It would be nice to rename it to |
We need to have a big conversation about documentation formats next week. |
I like the name subset better than filter. Also, just plain row indexing gives you a copy of a subset of a DataFrame. On Fri, Jun 28, 2013 at 2:46 PM, John Myles White
|
Now that I know about |
We could fix the docs. I do kind of like only having |
THIS. We should work that into our manual / philosophy somewhere. |
Hopefully we can fix the typo before we do. |
I can't even spot the typo after reading it multiple times... |
"one of the things I like Julia" -> one of the things I like ABOUT Julia" |
I think this is related, but I can't figure out how to use
Changing the
My recommendation would be to add a constructor to |
The existing nafilter/Filter and similar methods, and their flags in the DataVecs, seem limited when it comes to working with DataFrames. I like the way that different columns have different behaviors (replace/filter modes), but it's not clear how to combine them. For example, if building a model matrix for an OLS model, do you do a complete_cases operation? The naFilter iterator generator doesn't really work usefully in that context.
One option would be to have a filter_nas() method that generates a SubDataFrame without any rows that contained an NA in a column with filtering mode set. The result could then be iterated over row-wise, with NAs being replaced in any columns in replace mode. Other options and variations are certainly possible.
See also #4.
The text was updated successfully, but these errors were encountered: