Conversation
| " \"second\": (df2, 'id', None, {'words': NaturalLanguage}),\n", | ||
| " }\n", | ||
| "\n", | ||
| "es = ft.EntitySet(\"data\", entities, )\n", |
There was a problem hiding this comment.
i wonder if we should just use one of the demo datasets? perhaps we need to add a demo dataset with nulls.
i think that it's better for a examples to use standardize datasets. mostly for the user to have consistency as they read the docs, but also to avoid potential mistakes like the example here missing a relationship between the entities
There was a problem hiding this comment.
Yeah, that's a good point.
The reason I hadn't used the demo dataset here was because it seemed like the best way to show the behavior was to have small, contrived examples. It's also why there's no relationship between the two entities in the example; it just added more columns to the results that weren't proving the point of the guide.
Would it be helpful to add a section with a full demo dataset to show feature selection on more plausible data? Kind of fits along with the comment below about showing how it can be used with EvalML
| @@ -0,0 +1,304 @@ | |||
| { | |||
There was a problem hiding this comment.
in this guide, it would make sense to add a section about using evalml for feature selection. we can do it in the PR, or do that as a follow up.
There was a problem hiding this comment.
or even maybe we add a guide to the evalml docs about using evalml for feature selection and link to it from here. that is probably a better solution
There was a problem hiding this comment.
I'll create an issue in EvalML to add a guide, and we can link the guide in here later. @kmax12
|
Switched to using the flight demo dataset. Used the Also added sections that list the features removed to hopefully make the results of running the functions more clear |
|
@kmax12 thoughts on the dataset change / showing all dropped features? |
8fec4b7 to
4fdf36d
Compare
Codecov Report
@@ Coverage Diff @@
## main #1184 +/- ##
=======================================
Coverage 98.60% 98.60%
=======================================
Files 130 130
Lines 13932 13932
=======================================
Hits 13738 13738
Misses 194 194 Continue to review full report at Codecov.
|
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "ft.selection.remove_highly_null_features(fm)" |
There was a problem hiding this comment.
in the other examples, it looks like we pass in features=features, would it make sense to do the same here?
There was a problem hiding this comment.
We definitely can! The feature list is optional, and it just impacts whether or not we get an updated feature list back, which I'd added in for the others so that we could highlight which features were removed.
If it's better to be consistent here, happy to pass the feature list in here
There was a problem hiding this comment.
actually, i dont know if consistency is what is most important. rather i would make sure it reads as a clear narrative. so it's fine to start without using the parameter, but then add it in later, just make sure it is explained
so, i'd make sure to clearly call out and explain the usage of the features parameter in the guide. perhaps the best way to do that would be a note section like we use elsewhere
There was a problem hiding this comment.
@kmax12 Got it--added a note when we first use features as a parameter and an extra line highlighting how we use the results
4728c29 to
962b384
Compare


closes #1167
Adds a guide for the feature selection functions: