Simplify first module wrap-up quiz to not need a SimpleImputer #361

lesteve · 2021-06-03T07:48:50Z

https://mooc-forums.inria.fr/moocsl/t/m1-wrap-up-quiz-q5-simpleimputer-question/2535
There were some feed-back from the beta that it was hard to answer the question because it was not clear that a pipeline could be nested. We tried to give more guidance in the question:

this is the first module wrap-up quiz it should not be too hard
using something that we have not seen or mentioned (missing data imputation) is not a great idea in general
having to use a complex pipeline whereas we have seen only simple pipelines

Proposed solutions:

do the missing data imputations with pandas in the code we give to load the data (my favourite option personnally)
alternatives? I think we talked about this and there were other proposals but I can't remember (probably partly because I favour the previous option 😉, feel free to edit my post to add them)

GaelVaroquaux · 2021-07-16T11:37:22Z

We probably need to create a notebook with a title similar to "Illustration of a rich pipeline: handling missing values"

lesteve · 2021-07-20T09:33:20Z

I think the consensus at the time we discussed it (probably @GaelVaroquaux was not involved though I don't remember for sure) was not to add more content to the module 1 and do the simplest thing which was removing missing values with a few lines of pandas.

Whether we should talk about imputing missing data somewhere and where to put it, I have to say I don't know.

GaelVaroquaux · 2021-07-20T09:45:32Z

I think the consensus at the time we discussed it (probably @GaelVaroquaux was not involved though I don't remember for sure) was not to add more content to the module 1 and do the simplest thing which was removing missing values with a few lines of pandas.

I would even store a simplified dataset that does not have these misssing values, to avoid having to discuss this.

lesteve · 2021-07-20T12:53:14Z

I would even store a simplified dataset that does not have these misssing
values, to avoid having to discuss this.

Good point, we are using a local CSV file so this is probably the simplest thing to do. This would be nice to add a note about this in datasets/README.md.

ArturoAmorQ · 2021-07-21T14:15:56Z

I would even store a simplified dataset that does not have these misssing
values, to avoid having to discuss this.

There are some features such as 'Alley', 'PoolQC' , 'Fence' and 'MiscFeature' that have more than 500 na values.
A solution could be to erase them in the csv file and then erase rows with missing values, either on the csv or with a simple dropna() directly on the notebook. It's a matter of taste.
In any case that leaves us with 1094 out of the original 1460 entries.

Erasing columns means that we will have to adjust the rest of the questions and hints accordingly. What do you think?

lesteve · 2021-07-21T14:29:20Z

We can directly erase it in the CSV this way the quiz instructions are a bit simpler (and we don't have to explain that we are dropping NaNs or why we are doing it).

Erasing columns means that we will have to adjust the rest of the questions and hints accordingly. What do you think?

Good points I guess that means we may need to change quite a lot of the quiz with this change (for example the correction will change since we don't need a SimpleImputer anymore). I guess we may want to wait before tackling this issue then, IMO we need to decide on a rough strategy regarding quiz changes, the main question is basically who is going to do the manual updates in FUN. The next meeting is a good occasion of talking about this last point.

lesteve · 2021-07-23T12:53:37Z

So we agreed to:

remove SimpleImputer from the wrap-up quiz. We need to recheck the entire quiz and adapt it. This is the point of this issue
have missing value, imputing, "advanced pipeline" into a separate module as an ambitious goal and reevaluate depending on how fast we progress on less complicated things: Add "advanced pipeline", missing value, imputing module, maybe more #414

lesteve added this to the MOOC 2.0 milestone Jun 3, 2021

lesteve changed the title ~~Simplify first module wrap-up quiz to not need a SimpleImuter~~ Simplify first module wrap-up quiz to not need a SimpleImputer Jun 3, 2021

GaelVaroquaux added the enhancement New feature or request label Jul 16, 2021

ArturoAmorQ mentioned this issue Jul 19, 2021

Proposal for reordering the contents in M1 and M2 #398

Closed

glemaitre self-assigned this Aug 3, 2021

glemaitre mentioned this issue Aug 3, 2021

MNT make a dataset containing no missing values #425

Merged

5 tasks

lesteve closed this as completed Jan 6, 2022

lesteve mentioned this issue Jan 6, 2022

Idea about being more directive in wrap-up quiz to avoid coding variation giving the wrong answer #419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify first module wrap-up quiz to not need a SimpleImputer #361

Simplify first module wrap-up quiz to not need a SimpleImputer #361

lesteve commented Jun 3, 2021

GaelVaroquaux commented Jul 16, 2021

lesteve commented Jul 20, 2021

GaelVaroquaux commented Jul 20, 2021 via email

lesteve commented Jul 20, 2021

ArturoAmorQ commented Jul 21, 2021

lesteve commented Jul 21, 2021 •

edited

Loading

lesteve commented Jul 23, 2021 •

edited

Loading

Simplify first module wrap-up quiz to not need a SimpleImputer #361

Simplify first module wrap-up quiz to not need a SimpleImputer #361

Comments

lesteve commented Jun 3, 2021

GaelVaroquaux commented Jul 16, 2021

lesteve commented Jul 20, 2021

GaelVaroquaux commented Jul 20, 2021 via email

lesteve commented Jul 20, 2021

ArturoAmorQ commented Jul 21, 2021

lesteve commented Jul 21, 2021 • edited Loading

lesteve commented Jul 23, 2021 • edited Loading

lesteve commented Jul 21, 2021 •

edited

Loading

lesteve commented Jul 23, 2021 •

edited

Loading