Add Adult Census dataset description #659

ArturoAmorQ · 2022-09-08T14:40:13Z

Fixes #657.

A potentially controversial PR, since adding a new dependency, passing through correctness of the wording and ending with the veracity of the interpretation given.

All feedback is welcomed!

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…o main

review-notebook-app · 2022-09-08T14:40:19Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ogrisel

I think it's problematic to discuss and draw conclusions from the distributions of (gender or age based) sub-groups without taking fnlwgt properly into account. Furthermore I am not certain how we would do this: I have tried to see if you can use fnlwgt to recover the expected approximately 50%/50% relative representation between Male and Female adults in the US population but I failed.

We would have to dig the origin of this dataset to find out how it was built and how to properly use fnlwgt to draw conclusions on such sub group distributions but I think this goes beyond what we want to achieve with this MOOC.

ogrisel · 2022-10-07T14:25:47Z

For the record, here is the quick check I made:

>>> from sklearn.datasets import fetch_openml
>>> X, y = fetch_openml("adult", return_X_y=True)
>>> (X["sex"] == "Male").mean()
0.6684820441423365
>>> (X["sex"] != "Male").mean()
0.33151795585766347
>>> ((X["sex"] == "Male") * X['fnlwgt']).sum() / X['fnlwgt'].sum()
0.6757528069510574
>>> ((X["sex"] != "Male") * X['fnlwgt']).sum() / X['fnlwgt'].sum()
0.32424719304894256

ogrisel · 2022-10-07T15:38:08Z

I have the feeling that we can close this PR once #663 is accepted and merged if others agree.

ArturoAmorQ and others added 26 commits March 14, 2022 10:37

Fix learning curve to show overlapped error bars

824705c

Formatting

e28e879

Update python_scripts/cross_validation_sol_01.py

18e880b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Make interpretation more detailed

b0dbbfa

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

cdc9fbb

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

b083565

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

6e73ad8

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

d650f26

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

c240fc1

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

9c3b647

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

2d8777e

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

5adba0c

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

eab86dc

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

58ec874

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

d9e02c6

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

42e41dc

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

3c68620

…o main

Merge branch 'main' of https://github.com/INRIA/scikit-learn-mooc int…

b294bd6

…o main

Merge branch 'main' of github.com:INRIA/scikit-learn-mooc into main

206fbcd

Merge branch 'main' of github.com:INRIA/scikit-learn-mooc into main

421fb66

Add ptitprince to dependencies

0f1c387

Add .py description of adult census dataset

5961bf0

Add .ipynb description of adult census dataset

ca24caf

Add dataset description to TOC file

29e220e

Remove minimalist dataset description md file

e8358c6

Add reference to dataset description from intro notebook

f27368b

ArturoAmorQ added 2 commits September 12, 2022 12:02

Wording

9813ce0

Fix wrong definition for prevalence

704f607

ogrisel reviewed Oct 7, 2022

View reviewed changes

ArturoAmorQ closed this Oct 10, 2022

lesteve mentioned this pull request Oct 12, 2022

Add Adult Census dataset description #657

Closed

ArturoAmorQ deleted the adult_census branch November 2, 2023 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Adult Census dataset description #659

Add Adult Census dataset description #659

ArturoAmorQ commented Sep 8, 2022 •

edited

Loading

review-notebook-app bot commented Sep 8, 2022

ogrisel left a comment

ogrisel commented Oct 7, 2022

ogrisel commented Oct 7, 2022

Add Adult Census dataset description #659

Add Adult Census dataset description #659

Conversation

ArturoAmorQ commented Sep 8, 2022 • edited Loading

review-notebook-app bot commented Sep 8, 2022

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Oct 7, 2022

ogrisel commented Oct 7, 2022

ArturoAmorQ commented Sep 8, 2022 •

edited

Loading