Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add link to open notebooks in mybinder #190

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BioGeek
Copy link

@BioGeek BioGeek commented Mar 5, 2018

I think opening the notebooks via mybinder is the easiest way to get started, no need to install anything and the code is executable to play around with.

I think opening the notebooks via mybinder is the easiest way to get started, no need to install anything and the code is executable to play around with.
@BioGeek
Copy link
Author

BioGeek commented Mar 5, 2018

Can maybe be combined with #105 which proposes something similar but using another service.

@ageron
Copy link
Owner

ageron commented Mar 15, 2018

Hi @BioGeek ,

Thank for your contribution! I really loved Binder, and I actually started the repo with it activated, but sadly I had to remove it a year ago because it was too unstable at the time (see 002fd66).
Is it more reliable now?

Moreover, it was pretty hard to make chapter 16 work properly because it requires a headless X server, and other tweaks, I really struggled to have a deterministic output. I would need to spend some time testing this again, and unfortunately I don't have much time right now.

However, if you would be so kind as to make sure that all the notebooks work well within Binder, especially chapter 16, then I would more than happily merge this PR.

Thanks again,
Aurélien

@Jiltedboy
Copy link

Jiltedboy commented Aug 3, 2018

Hi Sorry for commenting here , but somehow not able to do at right place..

Seems the updated code for end to end machine learning , chapter 2 is not working for below.. I have mentioned the error as well ...Sorry , I am very new to machine learning ..

cat_attribs = ["ocean_proximity"]
old_cat_pipeline = Pipeline([
        ('selector', OldDataFrameSelector(cat_attribs)),
        ('cat_encoder', OneHotEncoder(sparse=False)),
    ])
    
#Join both the above pipeline in single pipeline
from sklearn.pipeline import FeatureUnion

old_full_pipeline = FeatureUnion(transformer_list=[
        ("num_pipeline", old_num_pipeline),
        ("cat_pipeline", old_cat_pipeline),
    ])

old_housing_prepared = old_full_pipeline.fit_transform(housing)
old_housing_prepared

Error.......

  File "/anaconda3/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 1809, in _transform_selected
    X = check_array(X, accept_sparse='csc', copy=copy, dtype=FLOAT_DTYPES)

  File "/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py", line 433, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

**ValueError: could not convert string to float: 'NEAR BAY'**

@daniel-s-ingram
Copy link
Contributor

daniel-s-ingram commented Aug 3, 2018

Hi @Jiltedboy, it looks like you're importing OneHotEncoder from sklearn.preprocessing. That version does not yet support string categorical inputs, so import OneHotEncoder from the future_encoders module provided here: https://github.com/ageron/handson-ml/blob/master/future_encoders.py

@Jiltedboy
Copy link

Jiltedboy commented Aug 4, 2018

@daniel-s-ingram Ok , Got it. But LabelBinarizer() should work , thats also not working.

old_cat_pipeline = Pipeline([
        ('selector', OldDataFrameSelector(cat_attribs)),
        ('cat_encoder', LabelBinarizer()),
    ])

Executing this line old_housing_prepared = old_full_pipeline.fit_transform(housing) , throwing below error

TypeError: fit_transform() takes 2 positional arguments but 3 were given

Is there not a way to use LabelEncoder and OneHoteEncoder inside the Pipeline? I thought LabelBinarizer is one option but it seems not working...

@ageron
Copy link
Owner

ageron commented Aug 4, 2018

Hi @Jiltedboy ,
The LabelBinarizer is designed to process labels, not input features. Its fit_transform() method has the following signature: def fit_transform(self, y). Notice that there is no X. Unfortunately when you put the LabelBinarizer in a pipeline, it will be passed both X and y, hence the issue.

One workaround is to wrap the LabelBinarizer inside your own transformer class:

class PipelineFriendlyLabelBinarizer(LabelBinarizer):
    def fit_transform(self, X, y=None):
        return super().fit_transform(y=X.ravel())
    def transform(self, X):
        return super().transform(y=X.ravel())

Note that this transformer will only be able to handle one column at a time, because that's one limitation of LabelBinarizers. So my recommendation would be to check out the latest version of the notebook for chapter 2, where I use a ColumnTransformer and a OneHotEncoder (imported from the future_encoders.py module (until Scikit-Learn 0.20 is released).
Hope this helps,
Aurélien
ps: thanks for your kind help @daniel-s-ingram

@daniel-s-ingram
Copy link
Contributor

No problem, @ageron!

@Jiltedboy
Copy link

Jiltedboy commented Aug 4, 2018

@daniel-s-ingram @ageron Thank You both for your time and reply. It really means a lot for me. And Yes, going to use future_encoders.py as of now :) Sorry for bothering you :)

@rovin235
Copy link

rovin235 commented Oct 3, 2018

@Jiltedboy is the problem solved

@Jiltedboy
Copy link

@rovin235 Yes, problem solved.

@jirkalhotka jirkalhotka mentioned this pull request Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants