You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 13 is different from 11)
I noticed that when I pass the whole housing dataset to the full_pipeline, the ocean_proximity is getting transformed to 5 different columns resulting in total of 13 fields. But, when I pass only a subset of the dataset (i.e. housing.iloc[:5]), the transformation is not applied to the ocean_proximity column.
Any suggestions on what could be wrong?
Thanks a lot
The text was updated successfully, but these errors were encountered:
Thanks for your question. Make sure you fit estimators only on training data. This means you should call fit() or fit_transform() or fit_predict() only on training data, never on other data (such as the validation set, the test set, or new data). In your code, you should therefore replace full_pipeline.fit_transform(some_data) with full_pipeline.transform(some_data). However, before you do that, you should first fit the model on the training set.
So the code should look like:
In the full training set, there are 5 distinct values in the ocean_proximity column. That's why after the full_pipeline is fit on the training set, it outputs one-hot vectors of size 5 for each ocean_proximity category. But if some_data is small enough, it is likely to contain less categories, which is what you observed. But if you only call transform(some_data) and not fit_transform(some_data), it will output one-hot vectors of size 5.
Hi @ageron ,
I have defined the full pipeline as:
where the num_pipeline is defined as:
Now, when I execute this code:
I get this error:
I noticed that when I pass the whole housing dataset to the full_pipeline, the ocean_proximity is getting transformed to 5 different columns resulting in total of 13 fields. But, when I pass only a subset of the dataset (i.e. housing.iloc[:5]), the transformation is not applied to the ocean_proximity column.
Any suggestions on what could be wrong?
Thanks a lot
The text was updated successfully, but these errors were encountered: