Some bugs in the generated code with feature selection and scaler #69

kadarakos · 2016-01-02T14:27:33Z

I ran a couple of experiments on MNIST and observed that the code generation is a bit buggy at the moment. In the first example only operator generated is SelectPercentile

import numpy as np
import pandas as pd

from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import f_classif
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR')
training_indices, testing_indices = next(iter(StratifiedShuffleSplit(tpot_data['class'].values, n_iter=1, train_size=0.75, test_size=0.25)))


# Use Scikit-learn's SelectPercentile for feature selection
training_features = result2.loc[training_indices].drop('class', axis=1)
training_class_vals = result2.loc[training_indices, 'class'].values

if len(training_features.columns.values) == 0:
result3 = result2.copy()
else:
selector = SelectPercentile(f_classif, percentile=100)
selector.fit(training_features.values, training_class_vals)
mask = selector.get_support(True)
mask_cols = list(training_features.iloc[:, mask].columns) + ['class']
result3 = result2[mask_cols]

No indentation
result2 is not defined
optimized_pipeline_ contains _select_percentile, svc, _standard_scaler, but svc and
standard scaler don't appear in the generated code

Another example with RobustScaler:

import numpy as np
import pandas as pd

from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.feature_selection import SelectPercentile
from sklearn.feature_selection import f_classif
from sklearn.preprocessing import RobustScaler
from sklearn.svm import SVC

# NOTE: Make sure that the class is labeled 'class' in the data file
tpot_data = pd.read_csv('PATH/TO/DATA/FILE', sep='COLUMN_SEPARATOR')
training_indices, testing_indices = next(iter(StratifiedShuffleSplit(tpot_data['class'].values, n_iter=1, train_size=0.75, test_size=0.25)))


# Use Scikit-learn's RobustScaler to scale the features
training_features = result3.loc[training_indices].drop('class', axis=1)
result4 = result3.copy()

if len(training_features.columns.values) > 0:
scaler = RobustScaler()
scaler.fit(training_features.values.astype(np.float64))
scaled_features = scaler.transform(result4.drop('class', axis=1).values.astype(np.float64))

for col_num, column in enumerate(result4.drop('class', axis=1).columns.values):
    result4.loc[:, column] = scaled_features[:, col_num]

No indentation
result2 is not defined
optimized_pipeline_ contains _robust_scaler, svc, svc, _select_percentile, but svc, svc and
_select_percentile, don't appear in the generated code

The text was updated successfully, but these errors were encountered:

rhiever · 2016-01-02T15:57:00Z

👍 I noticed some of these bugs when reviewing the code the other day as well. Will look to address these soon.

kadarakos · 2016-01-06T19:29:08Z

I fixed the problem with a very minor bugfix in PR #68

rhiever added the bug label Feb 3, 2016

rhiever closed this as completed Feb 3, 2016

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

saddy001 mentioned this issue Mar 20, 2018

Segfault on optimization process #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some bugs in the generated code with feature selection and scaler #69

Some bugs in the generated code with feature selection and scaler #69

kadarakos commented Jan 2, 2016

rhiever commented Jan 2, 2016

kadarakos commented Jan 6, 2016

Some bugs in the generated code with feature selection and scaler #69

Some bugs in the generated code with feature selection and scaler #69

Comments

kadarakos commented Jan 2, 2016

rhiever commented Jan 2, 2016

kadarakos commented Jan 6, 2016