Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data lab #8

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Data lab #8

wants to merge 31 commits into from

Conversation

anuragkapale
Copy link
Collaborator

No description provided.


# TODO: Further clean the design of this file
class AutoKaggle(BaseEstimator):
pipeline = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the class variables to instance variables.

p_hparams_base = None

def __init__(self, config=None, **kwargs):
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow autokeras doc string style.

import hyperopt
from hyperopt import tpe, hp, fmin, Trials, STATUS_OK, STATUS_FAIL
from sklearn.model_selection import cross_val_score
from autokaggle.ensemblers import RankedEnsembler, StackingEnsembler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import modules instead of classes.

m_hparams_base = None
p_hparams_base = None

def __init__(self, config=None, **kwargs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly clarify all the arguments instead of using kwargs.

x: A numpy.ndarray instance containing the training data.
y: training label vector.
time_limit: remaining time budget.
data_info: meta-features of the dataset, which is an numpy.ndarray describing the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list of strings. (specify the type)

self.pipeline.fit(x_train, y_train)

def resample(self, x, y):
if self.config.balance_class_dist:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc strings.

return x, y

def subsample(self, x, y, sample_percent):
# TODO: Add way to balance the subsample

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc string to subsample.

return grid_train_x, grid_train_y

def search(self, x, y, prep_space, model_space):
grid_train_x, grid_train_y = self.subsample(x, y, sample_percent=self.config.subsample_ratio)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set maximum line length to 85, and check with CI using flake 8.

np.random.shuffle(best_trials)

if self.config.diverse_ensemble:
estimator_list = self.pick_diverse_estimators(best_trials, self.config.num_estimators_ensemble)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the second arg

return np.array(data_info)


class AutoKaggleClassifier(AutoKaggle):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to "Classifier"

return score_metric, skf


class AutoKaggleRegressor(AutoKaggle):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to "Regressor".

self.ensembling_algo = hyperopt.rand.suggest if ensembling_algo == 'random' else hyperopt.tpe.suggest
self.num_p_hparams = num_p_hparams

def update(self, options):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc string.

setattr(self, k, v)


knn_classifier_params = {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all capital letters for constants.

}


class RankedEnsembler:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract a base class , function should raise not implemented error.

  1. Extend object class.
  2. Rename to RankEnsembleModel
  3. Doc strings.

self.stacking_estimator = self.search(predictions, y_val)
self.stacking_estimator.fit(predictions, y_val)

def search(self, x, y):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string,

LEVEL_HIGH = 32


class TabularPreprocessor(TransformerMixin):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants