Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
ecadd82
Extending Autosklearn. First commit.
ahn1340 Jul 17, 2018
b912d67
Add regression example
ahn1340 Jul 18, 2018
8e6927e
CI: upper bound numpy version due to travis failures
mfeurer Aug 1, 2018
e8130f7
CI: upper bound numpy version due to travis failures
mfeurer Aug 1, 2018
6832382
use tempfile.gettempdir() (#521)
mstreuhofer Aug 1, 2018
df273da
Remove a colon from README.md (#527)
tmielika Aug 13, 2018
8c5e3c7
fixing warnings on non-tuple sequence for indexing (#526)
tmielika Aug 14, 2018
c02dc8f
fix string formatting (#540)
KEggensperger Sep 10, 2018
9e91a33
FIX removing models wrt wrong metric in ensemble (#522)
KEggensperger Sep 10, 2018
8eaa36c
Add examples for extending auto-sklearn.
ahn1340 Oct 2, 2018
e1e8c25
.
ahn1340 Oct 2, 2018
30ec4a2
Merge branch 'development' of https://github.com/automl/auto-sklearn …
ahn1340 Oct 2, 2018
c55cbac
Change datasets used in examples from digits to breast_cancer.
ahn1340 Oct 2, 2018
ef12841
First commit
ahn1340 Oct 9, 2018
242eebf
Fixing codacy errors
ahn1340 Oct 9, 2018
6475623
Fixing bug
ahn1340 Oct 9, 2018
5cab178
[Debug] try different numpy version
ahn1340 Oct 18, 2018
a062ba0
[Debug] Try with latest numpy version
ahn1340 Oct 18, 2018
94f9d2c
Set numpy version to 1.14.5
ahn1340 Oct 18, 2018
b331251
First commit
ahn1340 Oct 9, 2018
3456920
Fixing bug
ahn1340 Oct 9, 2018
6b947c5
Modify flake8_diff.sh
ahn1340 Oct 18, 2018
9927c8f
Merge branch 'pep8_enforce' of https://github.com/ahn1340/auto-sklear…
ahn1340 Oct 18, 2018
c6229e5
Extending Autosklearn. First commit.
ahn1340 Jul 17, 2018
2a98d0c
Add regression example
ahn1340 Jul 18, 2018
a6c53b7
Add examples for extending auto-sklearn.
ahn1340 Oct 2, 2018
9db3e2e
.
ahn1340 Oct 2, 2018
15196ce
Fixing codacy errors
ahn1340 Oct 9, 2018
ba98902
Merge branch 'extend' of https://github.com/ahn1340/auto-sklearn into…
ahn1340 Oct 18, 2018
bfb1e08
Change example (#553)
ahn1340 Oct 19, 2018
9c2c245
[WIP]Add argument for custom logger configuration. (#505)
ahn1340 Oct 19, 2018
3f0ee66
FIX #566: sort ensemble correctly (#567)
mfeurer Oct 19, 2018
80517ca
Fix Line length in example_parallel.py
ahn1340 Oct 19, 2018
2afad9a
Fix line length in example_parallel.py
ahn1340 Oct 19, 2018
c16d7f6
Fix minor error
ahn1340 Oct 19, 2018
c8368f5
Fix codacy error "parameters differ from overriden 'fit' method"
ahn1340 Oct 19, 2018
763aac0
Check target type at the beginning of the fitting process. (#506)
ahn1340 Oct 19, 2018
3cf42b5
Update test_automl.py
mfeurer Oct 19, 2018
88d1554
Add python 3.7to Travis, change python_requirement in setup.py.
ahn1340 Oct 25, 2018
9b652d5
Add solver hyperparameter in MLP classifier example, increase runtime…
ahn1340 Oct 25, 2018
278f88a
Merge pull request #510 from ahn1340/extend
ahn1340 Oct 25, 2018
aacf24b
Change all occurences of master to development in flake8_diff.sh
ahn1340 Oct 25, 2018
f9a7b1d
numpy requirement is now >=1.9.0<=1.14.5
ahn1340 Oct 25, 2018
2c07970
Fix requirement inequality mistake
ahn1340 Oct 25, 2018
de3192f
change initial numpy version to 1.14.5.
ahn1340 Oct 26, 2018
c963f75
Merge pull request #562 from ahn1340/pep8_enforce
ahn1340 Oct 29, 2018
56af60d
Circle Drop (#575)
ahn1340 Nov 9, 2018
1b7a172
Update gmeans.py (#572)
theFool32 Nov 9, 2018
6d53d1f
Release 0.4.1 (#576)
ahn1340 Nov 9, 2018
8aae9d6
Update version information for 0.4.1
mfeurer Nov 9, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 22 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,14 @@ matrix:
- os: linux
env: DISTRIB="conda" PYTHON_VERSION="3.5" COVERAGE="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
- os: linux
env: DISTRIB="conda" PYTHON_VERSION="3.6" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
env: DISTRIB="conda" PYTHON_VERSION="3.6" DOCPUSH="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
- os: linux
env: DISTRIB="conda" PYTHON_VERSION="3.6" EXAMPLES="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
- os: linux
env: DISTRIB="conda" PYTHON_VERSION="3.7" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
- os: linux
env: DISTRIB="conda" PYTHON_VERSION="3.6" RUN_FLAKE8="true" SKIP_TESTS="true" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"


# Temporarily disabling OSX builds because thy take too long
# Set language to generic to not break travis-ci
Expand Down Expand Up @@ -58,17 +63,29 @@ before_install:
install:
# Install general requirements the way setup.py suggests
- pip install pep8 codecov
# Temporarily pin the numpy version for travis-ci
- pip install "numpy<=1.14.5"
- cat requirements.txt | xargs -n 1 -L 1 pip install
# Install openml dependency for metadata generation unittest
- pip install xmltodict requests
- pip install git+https://github.com/renatopp/liac-arff
- pip install xmltodict requests liac-arff
- pip install git+https://github.com/openml/openml-python@0b9009b0436fda77d9f7c701bd116aff4158d5e1 --no-deps
- mkdir ~/.openml
- echo "apikey = 610344db6388d9ba34f6db45a3cf71de" > ~/.openml/config
- pip install flake8
# Debug output to know all exact package versions!
- pip freeze
- python setup.py install

script: bash ci_scripts/test.sh
after_success: source ci_scripts/success.sh
after_success: source ci_scripts/success.sh && source ci_scripts/create_doc.sh $TRAVIS_BRANCH "doc_result"

deploy:
provider: pages
skip-cleanup: true
github-token: $GITHUB_TOKEN # set in the settings page of my repository
keep-hisotry: true
commiter-from-gh: true
on:
all_branches: true
condition: $doc_result = "success"
local_dir: doc/$TRAVIS_BRANCH
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ auto-sklearn is an automated machine learning toolkit and a drop-in replacement

Find the documentation [here](http://automl.github.io/auto-sklearn/)

Status for master branch:
Status for master branch

[![Build Status](https://travis-ci.org/automl/auto-sklearn.svg?branch=master)](https://travis-ci.org/automl/auto-sklearn)
[![Code Health](https://landscape.io/github/automl/auto-sklearn/master/landscape.png)](https://landscape.io/github/automl/auto-sklearn/master)
Expand Down
2 changes: 1 addition & 1 deletion autosklearn/__version__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Version information."""

# The following line *must* be the last in the module, exactly as formatted:
__version__ = "0.4.0"
__version__ = "0.4.1"
7 changes: 6 additions & 1 deletion autosklearn/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ def __init__(self,
disable_evaluator_output=False,
get_smac_object_callback=None,
smac_scenario_args=None,
logging_config=None,
):
super(AutoML, self).__init__()
self._backend = backend
Expand Down Expand Up @@ -110,6 +111,7 @@ def __init__(self,
self._disable_evaluator_output = disable_evaluator_output
self._get_smac_object_callback = get_smac_object_callback
self._smac_scenario_args = smac_scenario_args
self.logging_config = logging_config

self._datamanager = None
self._dataset_name = None
Expand Down Expand Up @@ -235,7 +237,10 @@ def fit_on_datamanager(self, datamanager, metric):

def _get_logger(self, name):
logger_name = 'AutoML(%d):%s' % (self._seed, name)
setup_logger(os.path.join(self._backend.temporary_directory, '%s.log' % str(logger_name)))
setup_logger(os.path.join(self._backend.temporary_directory,
'%s.log' % str(logger_name)),
self.logging_config,
)
return get_logger(logger_name)

@staticmethod
Expand Down
23 changes: 12 additions & 11 deletions autosklearn/ensemble_builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,11 +171,8 @@ def main(self):
while True:

#maximal number of iterations
if (
self.max_iterations is not None
and self.max_iterations > 0
and iteration >= self.max_iterations
):
if (self.max_iterations is not None
and 0 < self.max_iterations <= iteration):
self.logger.info("Terminate ensemble building because of max iterations: %d of %d",
self.max_iterations,
iteration)
Expand Down Expand Up @@ -300,7 +297,7 @@ def read_ensemble_preds(self):
Y_TEST: None,
# Lazy keys so far:
# 0 - not loaded
# 1 - loaded and ind memory
# 1 - loaded and in memory
# 2 - loaded but dropped again
"loaded": 0
}
Expand Down Expand Up @@ -372,14 +369,18 @@ def get_n_best_preds(self):
],
key=lambda x: x[1],
)))
# remove all that are at most as good as random, cannot assume a
# minimum number here because all kinds of metric can be used
sorted_keys = filter(lambda x: x[1] > 0.001, sorted_keys)
# remove all that are at most as good as random
# note: dummy model must have run_id=1 (there is not run_id=0)
dummy_score = list(filter(lambda x: x[2] == 1, sorted_keys))[0]
self.logger.debug("Use %f as dummy score" %
dummy_score[1])
sorted_keys = filter(lambda x: x[1] > dummy_score[1], sorted_keys)
# remove Dummy Classifier
sorted_keys = list(filter(lambda x: x[2] > 1, sorted_keys))
if not sorted_keys:
# no model left; try to use dummy classifier (num_run==0)
self.logger.warning("No models better than random - using Dummy Classifier!")
# no model left; try to use dummy score (num_run==0)
self.logger.warning("No models better than random - "
"using Dummy Score!")
sorted_keys = [
(k, v["ens_score"], v["num_run"]) for k, v in self.read_preds.items()
if v["seed"] == self.seed and v["num_run"] == 1
Expand Down
38 changes: 36 additions & 2 deletions autosklearn/estimators.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

from autosklearn.automl import AutoMLClassifier, AutoMLRegressor
from autosklearn.util.backend import create
from sklearn.utils.multiclass import type_of_target


class AutoSklearnEstimator(BaseEstimator):
Expand All @@ -28,7 +29,9 @@ def __init__(self,
shared_mode=False,
disable_evaluator_output=False,
get_smac_object_callback=None,
smac_scenario_args=None):
smac_scenario_args=None,
logging_config=None,
):
"""
Parameters
----------
Expand Down Expand Up @@ -168,6 +171,11 @@ def __init__(self,
This is an advanced feature. Use only if you are familiar with
`SMAC <https://automl.github.io/SMAC3/stable/index.html>`_.

logging_config : dict, optional (None)
dictionary object specifying the logger configuration. If None,
the default logging.yaml file is used, which can be found in
the directory ``util/logging.yaml`` relative to the installation.

Attributes
----------

Expand Down Expand Up @@ -199,6 +207,7 @@ def __init__(self,
self.disable_evaluator_output = disable_evaluator_output
self.get_smac_object_callback = get_smac_object_callback
self.smac_scenario_args = smac_scenario_args
self.logging_config = logging_config

self._automl = None
super().__init__()
Expand Down Expand Up @@ -238,7 +247,8 @@ def build_automl(self):
shared_mode=self.shared_mode,
get_smac_object_callback=self.get_smac_object_callback,
disable_evaluator_output=self.disable_evaluator_output,
smac_scenario_args=self.smac_scenario_args
smac_scenario_args=self.smac_scenario_args,
logging_config=self.logging_config,
)

return automl
Expand Down Expand Up @@ -456,6 +466,18 @@ def fit(self, X, y,
self

"""
# Before running anything else, first check that the
# type of data is compatible with auto-sklearn. Legal target
# types are: binary, multiclass, multilabel-indicator.
target_type = type_of_target(y)
if target_type in ['multiclass-multioutput',
'continuous',
'continuous-multioutput',
'unknown',
]:
raise ValueError("classification with data of type %s is"
" not supported" % target_type)

super().fit(
X=X,
y=y,
Expand Down Expand Up @@ -559,6 +581,18 @@ def fit(self, X, y,
self

"""
# Before running anything else, first check that the
# type of data is compatible with auto-sklearn. Legal target
# types are: continuous, binary, multiclass.
target_type = type_of_target(y)
if target_type in ['multiclass-multioutput',
'multilabel-indicator',
'continuous-multioutput',
'unknown',
]:
raise ValueError("regression with data of type %s is not"
" supported" % target_type)

# Fit is supposed to be idempotent!
# But not if we use share_mode.
super().fit(
Expand Down
2 changes: 1 addition & 1 deletion autosklearn/metalearning/metalearning/clustering/gmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def fit(self, X):
break

# Refinement
KMeans = sklearn.cluster.KMeans(n_clusters=1, n_init=1,
KMeans = sklearn.cluster.KMeans(n_clusters=len(cluster_centers), n_init=1,
init=np.array(cluster_centers),
random_state=self.random_state)
KMeans.fit(X)
Expand Down
20 changes: 9 additions & 11 deletions autosklearn/pipeline/create_searchspace_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,8 +117,8 @@ def find_active_choices(matches, node, node_idx, dataset_properties, \

choices = []
for c_idx, component in enumerate(available_components):
slices = [slice(None) if idx != node_idx else slice(c_idx, c_idx+1)
for idx in range(len(matches.shape))]
slices = tuple(slice(None) if idx != node_idx else slice(c_idx, c_idx+1)
for idx in range(len(matches.shape)))

if np.sum(matches[slices]) > 0:
choices.append(component)
Expand Down Expand Up @@ -200,10 +200,10 @@ def add_forbidden(conf_space, pipeline, matches, dataset_properties,
for product in itertools.product(*num_node_choices):
for node_idx, choice_idx in enumerate(product):
node_idx += start_idx
slices_ = [
slices_ = tuple(
slice(None) if idx != node_idx else
slice(choice_idx, choice_idx + 1) for idx in
range(len(matches.shape))]
range(len(matches.shape)))

if np.sum(matches[slices_]) == 0:
skip_array[product] = 1
Expand All @@ -212,13 +212,11 @@ def add_forbidden(conf_space, pipeline, matches, dataset_properties,
if skip_array[product]:
continue

slices = []
for idx in range(len(matches.shape)):
if idx not in indices:
slices.append(slice(None))
else:
slices.append(slice(product[idx - start_idx],
product[idx - start_idx] + 1))
slices = tuple(
slice(None) if idx not in indices else
slice(product[idx - start_idx],
product[idx - start_idx] + 1) for idx in
range(len(matches.shape)))

# This prints the affected nodes
# print [node_choice_names[i][product[i]]
Expand Down
19 changes: 13 additions & 6 deletions autosklearn/util/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,17 @@ def _prepare_directories(self, temporary_directory, output_directory):

self.__temporary_directory = temporary_directory \
if temporary_directory \
else '/tmp/autosklearn_tmp_%d_%d' % (pid, random_number)
else os.path.join(
tempfile.gettempdir(),
'autosklearn_tmp_%d_%d' % (pid, random_number)
)

self.__output_directory = output_directory \
if output_directory \
else '/tmp/autosklearn_output_%d_%d' % (pid, random_number)
else os.path.join(
tempfile.gettempdir(),
'autosklearn_output_%d_%d' % (pid, random_number)
)

def create_directories(self):
if self.shared_mode:
Expand Down Expand Up @@ -401,9 +407,10 @@ def save_ensemble(self, ensemble, idx, seed):
except Exception:
pass

filepath = os.path.join(self.get_ensemble_dir(),
'%s.%s.ensemble' % (str(seed),
str(idx)))
filepath = os.path.join(
self.get_ensemble_dir(),
'%s.%s.ensemble' % (str(seed), str(idx).zfill(10))
)
with tempfile.NamedTemporaryFile('wb', dir=os.path.dirname(
filepath), delete=False) as fh:
pickle.dump(ensemble, fh)
Expand Down Expand Up @@ -460,4 +467,4 @@ def write_txt_file(self, filepath, data, name):
self.logger.debug('Created %s file %s' % (name, filepath))
else:
self.logger.debug('%s file already present %s' %
(name, filepath))
(name, filepath))
23 changes: 14 additions & 9 deletions autosklearn/util/logging_.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,23 @@
import yaml


def setup_logger(output_file=None):
with open(os.path.join(os.path.dirname(__file__), 'logging.yaml'),
'r') as fh:
config = yaml.load(fh)
if output_file is not None:
config['handlers']['file_handler']['filename'] = output_file
logging.config.dictConfig(config)
def setup_logger(output_file=None, logging_config=None):
# logging_config must be a dictionary object specifying the configuration
# for the loggers to be used in auto-sklearn.
if logging_config is not None:
if output_file is not None:
logging_config['handlers']['file_handler']['filename'] = output_file
logging.config.dictConfig(logging_config)
else:
with open(os.path.join(os.path.dirname(__file__), 'logging.yaml'),
'r') as fh:
logging_config = yaml.safe_load(fh)
if output_file is not None:
logging_config['handlers']['file_handler']['filename'] = output_file
logging.config.dictConfig(logging_config)


def _create_logger(name):
logging.basicConfig(format='[%(levelname)s] [%(asctime)s:%(name)s] %('
'message)s', datefmt='%H:%M:%S')
return logging.getLogger(name)


Expand Down
20 changes: 0 additions & 20 deletions ci_scripts/circle_install.sh

This file was deleted.

Loading