## <span style = 'color:green'> HyperOpt and HyperOpt-SKlearn </span>

* HyperOpt is an open-source Python library for Bayesian optimization developed by James Bergstra.It is designed for large-scale optimization for models with hundreds of parameters and allows the optimization procedure to be scaled across multiple cores and multiple machines.

* The library was explicitly used to optimize machine learning pipelines, including data preparation, model selection, and model hyperparameters.

### 1 <span style= 'color:green'>|</span> Installation of Library

In [1]:
# pip install hyperopt

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip show hyperopt

Name: hyperopt
Version: 0.2.7
Summary: Distributed Asynchronous Hyperparameter Optimization
Home-page: https://hyperopt.github.io/hyperopt
Author: James Bergstra
Author-email: james.bergstra@gmail.com
License: BSD
Location: c:\users\prerana\anaconda3\lib\site-packages
Requires: cloudpickle, future, networkx, numpy, py4j, scipy, six, tqdm
Required-by: hpsklearn
Note: you may need to restart the kernel to use updated packages.


In [3]:
# pip install hpsklearn

Note: you may need to restart the kernel to use updated packages.


In [19]:
from pandas import read_csv
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from hpsklearn import HyperoptEstimator, any_classifier, any_preprocessing
from hyperopt import tpe

### 2 <span style= 'color:green'>|</span> HyperOpt-SKlearn for Classification

In [4]:
# Summarize the sonar dataset
#load dataset
url ='https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
df = read_csv(url,header=None)

In [6]:
df.shape

(208, 61)

In [8]:
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.0200,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.0180,0.0084,0.0090,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.0140,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.2280,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.0180,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.0100,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.0150,0.0085,0.0073,0.0050,0.0044,0.0040,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.0590,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.0110,0.0015,0.0072,0.0048,0.0107,0.0094,R
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
203,0.0187,0.0346,0.0168,0.0177,0.0393,0.1630,0.2028,0.1694,0.2328,0.2684,...,0.0116,0.0098,0.0199,0.0033,0.0101,0.0065,0.0115,0.0193,0.0157,M
204,0.0323,0.0101,0.0298,0.0564,0.0760,0.0958,0.0990,0.1018,0.1030,0.2154,...,0.0061,0.0093,0.0135,0.0063,0.0063,0.0034,0.0032,0.0062,0.0067,M
205,0.0522,0.0437,0.0180,0.0292,0.0351,0.1171,0.1257,0.1178,0.1258,0.2529,...,0.0160,0.0029,0.0051,0.0062,0.0089,0.0140,0.0138,0.0077,0.0031,M
206,0.0303,0.0353,0.0490,0.0608,0.0167,0.1354,0.1465,0.1123,0.1945,0.2354,...,0.0086,0.0046,0.0126,0.0036,0.0035,0.0034,0.0079,0.0036,0.0048,M


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208 entries, 0 to 207
Data columns (total 61 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       208 non-null    float64
 1   1       208 non-null    float64
 2   2       208 non-null    float64
 3   3       208 non-null    float64
 4   4       208 non-null    float64
 5   5       208 non-null    float64
 6   6       208 non-null    float64
 7   7       208 non-null    float64
 8   8       208 non-null    float64
 9   9       208 non-null    float64
 10  10      208 non-null    float64
 11  11      208 non-null    float64
 12  12      208 non-null    float64
 13  13      208 non-null    float64
 14  14      208 non-null    float64
 15  15      208 non-null    float64
 16  16      208 non-null    float64
 17  17      208 non-null    float64
 18  18      208 non-null    float64
 19  19      208 non-null    float64
 20  20      208 non-null    float64
 21  21      208 non-null    float64
 22  22

In [9]:
#split data into input and output elements
data = df.values
X, y = data[:,:-1], data[:,-1]
print(X.shape,y.shape)

(208, 60) (208,)


In [11]:
#preprocessing
X = X.astype('float32')
y = LabelEncoder().fit_transform(y.astype('str'))

In [12]:
# splitting dataset
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=45, test_size=0.33)

In [20]:
# model building
model = HyperoptEstimator(classifier=any_classifier('cla'),
                          preprocessing=any_preprocessing('pre'),
                          algo=tpe.suggest,
                          max_evals=50,
                          trial_timeout=30)

In [21]:
model.fit(X_train, y_train)

  0%|                                                                                         | 0/1 [00:00<?, ?trial/s, best loss=?]


AttributeError: 'numpy.random.mtrand.RandomState' object has no attribute 'integers'