{{ message }}

# Hyperparamopt #58

Merged
merged 13 commits into from Jul 16, 2016
Merged

# Hyperparamopt#58

merged 13 commits into from Jul 16, 2016

## Conversation

### narayanan2004 commented Jul 12, 2016

 No description provided.
added 3 commits Jul 9, 2016
 Hyperparamopt package added with tests 
 99571de 
 Added example, better tests, more comments 
 5822d39 
 Merge branch 'master' of github.com:IntelPNI/brainiak into hyperparamopt 
 e732219 

### buildbot-princeton commented Jul 12, 2016

 Can one of the admins verify this patch?
1 similar comment

### buildbot-princeton commented Jul 12, 2016

 Can one of the admins verify this patch?
 Added license paragraph to files, coverage > 90%, more tests 
 0d15261 

### mihaic commented Jul 12, 2016

 OK to test.
 # limitations under the License. """Metropolis-Hasting Random number generator This implementation provides random samples from a user-given

#### xoltar Jul 12, 2016 Contributor

How is this MCMC different or better than other MCMC implementations for Python? May want to explain here.

#### narayanan2004 Jul 13, 2016 Author Contributor

I looked for other python packages that do what I want i.e generate samples from a user-specified pdf. I couldn't find any, so had to write my own. If you know of any other implementation, please let me know as I am not aware of any at this point.

#### mihaic Jul 13, 2016 Contributor

How about pymc?
https://pypi.python.org/pypi/pymc

#### xoltar Jul 13, 2016 Contributor

Or maybe http://dan.iel.fm/emcee/current/ . It's OK if neither of these does what you need, you can just say that (and why) in the comments so people know there's a reason for having an MCMC module in brainiak rather than using an existing library.

#### TuKo Jul 13, 2016 Contributor

Still and MCMC module should be outside this package, maybe in utils or some other package.

#### narayanan2004 Jul 14, 2016 Author Contributor

After some searching, I see that I only need samples from a 1D Gaussian mixture model. I have changed the code to do this using other simpler methods in numpy+scipy. I will remove this file (mcmc.py) and the corresponding tests.

 import numpy cimport numpy as np cdef extern from "math.h":

#### xoltar Jul 12, 2016 Contributor

Is importing standard C math routines like this common practice in cython? I'm surprised it doesn't just automatically replace calls to Python's math.exp and math.sqrt with the appropriate C calls.

#### narayanan2004 Jul 14, 2016 Author Contributor

It seems pretty ok to do. I was having problems earlier with pyximport import otherwise. This file is now gone as Cython was only really needed to get MCMC to run fast. This code has been replaced with scipy+numpy samplers.

 logger = logging.getLogger(__name__) def getsigma(x, minlimit=-np.inf, maxlimit=np.inf):

#### xoltar Jul 12, 2016 Contributor

Should be get_sigma. Generally, please follow https://www.python.org/dev/peps/pep-0008/#naming-conventions, there are several other naming issues in this PR.

 return pts def getNextSample(x, y, minlimit=-np.inf, maxlimit=np.inf):

#### xoltar Jul 12, 2016 Contributor

 return xnext def getSample(x, y, dist, minlimit=-np.inf, maxlimit=np.inf):

#### xoltar Jul 12, 2016 Contributor

 return np.exp(np.random.random() * (np.log(maxlimit) - np.log(minlimit)) + np.log(minlimit))

#### xoltar Jul 12, 2016 • edited Contributor

Looks like there are only three supported values for dist. Should validate dist is in that list of supported values and throw an exception otherwise.

#### narayanan2004 Jul 14, 2016 Author Contributor

This function has been removed as we now support all scipy.stats distributions.

 if (space[s]['dist'] is not 'uniform' and space[s]['dist'] is not 'loguniform'): logger.error('Unsupported distribution for variable') raise TypeError('Unknown distribution type for variable')

#### xoltar Jul 12, 2016 Contributor

This should probably be ValueError, since this is an inappropriate value of type string, not an inappropriate type.

#### narayanan2004 Jul 14, 2016 Author Contributor

This has been changed. space[s]['dict'] should now hold scipy.stats objects, so anything that does not support the "rvs()" method for generating random numbers throws a TypeError.

 yarray = np.array([tr['loss'] for tr in trials]) for s in space: sarray = np.array([tr[s] for tr in trials]) dist = 'GMM' if (search_algo == 'Exploit') else space[s]['dist']

#### xoltar Jul 12, 2016 Contributor

It seems like there's a lot of use of string arguments to drive behavior. In general this is brittle and can be confusing. It's often more effective to use functions - for example, instead of dist being a string that getSample has to interpret, you could expose the three supported distributions that getSample supports as three separately defined functions. Then you can pass the desired distribution function itself, rather than a string that names the function. That way the tool becomes more general, and people can add other distributions without having to modify your code.

#### narayanan2004 Jul 14, 2016 Author Contributor

Thanks for the suggestion. The code now takes in scipy.stats distribution objects that can be used to generate random numbers using the "rvs()" method. No more disambiguating via strings.

 minlimit=space[s]['lo'], maxlimit=space[s]['hi']) if (verbose):

#### xoltar Jul 12, 2016 Contributor

It would be more normal to skip checking the verbose flag and just log at debug level rather than info.

#### narayanan2004 Jul 14, 2016 Author Contributor

Thanks. Fixed according to suggestion.

 while(p(x0) <= np.finfo(np.double).eps * 10): x0 = np.random.standard_normal() * 100 if (p(x0) <= 0): logging.error('Markov chain failed to initialize properly \

#### xoltar Jul 12, 2016 Contributor

Should throw an exception here, no?

#### narayanan2004 Jul 14, 2016 Author Contributor

File obsolete - removed.

 #cython: embedsignature=True import numpy cimport numpy as np

#### xoltar Jul 12, 2016 Contributor

Why do we need botth numpy and cimport numpy? Might be fine, just asking.

#### narayanan2004 Jul 14, 2016 Author Contributor

cimport numpy is needed for argument type definitions like "np.ndarray[np.float64_t, ndim=1] x" etc. in the C-interface.

 import numpy as np import matplotlib.pyplot as plt import brainiak.hyperparamopt.hpo as hpo

#### xoltar Jul 12, 2016 Contributor

An example needs quite a bit more explanatory text. This example is harder to follow than the library source itself.

#### narayanan2004 Jul 14, 2016 Author Contributor

 # assert(st.kurtosistest(samples).pvalue >= 0.05) """

#### xoltar Jul 12, 2016 Contributor

Delete if not needed anymore

#### narayanan2004 Jul 14, 2016 Author Contributor

File obsolete - removed.

 def test_simple_gmm(): from brainiak.hyperparamopt.hpo import gmm_1d_distribution

#### xoltar Jul 12, 2016 Contributor

Do we have a convention for doing imports only in the method body for tests? If not, I'd say move all the imports to the top of the file like a normal module.

#### narayanan2004 Jul 14, 2016 Author Contributor

Fixed.

 pi = numpy.pi cpdef double norm_pdf(double x, double mu, double sigma):

#### TuKo Jul 13, 2016 Contributor

Why do you define this function instead of calling scipy.stats.norm?
Is it a performance issue? If so how big is the performance difference?

#### narayanan2004 Jul 14, 2016 Author Contributor

scipy.stats is super slow. This code is an order of magnitude faster.

 restructuredtext-lint sphinx sphinx_rtd_theme tqdm

#### TuKo Jul 13, 2016 Contributor

What is tqdm? Do we really need it?

#### narayanan2004 Jul 14, 2016 Author Contributor

tqdm is a progress bar indicator. It is useful when calling HPO from command line. I don't want to dump too much debug info, but still give some useful measure of progress to the user. If there are python-inbuilt progress estimators, let me know. I will be happy to use those.

#### mihaic Jul 14, 2016 Contributor

Code in brainiak must not use the console. That is why we use logging instead of print. You should create a way of passing progress information to the calling code if logging is not enough for you.

#### narayanan2004 Jul 14, 2016 Author Contributor

So, if someone calls hyperparamopt and there is no progress, one has to check the log file as there will be nothing on the console? I hope there are easy ways to toggle this switch (i.e. set log file to stdout)

#### narayanan2004 Jul 14, 2016 Author Contributor

Removed tqdm dependency and progress bar.

added 4 commits Jul 14, 2016
 Removed mcmc, using scipy+numpy samplers for GMM; removed norm.pyx an… 
 2440beb 
…d cython; all scipy.stats continuous distributions supported; more comments added
 formatting fixes 
 b93e3a7 
 Merge branch 'master' of github.com:IntelPNI/brainiak into hyperparamopt 
 abff39c 
 removed tqdm; added comments to the example 
 6525f36 
 logger = logging.getLogger(__name__) def get_sigma(x, minlimit=-np.inf, maxlimit=np.inf):

#### xoltar Jul 14, 2016 Contributor

min_limit, max_limit

#### narayanan2004 Jul 14, 2016 Author Contributor

Done

 Used to weight the points non-uniformly if required """ def __init__(self, x, minlimit=-np.inf, maxlimit=np.inf, weights=1.0):

#### xoltar Jul 14, 2016 Contributor

min_limit, max_limit

#### narayanan2004 Jul 14, 2016 Author Contributor

Done

 self.W_sum = np.sum(self.weights) def get_gmm_pdf(self, xt): """Calculates the 1D GMM likelihood for a single point

#### xoltar Jul 14, 2016 Contributor

What does "xt" mean?

#### narayanan2004 Jul 14, 2016 Author Contributor

Renamed to x

 Arguments --------- x : scalar (or) 1D array of reals

xt

#### narayanan2004 Jul 14, 2016 Author Contributor

Renamed to x

 return xnext def fmin(lossfn,

#### xoltar Jul 14, 2016 Contributor

loss_fn, max_evals

#### narayanan2004 Jul 14, 2016 Author Contributor

Done

 """ for s in space: if (hasattr(space[s]['dist'], 'rvs') is False):

#### xoltar Jul 14, 2016 Contributor

if not hasattr(space[s]['dist], 'rvs'): is more normal style. If statements don't need parens in Python.

#### narayanan2004 Jul 14, 2016 Author Contributor

Done

 # Branin is the function we want to minimize. # It is a function of 2 variables def branin(x1, x2):

#### xoltar Jul 14, 2016 • edited Contributor

Does branin mean something? Is it someone's name? Why is this a good function for us?

#### narayanan2004 Jul 14, 2016 Author Contributor

It is one of the standard functions that people use for non-convex optimization problems. It is a part of lots of benchmarks. Within the range, it has 2 local minima and 1 global minimum, so it is a good function to test.

#### narayanan2004 Jul 14, 2016 Author Contributor

@mihaic Is brainiak one of the requirements to be listed?

No.

@mihaic Done.

#### xoltar Jul 14, 2016 Contributor

@narayanan2004 Thanks, and sorry for not being clear: When I ask questions like "What does this mean? Is it someone's name? Why is this a good function for us?" I'm really pretending to be a user, these are the questions in my head. My experience of your code will be better if the answers are in the comments or docstrings. Actually answering my questions in Github only helps me, but putting the answers in the code would help many others.

#### narayanan2004 Jul 14, 2016 Author Contributor

@xoltar Thanks. Got your point. Added some notes on the function to the example file.

### mihaic commented Jul 14, 2016

 Please add examples/hyperparamopt/requirements.txt containing everything required for running the examples, but not in install_requires in setup.py.
added 3 commits Jul 14, 2016
 Updated according to comments 
 1511e31 
 Added more notes on the branin function 
 54f179e 
 Merge branch 'master' of github.com:IntelPNI/brainiak into hyperparamopt 
 5ce304d 
 def get_sigma(x, min_limit=-np.inf, max_limit=np.inf): """Computes the standard deviations around the points for a 1D Gaussian mixture model computation.

#### mihaic Jul 15, 2016 Contributor

Please put a one-line summary in each docstring. Also, something we have been nitpicking in previous PRs, function summaries should be phrased as commands, not descriptions. In this case, "Compute" instead of "Computes".
https://www.python.org/dev/peps/pep-0257/#one-line-docstrings

#### narayanan2004 Jul 15, 2016 Author Contributor

Fixed.

 self.weights = 2. / (erf((max_limit - x) / (np.sqrt(2.) * self.sigma)) - erf((min_limit - x) / (np.sqrt(2.) * self.sigma))) * weights

#### mihaic Jul 15, 2016 Contributor

        self.weights = (2
/ (erf((max_limit - x) / (np.sqrt(2.) * self.sigma))
- erf((min_limit - x) / (np.sqrt(2.) * self.sigma)))
* weights)


Also, there is no need to write 2..

#### narayanan2004 Jul 15, 2016 Author Contributor

Done

 """Calculates the 1D GMM likelihood for a single point y = \sum_{i=1}^{N} norm_pdf(x, x_i, sigma_i)/(\sum weight_i) """

#### mihaic Jul 15, 2016 Contributor

Parameters and returns sections missing.

#### narayanan2004 Jul 15, 2016 Author Contributor

Fixed.

 """Returns the point that gives the largest Expected improvement (EI) in the optimization function. We use [Bergstra2013] to compute this. This model fits 2 different GMMs -

#### mihaic Jul 15, 2016 Contributor

Append an underscore to references to generate links: [Bergstra2013]_.

#### narayanan2004 Jul 15, 2016 Author Contributor

Done.

 ------- best : trial entry (dictionary of hyperparameters) Best hyperparameter setting found

#### mihaic Jul 15, 2016 Contributor

Use a 4-space indent for the second and following lines of descriptions of parameters, returns, etc.

#### narayanan2004 Jul 15, 2016 Author Contributor

Fixed.

 # Labels plt.xlabel('x1') plt.ylabel('x2') plt.title('Hyperparamter optimization using HPO (Branin function)')

#### xoltar Jul 15, 2016 Contributor

Typo: hyperparamEter

#### narayanan2004 Jul 15, 2016 Author Contributor

Fixed.

 plt.ylabel('x2') plt.title('Hyperparamter optimization using HPO (Branin function)') plt.legend() plt.show()

#### xoltar Jul 15, 2016 • edited Contributor

The example runs for me, but I don't see any blue dots for "best HPO"?

Maybe it's occluded by the "best grid search" dot?

#### narayanan2004 Jul 15, 2016 Author Contributor

Probably. The code should print the points. Are they very close to each other?
Or try running again. It should randomly generate another set of points.

#### xoltar Jul 15, 2016 Contributor

A second run produced the expected output with both points, thanks.

 for s in space: if not hasattr(space[s]['dist'], 'rvs'): logger.error('Unsupported distribution for variable') raise TypeError('Unknown distribution type for variable')

#### mihaic Jul 15, 2016 Contributor

The exception should be documented in the docstring:
http://www.sphinx-doc.org/en/stable/ext/example_numpy.html
Furthermore, the logging guideline suggest no logging (it is the responsibility of the calling code to log or do something else when the exception is raised; also reduces code duplication which generates inconsitency: "Unsuperted" vs. "Unknown"):
https://docs.python.org/3/howto/logging.html#when-to-use-logging

#### xoltar Jul 15, 2016 Contributor

Also, this is a ValueError, not a TypeError. TypeErrors are when you pass a string where an int is expected, things like that. Unsupported distributions would be a value error. There are languages where this could actually be a type error, but Python is a long, long way from being one of them.

#### narayanan2004 Jul 15, 2016 Author Contributor

Ok. I am not really aware of these python conventions. I will change it to a ValueError.
Generally, when an object that you pass does not support the method you need, should that be a TypeError at all? Or is this all very context-dependent? I am trying to learn.. Thanks,

#### xoltar Jul 15, 2016 Contributor

It's a grey area in a language like Python. I googled around a bit, it appears people are actually using TypeError for reflection failures like this. So let it stand, I guess. Sorry for the distraction.

#### mihaic Jul 15, 2016 Contributor

Now that you brought my attention to this, it is not clear what the parameters should look like by looking only at the docstring. Could you please add more documentation for the parameters, perhaps an example?

#### narayanan2004 Jul 15, 2016 Author Contributor

@mihaic Added a simple example in the docstring. Refer to latest commit.

#### narayanan2004 Jul 15, 2016 Author Contributor

I changed the TypeError to ValueError already. I looks it can be either. I will just leave it as ValueError if that is ok with @xoltar. Refer to latest commit.

added 2 commits Jul 15, 2016
 updated according to comments 
 78d1b0a 
 Merge branch 'master' of github.com:IntelPNI/brainiak into hyperparamopt 
 843e435 

### xoltar commented Jul 16, 2016

 👍
merged commit c455828 into brainiak:master Jul 16, 2016
2 checks passed
2 checks passed
linux Build finished.
Details
macos Build finished.
Details
deleted the narayanan2004:hyperparamopt branch Jul 16, 2016
pushed a commit that referenced this pull request Nov 16, 2017
 Merge pull request #58 from amplab/numbuf 
 d45ea8a 
implement serialization using numbuf