New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

GridSearch feature #7

Merged

NicolasHug merged 49 commits into NicolasHug:master from mahermalaeb:grid-search-cv

Jan 2, 2017

Contributor

mahermalaeb commented Jan 1, 2017

This pull request implements #6.

mahermalaeb and others added 30 commits

December 29, 2016 00:28


          ignored idea files generated by pycharm

44c7185


          Create GridSearch class:

ad01369

 - Import itertools
 - Add an __init__ function that initializes class instance
 - Add evaluate method that evaluates the parameters on a dataset


          Fixed evaluate() in GridSearch:

884053e

 - tests on correct algorithm instance


          Created GridSearch testing file:

cdf00a5

-Added a function to make sure the combinations returned are correct by testing its length


          Updated GridSearch evaluate():

 - Added best score, best parameters and best index attributes
 - Removed 1 attribute and added cv_results_  attribute analogous to sklearn implementation
 - Added tests for best attributes for RMSE and FCP measures


          Merge remote-tracking branch 'surprise-remote/master' into grid-searc…

e2a2c56

…h-cv


          first draft on Non negative matrix factorization

54b8371


          More doc for NMF, plus some tests

27c8f80


          tests for CoClustering algorithm

2367ac2


          update README.md

5410c70


          Update README.md

9596d88


          Added a biased version for NMF

ffc4efb


          Update CONTRIBUTING.md

b69365d


          update TODO.md

5b21ca0


          Added a clip option to the predict method

cac0add


          Added verbose parameter to evaluate method:

319851e

 - Default to True
 - Added some local variables needed for verbose messages
 - Change the loop to enumerate to follow similar code structure


          Change GridSearch evaluate method to accept multiple measurements:

d6114ae

 - Best attributes are now dicts with measures as keys
 - Change the test to adapt to the new parameters of evaluate
 - Add absolute value to tests


          Added parameters documentation to GridSearch class and refactored Gri…

…dSearch parameters

 - Added parameters documenation
 - Renamed algo parameter to algo_class
 - Changed default measures from ['RMSE'] to become similar to evaluate ['rmse','mae']


          Made GridSearch best attributes not case sensitive:

671e05c

 - Removed duplicate definition of attributes
 - Changed definition from dict to Case insensitive dict
 - Added a test to make sure input parameters and output attributes are not case sensitive


          Corrected if condition that might lead to un-desired situation

fdcfb69


          Added params and measures as keys for cv_results_

49f1175


          Created 3 verbosity levels:

0cc7923

 - 0: Do not print anything
 - 1: Print params when combination starts and Mean scores when it finishes
 - 2: Print same info as 2 plus the score on each fold


          Added best estimator attribute:

bc156f0

 - Best algorithm instance with certain measure
 - Gives the ability for the user to use like any other algorithm class instance
 - Add test for this attribute


          Added documentation for the GridSearch class

c88b3ca


          ignored idea files generated by pycharm

d9efc62


          Create GridSearch class:

5e9b3eb

 - Import itertools
 - Add an __init__ function that initializes class instance
 - Add evaluate method that evaluates the parameters on a dataset


          Fixed evaluate() in GridSearch:

d70bcd6

 - tests on correct algorithm instance


          Created GridSearch testing file:

6f3453b

-Added a function to make sure the combinations returned are correct by testing its length


          Updated GridSearch evaluate():

1421a9e

 - Added best score, best parameters and best index attributes
 - Removed 1 attribute and added cv_results_  attribute analogous to sklearn implementation
 - Added tests for best attributes for RMSE and FCP measures


          Added verbose parameter to evaluate method:

0de431b

 - Default to True
 - Added some local variables needed for verbose messages
 - Change the loop to enumerate to follow similar code structure

mahermalaeb added 16 commits

December 31, 2016 17:05


          Change GridSearch evaluate method to accept multiple measurements:

c2c8650

 - Best attributes are now dicts with measures as keys
 - Change the test to adapt to the new parameters of evaluate
 - Add absolute value to tests


          Added parameters documentation to GridSearch class and refactored Gri…

e0d31e9

…dSearch parameters

 - Added parameters documenation
 - Renamed algo parameter to algo_class
 - Changed default measures from ['RMSE'] to become similar to evaluate ['rmse','mae']


          Made GridSearch best attributes not case sensitive:

e7b7cc0

 - Removed duplicate definition of attributes
 - Changed definition from dict to Case insensitive dict
 - Added a test to make sure input parameters and output attributes are not case sensitive


          Corrected if condition that might lead to un-desired situation

877560b


          Added params and measures as keys for cv_results_

8eb88ac


          Created 3 verbosity levels:

c4a34a4

 - 0: Do not print anything
 - 1: Print params when combination starts and Mean scores when it finishes
 - 2: Print same info as 2 plus the score on each fold


          Added best estimator attribute:

61e3836

 - Best algorithm instance with certain measure
 - Gives the ability for the user to use like any other algorithm class instance
 - Add test for this attribute


          Added documentation for the GridSearch class

382e368


          Merge branch 'grid-search-cv' of https://github.com/mahermalaeb/Surprise

f8e5e9a

 into grid-search-cv


          Remove @classmethod attribute.

01c0fd5

Correct test cases. old evaluate method and grid search evaluate gives the best results


          Added CaseInsensitiveDefaultDictForBestResults class:

52c4a9f

 - It is a clone of CaseInsensitiveDefaultDict but without overriding __str__ method
 - Users can now print the dict output normally for the best
 - Replaced the usage of the CaseInsensitiveDefaultDict to CaseInsensitiveDefaultDictForBestResults inGridSearch class


          Added User-Guide for GridSearch feature:

81f7771

 - Added an example file that contains the code of the user-guide
 - Edited the getting started .rst file to add the guide


          Refactored some parts of the code:

dce38b2

 - Used enumerate instead of index to count in loop
 - changes cv_results_ to defaultdict(list)
 - Reduced the populating of scores and parameters for 1 block


          Refactored code to use evaluate() method:

f6f43c1

 - No need to manually iterate over folds
 - Some verbose print statements avoided


          Addressed a set of simple enhancements:

b8e7abb

 - Reduced the number of iterations in some test functions to reduce testing time
 - Added reference to GridSearchCV from sklearn
 - fixed test_measure_is_not_case_sensitive  to actually fail if we have a bad key
 - Added few comments
 - Change verbose method of GridSearch evaluate
 - Reduced line sizes to less than 80 chars


          Changed measure to upper case from the start

1ae1781

mahermalaeb mentioned this pull request

[Feature] GridSearchCV for Surprise algorithms #6

Closed

Owner

NicolasHug commented Jan 1, 2017

Hi Maher,

This is looking good :) !

A few remarks still:

The docs don't build correctly (run make clean so that errors are not ignored). This is partly due to the fact that the Args section in the docstring of GridSearchCV should be less indented (check out other classes documentation to see how it should render)
Still on the doc, maybe the results of the print function could be added as comments?
You forgot to flake8 the tests and the examples ^^
By u'strings' I mean strings starting with character u (used to specify that this is unicode). I think we do not need them (look at line 38 and 49 of test_grid_search.py).
I confirm that I'd rather not use underscored attributes
You can keep the order of the keys of a dict using an OrderedDict, but you're right in that it would force the parameter type to be ordered so forget what I said.
It's OK for the CaseInsensitiveDefaultDict for now. I'll change it so that it does not override the __str__ method so we can have one class only.
The code looks great now!

We're getting closer :)

Nicolas

mahermalaeb added 3 commits

January 1, 2017 23:08


          Make grid search test and example PEP-8 compliant.

270fd1e

 - One import in example file is left at the end of the file on purpose


          Fixed errors and warning when building docs:

 - Renamed GridSearch attribute by removig the underscore from the end. Solved Errors
 - Gave different names for code blocks. Solved warnings


          Removed specifying unicode character 'u' from gridsearch test

b3ead3c

Contributor Author

mahermalaeb commented Jan 1, 2017

Hi again,

The docs are now building with no warning or errors. Unicode character 'u' is removed. GridSearch attributes are renamed to have the underscore at the end removed.

What is left is the following:

There is still 1 import warning in the example file when using flex8. I have left the import before using pandas on purpose. I thought it is easier for the user to understand the example that way. It can be easily corrected if you prefer.
Concerning showing the print results. Do you mean I just copy paste the value into a code cell or do you want the docs to actually run the code and print the results? The second case might need some research so that I know how to do it. Note that other docs results are generally not being reported.

Maher

Owner

NicolasHug commented Jan 1, 2017

Hey!

What version of flake8 are you using? Mine is 3.2.1 and I still get many warnings on the example and the tests:

examples/grid_search_usage.py:16:28: E231 missing whitespace after ','
examples/grid_search_usage.py:16:39: E231 missing whitespace after ','
examples/grid_search_usage.py:16:56: E231 missing whitespace after ','
examples/grid_search_usage.py:25:6: E211 whitespace before '('
examples/grid_search_usage.py:27:6: E211 whitespace before '('
examples/grid_search_usage.py:30:6: E211 whitespace before '('
examples/grid_search_usage.py:32:6: E211 whitespace before '('
examples/grid_search_usage.py:34:1: E402 module level import not at top of file
examples/grid_search_usage.py:36:6: E211 whitespace before '('
tests/test_grid_search.py:38:12: E127 continuation line over-indented for visual indent
tests/test_grid_search.py:49:12: E127 continuation line over-indented for visual indent
tests/test_grid_search.py:70:12: E127 continuation line over-indented for visual indent

Also, the sphinx (1.4.9) still gives me warnings:

/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".

I've updated the CONTRIBUTING.md file, if you want to take a look.

You did well with the pandas import. You can disable the flake8 warning by commenting with # noqa one the corresponding line line.

For the print results I was just thinking to put the output of the program as comments below the code (inside the the .py file), eg

# best RMSE score
print (gridSearch.best_score_['RMSE'])
# >>> 0.960961836446

# combination of parameters that gave the best RMSE score
print (gridSearch.best_params_['RMSE'])
# >>> {'n_epochs': 10, 'reg_all': 0.4, 'lr_all': 0.005}

I'm sorry I know this is a bit tedious...

Next one will be the right one ;) !

Nicolas

NicolasHug requested review from NicolasHug and removed request for NicolasHug

January 2, 2017 10:13

Owner

NicolasHug commented Jan 2, 2017

Duuuuh I just realized that I hadn't pulled your changes, hence I still got old warnings... ><

It's all fine there are no more warning, really sorry about that!

I'll squash and merge and avoid the import pandas warning.

Thanks a lot again for this nice feature! :)

Nicolas

NicolasHug merged commit 714be0b into NicolasHug:master

Contributor Author

mahermalaeb commented Jan 2, 2017

Awesome! I reviewed the changes you did on top of my code and documentation and indeed they make this feature look better. I might be coming back with new pull requests :) once I finish a Udacity's machine learning nano-degree. I used Surprise for the capstone project and got wonderful feedback. Thanks!

Owner

NicolasHug commented Jan 2, 2017

I'm very glad it's helped you!

I made a reference to your github profile on the project page (acknowledgements section), if you want it to link elsewhere (or nowhere) tell me.

Looking forward your next pull requests :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment