Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearch feature #7

Merged
merged 49 commits into from
Jan 2, 2017
Merged

Conversation

mahermalaeb
Copy link
Contributor

This pull request implements #6.

mahermalaeb and others added 30 commits December 29, 2016 00:28
 - Import itertools
 - Add an __init__ function that initializes class instance
 - Add evaluate method that evaluates the parameters on a dataset
 - tests on correct algorithm instance
-Added a function to make sure the combinations returned are correct by testing its length
 - Added best score, best parameters and best index attributes
 - Removed 1 attribute and added cv_results_  attribute analogous to sklearn implementation
 - Added tests for best attributes for RMSE and FCP measures
 - Default to True
 - Added some local variables needed for verbose messages
 - Change the loop to enumerate to follow similar code structure
 - Best attributes are now dicts with measures as keys
 - Change the test to adapt to the new parameters of evaluate
 - Add absolute value to tests
…dSearch parameters

 - Added parameters documenation
 - Renamed algo parameter to algo_class
 - Changed default measures from ['RMSE'] to become similar to evaluate ['rmse','mae']
 - Removed duplicate definition of attributes
 - Changed definition from dict to Case insensitive dict
 - Added a test to make sure input parameters and output attributes are not case sensitive
 - 0: Do not print anything
 - 1: Print params when combination starts and Mean scores when it finishes
 - 2: Print same info as 2 plus the score on each fold
 - Best algorithm instance with certain measure
 - Gives the ability for the user to use like any other algorithm class instance
 - Add test for this attribute
 - Import itertools
 - Add an __init__ function that initializes class instance
 - Add evaluate method that evaluates the parameters on a dataset
 - tests on correct algorithm instance
-Added a function to make sure the combinations returned are correct by testing its length
 - Added best score, best parameters and best index attributes
 - Removed 1 attribute and added cv_results_  attribute analogous to sklearn implementation
 - Added tests for best attributes for RMSE and FCP measures
 - Default to True
 - Added some local variables needed for verbose messages
 - Change the loop to enumerate to follow similar code structure
 - Best attributes are now dicts with measures as keys
 - Change the test to adapt to the new parameters of evaluate
 - Add absolute value to tests
…dSearch parameters

 - Added parameters documenation
 - Renamed algo parameter to algo_class
 - Changed default measures from ['RMSE'] to become similar to evaluate ['rmse','mae']
 - Removed duplicate definition of attributes
 - Changed definition from dict to Case insensitive dict
 - Added a test to make sure input parameters and output attributes are not case sensitive
 - 0: Do not print anything
 - 1: Print params when combination starts and Mean scores when it finishes
 - 2: Print same info as 2 plus the score on each fold
 - Best algorithm instance with certain measure
 - Gives the ability for the user to use like any other algorithm class instance
 - Add test for this attribute
Correct test cases. old evaluate method and grid search evaluate gives the best results
 - It is a clone of CaseInsensitiveDefaultDict but without overriding __str__ method
 - Users can now print the dict output normally for the best
 - Replaced the usage of the CaseInsensitiveDefaultDict to CaseInsensitiveDefaultDictForBestResults inGridSearch class
 - Added an example file that contains the code of the user-guide
 - Edited the getting started .rst file to add the guide
 - Used enumerate instead of index to count in loop
 - changes cv_results_ to defaultdict(list)
 - Reduced the populating of scores and parameters for 1 block
 - No need to manually iterate over folds
 - Some verbose print statements avoided
 - Reduced the number of iterations in some test functions to reduce testing time
 - Added reference to GridSearchCV from sklearn
 - fixed test_measure_is_not_case_sensitive  to actually fail if we have a bad key
 - Added few comments
 - Change verbose method of GridSearch evaluate
 - Reduced line sizes to less than 80 chars
@NicolasHug
Copy link
Owner

Hi Maher,

This is looking good :) !

A few remarks still:

  • The docs don't build correctly (run make clean so that errors are not ignored). This is partly due to the fact that the Args section in the docstring of GridSearchCV should be less indented (check out other classes documentation to see how it should render)
  • Still on the doc, maybe the results of the print function could be added as comments?
  • You forgot to flake8 the tests and the examples ^^
  • By u'strings' I mean strings starting with character u (used to specify that this is unicode). I think we do not need them (look at line 38 and 49 of test_grid_search.py).
  • I confirm that I'd rather not use underscored attributes
  • You can keep the order of the keys of a dict using an OrderedDict, but you're right in that it would force the parameter type to be ordered so forget what I said.
  • It's OK for the CaseInsensitiveDefaultDict for now. I'll change it so that it does not override the __str__ method so we can have one class only.
  • The code looks great now!

We're getting closer :)

Nicolas

 - One import in example file is left at the end of the file on purpose
 - Renamed GridSearch attribute by removig the underscore from the end. Solved Errors
 - Gave different names for code blocks. Solved warnings
@mahermalaeb
Copy link
Contributor Author

Hi again,

The docs are now building with no warning or errors. Unicode character 'u' is removed. GridSearch attributes are renamed to have the underscore at the end removed.

What is left is the following:

  • There is still 1 import warning in the example file when using flex8. I have left the import before using pandas on purpose. I thought it is easier for the user to understand the example that way. It can be easily corrected if you prefer.
  • Concerning showing the print results. Do you mean I just copy paste the value into a code cell or do you want the docs to actually run the code and print the results? The second case might need some research so that I know how to do it. Note that other docs results are generally not being reported.

Maher

@NicolasHug
Copy link
Owner

Hey!

What version of flake8 are you using? Mine is 3.2.1 and I still get many warnings on the example and the tests:

examples/grid_search_usage.py:16:28: E231 missing whitespace after ','
examples/grid_search_usage.py:16:39: E231 missing whitespace after ','
examples/grid_search_usage.py:16:56: E231 missing whitespace after ','
examples/grid_search_usage.py:25:6: E211 whitespace before '('
examples/grid_search_usage.py:27:6: E211 whitespace before '('
examples/grid_search_usage.py:30:6: E211 whitespace before '('
examples/grid_search_usage.py:32:6: E211 whitespace before '('
examples/grid_search_usage.py:34:1: E402 module level import not at top of file
examples/grid_search_usage.py:36:6: E211 whitespace before '('
tests/test_grid_search.py:38:12: E127 continuation line over-indented for visual indent
tests/test_grid_search.py:49:12: E127 continuation line over-indented for visual indent
tests/test_grid_search.py:70:12: E127 continuation line over-indented for visual indent

Also, the sphinx (1.4.9) still gives me warnings:

/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".
/home/nico/Surprise/doc/source/getting_started.rst:4: WARNING: Duplicate explicit target name: "grid_search_usage.py".

I've updated the CONTRIBUTING.md file, if you want to take a look.

You did well with the pandas import. You can disable the flake8 warning by commenting with # noqa one the corresponding line line.

For the print results I was just thinking to put the output of the program as comments below the code (inside the the .py file), eg

# best RMSE score
print (gridSearch.best_score_['RMSE'])
# >>> 0.960961836446

# combination of parameters that gave the best RMSE score
print (gridSearch.best_params_['RMSE'])
# >>> {'n_epochs': 10, 'reg_all': 0.4, 'lr_all': 0.005}

I'm sorry I know this is a bit tedious...

Next one will be the right one ;) !

Nicolas

@NicolasHug NicolasHug requested review from NicolasHug and removed request for NicolasHug January 2, 2017 10:13
@NicolasHug
Copy link
Owner

Duuuuh I just realized that I hadn't pulled your changes, hence I still got old warnings... ><

It's all fine there are no more warning, really sorry about that!

I'll squash and merge and avoid the import pandas warning.

Thanks a lot again for this nice feature! :)

Nicolas

@NicolasHug NicolasHug merged commit 714be0b into NicolasHug:master Jan 2, 2017
@mahermalaeb
Copy link
Contributor Author

Awesome! I reviewed the changes you did on top of my code and documentation and indeed they make this feature look better. I might be coming back with new pull requests :) once I finish a Udacity's machine learning nano-degree. I used Surprise for the capstone project and got wonderful feedback. Thanks!

@NicolasHug
Copy link
Owner

I'm very glad it's helped you!

I made a reference to your github profile on the project page (acknowledgements section), if you want it to link elsewhere (or nowhere) tell me.

Looking forward your next pull requests :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants