Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] CN2 Rule Induction: handling memory error #2397

Merged
merged 1 commit into from
Jul 21, 2017
Merged

[FIX] CN2 Rule Induction: handling memory error #2397

merged 1 commit into from
Jul 21, 2017

Conversation

jerneju
Copy link
Contributor

@jerneju jerneju commented Jun 12, 2017

Issue
Description of changes
Includes
  • Code changes
  • Tests
  • Documentation

@jerneju
Copy link
Contributor Author

jerneju commented Jun 12, 2017

@codecov-io
Copy link

codecov-io commented Jun 12, 2017

Codecov Report

Merging #2397 into master will increase coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2397      +/-   ##
==========================================
+ Coverage   73.88%   73.92%   +0.03%     
==========================================
  Files         322      322              
  Lines       55843    55855      +12     
==========================================
+ Hits        41262    41289      +27     
+ Misses      14581    14566      -15

Copy link
Contributor

@lanzagar lanzagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markotoplak brought up a good point.
CN2 rules shouldn't really use a huge amount of ram (or at least they could be implemented in a way that they don't).
Maybe we should investigate why memory errors happen here, before adding a try/except which will show error messages to the user but hide the problems from us.

self.model = None
try:
self.model = self.learner(self.data)
print(type(self.learner))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a leftover debug print.

@janezd
Copy link
Contributor

janezd commented Jun 23, 2017

@jerneju, can you perhaps finish this? That is, discover how CN2 runs out of memory?

If you don't succeed, let's merge this PR.

@jerneju
Copy link
Contributor Author

jerneju commented Jun 26, 2017

I did some tests using memory_profiler. I used data from adult.tab. Here are the results:
Number of used nominal features Memory consumption
1......................................................0.0 MiB
2......................................................4.9 MiB
3......................................................14.7 MiB
4......................................................85.2 MiB

@janezd
Copy link
Contributor

janezd commented Jun 30, 2017

It will always be possible to cause MemoryError in CN2 even with reasonably sized data by using unreasonable settings. Even if CN2 can be optimized, the checks in this PR won't hurt.

We're thus merging this, and leave any optimization of CN2 as further work.

@janezd janezd merged commit 6f27ecb into biolab:master Jul 21, 2017
@jerneju jerneju deleted the memory-rules branch July 21, 2017 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants