Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly re-raise exceptions to propagate stack trace for exceptions. #23

Merged
merged 3 commits into from Nov 20, 2018

Conversation

richardwu
Copy link
Collaborator

@richardwu richardwu commented Nov 19, 2018

Closes #20

Re-ran ./start_test.sh with no errors.

Copy link
Contributor

@minafarid minafarid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes. Please merge after fixing.

dataset/dataset.py Outdated Show resolved Hide resolved
f1 = 2*(prec*rec)/(prec+rec)
except ZeroDivisionError as e:
f1 = -1.0
f1 = 2*(prec*rec)/(prec+rec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change to avoid zero division?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prec + rec is 0, then I think the repair process has done something exceptionally wrong and we should explicitly raise this instead of silently setting f1 = -1.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok for now

@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes):
for idx, val in enumerate(domain):
try:
prob = float(self.single_stats[attribute][val])/float(self.total)
except:
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this catches ZeroDivisionError. If so, add it explicitly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually catches the case where a KeyError is thrown: when the val in the domain does not appear in the original dataset when calculating stats (I'm actually not sure how this could be the case currently since we always generate domain from the values in the corresponding attribute column, yet I'm running into this issue when running the tests. Do you have context on this @thodrek ?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO and opened #26.

Copy link

@thodrek thodrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The KeyVal exception needs to be investigated.

dataset/dataset.py Outdated Show resolved Hide resolved
f1 = 2*(prec*rec)/(prec+rec)
except ZeroDivisionError as e:
f1 = -1.0
f1 = 2*(prec*rec)/(prec+rec)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.

@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes):
for idx, val in enumerate(domain):
try:
prob = float(self.single_stats[attribute][val])/float(self.total)
except:
except Exception:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.

@thodrek thodrek merged commit 9c01573 into HoloClean:master Nov 20, 2018
@richardwu richardwu mentioned this pull request Nov 20, 2018
@richardwu richardwu deleted the fix_exception_handling branch April 7, 2019 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants