New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly re-raise exceptions to propagate stack trace for exceptions. #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes. Please merge after fixing.
f1 = 2*(prec*rec)/(prec+rec) | ||
except ZeroDivisionError as e: | ||
f1 = -1.0 | ||
f1 = 2*(prec*rec)/(prec+rec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this change to avoid zero division?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If prec + rec
is 0
, then I think the repair process has done something exceptionally wrong and we should explicitly raise this instead of silently setting f1 = -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok for now
repair/featurize/freqfeat.py
Outdated
@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes): | |||
for idx, val in enumerate(domain): | |||
try: | |||
prob = float(self.single_stats[attribute][val])/float(self.total) | |||
except: | |||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this catches ZeroDivisionError
. If so, add it explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually catches the case where a KeyError
is thrown: when the val
in the domain
does not appear in the original dataset when calculating stats (I'm actually not sure how this could be the case currently since we always generate domain from the values in the corresponding attribute column, yet I'm running into this issue when running the tests. Do you have context on this @thodrek ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO and opened #26.
68dc5ad
to
9366148
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KeyVal exception needs to be investigated.
f1 = 2*(prec*rec)/(prec+rec) | ||
except ZeroDivisionError as e: | ||
f1 = -1.0 | ||
f1 = 2*(prec*rec)/(prec+rec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.
repair/featurize/freqfeat.py
Outdated
@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes): | |||
for idx, val in enumerate(domain): | |||
try: | |||
prob = float(self.single_stats[attribute][val])/float(self.total) | |||
except: | |||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.
Closes #20
Re-ran
./start_test.sh
with no errors.