Properly re-raise exceptions to propagate stack trace for exceptions. #23

richardwu · 2018-11-19T23:06:53Z

Closes #20

Re-ran ./start_test.sh with no errors.

minafarid

Minor changes. Please merge after fixing.

dataset/dataset.py

minafarid · 2018-11-20T18:48:58Z

evaluate/eval.py

-            f1 = 2*(prec*rec)/(prec+rec)
-        except ZeroDivisionError as e:
-            f1 = -1.0
+        f1 = 2*(prec*rec)/(prec+rec)


Why did this change to avoid zero division?

If prec + rec is 0, then I think the repair process has done something exceptionally wrong and we should explicitly raise this instead of silently setting f1 = -1.

I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.

minafarid · 2018-11-20T18:49:54Z

repair/featurize/freqfeat.py

@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes):
        for idx, val in enumerate(domain):
            try:
                prob = float(self.single_stats[attribute][val])/float(self.total)
-            except:
+            except Exception:


I assume this catches ZeroDivisionError. If so, add it explicitly

This actually catches the case where a KeyError is thrown: when the val in the domain does not appear in the original dataset when calculating stats (I'm actually not sure how this could be the case currently since we always generate domain from the values in the corresponding attribute column, yet I'm running into this issue when running the tests. Do you have context on this @thodrek ?)

Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.

Added a TODO and opened #26.

thodrek

The KeyVal exception needs to be investigated.

dataset/dataset.py

thodrek · 2018-11-20T19:11:00Z

evaluate/eval.py

-            f1 = 2*(prec*rec)/(prec+rec)
-        except ZeroDivisionError as e:
-            f1 = -1.0
+        f1 = 2*(prec*rec)/(prec+rec)


I agree with @richardwu . The -1 was to print something that will indicate that we are doing something bad. But at the same time I wanted to see the report on detected errors, corrected cells et. Can we actually add two print statements in the report: 1) a statement that returns the detailed counts in the current report and 2) a statement that returns f1, precision, recall etc.

thodrek · 2018-11-20T19:14:13Z

repair/featurize/freqfeat.py

@@ -25,7 +25,7 @@ def gen_feat_tensor(self, input, classes):
        for idx, val in enumerate(domain):
            try:
                prob = float(self.single_stats[attribute][val])/float(self.total)
-            except:
+            except Exception:


Can you investigate if there is a problem with the groupby statement over the data frame that generates the counts that go into single_stats attributes? I have also observed the KeyError exception but never investigated properly. Grab a key for which you get an error and see what is happening in the group_by statement over the raw pandas that generates the single_stats data frame.

Properly re-raise exceptions to propagate stack trace for exceptions.

4ed8726

richardwu mentioned this pull request Nov 20, 2018

Replace print statements with logging #25

Merged

thodrek requested review from minafarid and thodrek November 20, 2018 18:24

minafarid approved these changes Nov 20, 2018

View reviewed changes

Addressed PR comments.

9366148

richardwu force-pushed the fix_exception_handling branch from 68dc5ad to 9366148 Compare November 20, 2018 19:04

thodrek requested changes Nov 20, 2018

View reviewed changes

Added TODO for investigating missing value from single stats.

078a17a

richardwu mentioned this pull request Nov 20, 2018

Investigate missing values from self.single_stats #26

Closed

thodrek approved these changes Nov 20, 2018

View reviewed changes

thodrek merged commit 9c01573 into HoloClean:master Nov 20, 2018

richardwu mentioned this pull request Nov 20, 2018

Zhihan/debugging mode #22

Merged

richardwu deleted the fix_exception_handling branch April 7, 2019 20:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly re-raise exceptions to propagate stack trace for exceptions. #23

Properly re-raise exceptions to propagate stack trace for exceptions. #23

richardwu commented Nov 19, 2018 •

edited

minafarid left a comment

minafarid Nov 20, 2018

richardwu Nov 20, 2018

thodrek Nov 20, 2018

minafarid Nov 20, 2018

minafarid Nov 20, 2018

richardwu Nov 20, 2018

thodrek Nov 20, 2018

richardwu Nov 20, 2018

thodrek left a comment

thodrek Nov 20, 2018

thodrek Nov 20, 2018

Properly re-raise exceptions to propagate stack trace for exceptions. #23

Properly re-raise exceptions to propagate stack trace for exceptions. #23

Conversation

richardwu commented Nov 19, 2018 • edited

minafarid left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thodrek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardwu commented Nov 19, 2018 •

edited