-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard error margin #224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments ...
9de1ee1
to
51f2a31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few changes, mostly just updates on the docstring and test conventions we've evolved to while you were away :)
src/cr/cube/cubepart.py
Outdated
"""Returns the margin of error (MoE) for col percentages | ||
`moe = Z_975 * 100 * std_error` (the * 100 part accounts for percentages) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to column_percentages_moe
so it is explicit and also appears immediately following column_percentages
, both here in the code and in the documentation.
Prefer "float margin of error (MoE) for column percentages."
for the summary line. A property, like an attribute, doesn't "return" anything, it "is" some value. Although there are many examples (like the next property below) left to clean up, we reserve the "Returns ...
prefix for methods and functions.
Blank line before continuation when the docstring extends beyond the first line: https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings
moe = Z_975 * 100 * std_error
just restates the implementation. This docstring appears in the documentation and so should communicate to the user what they are getting from (the contract for) this property, in user-space terms. It should also probably state that 3.5% is represented as 3.5 (and not .035). Also it should probably mention that the return value can be NaN and under what circumstances. And we should state clearly whether a returned NaN is a float("NaN")
or np.nan
because those two have different behaviors in certain comparisons.
The (the * 100 ...)
bit can appear as a comment and seems useful. Comments speak in developer-space and topics related to the implementation are appropriate there.
src/cr/cube/cubepart.py
Outdated
@@ -239,6 +239,10 @@ class _Slice(CubePartition): | |||
dimensions which can be crosstabbed in a slice. | |||
""" | |||
|
|||
# quantile of the normal cdf at .975 because the confidence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't make me look up what cdf
is. Spell it out and then show the abbreviation in parentheses, like Common Denominator Frombus (CDF)
. Also, if you would, prefix comments with ---
to make them stand out from commented-out code. Like:
# --- Quantile of the normal Cubic Dimension Fromulus (CDF) at .975 because the
# --- confidence interval is ±.025 on each side.
What are the units of the ±.025 value? Is that standard-deviations? Best to be explicit. Use these opportunities to educate the uninitiated reader.
src/cr/cube/cubepart.py
Outdated
"""Returns the standard deviation for cell percentages | ||
"""Returns the standard deviation for col percentages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, remainder of fixup would be something like:
"""np.float64 standard-deviation of column percentages."""
src/cr/cube/cubepart.py
Outdated
0 1 2 3 4 5 6 | ||
0 inf inf inf inf inf -2.9 inf | ||
1 inf inf inf inf inf -4.3 inf | ||
2 2.5 1.3 3.3 -0.70 -7.25 -6.52 2.25 | ||
3 inf inf inf inf inf -2.51 inf | ||
4 -1.16 2.20 5.84 1.78 -8.48 -5.92 0.93 | ||
5 inf inf inf inf inf 9.70 inf | ||
|
||
Only the insertions residuals are showed in a inf masked array""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this scattered alignment with tabs in it has been bugging me for months :)
Final """
should appear on its own line. From the PEP257 link above:
Unless the entire docstring fits on a line, place the closing quotes on a line by themselves.
src/cr/cube/cubepart.py
Outdated
""" -> np.int64 ndarray of the columns scale median | ||
"""-> np.int64 ndarray of the columns scale median |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried out this ->
shorthand for "Returns ..."
because it was compact and consistent with Python3 type-hints. But the new version of Black
won't allow the leading space and then the eye struggles to differentiate it from the """
. So we're back to using Returns
when applicable (methods and functions). This is a property though, so the arrow should just go away, leaving """np.int64 ndarray of ..."""
.
src/cr/cube/cubepart.py
Outdated
"""Returns the margin of error (MoE) for table percentages | ||
`moe = Z_975 * 100 * std_error` (the * 100 part accounts for percentages) | ||
""" | ||
return self.Z_975 * 100 * self.table_std_err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same remarks as on column_moe
.
@@ -1127,6 +1127,65 @@ def it_calculate_col_residuals_for_subtotals(self): | |||
], | |||
], | |||
) | |||
np.testing.assert_almost_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this expectation should be extracted. It's long enough to disrupt reading the test.
@@ -375,11 +375,58 @@ def test_various_measures_from_r_rows_margin(): | |||
], | |||
] | |||
|
|||
expected_col_moe = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of the expectations for this test should be extracted. Together they are like three screenfuls of noise in the test. We're doing this incrementally as we encounter legacy tests like this one.
51f2a31
to
4d10e24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small remarks, mostly for future reference, but approved in advance.
src/cr/cube/cubepart.py
Outdated
@@ -764,6 +780,17 @@ def table_margin(self): | |||
def table_margin_unpruned(self): | |||
return self._matrix.table_margin_unpruned | |||
|
|||
@lazyproperty | |||
def table_percentages_moe(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of alphabetical order.
src/cr/cube/cubepart.py
Outdated
"""Returns the variance for cell percentages | ||
"""Returns the variance for column percentages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not asking you to change all these because they snuck in on prior commits, but the "Return" at the beginning should be imperative tense rather than passive tense. So like "Return the variance ..." like you were telling this property what to do rather than "Returns the variance ..." like it was a passive description. If you check the Python documentation you'll see they do it this way. Generally I make all these little fixups if I have occasion to make any change to a docstring.
Mostly add expectations, but also implement the feature.