Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard error margin #224

Merged
merged 6 commits into from
Nov 2, 2020
Merged

Standard error margin #224

merged 6 commits into from
Nov 2, 2020

Conversation

slobodan-ilic
Copy link
Contributor

Mostly add expectations, but also implement the feature.

Copy link
Contributor

@ernestoarbitrio ernestoarbitrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments ...

@coveralls
Copy link

coveralls commented Oct 28, 2020

Coverage Status

Coverage remained the same at 100.0% when pulling 21d6d76 on standard-error-margin into 85cd116 on master.

Copy link
Contributor

@ernestoarbitrio ernestoarbitrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM to me :)

Copy link
Contributor

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes, mostly just updates on the docstring and test conventions we've evolved to while you were away :)

Comment on lines 300 to 307
"""Returns the margin of error (MoE) for col percentages
`moe = Z_975 * 100 * std_error` (the * 100 part accounts for percentages)
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to column_percentages_moe so it is explicit and also appears immediately following column_percentages, both here in the code and in the documentation.

Prefer "float margin of error (MoE) for column percentages." for the summary line. A property, like an attribute, doesn't "return" anything, it "is" some value. Although there are many examples (like the next property below) left to clean up, we reserve the "Returns ... prefix for methods and functions.

Blank line before continuation when the docstring extends beyond the first line: https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings

moe = Z_975 * 100 * std_error just restates the implementation. This docstring appears in the documentation and so should communicate to the user what they are getting from (the contract for) this property, in user-space terms. It should also probably state that 3.5% is represented as 3.5 (and not .035). Also it should probably mention that the return value can be NaN and under what circumstances. And we should state clearly whether a returned NaN is a float("NaN") or np.nan because those two have different behaviors in certain comparisons.

The (the * 100 ...) bit can appear as a comment and seems useful. Comments speak in developer-space and topics related to the implementation are appropriate there.

@@ -239,6 +239,10 @@ class _Slice(CubePartition):
dimensions which can be crosstabbed in a slice.
"""

# quantile of the normal cdf at .975 because the confidence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't make me look up what cdf is. Spell it out and then show the abbreviation in parentheses, like Common Denominator Frombus (CDF). Also, if you would, prefix comments with --- to make them stand out from commented-out code. Like:

# --- Quantile of the normal Cubic Dimension Fromulus (CDF) at .975 because the
# --- confidence interval is ±.025 on each side.

What are the units of the ±.025 value? Is that standard-deviations? Best to be explicit. Use these opportunities to educate the uninitiated reader.

Comment on lines 296 to 307
"""Returns the standard deviation for cell percentages
"""Returns the standard deviation for col percentages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, remainder of fixup would be something like:

"""np.float64 standard-deviation of column percentages."""

Comment on lines 351 to 359
0 1 2 3 4 5 6
0 inf inf inf inf inf -2.9 inf
1 inf inf inf inf inf -4.3 inf
2 2.5 1.3 3.3 -0.70 -7.25 -6.52 2.25
3 inf inf inf inf inf -2.51 inf
4 -1.16 2.20 5.84 1.78 -8.48 -5.92 0.93
5 inf inf inf inf inf 9.70 inf

Only the insertions residuals are showed in a inf masked array"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this scattered alignment with tabs in it has been bugging me for months :)

Final """ should appear on its own line. From the PEP257 link above:

Unless the entire docstring fits on a line, place the closing quotes on a line by themselves.

Comment on lines 604 to 613
""" -> np.int64 ndarray of the columns scale median
"""-> np.int64 ndarray of the columns scale median
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried out this -> shorthand for "Returns ..." because it was compact and consistent with Python3 type-hints. But the new version of Black won't allow the leading space and then the eye struggles to differentiate it from the """. So we're back to using Returns when applicable (methods and functions). This is a property though, so the arrow should just go away, leaving """np.int64 ndarray of ...""".

Comment on lines 778 to 792
"""Returns the margin of error (MoE) for table percentages
`moe = Z_975 * 100 * std_error` (the * 100 part accounts for percentages)
"""
return self.Z_975 * 100 * self.table_std_err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remarks as on column_moe.

@@ -1127,6 +1127,65 @@ def it_calculate_col_residuals_for_subtotals(self):
],
],
)
np.testing.assert_almost_equal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this expectation should be extracted. It's long enough to disrupt reading the test.

@@ -375,11 +375,58 @@ def test_various_measures_from_r_rows_margin():
],
]

expected_col_moe = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of the expectations for this test should be extracted. Together they are like three screenfuls of noise in the test. We're doing this incrementally as we encounter legacy tests like this one.

Copy link
Contributor

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small remarks, mostly for future reference, but approved in advance.

@@ -764,6 +780,17 @@ def table_margin(self):
def table_margin_unpruned(self):
return self._matrix.table_margin_unpruned

@lazyproperty
def table_percentages_moe(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of alphabetical order.

Comment on lines 865 to 892
"""Returns the variance for cell percentages
"""Returns the variance for column percentages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking you to change all these because they snuck in on prior commits, but the "Return" at the beginning should be imperative tense rather than passive tense. So like "Return the variance ..." like you were telling this property what to do rather than "Returns the variance ..." like it was a passive description. If you check the Python documentation you'll see they do it this way. Generally I make all these little fixups if I have occasion to make any change to a docstring.

@ernestoarbitrio ernestoarbitrio merged commit 21e7042 into master Nov 2, 2020
@ernestoarbitrio ernestoarbitrio deleted the standard-error-margin branch November 2, 2020 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants