-
Notifications
You must be signed in to change notification settings - Fork 332
Sample pivot #252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample pivot #252
Conversation
|
@SamLau95 Want to look this over before I do the manual merge? |
|
Can you do the merge from master but not merge in the pull request? Eg. run |
SamLau95
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating these methods! See my comments in the file.
datascience/tables.py
Outdated
| create new columns, based on its unique values in self. | ||
| ``rows`` -- row labels, as (``str``) or list of strings, used to | ||
| create new rows based on it's unique values. | ||
| ``values`` -- column label in self for use in aggregation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these arguments, can we avoid the use of self? The students don't know what that means. We should instead say the table.
datascience/tables.py
Outdated
| Args: | ||
| ``columns`` -- a single column label, (``str``), in self, used to | ||
| create new columns, based on its unique values in self. | ||
| ``rows`` -- row labels, as (``str``) or list of strings, used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use array instead of list.
datascience/tables.py
Outdated
| with_replacement (bool): If True (default), samples the rows with | ||
| replacement. If False, samples the rows without replacement. | ||
| ``with_replacement`` -- (``boolean``), if true samples ``k`` rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: Technically there's only bools in Python, not booleans. Also, could you capitalize true and specify which option is the default? Finally, the same comment about using self applies here.
datascience/tables.py
Outdated
| ``weights``: Array specifying valid probability distribution. | ||
| Rows in self are sampled according the the | ||
| probability distribution given by ``weights``. Default is | ||
| uniform distribution on [1, ... , n], n = number of rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think the previous description is clearer, since it specifies the format of weights. If you feel like yours is better, care to explain what you didn't like about the previous description?
| >>> jobs.sample(k = 2, weights = make_array(1, 0, 0)) | ||
| Traceback (most recent call last): | ||
| ... | ||
| ValueError: a and p must have same size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a Raises section to the docstring to document these ValueErrors?
SamLau95
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one comment.
datascience/tables.py
Outdated
| ``with_replacement`` -- (``boolean``), if true samples ``k`` rows | ||
| with replacement from self, else samples ``k`` rows without | ||
| replacement. | ||
| ``with_replacement`` -- (``bool``) By default TRUE; Samples ``k`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I was unclear in my previous comment. I meant True, since that's how boolean values in Python are spelled.
datascience/tables.py
Outdated
| replacement. If False, samples the rows without replacement. | ||
| ``weights`` -- Array specifying probability the ith row of the | ||
| table is sampled. If None, by default, ``weights`` is the | ||
| uniform distribution on [1, ... , n], n = number of rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is more clear than before, which states:
If None (default), samples the rows using a uniform random distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it was clear how the sampling was done... specifically we are sampling indices and then selecting. That's why I wanted to make explicit the ith entry in the array of weights corresponds to the probability that the ith row is selected, in sample size 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that stating the rows are sampled using a uniform random distribution is fairly clear since that's a distribution students are familiar with.
The first time I saw [1, ... , n] I thought that was a valid value of weights which is misleading.
What about stating: weights defaults to None, which samples each row with equal probability.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah fair enough... can see how U ~ [1, ... , n] could be confused with python list. Ok, I'll meet half way.
|
@SamLau95 all set? |
|
Yup, I already approved it so you can merge whenever you'd like. |
[ ] Wrote test for feature
[ ] Added changes in the Changelog section in README.md
[ ] Bumped version number (delete if unneeded)
Changes proposed: