Skip to content

Conversation

@maxwelljweinstein
Copy link
Contributor

[ ] Wrote test for feature
[ ] Added changes in the Changelog section in README.md
[ ] Bumped version number (delete if unneeded)

Changes proposed:

@maxwelljweinstein
Copy link
Contributor Author

@SamLau95 Want to look this over before I do the manual merge?

@SamLau95
Copy link
Contributor

Can you do the merge from master but not merge in the pull request? Eg. run git pull origin master on this branch and git push origin sample_pivot but don't click Merge pull request.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 74.894% when pulling e6d78f4 on sample_pivot into 03aa023 on master.

Copy link
Contributor

@SamLau95 SamLau95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating these methods! See my comments in the file.

create new columns, based on its unique values in self.
``rows`` -- row labels, as (``str``) or list of strings, used to
create new rows based on it's unique values.
``values`` -- column label in self for use in aggregation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these arguments, can we avoid the use of self? The students don't know what that means. We should instead say the table.

Args:
``columns`` -- a single column label, (``str``), in self, used to
create new columns, based on its unique values in self.
``rows`` -- row labels, as (``str``) or list of strings, used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use array instead of list.

with_replacement (bool): If True (default), samples the rows with
replacement. If False, samples the rows without replacement.
``with_replacement`` -- (``boolean``), if true samples ``k`` rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: Technically there's only bools in Python, not booleans. Also, could you capitalize true and specify which option is the default? Finally, the same comment about using self applies here.

``weights``: Array specifying valid probability distribution.
Rows in self are sampled according the the
probability distribution given by ``weights``. Default is
uniform distribution on [1, ... , n], n = number of rows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think the previous description is clearer, since it specifies the format of weights. If you feel like yours is better, care to explain what you didn't like about the previous description?

>>> jobs.sample(k = 2, weights = make_array(1, 0, 0))
Traceback (most recent call last):
...
ValueError: a and p must have same size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a Raises section to the docstring to document these ValueErrors?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 75.035% when pulling 08bc2ee on sample_pivot into 03aa023 on master.

Copy link
Contributor

@SamLau95 SamLau95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just one comment.

``with_replacement`` -- (``boolean``), if true samples ``k`` rows
with replacement from self, else samples ``k`` rows without
replacement.
``with_replacement`` -- (``bool``) By default TRUE; Samples ``k``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I was unclear in my previous comment. I meant True, since that's how boolean values in Python are spelled.

replacement. If False, samples the rows without replacement.
``weights`` -- Array specifying probability the ith row of the
table is sampled. If None, by default, ``weights`` is the
uniform distribution on [1, ... , n], n = number of rows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is more clear than before, which states:

If None (default), samples the rows using a uniform random distribution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it was clear how the sampling was done... specifically we are sampling indices and then selecting. That's why I wanted to make explicit the ith entry in the array of weights corresponds to the probability that the ith row is selected, in sample size 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that stating the rows are sampled using a uniform random distribution is fairly clear since that's a distribution students are familiar with.

The first time I saw [1, ... , n] I thought that was a valid value of weights which is misleading.

What about stating: weights defaults to None, which samples each row with equal probability.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fair enough... can see how U ~ [1, ... , n] could be confused with python list. Ok, I'll meet half way.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 75.035% when pulling 6bd1776 on sample_pivot into 03aa023 on master.

@maxwelljweinstein
Copy link
Contributor Author

@SamLau95 all set?

@SamLau95
Copy link
Contributor

Yup, I already approved it so you can merge whenever you'd like.

@maxwelljweinstein maxwelljweinstein merged commit a01e52f into master Oct 25, 2016
@maxwelljweinstein maxwelljweinstein deleted the sample_pivot branch October 25, 2016 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants