Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

希望增加group_by之后的统计函数? #49

Open
linearhinos opened this issue Dec 22, 2017 · 4 comments
Open

希望增加group_by之后的统计函数? #49

linearhinos opened this issue Dec 22, 2017 · 4 comments
Assignees

Comments

@linearhinos
Copy link

在对key group_by之后,希望可以方便做求均值,求方差,排序再遍历这样的操作;
希望可以提供类似这样的内置函数

@yshysh
Copy link
Collaborator

yshysh commented Dec 25, 2017

I can not understand what you mean。

@linearhinos
Copy link
Author

i mean, in addition to sum(), count(), could bigflow support mean()/variance() and other popular statistical function for PCollection ?

@acmol
Copy link
Collaborator

acmol commented Dec 25, 2017

Actually, you can use:

def mean(p):
    return p.sum() / p.count()   
    # this is a sugar for p.sum().map(lambda s, c: s / c, p.count())

to implement mean in one line.

then, you can use them in apply_values,
e.g.

p.group_by_key()\
  .apply_values(mean)

At the same time, if you want to use it to a global pcollection, you can just use apply:

p.apply(mean) 

or just call it directly:

mean(p)

Because it's easy to implement these functions, so we don't regard them as built-in methods.

If you find it difficult to write these functions, you can always use transforms.make_tuple(pobject1, pobject2).
E.g. You can use transforms.make_tuple to implement mean like this:

def mean(p):
    return transforms.make_tuple(p.sum(), p.count()).map(lambda (s, c): s/c)

And you can implement a method to get both sum and mean, and use them in apply_values like this:

def sum_and_mean(p):
    return transforms.make_tuple(p.sum(), p.apply(mean))

p.group_by_key().apply_values(sum_and_mean)

@chunyang-wen
Copy link
Collaborator

I think there should be a module to provide available or useful functions.

@yshysh yshysh self-assigned this Jan 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants