Skip to content
This repository has been archived by the owner on May 7, 2023. It is now read-only.

[FEATURE] Using defaultdict in *group_by* implementation #201

Closed
geekypandey opened this issue Jun 10, 2020 · 2 comments · Fixed by #235
Closed

[FEATURE] Using defaultdict in *group_by* implementation #201

geekypandey opened this issue Jun 10, 2020 · 2 comments · Fixed by #235
Labels
enhancement New feature or request

Comments

@geekypandey
Copy link
Contributor

The current implementation of group_by loops twice over the lst, which significantly reduces the speed when working with large data. Instead using defaultdict will make it possible to loop over the lst only once.

Current implementation:

def group_by(lst, fn):
    return {key: [el for el in lst if fn(el] == key] for key in map(fn, lst)}

Suggested implementation:

from collections import defaultdict

def group_by(lst, fn):
    d = defaultdict(list)
    for el in lst:
        d[fn(el)].append(el)
    return d
@zhangfelix
Copy link
Contributor

Yeah, it looks more efficient that way. The python official website also gives a similar example.

And defaultdict can also be used in the implementation of count_by.

Current implementation:

def count_by(arr, fn=lambda x: x):
  key = {}
  for el in map(fn, arr):
    key[el] = 1 if el not in key else key[el] + 1
  return key

Suggested implementation:

from collections import defaultdict

def count_by(lst, fn):
  d = defaultdict(int)
  for el in lst:
    d[fn(el)] += 1
  return d

@Chalarangelo Chalarangelo added this to To do in 2020 via automation Aug 19, 2020
@Chalarangelo Chalarangelo added the enhancement New feature or request label Aug 19, 2020
@Lattay
Copy link

Lattay commented Aug 19, 2020

Just want to point out that this change is more important that said.
As I explain in #213 the initial implementation does not loop twice over the elements, it loops n times over all elements, which would be dramatic for as few as 1000 elements.

Trinityyi added a commit that referenced this issue Oct 3, 2020
2020 automation moved this from To do to Done Oct 3, 2020
@geekypandey geekypandey removed this from Done in 2020 Oct 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants