-
Notifications
You must be signed in to change notification settings - Fork 62
Series.dropna scalable draft #604
Conversation
1e-to
commented
Feb 14, 2020

sdc/functions/numpy_like.py
Outdated
| return nanprod_impl | ||
|
|
||
|
|
||
| def get_pool_size(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this file to something like prange_utils.py
sdc/functions/numpy_like.py
Outdated
| if pool_size == 0: | ||
| pool_size = get_pool_size() | ||
|
|
||
| chunk_size = size//pool_size + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| chunk_size = size//pool_size + 1 | |
| chunk_size = (size - 1)//pool_size + 1 |
this is the correct formula
sdc/utilities/prange_utils.py
Outdated
| if pool_size == 0: | ||
| pool_size = get_pool_size() | ||
|
|
||
| chunk_size = size//pool_size + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(size - 1)//pool_size + 1
AlexanderKalistratov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sdc/functions/numpy_like.py
Outdated
| result_index = numpy.empty(shape=length, dtype=dtype_idx) | ||
| for i in prange(len(chunks)): | ||
| chunk = chunks[i] | ||
| if i == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new_start = sum(arr_len[0:i])
new_stop = new_start + arr_len[i]
sdc/functions/numpy_like.py
Outdated
| for j in range(chunk.start, chunk.stop): | ||
| if new_start < new_stop: | ||
| if not isnan(arr[j]): | ||
| result_data[new_start] = arr[j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to introduce new variable. Something like current_pos. Like this:
current_pos = new_start
for j in range(chunk.start, chunk.stop):
if current_pos < new_stop:
if not isnan(arr[j]):
result_data[current_pos] = arr[j]
result_index[current_pos] = idx[j]
current_pos += 1It is confusing that you are always writing to new_start
sdc/functions/numpy_like.py
Outdated
| new_stop = new_start + arr_len[i] | ||
|
|
||
| for j in range(chunk.start, chunk.stop): | ||
| if new_start < new_stop: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you actually need this condition
|
conflicts |
|
Could you please remeasure performance? |
