Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os_sorted: only convert input to str if necessary #158

Merged
merged 1 commit into from Jan 11, 2023

Conversation

Dobatymo
Copy link
Contributor

See #157

I ran some benchmarks and I get a consistent 20% improvement when the inputs are Path already. Both for small lists and large lists (I tested lists with 10 and 10 000 items).

from random import shuffle
from pathlib import Path
from timeit import repeat

from natsort import os_sorted

INPATH = "<path>"
paths = list(Path(INPATH).rglob("*"))
print(len(paths))
shuffle(paths)

# use smaller `number` for larger lists
print(min(repeat("os_sorted(paths)", repeat=10, number=1000, globals={"os_sorted": os_sorted, "paths": paths})))

I cannot run the test suite sadly since I get lots of locale.Error: unsupported locale setting errors.

@SethMMorton
Copy link
Owner

I cannot run the test suite sadly since I get lots of locale.Error: unsupported locale setting errors.

Huh - that's something that needs to be fixed (not in this PR, though).

@SethMMorton
Copy link
Owner

The tests failed because python 3.6 is no longer available on the runners - I have removed 3.6 on the master branch. You can rebase and try again.

@Dobatymo
Copy link
Contributor Author

rebased

@Dobatymo
Copy link
Contributor Author

Hi, any feedback on this? The failing tests seem unrelated to my code...

@SethMMorton
Copy link
Owner

Hi, any feedback on this? The failing tests seem unrelated to my code...

I disagree. Both of these tests are checking that when ns.PATH is given as an algorithm the natsort key can handle non-string input. It looks like the change that was made breaks that contract.

I cannot run the test suite sadly since I get lots of locale.Error: unsupported locale setting errors.

natsort is supposed to skip tests using locale. Can you let me know specifically which tests fail for you, and I can make an update that you can rebase on that will make the test suite not fail for you.

@Dobatymo
Copy link
Contributor Author

  1. I fixed running the tests locally. Previously I ran pytest directly which caused issues, because I probably did something wrong. Running it using tox worked.
  2. Before I missed that the keygen code depended on path_splitter raising exceptions on the wrong input types. That caused the failing tests. That behavior is slightly confusing imo, since the type annotations only document it receiving either str or Path.
  3. I fixed my PR. Now all tests succeed (at least locally, still waiting here). I also ran the benchmarks again. For str inputs the performance did not change, for Path I get a consistent 10-20% performance improvement. Since only _split_apply is modified now, which is only used by os_sort_keygen, all other sorting methods should be completely unaffected.

@SethMMorton
Copy link
Owner

Before I missed that the keygen code depended on path_splitter raising exceptions on the wrong input types. That caused the failing tests. That behavior is slightly confusing imo, since the type annotations only document it receiving either str or Path.

That's true.

@codecov
Copy link

codecov bot commented Jan 11, 2023

Codecov Report

Base: 98.44% // Head: 91.73% // Decreases project coverage by -6.70% ⚠️

Coverage data is based on head (8298971) compared to base (837a387).
Patch coverage: 25.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #158      +/-   ##
==========================================
- Coverage   98.44%   91.73%   -6.71%     
==========================================
  Files          10       10              
  Lines         578      581       +3     
==========================================
- Hits          569      533      -36     
- Misses          9       48      +39     
Impacted Files Coverage Δ
natsort/natsort.py 82.17% <25.00%> (-17.83%) ⬇️
natsort/compat/fastnumbers.py 57.14% <0.00%> (-35.72%) ⬇️
natsort/compat/locale.py 63.82% <0.00%> (-34.05%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@SethMMorton SethMMorton merged commit 34078b7 into SethMMorton:master Jan 11, 2023
@Dobatymo Dobatymo deleted the no-convert-path branch January 12, 2023 03:52
@SethMMorton
Copy link
Owner

SethMMorton commented Feb 27, 2023

@Dobatymo FYI I have made an update in 90fd249 that hopefully makes the types clearer by using isinstance instead of duck typing. Hopefully that makes future contributors less confused :)

Thanks for your feedback on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants