Skip to content

Conversation

@weiji14
Copy link
Contributor

@weiji14 weiji14 commented Jul 14, 2024

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section? Broken link...
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Make litdata work with NumPy 2.0 by changing np.sctypes.values() to a list of dtypes obtained from np.core.sctypes.values(). Also added NPY201 ruff lint rule and remove upper numpy<2.0 pin in requirements.txt.

Fixes #175, specifically this error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litdata/__init__.py", line 16, in <module>
    from litdata.processing.functions import map, merge_datasets, optimize, walk
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litdata/processing/functions.py", line 30, in <module>
    from litdata.constants import _INDEX_FILENAME, _IS_IN_STUDIO, _TQDM_AVAILABLE
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/litdata/constants.py", line 62, in <module>
    _NUMPY_SCTYPES = [v for values in np.sctypes.values() for v in values]
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(
AttributeError: `np.sctypes` was removed in the NumPy 2.0 release. Access dtypes explicitly instead.. Did you mean: 'dtypes'?

Also closes #201

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

weiji14 added 4 commits July 15, 2024 11:07
@weiji14 weiji14 marked this pull request as ready for review July 14, 2024 23:25
@codecov
Copy link

codecov bot commented Jul 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@225814e). Learn more about missing BASE report.

Additional details and impacted files
@@          Coverage Diff          @@
##             main   #230   +/-   ##
=====================================
  Coverage        ?    77%           
=====================================
  Files           ?     33           
  Lines           ?   4696           
  Branches        ?      0           
=====================================
  Hits            ?   3622           
  Misses          ?   1074           
  Partials        ?      0           

@weiji14
Copy link
Contributor Author

weiji14 commented Jul 15, 2024

Getting this error on Windows:

___________________ test_assert_no_header_numpy_serializer ____________________

    def test_assert_no_header_numpy_serializer():
        serializer = NoHeaderNumpySerializer()
        t = np.ones((10,))
        assert serializer.can_serialize(t)
        data, name = serializer.serialize(t)
>       assert name == "no_header_numpy:10"
E       AssertionError: assert 'no_header_numpy:11' == 'no_header_numpy:10'
E         
E         - no_header_numpy:10
E         ?                  ^
E         + no_header_numpy:11
E         ?                  ^

tests\streaming\test_serializer.py:210: AssertionError
=========================== short test summary info ===========================
FAILED tests/streaming/test_serializer.py::test_assert_no_header_numpy_serializer - AssertionError: assert 'no_header_numpy:11' == 'no_header_numpy:10'
  
  - no_header_numpy:10
  ?                  ^
  + no_header_numpy:11
  ?                  ^
===== 1 failed, 137 passed, 53 skipped, 11 warnings in 258.70s (0:04:18) ======

Seems like np.sctypes.values() is actually platform dependent, xref numpy/numpy#11923?

@tchaton
Copy link
Collaborator

tchaton commented Jul 15, 2024

Hey @weiji14. Thanks for your contribution ;)

Yes, it seems numpy has a different behaviour on Windows. I will look into it.

BTW, I can also see your are contributor to https://zen3geo.readthedocs.io/en/latest/. This is such a beautiful library ;)

Now I am curious, what's your interest in LitData ?

Best,
T.C

@weiji14
Copy link
Contributor Author

weiji14 commented Jul 15, 2024

Cool, I don't have access to a Windows computer, so this will be tricky to debug on my end. Best I can do is to do lots of pushes and figure things out on GitHub Actions 🙂

And thanks for the compliment on zen3geo 😄 I've been meaning to update it, and have ideas of expanding some features, but am getting sucked into more low-level data pipeline stuff in Rust recently.

My interest with litdata was when my teammates were looking into it at Clay-foundation/model#169 because we're fans of Lightning, so I packaged it on conda-forge, and now I'm stuck maintaining the package at https://github.com/conda-forge/litdata-feedstock (same with a few other Lightning packages) 😆 The NumPy 2.0 compat is just something I'd like to get in to clear the backlog of updates at https://github.com/conda-forge/litdata-feedstock/pulls (not a big fan of upper pins).

@tchaton tchaton merged commit 5d6b8f9 into Lightning-AI:main Jul 16, 2024
@tchaton
Copy link
Collaborator

tchaton commented Jul 16, 2024

Thanks for your contribution @weiji14

@weiji14 weiji14 deleted the numpy-2.0-compat branch July 16, 2024 11:51
weiji14 added a commit to regro-cf-autotick-bot/litdata-feedstock that referenced this pull request Jul 20, 2024
Adding a regression test from Lightning-AI/litData#230. Should fail with `AttributeError: `np.sctypes` was removed in the NumPy 2.0 release` until patched.
weiji14 added a commit to regro-cf-autotick-bot/litdata-feedstock that referenced this pull request Jul 20, 2024
Patch from Lightning-AI/litData#230 on the requirements.txt and src/litdata/constants.py files.
weiji14 added a commit to conda-forge/litdata-feedstock that referenced this pull request Jul 20, 2024
* updated v0.2.16

* MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.36.2, and conda-forge-pinning 2024.07.11.08.43.30

* Remove numpy <2.0.0 upper pin and add regression test

Adding a regression test from Lightning-AI/litData#230. Should fail with `AttributeError: `np.sctypes` was removed in the NumPy 2.0 release` until patched.

* MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.37.1, and conda-forge-pinning 2024.07.15.01.22.34

* Add patch for NumPy 2.0 compatibility

Patch from Lightning-AI/litData#230 on the requirements.txt and src/litdata/constants.py files.

---------

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
weiji14 added a commit to regro-cf-autotick-bot/litdata-feedstock that referenced this pull request Jul 22, 2024
weiji14 added a commit to conda-forge/litdata-feedstock that referenced this pull request Jul 22, 2024
* updated v0.2.17

* Remove NumPy 2.0 compatibility patch

Included in litdata=0.2.17, xref Lightning-AI/litData#230

---------

Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AttributeError: np.sctypes was removed in the NumPy 2.0 release.

2 participants