Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3Packages.tokenizers: init at 0.8.0 #91342

Merged
merged 1 commit into from Jul 2, 2020
Merged

Conversation

@danieldk
Copy link
Member

danieldk commented Jun 23, 2020

Motivation for this change

I wanted to update the transformers Python package. However, newer versions require the tokenizers Python package. This package is quite interesting by itself:

  • It main package is implemented in Rust.
  • It also contains some Python files.
  • It uses the Rust pyo3 crate for binding Python.
  • Rust pyo3 needs Rust nightly.
  • We can't just use RUSTC_BOOTSTRAP, since pyo3s build script checks the actual version.

So, to actually make this work I chose to:

  • Use buildRustPackage as the main driver, so that Rust crates are properly vendored.
  • Use maturin to build a Python wheel from the combined Rust + Python sources.
  • Use pip to install the wheel in the right location.

I had to patch the build script of the vendored pyo3 to remove the explicit Rust version check or it is not happy with the RUSTC_BOOTSTRAP cheat code.

As icing on the 馃嵃, the tests attempt to download some vocabularies. So, we have to put them in place ourselves for purity.

I haven't been able to test this on macOS yet, it will probably need Security. Builds without issues on my MacBook.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
@danieldk
Copy link
Member Author

danieldk commented Jun 23, 2020

@GrahamcOfBorg build python3Packages.tokenizers

@danieldk
Copy link
Member Author

danieldk commented Jun 24, 2020

Also cc @pashashocky , since they are the maintainer of the tranformers derivation.

@danieldk danieldk force-pushed the danieldk:tokenizers branch from 21a0623 to b8af6af Jun 25, 2020
@danieldk danieldk requested a review from jonringer Jun 25, 2020
Copy link
Contributor

jonringer left a comment

I'm not sure how to improve upon expression, so LGTM
able to be imported fine from python

https://github.com/NixOS/nixpkgs/pull/91342
2 packages built:
python37Packages.tokenizers python38Packages.tokenizers

still would like to see @FRidh 's thoughts on this

@danieldk danieldk force-pushed the danieldk:tokenizers branch from b8af6af to daa5b02 Jun 26, 2020
@danieldk
Copy link
Member Author

danieldk commented Jun 26, 2020

Added one more substituteInPlace to ensure that the wheel gets the correct name (tokenizers and not tokenizers-python).

@danieldk
Copy link
Member Author

danieldk commented Jun 28, 2020

@FRidh does this look good to you? Would be nice to be able to update the transformers package, since it's trailing behind quite a bit.

@danieldk danieldk force-pushed the danieldk:tokenizers branch from daa5b02 to 110c05a Jul 1, 2020
@danieldk danieldk changed the title python3Packages.tokenizers: init at 0.7.0 python3Packages.tokenizers: init at 0.8.0 Jul 1, 2020
@danieldk
Copy link
Member Author

danieldk commented Jul 1, 2020

Huggingface transformers 3.0.0 is out, together with tokenizers 0.8.0. So, I have updated this PR to 0.8.0.

@danieldk danieldk mentioned this pull request Jul 1, 2020
4 of 10 tasks complete
@drewrisinger
Copy link
Contributor

drewrisinger commented Jul 1, 2020

I had a similar issue when packaging python3Packages.retworkx which uses pyo3 #84945, I'm honestly amazed that you got this to work at all. My solution was to use a pre-compiled wheel, which looks like it might be an option here? https://pypi.org/project/tokenizers/#files

Copy link
Contributor

drewrisinger left a comment

I suggest moving the package disable to the tokenizers file, in keeping with the style of the python-packages.nix file.

pkgs/top-level/python-packages.nix Outdated Show resolved Hide resolved
@danieldk
Copy link
Member Author

danieldk commented Jul 2, 2020

I had a similar issue when packaging python3Packages.retworkx which uses pyo3 #84945, I'm honestly amazed that you got this to work at all. My solution was to use a pre-compiled wheel, which looks like it might be an option here? https://pypi.org/project/tokenizers/#files

I find it highly preferable to build from source if possible. Also, the derivation can be simplified once tokenizers switches to pyo3 0.11.0, since it does not require Rust nightly.

@danieldk danieldk requested a review from drewrisinger Jul 2, 2020
@danieldk danieldk force-pushed the danieldk:tokenizers branch from 110c05a to cfe1d49 Jul 2, 2020
@drewrisinger
Copy link
Contributor

drewrisinger commented Jul 2, 2020

I find it highly preferable to build from source if possible. Also, the derivation can be simplified once tokenizers switches to pyo3 0.11.0, since it does not require Rust nightly.

Agree about build from source. I didn't realize that the path of building via buildRustPlatform was an option (or worth my time), so thanks for demonstrating that :) 馃帀 . Also, good to know about pyo3 0.11.0 & non-nightly. When I looked into it ~1-2 months ago, it wasn't an option, so glad to hear things are progressing. I'll probably update python3Packages.retworkx then in the future to use something like this or pyo3.

Copy link
Contributor

drewrisinger left a comment

  • Diff LGTM (except minor comments below)
  • Commits LGTM
  • Builds clean
  • Tests clean with ~ python; import tokenizers
@danieldk danieldk force-pushed the danieldk:tokenizers branch from cfe1d49 to cb1d221 Jul 2, 2020
Copy link
Member

FRidh left a comment

Looks good to me. You may want to use buildRustPackage to build the wheel, and then pass that wheel to buildPythonPackage, but there is no need for that.

pkgs/top-level/python-packages.nix Outdated Show resolved Hide resolved
@danieldk danieldk force-pushed the danieldk:tokenizers branch 2 times, most recently from 02eafdd to e4d16bb Jul 2, 2020
@danieldk danieldk requested a review from drewrisinger Jul 2, 2020
@danieldk danieldk force-pushed the danieldk:tokenizers branch from e4d16bb to bdc2b9c Jul 2, 2020
@danieldk danieldk force-pushed the danieldk:tokenizers branch from bdc2b9c to 0a49a09 Jul 2, 2020
@danieldk danieldk requested a review from FRidh Jul 2, 2020
Copy link
Contributor

drewrisinger left a comment

(repeat of above checklist)

  • Diff LGTM
  • Commits LGTM
  • Builds clean
  • Tests clean with ~ python; import tokenizers
@FRidh FRidh merged commit 66d0b2a into NixOS:master Jul 2, 2020
14 checks passed
14 checks passed
Evaluation Performance Report Evaluator Performance Report
Details
grahamcofborg-eval ^.^!
Details
grahamcofborg-eval-check-maintainers matching changed paths to changed attrs...
Details
grahamcofborg-eval-check-meta config.nix: checkMeta = true
Details
grahamcofborg-eval-darwin nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./pkgs/t
Details
grahamcofborg-eval-lib-tests nix-build --arg pkgs import ./. {} ./lib/tests/release.nix
Details
grahamcofborg-eval-nixos nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./nixos/
Details
grahamcofborg-eval-nixos-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./nixos/
Details
grahamcofborg-eval-nixos-options nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./nixos/
Details
grahamcofborg-eval-nixpkgs-manual nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./pkgs/t
Details
grahamcofborg-eval-nixpkgs-tarball nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./pkgs/t
Details
grahamcofborg-eval-nixpkgs-unstable-jobset nix-instantiate --arg nixpkgs { outPath=./.; revCount=999999; shortRev="0a49a09"; rev="0a49a0904a06553c532bf25a4ad01f7df7b6e25c"; } ./pkgs/t
Details
grahamcofborg-eval-package-list nix-env -qa --json --file .
Details
grahamcofborg-eval-package-list-no-aliases nix-env -qa --json --file . --arg config { allowAliases = false; }
Details
@FRidh
Copy link
Member

FRidh commented Jul 2, 2020

Thank you for doing these explicit approvals, it makes it so much easier to filter on what may be merged and what not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can鈥檛 perform that action at this time.