[bump] pyfg 1.0.4 -> 1.0.5; doc updates and TokenizeFeature fix by tiankongdeguiji · Pull Request #489 · alibaba/TorchEasyRec

tiankongdeguiji · 2026-04-28T06:31:42Z

Summary

Bump pyfg pin from 1.0.4 to 1.0.5 (cp310/cp311/cp312 wheels) in requirements/runtime.txt.
Doc updates in docs/source/feature/feature.md to reflect what's actually usable in TorchEasyRec with the new pyfg:
- ExprFeature function table: add isnan (new in pyfg 1.0.5), mod, corr; remove a duplicate sigmoid row.
- CombineFeature / LookupFeature combiner enums: extended to count / avg / gap_min / gap_max (in addition to sum/mean/min/max).
- MatchFeature nested_map: note that MAP<K, string> columns are accepted directly (parallel to existing LookupFeature wording).
Fix tzrec/features/tokenize_feature.py: omit output_delim from the inner tokenize-feature config in the grouped-sequence path. pyfg 1.0.5 expects the inner feature to emit per-token outputs and the surrounding sequence wrapper handles delimiting; with output_delim set, parse fails with sparse sequence feature internal error ("no bucketize config" in pyfg native logs).

Doc additions verified live against pyfg 1.0.5 via pyfg.FgArrowHandler smoke tests (isnan / mod / corr / all 8 combiner options including gap_min, gap_max).

Skipped (not exposed in TorchEasyRec proto/code today, so documenting them would mislead users): SliceFeature, BM25Feature, StringReplace, TokenizeFeature bucketization.

Test plan

pytest tzrec/features/ — 242 passed (previously 3 SequenceTokenizeFeature regressions on 1.0.5; now fixed).
pytest tzrec/datasets/ tzrec/utils/ --ignore=tzrec/utils/faiss_util_test.py — 120 passed, 23 skipped (faiss optional).
Targeted: expr_feature_test, combine_feature_test, lookup_feature_test, match_feature_test — 65/65 pass.
Smoke: pyfg.FgArrowHandler accepts isnan(x), mod(x,y), corr(a,b), and every documented combiner.

- requirements/runtime.txt: bump pyfg pin (cp310/cp311/cp312 wheels) - docs/feature.md: add ExprFeature isnan (new in 1.0.5), mod, corr; drop duplicate sigmoid - docs/feature.md: extend CombineFeature/LookupFeature combiner enum with count/avg/gap_min/gap_max - docs/feature.md: note MatchFeature MAP<K, string> input support - tzrec/features/tokenize_feature.py: omit output_delim in grouped-sequence path; pyfg 1.0.5 expects the inner tokenize feature to emit per-token outputs and rejects output_delim there

Standalone TokenizeFeature parses fine without output_delim too, so the grouped-sequence branch is unnecessary. Simplifies the previous commit.

Follow-up to dropping output_delim from TokenizeFeature._fg_json — update the expected dicts in feature_test.test_create_fg_json{,_remove_bucketizer} so they match the new output. Caught by CI on PR alibaba#489.

tiankongdeguiji added 3 commits April 28, 2026 14:30

[refactor] drop tokenize_feature output_delim unconditionally

f2e62f1

Standalone TokenizeFeature parses fine without output_delim too, so the grouped-sequence branch is unnecessary. Simplifies the previous commit.

[test] drop output_delim from tokenize_feature expected fg_json

0b6572f

Follow-up to dropping output_delim from TokenizeFeature._fg_json — update the expected dicts in feature_test.test_create_fg_json{,_remove_bucketizer} so they match the new output. Caught by CI on PR alibaba#489.

chengaofei approved these changes Apr 29, 2026

View reviewed changes

tiankongdeguiji merged commit 82481cd into alibaba:master Apr 29, 2026
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bump] pyfg 1.0.4 -> 1.0.5; doc updates and TokenizeFeature fix#489

[bump] pyfg 1.0.4 -> 1.0.5; doc updates and TokenizeFeature fix#489
tiankongdeguiji merged 3 commits into
alibaba:masterfrom
tiankongdeguiji:ty/bump-pyfg-1.0.5

tiankongdeguiji commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tiankongdeguiji commented Apr 28, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants