Skip to content

[bump] pyfg 1.0.4 -> 1.0.5; doc updates and TokenizeFeature fix#489

Merged
tiankongdeguiji merged 3 commits into
alibaba:masterfrom
tiankongdeguiji:ty/bump-pyfg-1.0.5
Apr 29, 2026
Merged

[bump] pyfg 1.0.4 -> 1.0.5; doc updates and TokenizeFeature fix#489
tiankongdeguiji merged 3 commits into
alibaba:masterfrom
tiankongdeguiji:ty/bump-pyfg-1.0.5

Conversation

@tiankongdeguiji
Copy link
Copy Markdown
Collaborator

Summary

  • Bump pyfg pin from 1.0.4 to 1.0.5 (cp310/cp311/cp312 wheels) in requirements/runtime.txt.
  • Doc updates in docs/source/feature/feature.md to reflect what's actually usable in TorchEasyRec with the new pyfg:
    • ExprFeature function table: add isnan (new in pyfg 1.0.5), mod, corr; remove a duplicate sigmoid row.
    • CombineFeature / LookupFeature combiner enums: extended to count / avg / gap_min / gap_max (in addition to sum/mean/min/max).
    • MatchFeature nested_map: note that MAP<K, string> columns are accepted directly (parallel to existing LookupFeature wording).
  • Fix tzrec/features/tokenize_feature.py: omit output_delim from the inner tokenize-feature config in the grouped-sequence path. pyfg 1.0.5 expects the inner feature to emit per-token outputs and the surrounding sequence wrapper handles delimiting; with output_delim set, parse fails with sparse sequence feature internal error ("no bucketize config" in pyfg native logs).

Doc additions verified live against pyfg 1.0.5 via pyfg.FgArrowHandler smoke tests (isnan / mod / corr / all 8 combiner options including gap_min, gap_max).

Skipped (not exposed in TorchEasyRec proto/code today, so documenting them would mislead users): SliceFeature, BM25Feature, StringReplace, TokenizeFeature bucketization.

Test plan

  • pytest tzrec/features/ — 242 passed (previously 3 SequenceTokenizeFeature regressions on 1.0.5; now fixed).
  • pytest tzrec/datasets/ tzrec/utils/ --ignore=tzrec/utils/faiss_util_test.py — 120 passed, 23 skipped (faiss optional).
  • Targeted: expr_feature_test, combine_feature_test, lookup_feature_test, match_feature_test — 65/65 pass.
  • Smoke: pyfg.FgArrowHandler accepts isnan(x), mod(x,y), corr(a,b), and every documented combiner.

- requirements/runtime.txt: bump pyfg pin (cp310/cp311/cp312 wheels)
- docs/feature.md: add ExprFeature isnan (new in 1.0.5), mod, corr; drop duplicate sigmoid
- docs/feature.md: extend CombineFeature/LookupFeature combiner enum with count/avg/gap_min/gap_max
- docs/feature.md: note MatchFeature MAP<K, string> input support
- tzrec/features/tokenize_feature.py: omit output_delim in grouped-sequence path; pyfg 1.0.5 expects
  the inner tokenize feature to emit per-token outputs and rejects output_delim there
Standalone TokenizeFeature parses fine without output_delim too, so the
grouped-sequence branch is unnecessary. Simplifies the previous commit.
Follow-up to dropping output_delim from TokenizeFeature._fg_json — update
the expected dicts in feature_test.test_create_fg_json{,_remove_bucketizer}
so they match the new output. Caught by CI on PR alibaba#489.
tiankongdeguiji added a commit to tiankongdeguiji/TorchEasyRec that referenced this pull request Apr 28, 2026
Follow-up to dropping output_delim from TokenizeFeature._fg_json — update
the expected dicts in feature_test.test_create_fg_json{,_remove_bucketizer}
so they match the new output. Caught by CI on PR alibaba#489.
@tiankongdeguiji tiankongdeguiji merged commit 82481cd into alibaba:master Apr 29, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants