Calculate pandas categorical hash once. #12

glogiotatidis · 2025-06-05T10:17:58Z

Build the pandas_categorical index hash when loading the model, instead of doing it on every predict().

Doing it on predict() significantly slows down the prediction, especially for larger categorical feature lists and wastes CPU cycles since this can be just calculated once.

After change:

Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 0.3876569999847561 seconds

Before change:

Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 15.50964699999895 seconds

Benchmark code:

    time = Benchmark.measure do
      x_test = [[3.7, 1.2,  "cat9", "100.8.6448"], [7.5, 0.5,  "cat0", "100.0.4262",]]
      booster = LightGBM::Booster.new(model_file: "test/support/categorical.txt")
      pandas_categorical = booster.instance_variable_get(:@pandas_categorical)
      puts "Number of categorical features: #{pandas_categorical.length}"
      puts "Biggest category length: #{pandas_categorical.max_by { |category| category.length }.length}"
      10000.times do
        y_pred = booster.predict(x_test)
      end
    end
    puts "Benchmark cached: #{time.real} seconds"

Added a large categorical feature in the test model with almost 12k entries.

nunosilva800

LGTM!

glogiotatidis · 2025-06-05T10:48:25Z

@ankane hey! could you please review this and cut a release if you think it's good to go?

ankane · 2025-06-06T02:22:36Z

Thanks @glogiotatidis, nice find!

glogiotatidis · 2025-06-06T08:00:15Z

Thanks for merging @ankane. Do you plan to release a patch version soon with this?

ankane · 2025-06-06T20:02:03Z

It'll probably be a few weeks.

Calculate pandas categorical hash once.

1e5b98b

nunosilva800 approved these changes Jun 5, 2025

View reviewed changes

ankane merged commit 471888e into ankane:master Jun 6, 2025

glogiotatidis deleted the precalculate-categorical-features-map branch June 6, 2025 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Calculate pandas categorical hash once. #12

Calculate pandas categorical hash once. #12

Uh oh!

glogiotatidis commented Jun 5, 2025

Uh oh!

nunosilva800 left a comment

Uh oh!

glogiotatidis commented Jun 5, 2025

Uh oh!

ankane commented Jun 6, 2025

Uh oh!

glogiotatidis commented Jun 6, 2025

Uh oh!

ankane commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Calculate pandas categorical hash once. #12

Calculate pandas categorical hash once. #12

Uh oh!

Conversation

glogiotatidis commented Jun 5, 2025

Uh oh!

nunosilva800 left a comment

Choose a reason for hiding this comment

Uh oh!

glogiotatidis commented Jun 5, 2025

Uh oh!

ankane commented Jun 6, 2025

Uh oh!

glogiotatidis commented Jun 6, 2025

Uh oh!

ankane commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants