Skip to content

Conversation

@glogiotatidis
Copy link
Contributor

Build the pandas_categorical index hash when loading the model, instead of doing it on every predict().

Doing it on predict() significantly slows down the prediction, especially for larger categorical feature lists and wastes CPU cycles since this can be just calculated once.

After change:

Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 0.3876569999847561 seconds

Before change:

Number of categorical features: 3
Biggest category length: 11945
Predict for 10.000 times: 15.50964699999895 seconds

Benchmark code:

    time = Benchmark.measure do
      x_test = [[3.7, 1.2,  "cat9", "100.8.6448"], [7.5, 0.5,  "cat0", "100.0.4262",]]
      booster = LightGBM::Booster.new(model_file: "test/support/categorical.txt")
      pandas_categorical = booster.instance_variable_get(:@pandas_categorical)
      puts "Number of categorical features: #{pandas_categorical.length}"
      puts "Biggest category length: #{pandas_categorical.max_by { |category| category.length }.length}"
      10000.times do
        y_pred = booster.predict(x_test)
      end
    end
    puts "Benchmark cached: #{time.real} seconds"

Added a large categorical feature in the test model with almost 12k entries.

Copy link
Contributor

@nunosilva800 nunosilva800 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@glogiotatidis
Copy link
Contributor Author

@ankane hey! could you please review this and cut a release if you think it's good to go?

@ankane ankane merged commit 471888e into ankane:master Jun 6, 2025
@ankane
Copy link
Owner

ankane commented Jun 6, 2025

Thanks @glogiotatidis, nice find!

@glogiotatidis
Copy link
Contributor Author

Thanks for merging @ankane. Do you plan to release a patch version soon with this?

@glogiotatidis glogiotatidis deleted the precalculate-categorical-features-map branch June 6, 2025 08:00
@ankane
Copy link
Owner

ankane commented Jun 6, 2025

It'll probably be a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants