Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems #105

Closed
3 of 4 tasks
viswa-nvidia opened this issue Feb 17, 2022 · 5 comments
Closed
3 of 4 tasks
Assignees
Milestone

Comments

@viswa-nvidia
Copy link

viswa-nvidia commented Feb 17, 2022

Problem

Gradient-boosted decision trees (GBDTs) are commonly used in the industry as part of the scoring phase of recommender systems. Supporting serving of these models and integrating with the Merlin ecosystem will help facilitate usage of these models in these systems.

The Triton Inference Server has a backend called FIL (Forest Inference Library) to facilitate GPU accelerated serving of these models.

Random forests (RF) and gradient-boosted decision trees (GBDTs) have become workhorse models of applied machine learning. XGBoost and LightGBM, popular packages implementing GBDT models, consistently rank among the most commonly used tools by data scientists on the Kaggle platform. We see similar interest in forest-based models in industry, where they are applied to problems ranging from inventory forecasting, to ad ranking, to medical diagnostics.

RAPIDS Forest Inference Library: Prediction at 100 million rows per second

Goals

  • Enable the use of Tree based models (e.g. GBDTs, Random Forests) in a Merlin Systems ensemble.
  • Support the training of XGBoost models from a Merlin Dataset.

Constraints

Starting Point

Merlin-models (Data Scientist)

NVTabular (Data Scientist)

  • [NA] Operators for batch prediction with these models
  • Note: Batch prediction is not in scope for this development

Merlin-systems (Product Engineer)

Examples and Docs (Everyone)

Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-828

@karlhigley karlhigley changed the title [RMP] Tree Models [RMP] Tree Ranking Models Feb 18, 2022
@viswa-nvidia
Copy link
Author

@benfred , could you map your XGboost tickets here ?

@benfred benfred added this to the Merlin release 22.04 milestone Feb 22, 2022
@karlhigley karlhigley changed the title [RMP] Tree Ranking Models [RMP] Tree Ranking Models (like XGBoost) Feb 24, 2022
@viswa-nvidia viswa-nvidia changed the title [RMP] Tree Ranking Models (like XGBoost) 22 Feb 28, 2022
@viswa-nvidia viswa-nvidia changed the title 22 [RMP] Tree Ranking Models (like XGBoost) Mar 1, 2022
@karlhigley karlhigley removed this from the Merlin 22.05 milestone Apr 6, 2022
@EvenOldridge EvenOldridge added this to the Merlin 22.06 milestone Apr 6, 2022
@karlhigley karlhigley assigned marcromeyn and unassigned benfred May 2, 2022
@karlhigley
Copy link
Contributor

@marcromeyn Re-assigned this to you for the time being since it sounds like you'll be working with our new hire on both the Models and Systems sides. Once he's in our GH org, might make him the lead on this though?

@karlhigley karlhigley changed the title [RMP] Tree Ranking Models (like XGBoost) [RMP] Support Tree Ranking Models (like XGBoost) in Merlin Models and Systems May 20, 2022
@oliverholworthy
Copy link
Member

An update on this issue.

Merlin Models

We have a first version of the XGBoost API in Merlin Models from release 22.06. Once we have an example that part can be considered complete, for the purposes of this issue. With ongoing support for feature improvements via other issues.

Merlin Systems

Operators have been added to Merlin Systems to enable XGBoost, LightGBM, Scikit-Learn (Random Forest), cuML (Random Forest) models being added as part of a serving ensemble in Triton. There are a couple of small issues being addressed (operator outputs) to make this usable from the next release 22.07.

NVTabular

The comment "NVTabular - Operators for batch prediction with these models" in the description I think may be out of scope of this issue. At least, it's unclear to me what the relationship with NVTabular is in this context. In terms of batch prediction for the XGBoost integration in Merlin Models. This is supported as part of the predict method, since we're using the Dask version of XGBoost, which can be used in a distributed setting.

In terms of a common pattern for running evaluation / which will involve batch/distributed prediction. The ongoing work in this area may be covered by #407 or #405

@viswa-nvidia
Copy link
Author

@radekosmulski , could you please add the ticket tracking the examples here in the top

@viswa-nvidia
Copy link
Author

Closing this ticket. Example is pending and is planned for 22.08

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants