Don't populate labels if label column is not specified in csv parser #679

rongou · 2023-03-06T19:23:50Z

Right now the CSV parser sets labels to 0 if the label column is not specified (or set to -1). This is surprising to the user and leads to cryptic error messages. It's probably better to just leave the labels as empty if not specified.

For vertical federated learning, we may have workers that don't have access to the label, so this would enable them to parse csv shards without erroneously setting labels to 0.

rongou · 2023-03-06T19:24:02Z

@hcho3 @trivialfis

hcho3 · 2023-03-10T03:16:42Z

Merging for now. I'll try to make time to fix the CI.

Don't populate labels if label column is not specified in csv parser

d9142ba

rongou mentioned this pull request Mar 6, 2023

Vertical Federated Learning RFC dmlc/xgboost#8424

Open

simplify label handling

60fb4ca

hcho3 merged commit ea21135 into dmlc:main Mar 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't populate labels if label column is not specified in csv parser #679

Don't populate labels if label column is not specified in csv parser #679

rongou commented Mar 6, 2023

rongou commented Mar 6, 2023

hcho3 commented Mar 10, 2023

Don't populate labels if label column is not specified in csv parser #679

Don't populate labels if label column is not specified in csv parser #679

Conversation

rongou commented Mar 6, 2023

rongou commented Mar 6, 2023

hcho3 commented Mar 10, 2023