xgboost-comprehensive with bagging aggregation #2554

yan-gao-GY · 2023-11-02T17:55:49Z

Issue

There is no easy-to-use XGBoost example with Flower.

Description

EXtreme Gradient Boosting (XGBoost) is a robust and comprehensible gradient-boosted decision tree (GBDT). Given the robustness and efficiency of XGBoost, combining it with federated learning offers a promising solution for model training with data privacy protection.

Proposal

This example demonstrates how to perform XGBoost within Flower using xgboost package on HIGGS dataset. Tree-based with bagging method is used for aggregation on the server.

Warning

Note that this example uses SizePartitioner for FL data partitioning, so this PR should be merged after fds-size-partitioner.

adam-narozniak · 2023-11-06T09:14:51Z

Hi @yan-gao-GY, here is my general review regarding the client and dataset.
I think that the abstractions created in the dataset.py do not necessarily improve the readability of the code. I'd say it would be more readable if the code from: init_higgs and load_partition were directly used in the client. We can keep the partitioner initialization in the datasets.py. I'd keep the split_train_test in the dataset.py but create a separate method that changes the Dataset produced from fds to xgb.DMatrix because it's not expected behavior that it's going to happen right now in the split_train_test.
Also, currently, only the uniform partitioning method is used. Will the code allow the use of others too? If not, we can remove the whole choice of partitioner. If yes, will it be a part of the e.g. next PR.

Here is my recommendation:

# main
from dataset import instantiate_partitioner, train_test_split

partitioner = instantiate_partitioner(partitioner_type=patitioner_type, num_partitions=num_partitions)
# alternatively not `partitioner_type` but `node_id_to_samples_correlation` or just `correlation`
fds = FederatedDataset(dataset="jxie/higgs", partitioners={"train": partitioner})
partition = fds.load_partition(idx=partition_id, split="train")
partition.set_format("numpy")
# split_rate is not informative keyword to me, I'd stick to e.g. test_size or test_fraction
# I'd also drop the size returns but I think it's more personal choice
train_data, valid_data = train_test_split(partition, test_size=test_size, seed=SEED)
# I'd rename the _reformat_data, but it'd serve the same purpose
train_dmatrix = transform_dataset_to_dmatrix(train_data)
valid_dmatrix = transform_dataset_to_dmatrix(valid_data)

Also, I'd rename the SPLIT_DICT either to CORRELATION_TO_PARTITIONER or sth similar accordingly to the parameter name chosen for the instantiate_partitioner

adam-narozniak · 2023-11-06T09:17:26Z

Also, I'd add the train and valid data as parameters to FlowerClient and then reference via self.

yan-gao-GY · 2023-11-06T10:11:04Z

@adam-narozniak thanks a lot for your suggestion! i think it makes sense. i'll make changes later.

examples/quickstart-xgboost/pyproject.toml

examples/quickstart-xgboost/requirements.txt

examples/quickstart-xgboost/run.sh

examples/quickstart-xgboost/server.py

examples/quickstart-xgboost/strategy.py

examples/quickstart-xgboost/client.py

adam-narozniak · 2023-11-09T10:31:05Z

Also, one more thing. Let's make all the comments start with a capitalized letter. (I know that we don't necessarily even do full type hints in the examples, but let's make it consistent in the project)

adam-narozniak

This is inconsistent with the pyproject.toml

examples/quickstart-xgboost/requirements.txt

adam-narozniak · 2023-11-14T13:35:28Z

examples/quickstart-xgboost/client.py

+partition = fds.load_partition(idx=partition_id, split="train")
+partition.set_format("numpy")
+
+if args.centralised_eval:


Just one more question. In the case of centralized eval each of the (federated) nodes also uses centralized dataset for the federated evaluation. Is that intended, or is it controlled in the server?

Doing centralised eval or not is controlled by server with --centralised_eval. If not centralised eval, the user can still choose to use centralised test set or client test set (splitting from client's training data) to do the client evaluation. e.g., doing client.py --centralised_eval will enable the client evaluation on centralised test set.

examples/quickstart-xgboost/README.md

examples/quickstart-xgboost/client.py

examples/quickstart-xgboost/strategy.py

examples/quickstart-xgboost/utils.py

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

examples/xgboost-comprehensive/pyproject.toml

yan-gao-GY added 3 commits November 1, 2023 20:49

Initialise XGBoost

8529119

Upload readme

fe2d3e0

Pass the number of examples to server and do formatting

7efcdbf

yan-gao-GY requested review from danieljanes and tanertopal as code owners November 2, 2023 17:55

yan-gao-GY self-assigned this Nov 2, 2023

Add flwr_datasets as required package

6ddb448

Change dataset loading structure

34452a5

yan-gao-GY mentioned this pull request Nov 6, 2023

XGBoost tutorial #2567

Merged

adam-narozniak reviewed Nov 7, 2023

View reviewed changes

examples/quickstart-xgboost/client.py Outdated Show resolved Hide resolved

yan-gao-GY added 3 commits November 7, 2023 15:45

Clean up env; add weighted AUC aggregation

b272f11

Replace print with log

0fb9d36

Add file description in readme

d0b33be

adam-narozniak requested changes Nov 8, 2023

View reviewed changes

examples/quickstart-xgboost/client.py Outdated Show resolved Hide resolved

examples/quickstart-xgboost/client.py Outdated Show resolved Hide resolved

yan-gao-GY added 2 commits November 8, 2023 17:55

Add arguments parser on client side; Do formatting

91af548

Update required package flwr-datasets==0.02; pull back run.sh

b965843

adam-narozniak requested changes Nov 9, 2023

View reviewed changes

examples/quickstart-xgboost/client.py Outdated Show resolved Hide resolved

yan-gao-GY added 2 commits November 9, 2023 11:20

Move argument parser to utils; Modify comments

6bbee08

Add feature of centralised/client evaluation

4b92c81

adam-narozniak requested changes Nov 10, 2023

View reviewed changes

adam-narozniak reviewed Nov 10, 2023

View reviewed changes

examples/quickstart-xgboost/requirements.txt Outdated Show resolved Hide resolved

yan-gao-GY and others added 4 commits November 12, 2023 21:54

Correct aggregation and match Navida results

57bde18

formatting

727a028

Merge branch 'main' into xgboost

237ea18

Add type hints

c91f7ad

adam-narozniak reviewed Nov 14, 2023

View reviewed changes

danieljanes and others added 3 commits November 15, 2023 11:13

Merge branch 'main' into xgboost

6973ecb

Format readme

28b7305

Merge branch 'xgboost' of https://github.com/adap/flower into xgboost

39ced8b

yan-gao-GY dismissed adam-narozniak’s stale review via 39ced8b November 15, 2023 10:22

danieljanes requested changes Nov 15, 2023

View reviewed changes

examples/quickstart-xgboost/utils.py Outdated Show resolved Hide resolved

examples/quickstart-xgboost/utils.py Outdated Show resolved Hide resolved

yan-gao-GY and others added 11 commits November 15, 2023 10:51

Update examples/quickstart-xgboost/client.py

ebe35a5

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Update examples/quickstart-xgboost/README.md

749a960

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Update examples/quickstart-xgboost/README.md

40d0bed

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Update examples/quickstart-xgboost/utils.py

5cdfc37

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Update examples/quickstart-xgboost/utils.py

c2f85c0

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Update examples/quickstart-xgboost/strategy.py

b63a0b0

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Format arguments parser

5484a47

Update run.sh

0ad1c43

Change strategy name to FedXgbBagging

b948294

Rename to xgboost-comprehensive

68d065e

Recover readme

b0348c9

danieljanes reviewed Nov 15, 2023

View reviewed changes

examples/xgboost-comprehensive/pyproject.toml Outdated Show resolved Hide resolved

danieljanes reviewed Nov 15, 2023

View reviewed changes

examples/xgboost-comprehensive/pyproject.toml Outdated Show resolved Hide resolved

danieljanes added 2 commits November 15, 2023 19:36

Update examples/xgboost-comprehensive/pyproject.toml

f10d4b5

Update examples/xgboost-comprehensive/pyproject.toml

a0c65eb

danieljanes reviewed Nov 15, 2023

View reviewed changes

examples/xgboost-comprehensive/pyproject.toml Outdated Show resolved Hide resolved

Update examples/xgboost-comprehensive/pyproject.toml

5656bb4

danieljanes approved these changes Nov 15, 2023

View reviewed changes

Merge branch 'main' into xgboost

35207f5

danieljanes enabled auto-merge (squash) November 15, 2023 18:38

danieljanes changed the title ~~Quickstart-xgboost with bagging aggregation~~ xgboost-comprehensive with bagging aggregation Nov 15, 2023

danieljanes merged commit f056175 into main Nov 15, 2023
26 checks passed

danieljanes deleted the xgboost branch November 15, 2023 18:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost-comprehensive with bagging aggregation #2554

xgboost-comprehensive with bagging aggregation #2554

yan-gao-GY commented Nov 2, 2023

adam-narozniak commented Nov 6, 2023 •

edited

adam-narozniak commented Nov 6, 2023

yan-gao-GY commented Nov 6, 2023

adam-narozniak commented Nov 9, 2023

adam-narozniak left a comment

adam-narozniak Nov 14, 2023

yan-gao-GY Nov 14, 2023

xgboost-comprehensive with bagging aggregation #2554

xgboost-comprehensive with bagging aggregation #2554

Conversation

yan-gao-GY commented Nov 2, 2023

Issue

Description

Proposal

Warning

adam-narozniak commented Nov 6, 2023 • edited

adam-narozniak commented Nov 6, 2023

yan-gao-GY commented Nov 6, 2023

adam-narozniak commented Nov 9, 2023

adam-narozniak left a comment

Choose a reason for hiding this comment

adam-narozniak Nov 14, 2023

Choose a reason for hiding this comment

yan-gao-GY Nov 14, 2023

Choose a reason for hiding this comment

adam-narozniak commented Nov 6, 2023 •

edited