XGBoost User Experience Improvment #2616

nvidianz · 2024-06-06T14:01:21Z

Description

Made following changes to XGBoost,

Consolidate all XGBoost parameters to controller so all clients share the same parameters.
Removed the need to configure handlers. They are added automatically.
Moved plugin configuration to resources.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Quick tests passed locally by running ./runtest.sh.
In-line docstrings updated.
Documentation updated.

yanchengnv

Looks good in general. See my comments for improvement.

nvflare/app_opt/xgboost/histogram_based_v2/fed_controller.py

nvflare/app_opt/xgboost/histogram_based_v2/fed_executor.py

nvflare/app_opt/xgboost/histogram_based_v2/runners/xgb_client_runner.py

nvflare/private/fed/client/client_app_runner.py

yanchengnv

LGTM

…servers

* Fixed the simulator worker sys path. * fixed the get_new_sys_path() logic, added in unit test. * fixed isort. * Changed the _get_new_sys_path() implementation. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

* Moved the get_component_init_parameters to class_utils. * added docstring and unit test. --------- Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com>

* WIP: auto register fobs. * WIP * Made changes to server and client fobs_initialize(). * Added auto fobs regiater. * codestyle fix. * Adjust to use register_nvflare_decomposers() for IPCAgent. * add some cleanup. * codestyle fix. * Codestyle fix. * re-design the custom Fobs register to only scan from nvflare_decomposers folder, and user defined decomposer_module. * removed no use import. * Added unit test for custom_fobs_initialize(). * Addressed the PR reviews. * changed a log warning message. * Added None handling for register_ext_decomposers. * Fixed unit test. * fixed unit test. * fix unit test. * codestyle fix. * Removed the custom decomposer regoister for SP and CP. Added the register for simulator_worker. * removed no use import. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

* upgrade tf example * change default * update readme * fix typo

This change fixes: - Typos in the `build_doc.sh` references. - Incorrect rendering of python code blocks on Flare API page.

wheel file

The `project.yml` generated by `nvflare provision` command has comments `change overseer.example.com to the FQDN of the overseer` and `change example.com to the FQDN of the server` but the actual values nowhere has `example.com` mentioned and it is confusing. This change just fixes comments to make it more clear. Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

* test pr * fix formatting issues --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

* Initial commit for xgboost-secure * Initial commit for xgboost-secure * Change model output path * Change data mode * Most basic xgboost process for coding * Most basic xgboost process for coding * Most basic xgboost process for coding * Most basic xgboost process for coding * First prototype for secure vertical pipeline * Phase 1 concludes * add seal pipeline in C++ * experiment will more tree depth to ensure correct node behavior * experiment will more tree depth to ensure correct node behavior * update secureboost eval bench * set header to none for sample alignment * config processor interface from python * simplify data preparation, add horizontal testing codes * remove redundants * horizontal exps * update scripts * update test scripts * add feature tests * update to align all outputs' format * remove conflict * reorganize * format * add flare jobs * add readme and experiment results

@reboot

* the following changes for aws cloud deployment have been tested with 6 configurations, including 3 with T4 GPU (g4dn.xlarge) t2.small: 20.04: ami-04bad3c587fe60d89 22.04: ami-03c983f9003cb9cd1 24.04: ami-0406d1fdd021121cd g4dn.xlarge: 20.04: ami-04bad3c587fe60d89 22.04: ami-03c983f9003cb9cd1 24.04: ami-0406d1fdd021121cd - changed default image to ami-03c983f9003cb9cd1 (22.04 / Python 3.10) - added g4dn.xlarge as an option to prompt EC2_TYPE - fixed typo in prompt REGION - get default blockdevice name and set size to 16GB instead of 8GB - install apt package nvidia-driver-535-server if GPU found - run modprobe nvidia to avoid reboot if GPU found - adding ~/.local/bin to PATH - add --break-system-packages to pip install (required by Python 3.12) - add --no-cache-dir to pip install to avoid disk space issues - add @reboot cronjob to ensure nvflare is restarted after a server (re)start * instead of setting the disk to 16GB increase the existing disk size by 8GB --------- Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

…on (NVIDIA#2655)

* Added external script for fed_job API. * enforce must deploy a controller or executor before adding external script. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

The `compose.yml` generated by the DockerBuilder adds a build section only for the overseer. As a result, when we have non-HA mode `project.yml', generated `compose.yml` won't have the build section and so `docker compose build` won't buildthe required NVFlare docker image. This change adds a build section in the generated `compose.yml` even for clients and servers. It fixes the above issue and also with this change, same `compose.yml` can be used to build/run individual servers and clients using the `docker compose` method.

…IA#2653)

* Changed the persistor_id for ScatterAndGather to persistor. * changed the docstring of persistor_id in ScatterAndGather to default to empty string. --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

…ion of GPU instance types and AMI images (NVIDIA#2650) * add function get_resources_file to get filename to parse GPU specs from * add function find_ec2_gpu_instance_type to propose best ec2 instance type * Modify user prompts to allow for better AWS defaults : - Prompt for AMI image name ("ubuntu-*-22.04-amd64-pro-server") - Retrieve appropriate default AMI Image ID for the region - Prompt for default EC2 instance based on find_ec2_gpu_instance_type - Add support for ARM architecture and change EC2_TYPE to t4g.small (ARM) * Improved package installations with package manager and in user space: - changed ssh exec commands to single quote for readability and avoid escaping chars - install additional os packages (python3-dev & gcc) to allow ARM packages without wheels - use driver-550-server os package for >= 22.04 and driver-535-server for <= 20.04 - separated os and user packages to allow for failed install of os packages - starting os and user package install in parallel but wait twice for install of os packages - support waiting for build of nvidia.ko as well as nvidia.ko.zst (for newer os) * check environment to get AWS region (default is us-west-2) * set --region for all AWS commands to allow multi-region support * call "aws sts get-caller-identity" to see if auth works, e.g. SSO token is still active * adding unique suffixes to KEY_FILE, nvflare.log etc to allow multiple deployments in parallel * change most "prompt" commands to read -e -i <default> -p for better prompt editing * Cosmetic changes to improve usabilty : - initialize default AMI_IMAGE and EC2_TYPE at top of script - add AWS_PROFILE hint to allow for different AWS profile - clarify requirements.txt instruction - improve final message to allow for copy/paste of ssh and tail command * adjust server security group command to allow REGION and unique log file --------- Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

* Fixed the wrong numpy release version. * Update the numpy version to 1.26.4.

… "read -i" requires Bash >= 4 (newer than 2009 !) as OSX only comes with Bash 3.2 (NVIDIA#2677) Refactor of Prompt function has: - editable prompts with read -i ONLY if Bash >= 4 - in cases where you don not want or need an editable prompt (e.g. AMI ID) you leave out the 3rd argument - fully compatible with previous prompt function (only cosmetic changes in azure template)

* expose min_responses, hide wait_time_after_min_received * change default to None * add comment

…IA#2683)

YuanTingHsieh · 2024-07-15T22:32:35Z

examples/hello-world/hello-tf2/README.md

+If you possess more GPUs than clients, a good strategy is to run one client on each GPU.
+This can be achieved using the `-gpu` argument during simulation, e.g., `nvflare simulator -n 2 -gpu 0,1 [job]`.
+
+


just a note some of the changes are already in the main branch so we will need to rebase this branch so we don't change 278 files in this PR

nvidianz and others added 10 commits March 18, 2024 15:46

Updated FOBS readme to add DatumManager, added agrpcs as secure scheme

a655d9d

Merge branch 'NVIDIA:main' into main

4e5ba5d

Merge branch 'NVIDIA:main' into main

9f90d48

Merge branch 'NVIDIA:main' into main

84fc5bd

Merge branch 'NVIDIA:main' into main

f972506

Merge branch 'NVIDIA:main' into main

5f6e76f

Merge branch 'NVIDIA:main' into main

15884c4

Merge branch 'NVIDIA:main' into main

6fbcbaa

Merge branch 'NVIDIA:main' into main

07d3787

Merge branch 'NVIDIA:main' into main

98289c4

nvidianz requested review from chesterxgchen, yhwen, IsaacYangSLA, YuanTingHsieh, SYangster, nvkevlu and yanchengnv June 6, 2024 14:01

yanchengnv requested changes Jun 6, 2024

View reviewed changes

yanchengnv previously approved these changes Jun 7, 2024

View reviewed changes

nvidianz dismissed yanchengnv’s stale review via c4d56d2 June 7, 2024 20:35

nvidianz force-pushed the xgboost-user-experience-improvement branch from be58f4a to cecb3d5 Compare June 12, 2024 15:37

Simplified user experience for xgboost by combining configuration on …

8302b60

…servers

nvidianz force-pushed the xgboost-user-experience-improvement branch from 5f20e5c to 8302b60 Compare June 14, 2024 17:19

chesterxgchen and others added 7 commits July 8, 2024 06:45

update client_api.png (NVIDIA#2577)

32dba32

Fix the simulator worker sys path (NVIDIA#2561)

bf973cc

* Fixed the simulator worker sys path. * fixed the get_new_sys_path() logic, added in unit test. * fixed isort. * Changed the _get_new_sys_path() implementation. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Moved the get_component_init_parameters to class_utils (NVIDIA#2578)

425720c

* Moved the get_component_init_parameters to class_utils. * added docstring and unit test. --------- Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com>

add cyclic wfcontroller (NVIDIA#2554)

c261c3b

Merged changes from main

f85c3b3

Update TF2 example (NVIDIA#2587)

bccf3b7

* upgrade tf example * change default * update readme * fix typo

apatole and others added 27 commits July 8, 2024 06:57

Fix docs typos (NVIDIA#2629)

ac27bbf

This change fixes: - Typos in the `build_doc.sh` references. - Incorrect rendering of python code blocks on Flare API page.

Update MANIFEST.in file to include nvflare/libs/*.so files into the

3407eb3

wheel file

[2.5] Consolidate text file name validation (NVIDIA#2632)

87224dd

* test pr * fix formatting issues --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

Update setup.py for base version and nightly build

80453fb

add licenses (NVIDIA#2645)

ea2c174

add flare site to github pages (NVIDIA#2642)

41f9c40

Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>

Wrap the xgboost plugin into a C library. (NVIDIA#2639)

c9f2c44

Add feedback when users launch dashboard to cloud with editable versi…

a6be2f5

…on (NVIDIA#2655)

Added external script for fed_job API (NVIDIA#2623)

14f7c77

* Added external script for fed_job API. * enforce must deploy a controller or executor before adding external script. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>

update client_api docs. (NVIDIA#2661)

94f6986

use npm for pages deployment (NVIDIA#2665)

ec423d3

Allow the simulator to pass the End_Run aux message to clients. (NVID…

a8b0a25

…IA#2653)

change to google drive embeddings, minor enhancements (NVIDIA#2668)

07ecd70

Fixed the numpy release version (NVIDIA#2670)

6924bd9

* Fixed the wrong numpy release version. * Update the numpy version to 1.26.4.

add correct params exchange format for tf script executor (NVIDIA#2676)

08dd2a4

Add min_responses to send_model_and_wait() (NVIDIA#2675)

8795004

* expose min_responses, hide wait_time_after_min_received * change default to None * add comment

Resolve CVE with updated dependency (NVIDIA#2681)

41e8e06

add additional pages, change to nvidia flare, improvements (NVIDIA#2682)

8644283

clarify ModelController args, rename min_clients to num_clients (NVID…

c043635

…IA#2683)

Merged changes made by Jiaming

0dcbe82

YuanTingHsieh reviewed Jul 15, 2024

View reviewed changes

Migrating local plugin and mock to NVFlare

77237fc

chesterxgchen closed this Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGBoost User Experience Improvment #2616

XGBoost User Experience Improvment #2616

nvidianz commented Jun 6, 2024 •

edited

Loading

yanchengnv left a comment

yanchengnv left a comment

YuanTingHsieh Jul 15, 2024

		If you possess more GPUs than clients, a good strategy is to run one client on each GPU.
		This can be achieved using the `-gpu` argument during simulation, e.g., `nvflare simulator -n 2 -gpu 0,1 [job]`.

XGBoost User Experience Improvment #2616

XGBoost User Experience Improvment #2616

Conversation

nvidianz commented Jun 6, 2024 • edited Loading

Description

Types of changes

yanchengnv left a comment

Choose a reason for hiding this comment

yanchengnv left a comment

Choose a reason for hiding this comment

YuanTingHsieh Jul 15, 2024

Choose a reason for hiding this comment

nvidianz commented Jun 6, 2024 •

edited

Loading