-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XGBoost User Experience Improvment #2616
XGBoost User Experience Improvment #2616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general. See my comments for improvement.
nvflare/app_opt/xgboost/histogram_based_v2/runners/xgb_client_runner.py
Outdated
Show resolved
Hide resolved
nvflare/app_opt/xgboost/histogram_based_v2/runners/xgb_client_runner.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
be58f4a
to
cecb3d5
Compare
5f20e5c
to
8302b60
Compare
* Fixed the simulator worker sys path. * fixed the get_new_sys_path() logic, added in unit test. * fixed isort. * Changed the _get_new_sys_path() implementation. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
* Moved the get_component_init_parameters to class_utils. * added docstring and unit test. --------- Co-authored-by: Holger Roth <6304754+holgerroth@users.noreply.github.com>
* WIP: auto register fobs. * WIP * Made changes to server and client fobs_initialize(). * Added auto fobs regiater. * codestyle fix. * Adjust to use register_nvflare_decomposers() for IPCAgent. * add some cleanup. * codestyle fix. * Codestyle fix. * re-design the custom Fobs register to only scan from nvflare_decomposers folder, and user defined decomposer_module. * removed no use import. * Added unit test for custom_fobs_initialize(). * Addressed the PR reviews. * changed a log warning message. * Added None handling for register_ext_decomposers. * Fixed unit test. * fixed unit test. * fix unit test. * codestyle fix. * Removed the custom decomposer regoister for SP and CP. Added the register for simulator_worker. * removed no use import. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
* upgrade tf example * change default * update readme * fix typo
This change fixes: - Typos in the `build_doc.sh` references. - Incorrect rendering of python code blocks on Flare API page.
The `project.yml` generated by `nvflare provision` command has comments `change overseer.example.com to the FQDN of the overseer` and `change example.com to the FQDN of the server` but the actual values nowhere has `example.com` mentioned and it is confusing. This change just fixes comments to make it more clear. Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
* test pr * fix formatting issues --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
* Initial commit for xgboost-secure * Initial commit for xgboost-secure * Change model output path * Change data mode * Most basic xgboost process for coding * Most basic xgboost process for coding * Most basic xgboost process for coding * Most basic xgboost process for coding * First prototype for secure vertical pipeline * Phase 1 concludes * add seal pipeline in C++ * experiment will more tree depth to ensure correct node behavior * experiment will more tree depth to ensure correct node behavior * update secureboost eval bench * set header to none for sample alignment * config processor interface from python * simplify data preparation, add horizontal testing codes * remove redundants * horizontal exps * update scripts * update test scripts * add feature tests * update to align all outputs' format * remove conflict * reorganize * format * add flare jobs * add readme and experiment results
* the following changes for aws cloud deployment have been tested with 6 configurations, including 3 with T4 GPU (g4dn.xlarge) t2.small: 20.04: ami-04bad3c587fe60d89 22.04: ami-03c983f9003cb9cd1 24.04: ami-0406d1fdd021121cd g4dn.xlarge: 20.04: ami-04bad3c587fe60d89 22.04: ami-03c983f9003cb9cd1 24.04: ami-0406d1fdd021121cd - changed default image to ami-03c983f9003cb9cd1 (22.04 / Python 3.10) - added g4dn.xlarge as an option to prompt EC2_TYPE - fixed typo in prompt REGION - get default blockdevice name and set size to 16GB instead of 8GB - install apt package nvidia-driver-535-server if GPU found - run modprobe nvidia to avoid reboot if GPU found - adding ~/.local/bin to PATH - add --break-system-packages to pip install (required by Python 3.12) - add --no-cache-dir to pip install to avoid disk space issues - add @reboot cronjob to ensure nvflare is restarted after a server (re)start * instead of setting the disk to 16GB increase the existing disk size by 8GB --------- Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
* Added external script for fed_job API. * enforce must deploy a controller or executor before adding external script. --------- Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
The `compose.yml` generated by the DockerBuilder adds a build section only for the overseer. As a result, when we have non-HA mode `project.yml', generated `compose.yml` won't have the build section and so `docker compose build` won't buildthe required NVFlare docker image. This change adds a build section in the generated `compose.yml` even for clients and servers. It fixes the above issue and also with this change, same `compose.yml` can be used to build/run individual servers and clients using the `docker compose` method.
* Changed the persistor_id for ScatterAndGather to persistor. * changed the docstring of persistor_id in ScatterAndGather to default to empty string. --------- Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com>
…ion of GPU instance types and AMI images (NVIDIA#2650) * add function get_resources_file to get filename to parse GPU specs from * add function find_ec2_gpu_instance_type to propose best ec2 instance type * Modify user prompts to allow for better AWS defaults : - Prompt for AMI image name ("ubuntu-*-22.04-amd64-pro-server") - Retrieve appropriate default AMI Image ID for the region - Prompt for default EC2 instance based on find_ec2_gpu_instance_type - Add support for ARM architecture and change EC2_TYPE to t4g.small (ARM) * Improved package installations with package manager and in user space: - changed ssh exec commands to single quote for readability and avoid escaping chars - install additional os packages (python3-dev & gcc) to allow ARM packages without wheels - use driver-550-server os package for >= 22.04 and driver-535-server for <= 20.04 - separated os and user packages to allow for failed install of os packages - starting os and user package install in parallel but wait twice for install of os packages - support waiting for build of nvidia.ko as well as nvidia.ko.zst (for newer os) * check environment to get AWS region (default is us-west-2) * set --region for all AWS commands to allow multi-region support * call "aws sts get-caller-identity" to see if auth works, e.g. SSO token is still active * adding unique suffixes to KEY_FILE, nvflare.log etc to allow multiple deployments in parallel * change most "prompt" commands to read -e -i <default> -p for better prompt editing * Cosmetic changes to improve usabilty : - initialize default AMI_IMAGE and EC2_TYPE at top of script - add AWS_PROFILE hint to allow for different AWS profile - clarify requirements.txt instruction - improve final message to allow for copy/paste of ssh and tail command * adjust server security group command to allow REGION and unique log file --------- Co-authored-by: Isaac Yang <isaacy@nvidia.com> Co-authored-by: Chester Chen <512707+chesterxgchen@users.noreply.github.com> Co-authored-by: Yuan-Ting Hsieh (謝沅廷) <yuantingh@nvidia.com>
* Fixed the wrong numpy release version. * Update the numpy version to 1.26.4.
… "read -i" requires Bash >= 4 (newer than 2009 !) as OSX only comes with Bash 3.2 (NVIDIA#2677) Refactor of Prompt function has: - editable prompts with read -i ONLY if Bash >= 4 - in cases where you don not want or need an editable prompt (e.g. AMI ID) you leave out the 3rd argument - fully compatible with previous prompt function (only cosmetic changes in azure template)
* expose min_responses, hide wait_time_after_min_received * change default to None * add comment
If you possess more GPUs than clients, a good strategy is to run one client on each GPU. | ||
This can be achieved using the `-gpu` argument during simulation, e.g., `nvflare simulator -n 2 -gpu 0,1 [job]`. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note some of the changes are already in the main branch so we will need to rebase this branch so we don't change 278 files in this PR
Description
Made following changes to XGBoost,
Types of changes
./runtest.sh
.