# Federated Learning for XGBoost 
This chapter demonstrates how to use NVFlare to train an XGBoost model in a federated learning setting. 
Several potential variations of federated XGBoost are illustrated, including:
- non-secure horizontal collaboration with histogram-based and tree-based mechanisms.
- non-secure vertical collaboration with histogram-based mechanism.
- secure horizontal and vertical collaboration with histogram-based mechanism and homomorphic encryption.

Let's first visit the basics of XGBoost and the collaboration modes.

## XGBoost 
XGBoost is a machine learning algorithm that uses decision/regression trees to perform classification and regression tasks, 
mapping a vector of feature values to its label prediction. It is especially powerful for tabular data, so even in the age of LLM, 
it is still widely used for many tabular data use cases. It is also preferred for its explainability and efficiency.

In these examples, we use [DMLC XGBoost](https://github.com/dmlc/xgboost), which is an optimized distributed gradient boosting library. 
It offers advanced features like GPU accelerated capabilities, and distributed/federated learning support.

## Collaboration Modes and Data Split
Essentially there are two collaboration modes: horizontal and vertical:
![hori_vert](./hori_vert.png)

- In horizontal case, each participant has access to the same features (columns - "x_1 x_2") and label ("y") of different data samples (rows - 1/2/3 for Client A v.s. 4/5/6 for Client B). 
In this case, everyone holds equal status as "label owner"
- In vertical case, each client has access to different features (columns - "x_1 x_2 x_3" for Client A v.s. "x_4 x_5" for Client B) of the same data samples (rows - 1/2/3).
We assume that only one is the "label owner" (or we call it as the "active party") - Client B owns label "y" 

To simulate the above two collaboration modes, we split the dataset both horizontally and vertically, and 
we give site-1 the label column for simplicity.

## Federated Training of XGBoost
Continue with this chapter for two scenarios:
### [Federated XGBoost without Encryption](../10.1_fed_xgboost/fed_xgboost.ipynb)
This section provides instructions for running federated XGBoost without homomorphic encryption, covering both histogram-based and tree-based horizontal collaboration, as well as histogram-based vertical collaboration.

### [Secure Federated XGBoost with Homomorphic Encryption](../10.2_secure_fed_xgboost/secure_fed_xgboost.ipynb)
This section includes instructions on running secure federated XGBoost with homomorphic encryption under 
histogram-based horizontal and vertical collaboration. Note that as tree-based methods exchange the local trained models (trees), rather than intermediate gradients / histograms, considering that the final model will be made available to all parties at the end of the federated learning, they do not have the same security concerns as histogram-based methods. Therefore under our current setting, we do not consider Homomorphic Encryption for tree-based methods.

We will then finish this chapter with a [recap](../10.3_recap/recap.ipynb)