## Multiparty XGBoost with Federated Training
We will now discuss running XGBoost in the federated setting. Unlike the previous exercise, in the federated setting all data stays on its respective machine. This eliminates the need to transfer over the network which incurs high overhead and requires significant bandwidth. Instead, in the federated setting in each iteration each party sends a summary of the update made to its model. The central server then aggregates these updates, applies the aggregated update to its model, and broadcasts the new model to all parties. The parties then train locally with the new model and sends the update to the central server.

![title](img/exercise3.png)

In our project, all this is abstracted away. The central server simply starts the training, and everything else is performed automatically.

Import some helper functions.

In [None]:
import pandas as pd
import subprocess
from Utils import start_job, Federation

## Model Training
The central aggregator will take care of the entire training process; s/he can start the training job from his/her machine. You can take a break while the aggregator is doing training.

## Model Evaluation
We'll now use the model we trained in the previous step to make predictions on our test data. Load in the federated model, preprocess your test data, and evaluate the model with the test data.
* Test data for the insurance dataset is at `/data/insurance/insurance_test.csv`

In [None]:
import xgboost as xgb

model_path = "ex2_model.model"
multiparty_model = xgb.Booster()
multiparty_model.load_model(model_path)

In [None]:
test_data_path = "/data/insurance/insurance_test.csv"
test_data_subset = pd.read_csv(test_data_path, sep=",", header=None)
y_test_subset = test_data_subset.iloc[:, 0]
x_test_subset = test_data_subset.iloc[:, 1:]
test_data = xgb.DMatrix(x_test_subset, label=y_test_subset)

In [None]:
# Evaluate the model
multiparty_model.eval(test_data)