## Multiparty XGBoost with Centralized Training
In this exercise, we'll demonstrate a workflow in which each party has its own data and sends a copy of its data to the central server. Therefore, all the training data is sent over the network to the central server, who collects it and locally trains a model on all the data. The central server will then broadcast the trained model back to the parties, who will load the model and test it on their local test datasets. 

![title](img/exercise2.png)


We will also measure the number of bytes sent over the network to show the large bandwidth needed for this workflow. 
This shows the benefits of using as much data as possible to make the model more robust.

### Data Transfer
Import the necessary libraries

In [None]:
import xgboost as xgb
import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score
from Utils import scp

Ensure that you've properly set up SSH credentials in Exercise 0. Send the training data you used in Exercise 1 to the aggregator. Note how many bytes are transferred over the network.

* Training data for the hospital dataset is at `/data/hospital/hospital_training_{party_id}.csv`
* Training data for the insurance dataset is at `/data/insurance/insurance_training_{party_id}.csv`

In [None]:
# Make sure you use the training data you used in exercise 1
training_data = "/path/to/training_data" # TODO: fill in the path to the training data
aggregator_ip = "aggregator_ip" # TODO: fill in the IP of the aggregator
dest_dir = "~/shared_data"
scp(training_data, aggregator_ip, dest_dir)

After you've completed this step, you can take a break. At this time, the aggregator will be aggregating the entire federation's training data, and using the combined training data to train a model. Once the training has finished, the aggregator will send you the computed model. 

### Model Evaluation
Once the aggregator has sent you the trained model, your break is over. Load the model and evaluate it on your local test data.

In [None]:
# Load in the model
import pickle

model_path = "ex2_model.model"
multiparty_model = pickle.load(open(model_path, "rb"))

In [None]:
# TODO: load in your local test data and preprocess it to split it into features and labels

In [None]:
arg1, arg2 = # TODO: set arg1 to the test features, arg2 to the test labels
preds = model.predict(arg1)
print(accuracy_score(arg2, preds))

Discuss the results with other members of your federation. How did the centrally trained model perform on your local test data compared with the locally trained model? Did adding more data help?

Once you're ready, please move to [Exercise 3](./exercise3-member.ipynb).