# Project Ideas

## 1. How much shared learning is there?

The lottery networks can be small, 1% of original size. How much shared learning will FederatedAverage provide is there is not much overlap in the weights of interest to various clients? First we need a sense of the influence clients are having on each other.

### Parameter Influence Index (PII)

It would be good to have a measure of the influence client updates are having on each other when using sparse subnetworks at each client. We call this the parameter influence index, PII. We for some reasonable ways to define it.

### Method 1 (Server-side)

* for each pair of clients, define
  * $intersection(i, j) = | m_i \odot m_j |$
  * $intersectionRate(i, j) = \frac{intersection(i, j)}{|\theta|}$
* for each client, define
  * $pii_k = \frac{1}{K-1} \sum_{j \, \neq \, k} intersectionRate(k, j)$
* define
  * $pii = \frac{1}{K} \sum_{k \, = \, 1}^{K} pii_k$

### Method 2 (Client-side)

Adjust ClientUpdate$(C_k, \theta_k^t, \theta_0)$

The idea is to compare $\theta_k^t$ to $\theta_k^{t-1}$, which is stored locally by clients.

$updateCount(k) = |\, \mathbb{1} [ \theta_k^t \neq \theta_k^{t-1} ] \, |$ (possibly make an allowance for small floating point errors)

$pii(k) = \frac{updateCount}{|\theta_k^t|}$

Return $pii(k)$ to server from ClientUpdate$(C_k, \theta_k^t, \theta_0)$.

Server averages these to compute pii.

## 2. Modifications to LotteryFL

### Option 1

In LotteryFL, ClientUpdate$(C_k, \theta_k^t, \theta_0)$ prunes and then trains. This is not the LTH approach. It also implies that in early iterations the LTN is not being identified, and so the communication load remains high. At the beginning, the accuracy will be low, and so no pruning will occur.

The client update could be changed to

1. reverse the order to train then prune, or
2. prune, train, and prune

We do retain the existing conditions on pruning: accuracy check, and pruning target check.

### Option 2

Summary: Single LTN, server-side pruning, client-side fine-tuning.

Perhaps an LTN trained on one set of data will do OK if retrained for other data.

* we start with plain FL
* clientUpdate also returns local accuracy (a single number) to the server
* server prunes when clients' accuracies are good enough (i.e. > acc_threshold)
  this step has flexibility, we don't need ALL clients to meet this threshhold
* we end up with a pruned model, computed by the server
* repeat the above and we will get our LTN

* for prediction, each client fine-tunes (by further-training, or training from scratch with initial weights) before making local predictions.

Initial communication rounds will have high volume, but we have already commented that this would be the case with the LotteryFL algorithm.

## 3. Computing LTN with little data.

The authors claim that they are addressing the use case of clients having as little as 5 images per class. That is not much data to go on when applying pruning to find the LT. With little data, does the LTH approach lead to incredibly sparse subnetworks, or to barely pruned networks? How well will such a subnetwork perform (generalize) on each client?

We could perform LTH experiments on small data sets and report trends we observe. We could try to tie this back to LotteryFL

## Analyze Communication Volumes

LotteryFL is supposed to lower the volume of communication between client server be leveraging a LT. But, how well does this work? How quickly (after how many rounds) does this pruning become significant.

We can contrive and run a variety of experiments to determine if there is a relationship between volume of data on clients and rapidity of network pruning, or between the level of data skew and rapidity of network pruning.