Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRU / LSTM for federated learning #3010

Closed
Jagoul opened this issue Feb 4, 2020 · 17 comments
Closed

GRU / LSTM for federated learning #3010

Jagoul opened this issue Feb 4, 2020 · 17 comments
Labels
Status: Stale 🍞 Been open for a while with no activity

Comments

@Jagoul
Copy link

Jagoul commented Feb 4, 2020

hello,
I am trying to imlplement RNN networks in a federated way, using either GRU or LSTM and I have the following units :
`class GRUNet(nn.Module):
def init(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
super(GRUNet, self).init()
self.hidden_dim = hidden_dim
self.n_layers = n_layers

    self.gru = nn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
    self.fc = nn.Linear(hidden_dim, output_dim)
    self.relu = nn.ReLU()
    
def forward(self, x, h):
    #print('x before transformation {}'.format(x))
    #print('h before transformation {}'.format(h))
    #x = x.view(-1, x.shape[1])
    print('x inside forward {}'.format(x.shape))
    print('h inside forward {}'.format(h.shape))
    out, h = self.gru(x, h)
    print('out'.format(out[0].shape))
    out = self.fc(self.relu(out[:,-1]))
    return out, h

def init_hidden(self, batch_size):
    weight = next(self.parameters()).data
    hidden = weight.new(self.n_layers, batch_size, self.hidden_dim).zero_()
    return hidden

class LSTMNet(nn.Module):
def init(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
super(LSTMNet, self).init()
self.hidden_dim = hidden_dim
self.n_layers = n_layers

    self.lstm = nn.LSTM(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
    self.fc = nn.Linear(hidden_dim, output_dim)
    self.relu = nn.ReLU()
    
def forward(self, x, h):
    out, h = self.lstm(x, h)
    out = self.fc(self.relu(out[:,-1]))
    return out, h

def init_hidden(self, batch_size):
    weight = next(self.parameters()).data
    hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(worker),
              weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(worker))
    return hidden`

After initializing and calling the model as follows :
lr = 0.001 gru_model = train(federated_train_loader, lr, model_type="GRU")

My training method looks like this :
`def train(federated_train_loader, learn_rate, hidden_dim=128, EPOCHS=1, model_type="GRU"):

# Setting common hyperparameters
input_dim = next(iter(federated_train_loader))[0].shape[2]
print('input dim {}'.format(input_dim))
output_dim = 1
n_layers = 2
# Instantiating the models
if model_type == "GRU":
    model = GRUNet(input_dim, hidden_dim, output_dim, n_layers)
else:
    model = LSTMNet(input_dim, hidden_dim, output_dim, n_layers)
#model.to(device)

# Defining loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learn_rate)

model.train()
print("Starting Training of {} model".format(model_type))
epoch_times = []
# Start training loop
for epoch in range(1,EPOCHS+1):
    start_time = time.perf_counter()
    avg_loss = 0.
    counter = 0
    for inputs, labels in federated_train_loader:
        print('inputs :{}'.format(inputs.shape))
        print('labels :{}'.format(labels.shape))

        counter += 1
        worker = inputs.location
        h = model.init_hidden(batch_size).to(device)
        if model_type == "GRU":
            h = h.data
        else:
            h = tuple([e.data for e in h])
        model.send(worker)
        model.zero_grad()
        out, _ = model(inputs.to(device).float(), h)
        loss = criterion(out, labels.to(device).float())
        loss.backward()
        optimizer.step()
        model.get()
        avg_loss += loss.item()
        if counter%200 == 0:
            print("Epoch {}......Step: {}/{}....... Average Loss for Epoch: {}".format(epoch, counter, len(federated_train_loader), avg_loss/counter))
    current_time = time.clock()
    print("Epoch {}/{} Done, Total Loss: {}".format(epoch, EPOCHS, avg_loss/len(federated_train_loader)))
    print("Time Elapsed for Epoch: {} seconds".format(str(current_time-start_time)))
    epoch_times.append(current_time-start_time)
print("Total Training Time: {} seconds".format(str(sum(epoch_times))))
return model`

I got the following error . :
`input dim 7
Starting Training of GRU model
inputs :torch.Size([32, 90, 7])
labels :torch.Size([32, 1])
x inside forward torch.Size([32, 90, 7])
h inside forward torch.Size([2, 32, 128])

RuntimeError Traceback (most recent call last)
in
1 lr = 0.001
----> 2 gru_model = train(federated_train_loader, lr, model_type="GRU")

in train(federated_train_loader, learn_rate, hidden_dim, EPOCHS, model_type)
38 model.send(worker)
39 model.zero_grad()
---> 40 out, _ = model(inputs.to(device).float(), h)
41 loss = criterion(out, labels.to(device).float())
42 loss.backward()

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)

in forward(self, x, h)
15 print('x inside forward {}'.format(x.shape))
16 print('h inside forward {}'.format(h.shape))
---> 17 out, h = self.gru(x, h)
18 print('out'.format(out[0].shape))
19 out = self.fc(self.relu(out[:,-1]))

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
727 return self.forward_packed(input, hx)
728 else:
--> 729 return self.forward_tensor(input, hx)
730
731

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_tensor(self, input, hx)
719 sorted_indices = None
720 unsorted_indices = None
--> 721 output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
722 return output, self.permute_hidden(hidden, unsorted_indices)
723

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
696 hx = self.permute_hidden(hx, sorted_indices)
697
--> 698 self.check_forward_args(input, hx, batch_sizes)
699 result = self.run_impl(input, hx, batch_sizes)
700 output = result[0]

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
168 def check_forward_args(self, input, hidden, batch_sizes):
169 # type: (Tensor, Tensor, Optional[Tensor]) -> None
--> 170 self.check_input(input, batch_sizes)
171 expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
172

~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
147 raise RuntimeError(
148 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 149 self.input_size, input.size(-1)))
150
151 def get_expected_hidden_size(self, input, batch_sizes):

RuntimeError: input.size(-1) must be equal to input_size. Expected 7, got 0`

I searched for this problem and apparently it is a limitation in pysyft when it comes to LSTM. could you please help me out debugging this problem.

Just a side note : I also use the handcrafted GRU implemented here : [https://github.com/andrelmfarias/Private-AI/blob/master/Federated_Learning/Federated%20learning%20with%20Pysyft%20and%20Pytorch.ipynb]. but then my laptop almost explode and I had to kill the process because I don't get any output. I got the following error :
ValueError: Target and input must have the same number of elements. target nelement (20160) != input nelement (32)

how can we overcome this problem?

@matthiaslau
Copy link
Contributor

The RuntimeError is because the size method is not working with syft, see #2201.

@Jagoul
Copy link
Author

Jagoul commented Feb 4, 2020

I already checked this issue. That's why I asking the help of pysyft team, maybe somebody tried to solve it

@ganesan5
Copy link

ganesan5 commented Feb 5, 2020

from syft.frameworks.torch.nn import rnn
In class GRU,

self.gru = rnn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
self.fc = nn.Linear(hidden_dim, output_dim)
self.relu = nn.ReLU()

This uses the GRU/RNN from syft frameworks and it works in federated environment.

@Jagoul
Copy link
Author

Jagoul commented Feb 6, 2020 via email

@Jagoul
Copy link
Author

Jagoul commented Feb 6, 2020 via email

@DanyEle
Copy link
Contributor

DanyEle commented Feb 6, 2020

I recall getting an error similar to yours too. This may be related to GPU compatibility for this piece of code. Can you try to run this on a CPU instead of a GPU, so removing all references to devices like .to(device)?

@Jagoul
Copy link
Author

Jagoul commented Feb 6, 2020

I am training on a cpu and my device assignment looks like this device = torch.device("cpu")
but still I have the same error , when I remove device references

@DanyEle
Copy link
Contributor

DanyEle commented Feb 6, 2020

Get rid of all movements to devices and use no device at all. The computation will default to CPU. As I mentioned, try removing the tensor.to(device) statements and retry.

@Jagoul
Copy link
Author

Jagoul commented Feb 6, 2020

Same thing !!

@DanyEle
Copy link
Contributor

DanyEle commented Feb 7, 2020

As @matthiaslau said, this issue comes up because the size() method is not implemented in PySyft #2201. If you use the code I posted in #2343 or write your own method for the size() method, that would break some functionalities of PySyft, but you would get your specific code to work. I also resorted to such solution to get the forward pass working.

@github-actions
Copy link

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.

@github-actions github-actions bot added the Status: Stale 🍞 Been open for a while with no activity label May 22, 2020
@erksch
Copy link

erksch commented Jul 13, 2020

@DanyEle Hey! I want to use LSTMs to and running into errors due to the missing size() method. Your workaround code seems to be deprecated, as some of the files have been moved or renamed. I don't know enough about the pysyft workings, could you give a small guide where to add the changes in the current master code?

I am getting the following error although I tried setting the size method in various files inspired by your MR.

AttributeError: 'AutogradTensor' object has no attribute 'size'

@DanyEle
Copy link
Contributor

DanyEle commented Jul 13, 2020

Hi, I am not so much up-to-date with the latest PySyft version either, but I believe the only change you would need to make in the code base woud be adding a hooked .size() method in the hook.py file in the path syft/frameworks/torch/hook/hook.py https://github.com/OpenMined/PySyft/blob/master/syft/frameworks/torch/hook/hook.py, analogously to the code from hook.py I posted here https://github.com/OpenMined/PySyft/pull/2343/files.

The idea is that you override (i.e., hook) the basic PyTorch method with this custom version. However, doing this would probably break a few functionalities not required for LSTM training.

To sum up, you could try adding code like:

 def size(self, dim=None):
            if dim is None:
                return self.shape
            return self.shape[dim]

hook_self.torch.tensor.size = size

to this file: syft/frameworks/torch/hook/hook.py , eventually applying modifications to make it possible to have the .size() method be hooked in PySyft. One thing also needed would be unncommenting the 'size' method name from https://github.com/OpenMined/PySyft/blob/master/syft/frameworks/torch/torch_attributes.py underneath the following comment:
#Add special functions to exclude from the hook in alphabetical order

@kouohhashi
Copy link
Contributor

Hi,
is size() included in master now?

@viraaji
Copy link

viraaji commented Oct 23, 2020

I m facing this issue in version 0.2.4 ... is it fixed now

@kouohhashi
Copy link
Contributor

I think it's going to be fixed in ver 0.3.0...

@viraaji
Copy link

viraaji commented Nov 7, 2020

Hi @Jagoul , wer you able to solve the issue with the recommended fix ?? I m facing same error.. initially size(-1) is 0 after adding size method (expected to be in range of [-1, 0], but got 1)
if I use rnn instead of nn then ' got 1D, 2D tensors at '

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale 🍞 Been open for a while with no activity
Projects
None yet
Development

No branches or pull requests

7 participants