-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRU / LSTM for federated learning #3010
Comments
The RuntimeError is because the size method is not working with syft, see #2201. |
I already checked this issue. That's why I asking the help of pysyft team, maybe somebody tried to solve it |
from syft.frameworks.torch.nn import rnn self.gru = rnn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob) This uses the GRU/RNN from syft frameworks and it works in federated environment. |
I imported the rnn class implemented by pysyft instead of the native
pytorch and I got the following error:
input dim 7
Starting Training of GRU model
inputs :torch.Size([32, 90, 7])
labels :torch.Size([32, 1])
x inside forward torch.Size([32, 90, 7])
h inside forward torch.Size([2, 32, 128])
…---------------------------------------------------------------------------PureFrameworkTensorFoundError
Traceback (most recent call
last)~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/tensors/interpreters/native.py
in handle_func_command(cls, command) 290 new_args,
new_kwargs, new_type, args_type =
hook_args.unwrap_args_from_function(--> 291 cmd, args,
kwargs, return_args_type=True 292 )
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook_args.py
in unwrap_args_from_function(attr, args, kwargs, return_args_type)
156 # Try running it--> 157 new_args = hook_args(args)
158
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook_args.py
in <lambda>(x) 349 --> 350 return lambda x: f(lambdas, x)
351
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook_args.py
in three_fold(lambdas, args, **kwargs) 527
lambdas[0](args[0], **kwargs),--> 528 lambdas[1](args[1],
**kwargs), 529 lambdas[2](args[2], **kwargs),
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook_args.py
in <lambda>(i) 327 # Last if not, rule is probably == 1 so
use type to return the right transformation.--> 328 else
lambda i: forward_func[type(i)](i) 329 for a, r in
zip(args, rules) # And do this for all the args / rules provided
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/hook/hook_args.py
in <lambda>(i) 29 if hasattr(i, "child")---> 30 else (_
for _ in ()).throw(PureFrameworkTensorFoundError), 31
torch.nn.Parameter: lambda i: i.child
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/hook/hook_args.py
in <genexpr>(.0) 29 if hasattr(i, "child")---> 30 else (_
for _ in ()).throw(PureFrameworkTensorFoundError), 31
torch.nn.Parameter: lambda i: i.child
PureFrameworkTensorFoundError:
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call
last)<ipython-input-12-156409d28793> in <module> 1 lr =
0.001----> 2 gru_model = train(federated_train_loader, lr,
model_type="GRU")
<ipython-input-11-043e717bff98> in train(federated_train_loader,
learn_rate, hidden_dim, EPOCHS, model_type) 38
model.send(worker) 39 model.zero_grad()---> 40
out, _ = model(inputs.to(device).float(), h) 41
loss = criterion(out, labels.to(device).float()) 42
loss.backward()
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py
in __call__(self, *input, **kwargs) 539 result =
self._slow_forward(*input, **kwargs) 540 else:--> 541
result = self.forward(*input, **kwargs) 542 for hook
in self._forward_hooks.values(): 543 hook_result =
hook(self, input, result)
<ipython-input-10-080fbca62aca> in forward(self, x, h) 15
print('x inside forward {}'.format(x.shape)) 16 print('h
inside forward {}'.format(h.shape))---> 17 out, h =
self.gru(x, h) 18 print('out'.format(out[0].shape)) 19
out = self.fc(self.relu(out[:,-1]))
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py
in __call__(self, *input, **kwargs) 539 result =
self._slow_forward(*input, **kwargs) 540 else:--> 541
result = self.forward(*input, **kwargs) 542 for hook
in self._forward_hooks.values(): 543 hook_result =
hook(self, input, result)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/nn/rnn.py
in forward(self, x, h) 240 output = x.new(seq_len,
batch_size, self.hidden_size).zero_() 241 for t in
range(seq_len):--> 242 h_for, c_for =
self._apply_time_step(x, h_for, c_for, t) 243 output[t,
:, :] = h_for[-1, :, :] 244
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/nn/rnn.py
in _apply_time_step(self, x, h, c, t, reverse_direction) 326
) 327 else:--> 328
h_next[layer, :, :] = rnn_layers[layer](x[t, :, :], h[layer, :, :])
329 else: 330 if self.is_lstm:
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py
in __call__(self, *input, **kwargs) 539 result =
self._slow_forward(*input, **kwargs) 540 else:--> 541
result = self.forward(*input, **kwargs) 542 for hook
in self._forward_hooks.values(): 543 hook_result =
hook(self, input, result)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/nn/rnn.py
in forward(self, x, h) 95 96 gate_x =
self.fc_xh(x)---> 97 gate_h = self.fc_hh(h) 98
x_r, x_z, x_n = gate_x.chunk(self.num_chunks, 1) 99 h_r,
h_z, h_n = gate_h.chunk(self.num_chunks, 1)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py
in __call__(self, *input, **kwargs) 539 result =
self._slow_forward(*input, **kwargs) 540 else:--> 541
result = self.forward(*input, **kwargs) 542 for hook
in self._forward_hooks.values(): 543 hook_result =
hook(self, input, result)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/linear.py
in forward(self, input) 85 86 def forward(self,
input):---> 87 return F.linear(input, self.weight, self.bias)
88 89 def extra_repr(self):
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook.py
in overloaded_func(*args, **kwargs) 599
handle_func_command = syft.framework.Tensor.handle_func_command 600
--> 601 response = handle_func_command(command) 602
603 return response
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/tensors/interpreters/native.py
in handle_func_command(cls, command) 330 # in the
execute_command function 331 try:--> 332
response = cls._get_response(cmd, args, kwargs) 333
except AttributeError: 334 # Change the library
path to avoid errors on layers like AvgPooling
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/tensors/interpreters/native.py
in _get_response(cmd, args, kwargs) 343 """ 344
if isinstance(args, tuple):--> 345 response =
eval(cmd)(*args, **kwargs) 346 else: 347
response = eval(cmd)(args, **kwargs)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/functional.py
in linear(input, weight, bias) 1368 if input.dim() == 2 and bias
is not None: 1369 # fused op is marginally faster-> 1370
ret = torch.addmm(bias, input, weight.t()) 1371 else: 1372
output = input.matmul(weight.t())
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/generic/frameworks/hook/hook.py
in overloaded_func(*args, **kwargs) 599
handle_func_command = syft.framework.Tensor.handle_func_command 600
--> 601 response = handle_func_command(command) 602
603 return response
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/tensors/interpreters/native.py
in handle_func_command(cls, command) 330 # in the
execute_command function 331 try:--> 332
response = cls._get_response(cmd, args, kwargs) 333
except AttributeError: 334 # Change the library
path to avoid errors on layers like AvgPooling
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/syft/frameworks/torch/tensors/interpreters/native.py
in _get_response(cmd, args, kwargs) 343 """ 344
if isinstance(args, tuple):--> 345 response =
eval(cmd)(*args, **kwargs) 346 else: 347
response = eval(cmd)(args, **kwargs)
IndexError: Dimension out of range (expected to be in range of [-1,
0], but got 1)
On Wed, Feb 5, 2020 at 1:45 PM ganesan5 ***@***.***> wrote:
from syft.frameworks.torch.nn import rnn
In class GRU,
self.gru = rnn.GRU(input_dim, hidden_dim, n_layers, batch_first=True,
dropout=drop_prob)
self.fc = nn.Linear(hidden_dim, output_dim)
self.relu = nn.ReLU()
This uses the GRU/RNN from syft frameworks and it works in federated
environment.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3010?email_source=notifications&email_token=ACD4UOALCJ6AQJECTFHRWRTRBMCMRA5CNFSM4KP5IWPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4Q7NA#issuecomment-582553524>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACD4UODDXLNL3BX55XNYLCDRBMCMRANCNFSM4KP5IWPA>
.
|
in case you didn't see the error that I got :
IndexError: Dimension out of range (expected to be in range of [-1,
0], but got 1)
…On Wed, Feb 5, 2020 at 1:45 PM ganesan5 ***@***.***> wrote:
from syft.frameworks.torch.nn import rnn
In class GRU,
self.gru = rnn.GRU(input_dim, hidden_dim, n_layers, batch_first=True,
dropout=drop_prob)
self.fc = nn.Linear(hidden_dim, output_dim)
self.relu = nn.ReLU()
This uses the GRU/RNN from syft frameworks and it works in federated
environment.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3010?email_source=notifications&email_token=ACD4UOALCJ6AQJECTFHRWRTRBMCMRA5CNFSM4KP5IWPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK4Q7NA#issuecomment-582553524>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACD4UODDXLNL3BX55XNYLCDRBMCMRANCNFSM4KP5IWPA>
.
|
I recall getting an error similar to yours too. This may be related to GPU compatibility for this piece of code. Can you try to run this on a CPU instead of a GPU, so removing all references to devices like .to(device)? |
I am training on a cpu and my device assignment looks like this |
Get rid of all movements to devices and use no device at all. The computation will default to CPU. As I mentioned, try removing the tensor.to(device) statements and retry. |
Same thing !! |
As @matthiaslau said, this issue comes up because the size() method is not implemented in PySyft #2201. If you use the code I posted in #2343 or write your own method for the size() method, that would break some functionalities of PySyft, but you would get your specific code to work. I also resorted to such solution to get the forward pass working. |
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the |
@DanyEle Hey! I want to use LSTMs to and running into errors due to the missing size() method. Your workaround code seems to be deprecated, as some of the files have been moved or renamed. I don't know enough about the pysyft workings, could you give a small guide where to add the changes in the current master code? I am getting the following error although I tried setting the size method in various files inspired by your MR.
|
Hi, I am not so much up-to-date with the latest PySyft version either, but I believe the only change you would need to make in the code base woud be adding a hooked .size() method in the hook.py file in the path The idea is that you override (i.e., hook) the basic PyTorch method with this custom version. However, doing this would probably break a few functionalities not required for LSTM training. To sum up, you could try adding code like:
to this file: |
Hi, |
I m facing this issue in version 0.2.4 ... is it fixed now |
I think it's going to be fixed in ver 0.3.0... |
Hi @Jagoul , wer you able to solve the issue with the recommended fix ?? I m facing same error.. initially size(-1) is 0 after adding size method (expected to be in range of [-1, 0], but got 1) |
hello,
I am trying to imlplement RNN networks in a federated way, using either GRU or LSTM and I have the following units :
`class GRUNet(nn.Module):
def init(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
super(GRUNet, self).init()
self.hidden_dim = hidden_dim
self.n_layers = n_layers
class LSTMNet(nn.Module):
def init(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=0.2):
super(LSTMNet, self).init()
self.hidden_dim = hidden_dim
self.n_layers = n_layers
After initializing and calling the model as follows :
lr = 0.001 gru_model = train(federated_train_loader, lr, model_type="GRU")
My training method looks like this :
`def train(federated_train_loader, learn_rate, hidden_dim=128, EPOCHS=1, model_type="GRU"):
I got the following error . :
`input dim 7
Starting Training of GRU model
inputs :torch.Size([32, 90, 7])
labels :torch.Size([32, 1])
x inside forward torch.Size([32, 90, 7])
h inside forward torch.Size([2, 32, 128])
RuntimeError Traceback (most recent call last)
in
1 lr = 0.001
----> 2 gru_model = train(federated_train_loader, lr, model_type="GRU")
in train(federated_train_loader, learn_rate, hidden_dim, EPOCHS, model_type)
38 model.send(worker)
39 model.zero_grad()
---> 40 out, _ = model(inputs.to(device).float(), h)
41 loss = criterion(out, labels.to(device).float())
42 loss.backward()
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
in forward(self, x, h)
15 print('x inside forward {}'.format(x.shape))
16 print('h inside forward {}'.format(h.shape))
---> 17 out, h = self.gru(x, h)
18 print('out'.format(out[0].shape))
19 out = self.fc(self.relu(out[:,-1]))
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
539 result = self._slow_forward(*input, **kwargs)
540 else:
--> 541 result = self.forward(*input, **kwargs)
542 for hook in self._forward_hooks.values():
543 hook_result = hook(self, input, result)
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
727 return self.forward_packed(input, hx)
728 else:
--> 729 return self.forward_tensor(input, hx)
730
731
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_tensor(self, input, hx)
719 sorted_indices = None
720 unsorted_indices = None
--> 721 output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
722 return output, self.permute_hidden(hidden, unsorted_indices)
723
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in forward_impl(self, input, hx, batch_sizes, max_batch_size, sorted_indices)
696 hx = self.permute_hidden(hx, sorted_indices)
697
--> 698 self.check_forward_args(input, hx, batch_sizes)
699 result = self.run_impl(input, hx, batch_sizes)
700 output = result[0]
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_forward_args(self, input, hidden, batch_sizes)
168 def check_forward_args(self, input, hidden, batch_sizes):
169 # type: (Tensor, Tensor, Optional[Tensor]) -> None
--> 170 self.check_input(input, batch_sizes)
171 expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
172
~/anaconda3/envs/newpytorch/lib/python3.7/site-packages/torch/nn/modules/rnn.py in check_input(self, input, batch_sizes)
147 raise RuntimeError(
148 'input.size(-1) must be equal to input_size. Expected {}, got {}'.format(
--> 149 self.input_size, input.size(-1)))
150
151 def get_expected_hidden_size(self, input, batch_sizes):
RuntimeError: input.size(-1) must be equal to input_size. Expected 7, got 0`
I searched for this problem and apparently it is a limitation in pysyft when it comes to LSTM. could you please help me out debugging this problem.
Just a side note : I also use the handcrafted GRU implemented here : [https://github.com/andrelmfarias/Private-AI/blob/master/Federated_Learning/Federated%20learning%20with%20Pysyft%20and%20Pytorch.ipynb]. but then my laptop almost explode and I had to kill the process because I don't get any output. I got the following error :
ValueError: Target and input must have the same number of elements. target nelement (20160) != input nelement (32)
how can we overcome this problem?
The text was updated successfully, but these errors were encountered: