Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query over the forward function of all models #35

Open
ankitpatnala opened this issue Oct 18, 2022 · 0 comments
Open

Query over the forward function of all models #35

ankitpatnala opened this issue Oct 18, 2022 · 0 comments

Comments

@ankitpatnala
Copy link

ankitpatnala commented Oct 18, 2022

def forward(self,x):
x = self.inlinear(x)
x = self.pe(x)
x = x.transpose(0, 1) # N x T x D -> T x N x D
x = self.transformerencoder(x)
x = x.transpose(0, 1) # T x N x D -> N x T x D
x = x.max(1)[0]
x = self.relu(x)
logits = self.outlinear(x)
logprobabilities = F.log_softmax(logits, dim=-1)
return logprobabilities

I wanted to implement your models on my data and I found at line 50 of script ```PETransformerModel.py" log_softmax function has been implemented.

Secondly, when I saw your examples/train.py , I found criterion used is CrossEntropyLoss

criterion = torch.nn.CrossEntropyLoss(reduction="mean")
log = list()
for epoch in range(args.epochs):
train_loss = train_epoch(model, optimizer, criterion, traindataloader, device)

def train_epoch(model, optimizer, criterion, dataloader, device):
model.train()
losses = list()
with tqdm(enumerate(dataloader), total=len(dataloader), leave=True) as iterator:
for idx, batch in iterator:
optimizer.zero_grad()
x, y_true, _ = batch
loss = criterion(model.forward(x.to(device)), y_true.to(device))
loss.backward()
optimizer.step()
iterator.set_description(f"train loss={loss:.2f}")
losses.append(loss)
return torch.stack(losses)

So, the loss is calculated on log_softmax which is not recommended by pytorch as nn.CrossEntropy Loss function already does that in its subroutine. (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).
I found that it has been done in other models too.

I am not sure whether it was done intentionally or an implementation bug or a mistake in my understanding. Can you explain me before I run my computation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant