Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade PTL to 1.0.2 #1278

Merged
merged 93 commits into from Oct 19, 2020
Merged

Upgrade PTL to 1.0.2 #1278

merged 93 commits into from Oct 19, 2020

Conversation

ericharper
Copy link
Collaborator

@ericharper ericharper commented Oct 12, 2020

Lightning Changes:

NeMo Changes

Under the hood changes:

  • Getting error AttributeError: 'Trainer' object has no attribute 'configure_logger', now Trainer.logger_connector.configure_logger
  • configure_checkpoint_callback -> callback_connector.init_default_checkpoint_callback

@okuchaiev
Copy link
Member

let's hold on with this until 1.0.0 is available.

@ericharper ericharper changed the title [WIP] Upgrade PTL to 1.0.0rc* [WIP] Upgrade PTL to 1.0.0 Oct 13, 2020
@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.



def compute_topk_accuracy(correct_counts, total_counts):
Copy link
Collaborator

@blisc blisc Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used in asr/models/classification_models.py and asr/models/label_models.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to create functional metric for this case.

Copy link
Collaborator

@titu1994 titu1994 Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not tie such metrics to PTL itself, but keep them general and simply use them in PTL metric wrappers. Otherwise it will become harder to use them outside of PTL contexts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PTL Metrics API is independent of PTL. Take a look at this example from their docs:

from pytorch_lightning import metrics

train_accuracy = metrics.Accuracy()
valid_accuracy = metrics.Accuracy(compute_on_step=False)

for epoch in range(epochs):
    for x, y in train_data:
        y_hat = model(x)

        # training step accuracy
        batch_acc = train_accuracy(y_hat, y)

    for x, y in valid_data:
        y_hat = model(x)
        valid_accuracy(y_hat, y)

# total accuracy over all training batches
total_train_accuracy = train_accuracy.compute()

# total accuracy over all validation batches
total_valid_accuracy = valid_accuracy.compute()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand that, but why do we want to make a basic utility function into something that wraps PTL of any form ? Please revert this one and simply call it from a PTL metric class if needed.

nemo/utils/exp_manager.py Outdated Show resolved Hide resolved

checkpoint_callback = NeMoModelCheckpoint(
filepath=Path(log_dir / 'checkpoints' / '{val_loss:.2f}-{epoch}'),
save_top_k=3,
monitor='val_loss',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to find a way to pass these parameters to the user through the yaml

Copy link
Collaborator

@titu1994 titu1994 Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"val_loss" is not a given - RNNT can optionally disable it cause it is non-trivial cost to compute it over entire datasets (expensive Prediction step and Joint step cost). The default should be loss because that is understood to always be available (since backprop cannot occur without it).

There is also the case that we may not choose to supply a validation set at all (some datasets don't have a validation set, or user constructed dataset does not have a split for it).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back in 0.9.0, lightning used to assume that "val_loss" was the key that people returned from their validation step.

While moving in a more general director makes sense for lightning, should we do so for NeMo? Or should we just enforce model users to good defaults? If they choose not to use the default, then they have to manually adjust their scripts to make use of things like the ModelCheckpoint callback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently ModelPT has no access to ExpManager (it's called before ModelPT) nor can it ensure the list of callbacks is in the correct order to override it at a later point. Even if it could parse the callback list, it can't access any of the path values as exp manager is a stateless function.

For now, it's fine to keep it val_loss, and provide an exp manager override to chose the monitor. I can override that in RNNT configs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, good defaults is very model dependent. An experiment manager should offer the flexibility without having to completely override one of it's core tasks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think the correct way is to leave "val_loss" as default and provide an override via it's yaml config. I'll think on it more once we merge this PR

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com
Copy link

lgtm-com bot commented Oct 13, 2020

This pull request introduces 3 alerts when merging e40b472 into fd98a89 - view on LGTM.com

new alerts:

  • 3 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging 36b9f87 into fd98a89 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging d6d564d into fd98a89 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging ac7102a into fd98a89 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging 2a0f459 into fd98a89 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging 89c14ca into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 1 alert when merging 5999ee9 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 2 alerts when merging a28a12d into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 2 alerts when merging bf0f265 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 2 alerts when merging 4752293 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2020

This pull request introduces 2 alerts when merging 1bc44e9 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Oct 15, 2020

This pull request introduces 2 alerts when merging a489ee6 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable
  • 1 for Unused import

@@ -148,7 +148,10 @@ def validation_step(self, batch, batch_idx):

preds = torch.argmax(logits, axis=-1)[subtokens_mask]
labels = labels[subtokens_mask]
tp, fp, fn = self.classification_report(preds, labels)
self.classification_report(preds, labels)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericharper, can you review the changes to this file? I mostly copied from text_classification_model.py.
Specifically, can you check that the instantiation of self.classification_report makes sense in this file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between ['macro', 'micro', 'weighted']? Should everything just be micro?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'micro' is accuracy, all these are different ways of aggregating tp, tn, fp for each label and calculating the final result. 'Macro' should be the default.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClassificationReport is used in 4 models:
IntentSlotClassificationModel
TextClassificationModel
PunctuationCapitalizationModel
TokenClassificationModel

Can I ask for a review on each model so we can approve of the changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all of the above models.

@lgtm-com
Copy link

lgtm-com bot commented Oct 15, 2020

This pull request introduces 1 alert when merging 2e91818 into 87206f7 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

Jenkinsfile Outdated
Comment on lines 483 to 485
// TODO: Pytorch Lightning has some issues with restoring Metric classes, asked on the lightning slack if they can
// provide a simple solution.
// stage('L2: Parallel NLP Examples 2') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'NER finetuning from pretrained Test' is currently blocked by Lightning-AI/pytorch-lightning#4195

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you might be able to use Lightning.load_from_checkpoint(..., strict=False) to avoid the key mismatch. (Uses load_state_dict(..., strict=False) under the hood.

@blisc blisc marked this pull request as ready for review October 16, 2020 17:13
Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 17, 2020

This pull request introduces 1 alert when merging 71e8f6f into 910caf6 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 17, 2020

This pull request introduces 1 alert when merging ae46780 into 910caf6 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2020

This pull request introduces 1 alert when merging ae687b0 into 910caf6 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 18, 2020

This pull request introduces 3 alerts when merging 6139420 into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2020

This pull request introduces 3 alerts when merging bbcac81 into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2020

This pull request introduces 3 alerts when merging 41e4be1 into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2020

This pull request introduces 3 alerts when merging 247975d into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2020

This pull request introduces 3 alerts when merging 8175ae7 into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable
  • 1 for Unused import

@okuchaiev okuchaiev requested a review from ekmb October 19, 2020 16:17
Copy link
Collaborator

@ekmb ekmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification report changes LGTM.

@@ -139,18 +153,18 @@ def get_precision_recall_f1(
+ '\n'
)

logging.info(report)
# logging.info(report)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove if not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# return {
# 'train_loss': loss,
# 'lr': self._optimizer.param_groups[0]['lr']
# }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove if not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tensorboard_logs = {'val_loss': avg_loss, 'exact_match': exact_match, 'f1': f1}
return {'val_loss': avg_loss, 'log': tensorboard_logs}
# tensorboard_logs = {'val_loss': avg_loss, 'exact_match': exact_match, 'f1': f1}
# return {'val_loss': avg_loss, 'log': tensorboard_logs}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -80,7 +80,10 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None):

self.loss = self.setup_loss(class_balancing=self._cfg.dataset.class_balancing)
# setup to track metrics
self.classification_report = ClassificationReport(len(self._cfg.label_ids), label_ids=self._cfg.label_ids)
# TODO: What is the current mode?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'macro'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, 'macro' is the default

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: ericharper <complex451@gmail.com>
@ericharper ericharper requested a review from ekmb October 19, 2020 17:11
@lgtm-com
Copy link

lgtm-com bot commented Oct 19, 2020

This pull request introduces 2 alerts when merging 655fb48 into 910caf6 - view on LGTM.com

new alerts:

  • 2 for Unused local variable

@ericharper ericharper merged commit 4561fbf into main Oct 19, 2020
@ericharper ericharper deleted the upgrade_ptl branch October 19, 2020 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants