Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comet and Transformers #434

Closed
4 of 5 tasks
arkhan19 opened this issue Oct 13, 2021 · 6 comments
Closed
4 of 5 tasks

Comet and Transformers #434

arkhan19 opened this issue Oct 13, 2021 · 6 comments
Assignees
Labels

Comments

@arkhan19
Copy link

arkhan19 commented Oct 13, 2021

Before Asking:

  • I have searched the Issue Tracker.
  • I have searched the Documentation.

What is your question related to?

  • Comet Python SDK
  • Comet UI
  • Third Party Integrations (Huggingface, TensorboardX, Pytorch Lightning etc.)

What is your question?

I am running comet with huggingface transformers. So my issue is the weird way the config is setup. Let me break it down so that everyone is on same page.

  1. I initially started setting up comet with report_to = "comet-ml" (which invokes the CometCallback in their library) in TrainingArguments in their TrainerAPI. But the issue was i cannot access log_confusion matrix of experiment object anymore. So there were two seperate experiments being created. So i removed it and created experiment object outside so i can manipulate accordingly.
  2. Now the issue is, i am under the impression that steps being recorded for loss vs step chart is incorrect, where is this loss and step being logged from? i didn't set it anywhere. I want to be clear what values i am logging so that i can make accurate assumptions.
  3. So i went back to CometCallback now i cannot create confusion matrices, nor i am able to do hyperparameter optimization. So i am stuck creating experiment object explicitly rather letting transformers library auto log the variables.

Now i would achieve my goals if:

  1. there is a way to know which values are being logged wrt loss and steps.
  2. any way to integrate comet with tranformers better so that i can enable logging of confusion matrices and performing Optimizer properly.

Code

Nothing specific.

What have you tried?

Mentioned above.

@gidim
Copy link
Contributor

gidim commented Oct 14, 2021

Hi @f3n1xx - thanks for reporting! We're looking into this. There should be a way to get the Experiment object from the CometCallback object.

@dsblank
Copy link
Collaborator

dsblank commented Oct 14, 2021

@f3n1xx Let's take a look at each of your questions:

  1. When you use Hugging Face's transformers logger (or any other logger for that matter) you can always get access to the Comet Experiment with:
import comet_ml
.... transformer code here
experiment = comet_ml.get_global_experiment()

At that point you can do anything that you would do normally. Of course, a confusion matrix that has already been logged is done. But you could continue to log additional items (and additional confusion matrices too).

  1. Why do you feel that the steps are incorrect? Of course, exactly how a "step" is defined is up to the framework (transformers, in this case). Looking at the transformer Comet Callback source code, I see that they are using state.global_step for step, and state.epoch for epoch when they log metrics.
  2. When you say you can't create a confusion matrix, why is that? Is it that the Experiment has ended? If so, you can create an ExistingExperiment() to continue logging.

If you have a specific experiment you would like us to look at (and it is public) could you share the link? Otherwise, please feel free to send the link to support@comet.ml and we can examine it there.

@arkhan19
Copy link
Author

arkhan19 commented Oct 14, 2021

  1. Ok, i understood the first point, I can get the experiment object with get_global_experiment. i can log confusion matrix with whichever experiment is live.

  2. Now, how would the experiments in Optimizer will be used? since in CometCallback code, experiment object is again created. Line 614 in integrations.py.

I understand it seems like I am asking naive questions, but it's really very confusing how everything is set up. Am I missing something? in the docs, it's mentioned high precedence is given to comet_ml.init(api_key) but if you use cometcallback everything gets overridden, and now I have two different projects (one "huggingface" and other which is defined in project_name in optimizers) with 4 different experiments running.....

@arkhan19
Copy link
Author

with regards to steps. My confusion is arising because, when i see the training output, it shows "Total optimization steps = 3150", and i see in the comet interface, the loss is being charted with repsect to steps but now steps are in the range of 20K or greater depending on the batch size i chose.

@dsblank
Copy link
Collaborator

dsblank commented Oct 14, 2021

@f3n1xx No, these are fine questions! Just be aware that this is one scenario among thousands that we support, and that the logger is actually part of a different project. Now that you are able to get the current, global experiment, it sounds like you are able to workaround some issues that you thought you had to solve through the logger.

Now there is a question about how to integrate the Comet Optimizer with transformers. As you have discovered, there is an issue because both the Comet Optimizer and transformers attempts to create experiments. But we have many tools to allow you to work through this. The first is that you can actually use the Comet Optimizer to generate a set of parameters, rather than the experiment itself. For example, you can:

import comet_ml
# define an Comet Optimizer config here:
CONFIG = {...}
optimizer = comet_ml.Optimizer(CONFIG)
for parameters in optimizer.get_parameters(): # dict of parameters
    .... transformers training here; let it create the experiment
   # you can get the global experiment too

Regarding your steps question: I'm not sure what transformers is reporting... perhaps those are epochs? Most machine learning frameworks use batch-updates as the unit for steps. So, 20k steps would be 20k updates of weights. Note that you can switch in Comet's UI between epochs and steps on most charts.

We also have other tools to help. comet_ml is very flexible, and should allow you to do what you want, especially when combined with a variety of frameworks.

@arkhan19
Copy link
Author

arkhan19 commented Oct 15, 2021

Thanks for your time, I really appreciate the help. get_parameters can be helpful. Gotta say, among all the ml tools, comet is the one that's the most intuitive, please keep on working on it.

Now to address steps being reported, I am thinking of creating a custom callback so that I can report explicitly what I want. I will report the results if it works here for anyone in future get into same issues as me.

Thanks for your help. i will close the issue after i get the callback to working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants