Comet and Transformers #434

arkhan19 · 2021-10-13T20:38:08Z

Before Asking:

I have searched the Issue Tracker.
I have searched the Documentation.

What is your question related to?

Comet Python SDK
Comet UI
Third Party Integrations (Huggingface, TensorboardX, Pytorch Lightning etc.)

What is your question?

I am running comet with huggingface transformers. So my issue is the weird way the config is setup. Let me break it down so that everyone is on same page.

I initially started setting up comet with report_to = "comet-ml" (which invokes the CometCallback in their library) in TrainingArguments in their TrainerAPI. But the issue was i cannot access log_confusion matrix of experiment object anymore. So there were two seperate experiments being created. So i removed it and created experiment object outside so i can manipulate accordingly.
Now the issue is, i am under the impression that steps being recorded for loss vs step chart is incorrect, where is this loss and step being logged from? i didn't set it anywhere. I want to be clear what values i am logging so that i can make accurate assumptions.
So i went back to CometCallback now i cannot create confusion matrices, nor i am able to do hyperparameter optimization. So i am stuck creating experiment object explicitly rather letting transformers library auto log the variables.

Now i would achieve my goals if:

there is a way to know which values are being logged wrt loss and steps.
any way to integrate comet with tranformers better so that i can enable logging of confusion matrices and performing Optimizer properly.

Code

Nothing specific.

What have you tried?

Mentioned above.

gidim · 2021-10-14T14:23:21Z

Hi @f3n1xx - thanks for reporting! We're looking into this. There should be a way to get the Experiment object from the CometCallback object.

dsblank · 2021-10-14T15:15:32Z

@f3n1xx Let's take a look at each of your questions:

When you use Hugging Face's transformers logger (or any other logger for that matter) you can always get access to the Comet Experiment with:

import comet_ml
.... transformer code here
experiment = comet_ml.get_global_experiment()

At that point you can do anything that you would do normally. Of course, a confusion matrix that has already been logged is done. But you could continue to log additional items (and additional confusion matrices too).

Why do you feel that the steps are incorrect? Of course, exactly how a "step" is defined is up to the framework (transformers, in this case). Looking at the transformer Comet Callback source code, I see that they are using state.global_step for step, and state.epoch for epoch when they log metrics.
When you say you can't create a confusion matrix, why is that? Is it that the Experiment has ended? If so, you can create an ExistingExperiment() to continue logging.

If you have a specific experiment you would like us to look at (and it is public) could you share the link? Otherwise, please feel free to send the link to support@comet.ml and we can examine it there.

arkhan19 · 2021-10-14T17:17:41Z

Ok, i understood the first point, I can get the experiment object with get_global_experiment. i can log confusion matrix with whichever experiment is live.
Now, how would the experiments in Optimizer will be used? since in CometCallback code, experiment object is again created. Line 614 in integrations.py.

I understand it seems like I am asking naive questions, but it's really very confusing how everything is set up. Am I missing something? in the docs, it's mentioned high precedence is given to comet_ml.init(api_key) but if you use cometcallback everything gets overridden, and now I have two different projects (one "huggingface" and other which is defined in project_name in optimizers) with 4 different experiments running.....

arkhan19 · 2021-10-14T17:20:55Z

with regards to steps. My confusion is arising because, when i see the training output, it shows "Total optimization steps = 3150", and i see in the comet interface, the loss is being charted with repsect to steps but now steps are in the range of 20K or greater depending on the batch size i chose.

dsblank · 2021-10-14T18:53:54Z

@f3n1xx No, these are fine questions! Just be aware that this is one scenario among thousands that we support, and that the logger is actually part of a different project. Now that you are able to get the current, global experiment, it sounds like you are able to workaround some issues that you thought you had to solve through the logger.

Now there is a question about how to integrate the Comet Optimizer with transformers. As you have discovered, there is an issue because both the Comet Optimizer and transformers attempts to create experiments. But we have many tools to allow you to work through this. The first is that you can actually use the Comet Optimizer to generate a set of parameters, rather than the experiment itself. For example, you can:

import comet_ml
# define an Comet Optimizer config here:
CONFIG = {...}
optimizer = comet_ml.Optimizer(CONFIG)
for parameters in optimizer.get_parameters(): # dict of parameters
    .... transformers training here; let it create the experiment
   # you can get the global experiment too

Regarding your steps question: I'm not sure what transformers is reporting... perhaps those are epochs? Most machine learning frameworks use batch-updates as the unit for steps. So, 20k steps would be 20k updates of weights. Note that you can switch in Comet's UI between epochs and steps on most charts.

We also have other tools to help. comet_ml is very flexible, and should allow you to do what you want, especially when combined with a variety of frameworks.

arkhan19 · 2021-10-15T08:11:46Z

Thanks for your time, I really appreciate the help. get_parameters can be helpful. Gotta say, among all the ml tools, comet is the one that's the most intuitive, please keep on working on it.

Now to address steps being reported, I am thinking of creating a custom callback so that I can report explicitly what I want. I will report the results if it works here for anyone in future get into same issues as me.

Thanks for your help. i will close the issue after i get the callback to working.

DN6 assigned dsblank Oct 14, 2021

dsblank added the question label Oct 14, 2021

dsblank mentioned this issue Oct 14, 2021

Integration.py huggingface/transformers#14003

Closed

arkhan19 closed this as completed Nov 14, 2021

jeremyarancio mentioned this issue Feb 29, 2024

Experiment not modifiable after training with Transformers Trainer #536

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comet and Transformers #434

Comet and Transformers #434

arkhan19 commented Oct 13, 2021 •

edited by gidim

gidim commented Oct 14, 2021

dsblank commented Oct 14, 2021 •

edited

arkhan19 commented Oct 14, 2021 •

edited

arkhan19 commented Oct 14, 2021

dsblank commented Oct 14, 2021 •

edited

arkhan19 commented Oct 15, 2021 •

edited

Comet and Transformers #434

Comet and Transformers #434

Comments

arkhan19 commented Oct 13, 2021 • edited by gidim

Before Asking:

What is your question related to?

What is your question?

Code

What have you tried?

gidim commented Oct 14, 2021

dsblank commented Oct 14, 2021 • edited

arkhan19 commented Oct 14, 2021 • edited

arkhan19 commented Oct 14, 2021

dsblank commented Oct 14, 2021 • edited

arkhan19 commented Oct 15, 2021 • edited

arkhan19 commented Oct 13, 2021 •

edited by gidim

dsblank commented Oct 14, 2021 •

edited

arkhan19 commented Oct 14, 2021 •

edited

dsblank commented Oct 14, 2021 •

edited

arkhan19 commented Oct 15, 2021 •

edited