Information Required regarding Patch_module. #175

Anurich · 2021-12-08T07:09:37Z

Hello I would like to get some information related to patch module, I am using baal library with hugging face for multilabel classification. I am using BALD as a heuristic and wrapping model in patch_module. Before, asking the main question I would like to specify that I am not using the library specified inside the blog post regarding NLP classification with hugging face. I have created my own custom function and the only thing I am using from the library is patch_module and heuristic for selecting samples.
The problem is, I am running the active learning loop 26 times, and for iteration 7 and 12: The results seems bit confusing as shown:

---------------------------- iteration -7 --------------------------
{
    "'fixed_rate'-F1": 0,
    "'floating_rate'-F1": 1.37,
    "'other'-F1": 62.27,
    "'rates'-F1": 63.56
}

----------------------------- Iteration -6 ---------------------------

{
    "'fixed_rate'-F1": 78.26,
    "'floating_rate'-F1": 78.55,
    "'other'-F1": 79.27,
    "'rates'-F1": 74.03
}

-------------------------   Iteration -5 -------------------------------
{
    "'fixed_rate'-F1": 63.41,
    "'floating_rate'-F1": 77.32,
    "'other'-F1": 78.65,
    "'rates'-F1": 73.76
}

as shown above in the 6th iteration we getting fixed_rate f1: 78.26 and in 7th iteration we went suddenly to 0:

We also tried for this specific iteration where f1 is 0 to train the model without Active learning, and it works totally fine so it suggest that there is no problem related to dataset. But when we adding the Active Learning procedure this becoming 0. I want to mention that this problem is with iteration 7 and 12. So I am really confused why this works fine for other iteration and only for these two iteration the f1 going down to 0. Is it because of patch_module, or the way I am using the patch_module.

   initial train -> 200
   every time we add more 200 samples into the train.

   trainer = Trainer(
          model=patch_module(model),
          args=training_args,
          train_dataset=train_dataset if training_args.do_train else None,
          eval_dataset=eval_dataset if training_args.do_eval else None,
          compute_metrics=custom_compute_metrics, #[ADD] passo al Trainer la nuova funzione che calcola le metriche
          tokenizer=tokenizer,
          data_collator=data_collator,
      )

One more question so when add the special dropout in the model, during the testing time does it uses the special dropout for doing the predictions, testing I mean when the model is fully trained using active learning procedure, and if it does then we cannot trust the prediction because it will change everytime, and if it doesn't please can u let me know if it automatically disabled by doing model.eval() or we need to add some other things. Sorry for too much of question. Please let me know if something is not clear.

The text was updated successfully, but these errors were encountered:

Dref360 · 2021-12-08T19:38:55Z

Hi Anupam,

This is strange, I have a few questions:

Do you reset the weights to their original value at every step?
Do you turn lr_scheduling? You might hit a learning rate of 0 and so no learning happens. I know that HF is tricky on this.

As for testing, we do Bayesian averaging at test time so with multiple test predictions we take the average to get the validation performance. I have some code to "unpatch" a module, would that help you? patch_module is not affected by model.eval so that when we make predictions we are fully Bayesian.

I hope that help :)

Anurich · 2021-12-09T08:52:21Z

Hello Frédéric Branchaud-Charron,
Thanks for the response regarding

yes after adding the unlabelled dataset to labelled we start the training From scratch.
Yes we use SchedulerType.LINEAR

Yes please if u can send me the code for unpatch.
Thanks

Dref360 · 2021-12-09T13:36:10Z

Right so if the schedule is not reset, you might have a learning rate close to 0. Can you verify that?

As for the code, should be something like this for Dropout. Very similar to patch_module.

from baal.bayesian import Dropout
from baal.bayesian.dropout import Dropout2d, patch_module
from torch import nn


def unpatch_module(module: nn.Module) -> bool:
    """
    Recursively iterate over the children of a module and replace them if
    they are a BaaL dropout layer. This function operates in-place.

    Args:
        module: Module to unpatch dropout layers.

    Returns:
        Flag indicating if a layer was modified.
    """
    changed = False
    for name, child in module.named_children():
        new_module: Optional[nn.Module] = None
        if isinstance(child, Dropout):
            new_module = nn.Dropout(p=child.p, inplace=child.inplace)
        elif isinstance(child, Dropout2d):
            new_module = nn.Dropout2d(p=child.p, inplace=child.inplace)

        if new_module is not None:
            changed = True
            module.add_module(name, new_module)

        # recursively apply to child
        changed |=  unpatch_module(child)
    return changed

Anurich · 2021-12-09T14:43:45Z

Thanks for the Answer, we are using W&B(https://wandb.ai) which also show the log related to training we check the learning rate but its not 0. So, unfortunately it's not related to learning rate.

Dref360 · 2021-12-09T16:50:14Z

Right. Is it stuck to a F1 of 0 if you continue labelling? Active learning can be noisy at time.

I would also suggest testing SGD if you are using Adam/RMSProp as it is more stable.

If you are able to share your code, I would be able make a more in-depth analysis.

Anurich · 2021-12-13T08:00:17Z

Thank you for the answer, no it stuck to f1 of 0 to certain iteration but for next iteration it seems doing fine, yes I would take a look at different optimization approaches. I would like to know another information related to MCDropoutConnectModule, I have read the blog post in baal website where you mention that using an dropout weight of 0.5 you achieve better result as compared to 0.9 mentioned in the paper. Did you use 0.5 for both training and inference in active learning steps, or do you change the weight during inference time ?

Dref360 · 2021-12-13T19:54:25Z

We did not change the rate between training and inference

Anurich · 2021-12-14T09:18:38Z

Thank you Frédéric Branchaud-Charron, for taking time to answer my question. I really appreciate
Thank you.

parmidaatg · 2022-02-15T18:10:02Z

Hi @Anurich,
I was going to follow up to see whether your issue is resolved? if so would you mind closing the issue or if not let us know how else we can help :)

parmidaatg · 2022-03-09T22:14:05Z

@Dref360 shall we close this?

Anurich added the enhancement New feature or request label Dec 8, 2021

Anurich closed this as completed Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information Required regarding Patch_module. #175

Information Required regarding Patch_module. #175

Anurich commented Dec 8, 2021

Dref360 commented Dec 8, 2021

Anurich commented Dec 9, 2021

Dref360 commented Dec 9, 2021

Anurich commented Dec 9, 2021

Dref360 commented Dec 9, 2021

Anurich commented Dec 13, 2021

Dref360 commented Dec 13, 2021

Anurich commented Dec 14, 2021

parmidaatg commented Feb 15, 2022

parmidaatg commented Mar 9, 2022

Information Required regarding Patch_module. #175

Information Required regarding Patch_module. #175

Comments

Anurich commented Dec 8, 2021

Dref360 commented Dec 8, 2021

Anurich commented Dec 9, 2021

Dref360 commented Dec 9, 2021

Anurich commented Dec 9, 2021

Dref360 commented Dec 9, 2021

Anurich commented Dec 13, 2021

Dref360 commented Dec 13, 2021

Anurich commented Dec 14, 2021

parmidaatg commented Feb 15, 2022

parmidaatg commented Mar 9, 2022