Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information Required regarding Patch_module. #175

Closed
Anurich opened this issue Dec 8, 2021 · 10 comments
Closed

Information Required regarding Patch_module. #175

Anurich opened this issue Dec 8, 2021 · 10 comments
Labels
enhancement New feature or request

Comments

@Anurich
Copy link

Anurich commented Dec 8, 2021

Hello I would like to get some information related to patch module, I am using baal library with hugging face for multilabel classification. I am using BALD as a heuristic and wrapping model in patch_module. Before, asking the main question I would like to specify that I am not using the library specified inside the blog post regarding NLP classification with hugging face. I have created my own custom function and the only thing I am using from the library is patch_module and heuristic for selecting samples.
The problem is, I am running the active learning loop 26 times, and for iteration 7 and 12: The results seems bit confusing as shown:

---------------------------- iteration -7 --------------------------
{
    "'fixed_rate'-F1": 0,
    "'floating_rate'-F1": 1.37,
    "'other'-F1": 62.27,
    "'rates'-F1": 63.56
}

----------------------------- Iteration -6 ---------------------------

{
    "'fixed_rate'-F1": 78.26,
    "'floating_rate'-F1": 78.55,
    "'other'-F1": 79.27,
    "'rates'-F1": 74.03
}

-------------------------   Iteration -5 -------------------------------
{
    "'fixed_rate'-F1": 63.41,
    "'floating_rate'-F1": 77.32,
    "'other'-F1": 78.65,
    "'rates'-F1": 73.76
}

as shown above in the 6th iteration we getting fixed_rate f1: 78.26 and in 7th iteration we went suddenly to 0:

We also tried for this specific iteration where f1 is 0 to train the model without Active learning, and it works totally fine so it suggest that there is no problem related to dataset. But when we adding the Active Learning procedure this becoming 0. I want to mention that this problem is with iteration 7 and 12. So I am really confused why this works fine for other iteration and only for these two iteration the f1 going down to 0. Is it because of patch_module, or the way I am using the patch_module.

   initial train -> 200
   every time we add more 200 samples into the train.

   trainer = Trainer(
          model=patch_module(model),
          args=training_args,
          train_dataset=train_dataset if training_args.do_train else None,
          eval_dataset=eval_dataset if training_args.do_eval else None,
          compute_metrics=custom_compute_metrics, #[ADD] passo al Trainer la nuova funzione che calcola le metriche
          tokenizer=tokenizer,
          data_collator=data_collator,
      )

One more question so when add the special dropout in the model, during the testing time does it uses the special dropout for doing the predictions, testing I mean when the model is fully trained using active learning procedure, and if it does then we cannot trust the prediction because it will change everytime, and if it doesn't please can u let me know if it automatically disabled by doing model.eval() or we need to add some other things. Sorry for too much of question. Please let me know if something is not clear.

@Anurich Anurich added the enhancement New feature or request label Dec 8, 2021
@Dref360
Copy link
Member

Dref360 commented Dec 8, 2021

Hi Anupam,

This is strange, I have a few questions:

  1. Do you reset the weights to their original value at every step?
  2. Do you turn lr_scheduling? You might hit a learning rate of 0 and so no learning happens. I know that HF is tricky on this.

As for testing, we do Bayesian averaging at test time so with multiple test predictions we take the average to get the validation performance. I have some code to "unpatch" a module, would that help you? patch_module is not affected by model.eval so that when we make predictions we are fully Bayesian.

I hope that help :)

@Anurich
Copy link
Author

Anurich commented Dec 9, 2021

Hello Frédéric Branchaud-Charron,
Thanks for the response regarding

  1. yes after adding the unlabelled dataset to labelled we start the training From scratch.
  2. Yes we use SchedulerType.LINEAR

Yes please if u can send me the code for unpatch.
Thanks

@Dref360
Copy link
Member

Dref360 commented Dec 9, 2021

Right so if the schedule is not reset, you might have a learning rate close to 0. Can you verify that?

As for the code, should be something like this for Dropout. Very similar to patch_module.

from baal.bayesian import Dropout
from baal.bayesian.dropout import Dropout2d, patch_module
from torch import nn


def unpatch_module(module: nn.Module) -> bool:
    """
    Recursively iterate over the children of a module and replace them if
    they are a BaaL dropout layer. This function operates in-place.

    Args:
        module: Module to unpatch dropout layers.

    Returns:
        Flag indicating if a layer was modified.
    """
    changed = False
    for name, child in module.named_children():
        new_module: Optional[nn.Module] = None
        if isinstance(child, Dropout):
            new_module = nn.Dropout(p=child.p, inplace=child.inplace)
        elif isinstance(child, Dropout2d):
            new_module = nn.Dropout2d(p=child.p, inplace=child.inplace)

        if new_module is not None:
            changed = True
            module.add_module(name, new_module)

        # recursively apply to child
        changed |=  unpatch_module(child)
    return changed

@Anurich
Copy link
Author

Anurich commented Dec 9, 2021

Thanks for the Answer, we are using W&B(https://wandb.ai) which also show the log related to training we check the learning rate but its not 0. So, unfortunately it's not related to learning rate.

@Dref360
Copy link
Member

Dref360 commented Dec 9, 2021

Right. Is it stuck to a F1 of 0 if you continue labelling? Active learning can be noisy at time.

I would also suggest testing SGD if you are using Adam/RMSProp as it is more stable.

If you are able to share your code, I would be able make a more in-depth analysis.

@Anurich
Copy link
Author

Anurich commented Dec 13, 2021

Thank you for the answer, no it stuck to f1 of 0 to certain iteration but for next iteration it seems doing fine, yes I would take a look at different optimization approaches. I would like to know another information related to MCDropoutConnectModule, I have read the blog post in baal website where you mention that using an dropout weight of 0.5 you achieve better result as compared to 0.9 mentioned in the paper. Did you use 0.5 for both training and inference in active learning steps, or do you change the weight during inference time ?

@Dref360
Copy link
Member

Dref360 commented Dec 13, 2021

We did not change the rate between training and inference

@Anurich
Copy link
Author

Anurich commented Dec 14, 2021

Thank you Frédéric Branchaud-Charron, for taking time to answer my question. I really appreciate
Thank you.

@parmidaatg
Copy link
Collaborator

Hi @Anurich,
I was going to follow up to see whether your issue is resolved? if so would you mind closing the issue or if not let us know how else we can help :)

@parmidaatg
Copy link
Collaborator

@Dref360 shall we close this?

@Anurich Anurich closed this as completed Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants