Regarding Exploit Only Baseline and other errors #2

aflah02 · 2023-11-26T08:47:10Z

Hi!
Thanks for the great code
I was able to run the explore only baseline fairly easily but facing some issues with the exploit only baseline. I'll go over the issues I encountered and how I tried fixing them step by step.

Running the code directly led to an error on this line. I think this is due to the fact that instead of args, the attribute is being extracted from training_args where it does not exist
I make the necessary change but then I get an error for the path being None on the same line so I decide to add a path myself as the default path is None and the bash file does not provide a path. To do so I add the arg in the bash file so my bash file now looks like this -

for SEED in ${SEEDS[@]}; do
    for AUX_DATASET in ${AUX_DATASETS[@]}; do
        for MODEL in ${MODELS[@]}; do
        
            OUTPUT_DIR="outputs/train_logs/exploit_only/$SEED/$MODEL/${AUX_DATASET}/${TARGET_DATASET}/1000"
            PRECOMPUTED_WEIGHT_GRAD_SAVE_DIR="outputs/precomputed_weight_grads/exploit_only/$SEED/$MODEL/${AUX_DATASET}/${TARGET_DATASET}/1000"
            mkdir -p $OUTPUT_DIR

            echo $(date)
            echo "Running $SEED $MODEL $AUX_DATASET $TARGET_DATASET exploit only"
            echo "Saving log to ${OUTPUT_DIR}"

            CUDA_VISIBLE_DEVICES=$GPU python src/multirun_train_mixed.py \
                --seed $SEED \
                --target_dataset $TARGET_DATASET \
                --aux_dataset $AUX_DATASET \
                --model $MODEL \
                --weight_initialization_samples 1000 \
                --precomputed_weight_grad_save_dir $PRECOMPUTED_WEIGHT_GRAD_SAVE_DIR \
                > $OUTPUT_DIR/log.log 2> $OUTPUT_DIR/err.log
        done
    done
done

After doing this I get this error - Weight save file outputs/precomputed_weight_grads/exploit_only/42/google/t5-xl-lm-adapt/T0Mixture/copa/1000/initial_similarities/1000_copa_T0Mixture_google-t5-xl-lm-adapt_42.json does not exist. I looked into the code and noticed that the error is caused here however the file is assumed to be there apriori. How does one create this file?

This also extends to other runs such as the UCB1 run. There is a command to explicitly create the gradients for all datasets separately however I thought it would be run implicitly based on the README. Do let me know if I need to run it separately.

The text was updated successfully, but these errors were encountered:

aflah02 · 2023-11-26T09:18:53Z

I also have an unrelated question. Why is WEIGHT_INIT_SAMPLES=0 in the all_exp3.sh file? The other files have it set to 1000

aflah02 · 2023-11-26T09:48:58Z

Another error I encountered is when I try to run multirun_create_weight_inits.py. I get an error on this line which says that weight_init_only is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which says AttributeError: 'DataParallel' object has no attribute 'name_or_path'. To fix this error I changed the line to model_name = model.module.name_or_path.replace("/","-")

After all these changes I've been stuck here for quite a while -

The progress bar doesn't seem to be running and the end time estimate is not populated

Do let me know if you have any suggestions to fix these in a different way?

EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)

alon-albalak · 2023-11-27T18:10:56Z

I also have an unrelated question. Why is WEIGHT_INIT_SAMPLES=0 in the all_exp3.sh file? The other files have it set to 1000

WEIGHT_INIT_SAMPLES is only used for the UCB1 algorithm which has an explicit reward initialization phase. See algorithms 1 and 2 from the paper (https://arxiv.org/pdf/2302.00674.pdf)

In reality, you could also initialize the rewards for EXP3, but we didn't for our experiments.

alon-albalak · 2023-11-27T18:38:24Z

Another error I encountered is when I try to run multirun_create_weight_inits.py. I get an error on this line which says that weight_init_only is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which says AttributeError: 'DataParallel' object has no attribute 'name_or_path'. To fix this error I changed the line to model_name = model.module.name_or_path.replace("/","-")

After all these changes I've been stuck here for quite a while -
The progress bar doesn't seem to be running and the end time estimate is not populated
Do let me know if you have any suggestions to fix these in a different way?

EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)

Regarding the weight initialization. The script currently is set to run over 2 quantities of weight initialization samples (this line) and 5 different seeds (this line).

You can reduce those to just use a single weight initialization samples, and a single seed which will significantly speed up the initialization.

Thanks for pointing this out, I think I know what the problem is. At one point, I moved the weight initialization into the trainer class (here), but didn't make it compatible with the multirun_train_mixed script.

For now, the solution is to compute gradients prior to the Exploit-only method, as you're currently doing. And I will add that information into the instructions. Thank you for finding this bug!

Let me know if it still doesn't work for some reason after pre-computing the gradients.

aflah02 · 2023-11-27T22:25:21Z

Thanks for the reference, I'll check it out!
For the alignment computation, I did limit it to only one seed and one weight initialization sample value but it's still taking around 40 hours on a A100 (80 GB). Also does the program write any intermediate outputs? It did create a directory but it's still empty (around 30 hours have passed). Just wanted to confirm the same

aflah02 · 2023-11-28T11:58:07Z

Update: It did not work. I did not save all logs in a text file, in hindsight I should have done that but this is the only log which is still there on the terminal
Not quite sure what's wrong though

I ran this command - python3 src/multirun_create_weight_inits.py --target_dataset $TARGET_DATASET --auxiliary_dataset $AUXILIARY_DATASET

Also there is this folder which was created but is empty - FLAD/outputs/weight_inits/T5_LM_3B/T0Mixture/copa/42/1000

alon-albalak · 2023-11-28T20:34:19Z

Okay, I've made a few fixes and been able to run the all_exploit.sh script and the multirun_create_weight_inits.py script.

Try pulling the newest version of the code base and let me know if you can run all_exploit.sh and multirun_create_weight_inits.py

aflah02 · 2023-11-28T20:38:46Z

Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?

alon-albalak · 2023-11-28T20:39:21Z

Also, I did catch this error once: AttributeError: 'DataParallel' object has no attribute 'name_or_path'
It has to do with how the model was initialized by huggingface. However, after my bug fixes it's disappeared for me. In case it's still there for you, let me know and I'll make changes for that as well.

The solution that I found for that is to change lines 1639-1640 from:

        # Initialize weights if needed
        if self.args.weight_initialization_samples > 0:
            self._initialize_weights(train_dataloader, target_dataloader, model)

to

        # Initialize weights if needed
        if self.args.weight_initialization_samples > 0:
            if hasattr(model, "name_or_path"):
                self._initialize_weights(train_dataloader, target_dataloader, model)
            else:
                self._initialize_weights(train_dataloader, target_dataloader, self.model)

I'm hesitant to make that change unless it's required though, because that will affect both the EXP3 and UCB1 trainers as well.

alon-albalak · 2023-11-28T20:41:30Z

Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?

The all_exploit.sh script should take care of the alignment computations

aflah02 · 2023-11-28T20:43:36Z

Got it!
I'll rerun it and update you with the outcome 🙌

alon-albalak · 2023-11-28T21:04:38Z

Got it! I'll rerun it and update you with the outcome 🙌

Sorry for so many back and forths. When I was debugging the all_exploit.sh script, it was running so I assumed it was also calculating the alignment, but it actually doesn't. So, then I went back to compute the alignments with multirun_create_weight_inits.py and I realized why it was taking such a long time. The weight_init_only flag actually is important and I must have removed it's use at some point, so I've added it back in.

Now I've successfully been able to precompute the alignments, and then train the exploit-only model.

To replicate our experiments you do need to run multirun_create_weight_inits.py first, then you can run the all_exploit.sh script. Pull the newest update and let me know if that fixes it for you

aflah02 · 2023-11-28T21:57:25Z

Thanks for all the fixes!
I've started the run. The ETA is just 20 mins however I did have to change the lines you mentioned above to fix the dataparallel error.

alon-albalak · 2023-12-01T05:21:40Z

Any update on this? Have you succeeded with the exploit-only baseline?

Have you been able to run either the EXP3 or UCB1 methods?

Just checking to see if you've found any other bugs

aflah02 · 2023-12-01T13:45:49Z

Hey
So I did manage to run the exploit-only baseline after precomputing gradients. I could run EXP3 without it too so haven't retested it. I did not get time to rerun the UCB1 baseline. I'll update you incase I hit any issues there
In terms of issues the only one was that dataparallel error and adding the changes you suggested above fixed it
So closing this issue as well as everything seems to be fine now
Thanks for all the help!

aflah02 changed the title ~~Regarding Exploit Only Baseline~~ Regarding Exploit Only Baseline and other errors Nov 26, 2023

aflah02 closed this as completed Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding Exploit Only Baseline and other errors #2

Regarding Exploit Only Baseline and other errors #2

aflah02 commented Nov 26, 2023 •

edited

Loading

aflah02 commented Nov 26, 2023

aflah02 commented Nov 26, 2023 •

edited

Loading

alon-albalak commented Nov 27, 2023

alon-albalak commented Nov 27, 2023

aflah02 commented Nov 27, 2023 •

edited

Loading

aflah02 commented Nov 28, 2023 •

edited

Loading

alon-albalak commented Nov 28, 2023 •

edited

Loading

aflah02 commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

aflah02 commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

aflah02 commented Nov 28, 2023

alon-albalak commented Dec 1, 2023

aflah02 commented Dec 1, 2023

Regarding Exploit Only Baseline and other errors #2

Regarding Exploit Only Baseline and other errors #2

Comments

aflah02 commented Nov 26, 2023 • edited Loading

aflah02 commented Nov 26, 2023

aflah02 commented Nov 26, 2023 • edited Loading

alon-albalak commented Nov 27, 2023

alon-albalak commented Nov 27, 2023

aflah02 commented Nov 27, 2023 • edited Loading

aflah02 commented Nov 28, 2023 • edited Loading

alon-albalak commented Nov 28, 2023 • edited Loading

aflah02 commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

aflah02 commented Nov 28, 2023

alon-albalak commented Nov 28, 2023

aflah02 commented Nov 28, 2023

alon-albalak commented Dec 1, 2023

aflah02 commented Dec 1, 2023

aflah02 commented Nov 26, 2023 •

edited

Loading

aflah02 commented Nov 26, 2023 •

edited

Loading

aflah02 commented Nov 27, 2023 •

edited

Loading

aflah02 commented Nov 28, 2023 •

edited

Loading

alon-albalak commented Nov 28, 2023 •

edited

Loading