Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding Exploit Only Baseline and other errors #2

Closed
aflah02 opened this issue Nov 26, 2023 · 15 comments
Closed

Regarding Exploit Only Baseline and other errors #2

aflah02 opened this issue Nov 26, 2023 · 15 comments

Comments

@aflah02
Copy link

aflah02 commented Nov 26, 2023

Hi!
Thanks for the great code
I was able to run the explore only baseline fairly easily but facing some issues with the exploit only baseline. I'll go over the issues I encountered and how I tried fixing them step by step.

  1. Running the code directly led to an error on this line. I think this is due to the fact that instead of args, the attribute is being extracted from training_args where it does not exist
  2. I make the necessary change but then I get an error for the path being None on the same line so I decide to add a path myself as the default path is None and the bash file does not provide a path. To do so I add the arg in the bash file so my bash file now looks like this -
for SEED in ${SEEDS[@]}; do
    for AUX_DATASET in ${AUX_DATASETS[@]}; do
        for MODEL in ${MODELS[@]}; do
        
            OUTPUT_DIR="outputs/train_logs/exploit_only/$SEED/$MODEL/${AUX_DATASET}/${TARGET_DATASET}/1000"
            PRECOMPUTED_WEIGHT_GRAD_SAVE_DIR="outputs/precomputed_weight_grads/exploit_only/$SEED/$MODEL/${AUX_DATASET}/${TARGET_DATASET}/1000"
            mkdir -p $OUTPUT_DIR

            echo $(date)
            echo "Running $SEED $MODEL $AUX_DATASET $TARGET_DATASET exploit only"
            echo "Saving log to ${OUTPUT_DIR}"

            CUDA_VISIBLE_DEVICES=$GPU python src/multirun_train_mixed.py \
                --seed $SEED \
                --target_dataset $TARGET_DATASET \
                --aux_dataset $AUX_DATASET \
                --model $MODEL \
                --weight_initialization_samples 1000 \
                --precomputed_weight_grad_save_dir $PRECOMPUTED_WEIGHT_GRAD_SAVE_DIR \
                > $OUTPUT_DIR/log.log 2> $OUTPUT_DIR/err.log
        done
    done
done
  1. After doing this I get this error - Weight save file outputs/precomputed_weight_grads/exploit_only/42/google/t5-xl-lm-adapt/T0Mixture/copa/1000/initial_similarities/1000_copa_T0Mixture_google-t5-xl-lm-adapt_42.json does not exist. I looked into the code and noticed that the error is caused here however the file is assumed to be there apriori. How does one create this file?

This also extends to other runs such as the UCB1 run. There is a command to explicitly create the gradients for all datasets separately however I thought it would be run implicitly based on the README. Do let me know if I need to run it separately.

@aflah02
Copy link
Author

aflah02 commented Nov 26, 2023

I also have an unrelated question. Why is WEIGHT_INIT_SAMPLES=0 in the all_exp3.sh file? The other files have it set to 1000

@aflah02 aflah02 changed the title Regarding Exploit Only Baseline Regarding Exploit Only Baseline and other errors Nov 26, 2023
@aflah02
Copy link
Author

aflah02 commented Nov 26, 2023

Another error I encountered is when I try to run multirun_create_weight_inits.py. I get an error on this line which says that weight_init_only is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which says AttributeError: 'DataParallel' object has no attribute 'name_or_path'. To fix this error I changed the line to model_name = model.module.name_or_path.replace("/","-")

After all these changes I've been stuck here for quite a while -

image

The progress bar doesn't seem to be running and the end time estimate is not populated

Do let me know if you have any suggestions to fix these in a different way?

EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)

@alon-albalak
Copy link
Owner

I also have an unrelated question. Why is WEIGHT_INIT_SAMPLES=0 in the all_exp3.sh file? The other files have it set to 1000

WEIGHT_INIT_SAMPLES is only used for the UCB1 algorithm which has an explicit reward initialization phase. See algorithms 1 and 2 from the paper (https://arxiv.org/pdf/2302.00674.pdf)

In reality, you could also initialize the rewards for EXP3, but we didn't for our experiments.

@alon-albalak
Copy link
Owner

Another error I encountered is when I try to run multirun_create_weight_inits.py. I get an error on this line which says that weight_init_only is not a valid attribute. To fix this I just commented it out as I noticed that it is indeed not an attribute of the class but then I get an error here which says AttributeError: 'DataParallel' object has no attribute 'name_or_path'. To fix this error I changed the line to model_name = model.module.name_or_path.replace("/","-")

After all these changes I've been stuck here for quite a while -

image The progress bar doesn't seem to be running and the end time estimate is not populated

Do let me know if you have any suggestions to fix these in a different way?

EDIT: I checked in after a few hours and it seems to be running (The ETA is 50ish hours for T0Mix alongside the 3B Model)

Regarding the weight initialization. The script currently is set to run over 2 quantities of weight initialization samples (this line) and 5 different seeds (this line).

You can reduce those to just use a single weight initialization samples, and a single seed which will significantly speed up the initialization.

Thanks for pointing this out, I think I know what the problem is. At one point, I moved the weight initialization into the trainer class (here), but didn't make it compatible with the multirun_train_mixed script.

For now, the solution is to compute gradients prior to the Exploit-only method, as you're currently doing. And I will add that information into the instructions. Thank you for finding this bug!

Let me know if it still doesn't work for some reason after pre-computing the gradients.

@aflah02
Copy link
Author

aflah02 commented Nov 27, 2023

Thanks for the reference, I'll check it out!
For the alignment computation, I did limit it to only one seed and one weight initialization sample value but it's still taking around 40 hours on a A100 (80 GB). Also does the program write any intermediate outputs? It did create a directory but it's still empty (around 30 hours have passed). Just wanted to confirm the same

@aflah02
Copy link
Author

aflah02 commented Nov 28, 2023

Update: It did not work. I did not save all logs in a text file, in hindsight I should have done that but this is the only log which is still there on the terminal
Not quite sure what's wrong though

I ran this command - python3 src/multirun_create_weight_inits.py --target_dataset $TARGET_DATASET --auxiliary_dataset $AUXILIARY_DATASET

image

Also there is this folder which was created but is empty - FLAD/outputs/weight_inits/T5_LM_3B/T0Mixture/copa/42/1000

@alon-albalak
Copy link
Owner

alon-albalak commented Nov 28, 2023

Okay, I've made a few fixes and been able to run the all_exploit.sh script and the multirun_create_weight_inits.py script.

Try pulling the newest version of the code base and let me know if you can run all_exploit.sh and multirun_create_weight_inits.py

@aflah02
Copy link
Author

aflah02 commented Nov 28, 2023

Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?

@alon-albalak
Copy link
Owner

Also, I did catch this error once: AttributeError: 'DataParallel' object has no attribute 'name_or_path'
It has to do with how the model was initialized by huggingface. However, after my bug fixes it's disappeared for me. In case it's still there for you, let me know and I'll make changes for that as well.

The solution that I found for that is to change lines 1639-1640 from:

        # Initialize weights if needed
        if self.args.weight_initialization_samples > 0:
            self._initialize_weights(train_dataloader, target_dataloader, model)

to

        # Initialize weights if needed
        if self.args.weight_initialization_samples > 0:
            if hasattr(model, "name_or_path"):
                self._initialize_weights(train_dataloader, target_dataloader, model)
            else:
                self._initialize_weights(train_dataloader, target_dataloader, self.model)

I'm hesitant to make that change unless it's required though, because that will affect both the EXP3 and UCB1 trainers as well.

@alon-albalak
Copy link
Owner

Thank you for the quick response! Just to confirm should I still be running the gradient alignment computations first? or can they be run in parallel now?

The all_exploit.sh script should take care of the alignment computations

@aflah02
Copy link
Author

aflah02 commented Nov 28, 2023

Got it!
I'll rerun it and update you with the outcome 🙌

@alon-albalak
Copy link
Owner

Got it! I'll rerun it and update you with the outcome 🙌

Sorry for so many back and forths. When I was debugging the all_exploit.sh script, it was running so I assumed it was also calculating the alignment, but it actually doesn't. So, then I went back to compute the alignments with multirun_create_weight_inits.py and I realized why it was taking such a long time. The weight_init_only flag actually is important and I must have removed it's use at some point, so I've added it back in.

Now I've successfully been able to precompute the alignments, and then train the exploit-only model.

To replicate our experiments you do need to run multirun_create_weight_inits.py first, then you can run the all_exploit.sh script. Pull the newest update and let me know if that fixes it for you

@aflah02
Copy link
Author

aflah02 commented Nov 28, 2023

Thanks for all the fixes!
I've started the run. The ETA is just 20 mins however I did have to change the lines you mentioned above to fix the dataparallel error.

@alon-albalak
Copy link
Owner

Any update on this? Have you succeeded with the exploit-only baseline?

Have you been able to run either the EXP3 or UCB1 methods?

Just checking to see if you've found any other bugs

@aflah02
Copy link
Author

aflah02 commented Dec 1, 2023

Hey
So I did manage to run the exploit-only baseline after precomputing gradients. I could run EXP3 without it too so haven't retested it. I did not get time to rerun the UCB1 baseline. I'll update you incase I hit any issues there
In terms of issues the only one was that dataparallel error and adding the changes you suggested above fixed it
So closing this issue as well as everything seems to be fine now
Thanks for all the help!

@aflah02 aflah02 closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants