You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Very interesting. Glad to see you can pull nice results with the current setup.
I'm contributing the training curve for fine-tuning another llama2-based model, tulu2-7B with UltraRM-13B on the ultrafeedback dataset.
The fine-tuned result (in terms of rewards) isn't as high as the other library (e.g. EasyLM) under similar hyperparameter settings, and I'm still trying to figure out why.
The fine-tuned result (in terms of rewards) isn't as high as the other library (e.g. EasyLM) under similar hyperparameter settings, and I'm still trying to figure out why.
btw this is resolved. I was able to pull good-performing models comparable to our other setups just with a few minor differences. Great work!
The text was updated successfully, but these errors were encountered: