Skip to content

ADaM-BJTU/W2SG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning

Jitao Sang, Yuhang Wang, Jing Zhang, Yanxu Zhu, Chao Kong, Junhong Ye, Shuyu Wei and Jinlin Xiao
Department of Computer Science
Beijing Jiaotong University

Our study simulates two phases of superalignment under the W2SG framework: the development of general superhuman models and the progression towards superintelligence. In the first phase, based on human supervision, the quality of weak supervision is enhanced through a combination of scalable oversight and ensemble learning, reducing the capability gap between weak teachers and strong students. In the second phase, an automatic alignment evaluator is employed as the weak supervisor. By recursively updating this auto aligner, the capabilities of the weak teacher models are synchronously enhanced, achieving weak-to-strong supervision over stronger student models.

This project contains experimental code for exploring the first phase in our paper.

Get Started

Please prepare the environment and install packages according to the weak-to-strong project.

Experimental dataset

The dataset utilized in our study was derived through sampling from the SciQ dataset (see Raw Data LINK for more details). Our constructed dataset can be accessed via the following LINK.

Code

The code related to section 3, 4, and 5 in the paper is located in the src directory. The code will be released after organization.

Acknowledgement

  • The open-source project weak-to-strong.
  • Hugging Face for their open-source transformer models.

Citation

@misc{sang2024improving,
      title={Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning}, 
      author={Jitao Sang and Yuhang Wang and Jing Zhang and Yanxu Zhu and Chao Kong and Junhong Ye and Shuyu Wei and Jinlin Xiao},
      year={2024},
      eprint={2402.00667},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published