Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why GPT-CC is lower than CodeX significantly. #73

Closed
BitcoinNLPer opened this issue Oct 29, 2021 · 2 comments
Closed

Why GPT-CC is lower than CodeX significantly. #73

BitcoinNLPer opened this issue Oct 29, 2021 · 2 comments

Comments

@BitcoinNLPer
Copy link

This is Codex results
image

The following result is GPT-CC
image

  • Is it caused by the quality of the pre-trained corpus data?

Thanks

@Symbolk
Copy link

Symbolk commented Jan 18, 2022

Also curious, as I know, the HumanEval dataset contains 164 problems, according to the latest result in README, the model does not even pass any one of them!

image

@taisazero
Copy link
Member

Ya that's correct we discovered issues with the pre-training corpus data. We fixed the issue and released a new pertaining corpus and are in the process of processing the dataset further and pre-training a new GPT-CC.

In the meantime, check out these awesome models: https://huggingface.co/spaces/codeparrot/code-generation-models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants