-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
感谢徐亮老师团队的工作~关于评测细节 有一些疑问咨询下 #1
Comments
同文+1,具体的题目数量有多少呢 |
期待把每一期的题目公布出来,大家一起共创。 |
放出来厂商就可以作弊了 lol |
看到人类得分那么高,就知道这个项目不靠谱。 |
1)当前报告的分数是采用开卷形式做题目的分数,所以结果比较高。我们也计划报告一下闭卷形式的分数。 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
1)我看到基础能力评测中人类各项分数都接近100分,是不是题目出的太少太简单?
2) 项目上说一共三个人用投票机制,作为人类的分数,请问是什么水平的人类?另外三个人是否太少~
3)尤其是代码能力方面 以我自己使用的体验 gpt-4 写代码能力很强 而且属于全栈 ,各种语言都会一些,这个应该没人能达到吧。但是这个评测中人类、gpt-4、gpt-3.5-turbo分数一样,是否题目的区分度还不够
The text was updated successfully, but these errors were encountered: