This repository stores Dataset Resources, Evaluation Papers and Detection Tools for ChatGPT.
-
ChatGPT: A Meta-Analysis after 2.5 Months.
Christoph Leiter, Ran Zhang, Yanran Chen, Jonas Belouadi, Daniil Larionov, Vivian Fresen, Steffen Eger. [abs], 2023.2
-
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.
Biyang Guo, Xin Zhang , Ziyuan Wang, Minqi Jiang , Jinran Nie, Yuxuan Ding, Jianwei Yue , Yupeng Wu. [abs],[github], 2023.1
-
ChatGPT: Jack of all trades, master of none.
Jan Kocoń , Igor Cichecki , Oliwier Kaszyca , Mateusz Kochanek , Dominika Szydło , Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak and Przemysław Kazienko. [abs],[github], 2023.2
-
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT.
Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao. [abs],[github], 2023.2
-
Is ChatGPT A Good Translator? A Preliminary Study.
Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Xing Wang, Zhaopeng Tu. [abs],[github], 2023.1
-
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective.
Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie . [abs],[github], 2023.2
-
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP).
Paulo Shakarian, Abhinav Koyyalamudi, Noel Ngu, Lakshmivihari Mareedu. [abs][github], 2023.2
Data statistics of these resources:
Paper with Dataset | Task | #Examples |
---|---|---|
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection | QA + Dialog | 40,000 |
ChatGPT: Jack of all trades, master of none | 25 classification/ QA/reasoning task | 38,000 |
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT | sentiment analysis / Paraphrase / NLI | 475 |
Is ChatGPT A Good Translator? A Preliminary Study | Translation | 5,609 |
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective | Robustness | 2,237 |
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP). | Reasoning | 1,000 |
-
Can ChatGPT Understand Too? A Comparative Study on ChatGPT and Fine-tuned BERT.
Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du, Dacheng Tao. [abs],[github], 2023.2
-
ChatGPT: Jack of all trades, master of none.
Jan Kocoń , Igor Cichecki , Oliwier Kaszyca , Mateusz Kochanek , Dominika Szydło , Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak and Przemysław Kazienko. [abs],[github], 2023.2
-
How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks.
Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, Xuanjing Huang. [abs], 2023.3
-
Exploring AI Ethics of ChatGPT: A Diagnostic Analysis.
Terry Yue Zhuo, Yujin Huang , Chunyang Chen , Zhenchang Xing. [abs], 2023.2
-
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech.
Fan Huang, Haewoon Kwak, Jisun An. [abs], 2023.2
-
Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization.
Xianjun Yang, Yan Li, Xinlu Zhang, Haifeng Chen, Wei Cheng. [abs], 2023.2
-
Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?
Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon. [abs], 2023.2
-
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports.
Katharina Jeblick, Balthasar Schachtner, Jakob Dexl, Andreas Mittermeier, Anna Theresa Stüber, Johanna Topalis, Tobias Weber, Philipp Wesp, Bastian Sabel, Jens Ricke, Michael Ingrisch. [abs], 2022.12
-
Cross-Lingual Summarization via ChatGPT.
Jiaan Wang, Yunlong Liang, Fandong Meng, Zhixu Li, Jianfeng Qu, Jie Zhou. [abs], 2023.2
-
Mathematical Capabilities of ChatGPT.
Simon Frieder, Luca Pinchetti, Ryan-Rhys Griffiths, Tommaso Salvatori, Thomas Lukasiewicz, Philipp Christian Petersen, Alexis Chevalier, Julius Berner. [abs], 2023.1
-
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang. [abs], 2023.2
-
A Categorical Archive of ChatGPT Failures.
Ali Borji. [abs], 2023.2
-
An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP).
Paulo Shakarian, Abhinav Koyyalamudi, Noel Ngu, Lakshmivihari Mareedu. [abs][github], 2023.2
-
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung. [abs], 2023.2
-
A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning
Zhisheng Tang, Mayank Kejriwal. [abs], 2023.2
-
Zero-Shot Information Extraction via Chatting with ChatGPT
Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han. [abs][github][demo], 2023.2
-
ChatGPT: The End of Online Exam Integrity?
Teo Susnjak. [abs], 2022.12
-
ChatGPT: Bullshit spewer or the end of traditional assessments in higher education?
Jürgen Rudolph, Samson Tan, Shannon Tan. [pdf], 2023.1
-
Will ChatGPT get you caught? Rethinking of Plagiarism Detection
Mohammad Khalil, Erkan Er. [abs], 2023.2
-
How Does ChatGPT Perform on the Medical Licensing Exams? The Implications of Large Language Models for Medical Education and Knowledge Assessment.
Aidan Gilson, Conrad Safranek, Thomas Huang, Vimig Socrates, Ling Chi, R. Andrew Taylor, David Chartash. [pdf], 2022.12
-
Evaluating ChatGPT as an Adjunct for Radiologic Decision-Making.
Arya Rao, John Kim, Meghana Kamineni, Michael Pang, Winston Lie, Marc D. Succi. [pdf], 2023.2
-
Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness.
Guido Zuccon, Bevan Koopman. [abs], 2023.2
-
Chatgpt goes to law school
Teo Susnjak. [abs], 2023
-
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature.
Eric Mitchell, Yoonho Lee, Alexander Khazatsky, Christopher D. Manning, Chelsea Finn. [abs],[demo], 2023.1
-
GPTScore: Evaluate as You Desire.
Jinlan Fu, See-Kiong Ng, Zhengbao Jiang, Pengfei Liu. [abs],[github], 2023.2
-
MAUVE Scores for Generative Models: Theory and Practice.
Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui. [abs], 2022.12
-
Large Language Models Are State-of-the-Art Evaluators of Translation Quality.
-
AI vs. Human -- Differentiation Analysis of Scientific Content Generation.
Yongqiang Ma, Jiawei Liu, Fan Yi, Qikai Cheng, Yong Huang, Wei Lu, Xiaozhong Liu. [abs], 2023.1
- Hello-SimpleAI ChatGPT Detector: An open-source detection project consists of three versions of models to detect text generated with ChatGPT, including QA version, Sinlge-text version and Linguistic version.
- GPTZero: A demo to detect writings generated by ChatGPT. The creator has seen that the technology was used by students to cheat on assignments, so he came up with a safeguard.
- OpenAI Classifier: A classifier fine-tuned on a dataset of pairs of human-written text and AI-written text on the same topic.
- Contentatscale AI Content Detector : A tool that allows users to receive the Human or AI Content score in the text to detect. It provides probability for each sentence.
- Writers AI Content Detector: A tool similar to Contentatscale. It requires either the URL of the page or text to calculate the “Human-Generated Content” score.
Statistics of these tools:
Tool | Detection Target | Language | Input Range (# characters) |
---|---|---|---|
Hello-SimpleAI ChatGPT Detector | ChatGPT | en/zh | (0,~1500] (512tokens) |
GPTZero | LLM | en | (250,♾️) |
OpenAI Classifier | LLM | en | (0,♾️) |
Contentatscale AI Content Detector | AI Content (NLP+SERP) | en | (0,25,000] |
Writers AI Content Detector | AI Content | en | (0, 1,500] |