Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据中存在问题 #8

Open
qhpeklh5959 opened this issue Jan 17, 2020 · 2 comments
Open

数据中存在问题 #8

qhpeklh5959 opened this issue Jan 17, 2020 · 2 comments
Assignees

Comments

@qhpeklh5959
Copy link

抽样看了一下评估数据,句首书名号会被洗掉,这类问题很常见,比如:
{"text": "第一财经日报》记者分析。", "label": {"book": {"第一财经日报》": [[0, 6]]}, "position": {"记者": [[7, 8]]}}}
{"text": "吴三桂演义》小说的想像,说是为牛金星所毒杀。……在小说中加插一些历史背景,", "label": {"book": {"吴三桂演义》": [[0, 5]]}, "name": {"牛金星": [[15, 17]]}}}

dev集里面有一处标注错误:
{"text": "客场1-3惨败于西西里岛。失去小曼奇尼和托蒂缺阵的情况下,整个阵形比较混乱,季初低走在所难免。", "label": {"address": {"西西里岛": [[8, 11]]}, "name": {"小曼奇尼": [[15, 18]], "托蒂": [[20, 21]]}}}
西西里岛应是组织机构而非地址

@qhpeklh5959
Copy link
Author

qhpeklh5959 commented Jan 17, 2020

{"text": "剧情:《现代启示录》", "label": {"game": {"《现代启示录》": [[3, 9]]}}}
这个是train集里面的数据,这类数据是无法分辨是movie还是game的(《现代启示录》更多认知是电影)

{"text": "据DOTA2官方消息透露,", "label": {"game": {"DOTA2": [[1, 5]]}}}
{"text": "“去年我们凭借《现代战争1》大获成功,其辉煌业绩让众多业界老手大跌眼镜。", "label": {"game": {"《现代战争1》": [[7, 13]]}}}
这类数据同样无法分辨作品类实体的细粒度类别

而且从readme里面看到,模型的表现远超human的表现,这个是什么情况哈?

@ConnieTong
Copy link

句首书名号缺失是因为原始真实数据就是这样,尽量保证数据真实性所以不会手动添加。标注准则是,即使只有一个书名号也要标注出来

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants