Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CoNLL-2003 dataset -Ner, Chunk and POS tagger #170

Closed
wants to merge 37 commits into from
Closed
Changes from 2 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
d97c6b2
conll basic format created
sbmaruf Jun 10, 2021
33c58b0
chunk and pos template updated.
sbmaruf Jun 10, 2021
6539b83
lable->label, code simplification
sbmaruf Jun 11, 2021
dc8d60e
prompt with random lable.
sbmaruf Jun 11, 2021
e053759
adding zip instead of nested loop
sbmaruf Jun 12, 2021
fe7e368
prompt nlu_evaluation_data (#132)
zaidalyafeai Jun 10, 2021
cf73ab9
added templates for google_wellformed_query (#156)
josrozen Jun 10, 2021
4af5b83
Added pubmed_qa prompts (#157)
jason-fries Jun 10, 2021
e2b2a05
Add neural_code_search_prompts (#159)
thomasw21 Jun 10, 2021
41ba7a2
Add choice filter (#149)
craffel Jun 10, 2021
2147b57
Prompts for esnli (#147)
elsanns Jun 10, 2021
06a40fe
Prompts for `quoref` (#88)
manandey Jun 10, 2021
93a8ae7
copying prompts to `winogrande` subsets (#89)
manandey Jun 10, 2021
c426182
Add workaround for error in template highlighting. (#175)
stephenbach Jun 10, 2021
d76181f
first commit (#177)
zaidalyafeai Jun 10, 2021
17dc003
Add a new task template field for fields corresponding to the origina…
srush Jun 10, 2021
ea09839
Add templates for the MC-TACO dataset (#151)
abheesht17 Jun 11, 2021
50c2ca9
Add template for snips_built_in_intents (#154)
trishalaneeraj Jun 11, 2021
d4571a3
Added templates for selqa dataset (#158)
rbawden Jun 11, 2021
083ab21
Squad v2 (#129)
Jun 11, 2021
823e4f0
Add MDD templates (#146)
gchhablani Jun 11, 2021
2848d76
Templates for `ncbi_disease` (#176)
drugilsberg Jun 11, 2021
580612d
Add social_i_qa prompts (#161)
thomasw21 Jun 11, 2021
0734967
Templates for stsb_multi_mt_en (#160)
NohTow Jun 11, 2021
65da59f
Common gen (#165)
nihalnayak Jun 11, 2021
058f1ff
Add choice filter for duorc templates (#183)
gchhablani Jun 11, 2021
015606b
wiki split (#130)
arunraja-hub Jun 11, 2021
84a4b47
Add app_reviews templates (#148)
gchhablani Jun 11, 2021
d965737
Prompts for quartz (#185)
elsanns Jun 11, 2021
115cfc3
Add scientific_papers templates (#179)
gchhablani Jun 11, 2021
3e200fc
longer timeout for extremely big datasets
VictorSanh Jun 11, 2021
1f71910
Templates for spider (#162)
NohTow Jun 11, 2021
3911d6c
Prompts for art more natural punctuation (#167)
elsanns Jun 11, 2021
6a967d7
Adding ASNQ (#202)
patrick-s-h-lewis Jun 12, 2021
23700ac
Add Open-Domain QA (without psg description) (#204)
sbmaruf Jun 12, 2021
5d6f8ed
bug-fixes, task template checkbox updated (#205)
sbmaruf Jun 12, 2021
c5979ca
newspop template (#186)
debajyotidatta Jun 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions templates/conll2003/templates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
dataset: conll2003
templates:
3b777e30-bdd1-4c4a-b5d9-3372c7a756f4: !Template
id: 3b777e30-bdd1-4c4a-b5d9-3372c7a756f4
jinja: "{% set _ner_lable_dict = ({\n0:\"O\",\n1:\"B-PER\",\n2:\"I-PER\",\n3:\"\
B-ORG\",\n4:\"I-ORG\",\n5:\"B-LOC\",\n6:\"I-LOC\",\n7:\"B-MISC\",\n8:\"I-MISC\"\
\n}) %}\nGenerate named entities from the following sentence. \n{{\"\"}}\n{%\
\ for i in tokens -%}\n {{- \" \" if not loop.last else \"\" -}}\n {{\
\ i }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very complex. If there a reason you can't use loop index?

https://stackoverflow.com/questions/1567291/get-loop-index-of-outer-loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

\ {% for lable in ner_tags -%}\n {% if outer_cnt.value == inner_cnt.value\
\ -%}\n {% if flag != 0 -%}\n  \n \
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _ner_lable_dict[lable]\
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\
\ + 1 -%}\n{% endfor -%} \n\n"
name: ner_flat_question_without_lable
reference: Natural question
6345fd1a-5272-4a8d-b5b2-02c1d837e521: !Template
id: 6345fd1a-5272-4a8d-b5b2-02c1d837e521
jinja: "{% set _pos_lable_dict = ({\n0:'\"',\n1:\"''\",\n2:\"#\",\n3:\"\\$\",\n\
4:\"(\",\n5:\")\",\n6:\",\",\n7:\".\",\n8:\":\",\n9:\"``\",\n10:\"CC\",\n11:\"\
CD\",\n12:\"DT\",\n13:\"EX\",\n14:\"FW\",\n15:\"IN\",\n16:\"JJ\",\n17:\"JJR\"\
,\n18:\"JJS\",\n19:\"LS\",\n20:\"MD\",\n21:\"NN\",\n22:\"NNP\",\n23:\"NNPS\"\
,\n24:\"NNS\",\n25:\"NN|SYM\",\n26:\"PDT\",\n27:\"POS\",\n28:\"PRP\",\n29:\"\
PRP\\$\",\n30:\"RB\",\n31:\"RBR\",\n32:\"RBS\",\n33:\"RP\",\n34:\"SYM\",\n35:\"\
TO\",\n36:\"UH\",\n37:\"VB\",\n38:\"VBD\",\n39:\"VBG\",\n40:\"VBN\",\n41:\"\
VBP\",\n42:\"VBZ\",\n43:\"WDT\",\n44:\"WP\",\n45:\"WP$\",\n46:\"WRB\"\n}) %}\n\
\n\nGenerate parts of speech from the following sentence. \n{% set flag = 0\
\ %}\n{% for i in tokens -%}\n {% if flag != 0 -%}\n  \n {%\
\ endif -%}\n {% set flag = 1 -%}\n {{ i }}\n{% endfor %} \n|||\n{% set\
\ flag = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens\
\ -%}\n {% set inner_cnt = namespace(value=0) -%}\n {% for lable in pos_tags\
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\
\ if flag != 0 -%}\n  \n {% endif -%}\n \
\ {% set flag = 1 -%}\n {{tok}}:{{ _pos_lable_dict[lable] }}\n\
\ {% endif -%}\n {% set inner_cnt.value = inner_cnt.value + 1\
\ -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value + 1\
\ -%}\n{% endfor -%} "
name: pos_flat_question_without_label
reference: Natural Question
71eadc14-9533-43a8-82dc-6d6a171119ef: !Template
id: 71eadc14-9533-43a8-82dc-6d6a171119ef
jinja: "{% set _chunk_lable_dict = ({\n0:\"O\",\n1:\"B-ADJP\",\n2:\"I-ADJP\",\n\
3:\"B-ADVP\",\n4:\"I-ADVP\",\n5:\"B-CONJP\",\n6:\"I-CONJP\",\n7:\"B-INTJ\",\n\
8:\"I-INTJ\",\n9:\"B-LST\",\n10:\"I-LST\",\n11:\"B-NP\",\n12:\"I-NP\",\n13:\"\
B-PP\",\n14:\"I-PP\",\n15:\"B-PRT\",\n16:\"I-PRT\",\n17:\"B-SBAR\",\n18:\"I-SBAR\"\
,\n19:\"B-UCP\",\n20:\"I-UCP\",\n21:\"B-VP\",\n22:\"I-VP\"\n}) %}\nGenerate\
\ chunk tag from the following sentence. The chunk tags are\n{% for k,v in _chunk_lable_dict.items()\
\ -%}\n {{ v }}\n {{- \", \" if not loop.last else \"\" -}}\n {{- \"\
\\n\" if loop.last -}}\n{% endfor %} \n{% for i in tokens -%}\n {{- \" \"\
\ if not loop.last else \"\" -}}\n {{ i }}\n{% endfor %} \n|||\n{% set flag\
\ = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens -%}\n\
\ {% set inner_cnt = namespace(value=0) -%}\n {% for lable in ner_tags\
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\
\ if flag != 0 -%}\n  \n {% endif -%}\n \
\ {% set flag = 1 -%}\n {{tok}}:{{ _chunk_lable_dict[lable]\
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\
\ + 1 -%}\n{% endfor -%} \n\n"
name: chunk_flat_question_with_label
reference: Natural Question
8613a9bb-94c7-4b91-998c-3c298087d719: !Template
id: 8613a9bb-94c7-4b91-998c-3c298087d719
jinja: "{% set _ner_lable_dict = ({\n0:\"O\",\n1:\"B-PER\",\n2:\"I-PER\",\n3:\"\
B-ORG\",\n4:\"I-ORG\",\n5:\"B-LOC\",\n6:\"I-LOC\",\n7:\"B-MISC\",\n8:\"I-MISC\"\
\n}) %}\nGenerate named entities from the following sentence. The named entities\
\ are\n{% for k,v in _ner_lable_dict.items() -%}\n {{ v }}\n {{- \", \"\
\ if not loop.last else \"\" -}}\n {{- \"\\n\" if loop.last -}}\n{% endfor\
\ %} \n{% for i in tokens -%}\n {{- \" \" if not loop.last else \"\" -}}\n\
\ {{ i }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\
\ {% for lable in ner_tags -%}\n {% if outer_cnt.value == inner_cnt.value\
\ -%}\n {% if flag != 0 -%}\n  \n \
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _ner_lable_dict[lable]\
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\
\ + 1 -%}\n{% endfor -%} \n\n"
name: ner_flat_question_with_lable
reference: Natural question
87bb05ff-f6bf-4c0d-bbcc-56f75095f4a1: !Template
id: 87bb05ff-f6bf-4c0d-bbcc-56f75095f4a1
jinja: "{% set _chunk_lable_dict = ({\n0:\"O\",\n1:\"B-ADJP\",\n2:\"I-ADJP\",\n\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lable - > label

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

3:\"B-ADVP\",\n4:\"I-ADVP\",\n5:\"B-CONJP\",\n6:\"I-CONJP\",\n7:\"B-INTJ\",\n\
8:\"I-INTJ\",\n9:\"B-LST\",\n10:\"I-LST\",\n11:\"B-NP\",\n12:\"I-NP\",\n13:\"\
B-PP\",\n14:\"I-PP\",\n15:\"B-PRT\",\n16:\"I-PRT\",\n17:\"B-SBAR\",\n18:\"I-SBAR\"\
,\n19:\"B-UCP\",\n20:\"I-UCP\",\n21:\"B-VP\",\n22:\"I-VP\"\n}) %}\nGenerate\
\ chunk tag from the following sentence. \n{% for i in tokens -%}\n {{- \"\
\ \" if not loop.last else \"\" -}}\n {{ i }}\n{% endfor %} \n|||\n{% set\
\ flag = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens\
\ -%}\n {% set inner_cnt = namespace(value=0) -%}\n {% for lable in ner_tags\
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\
\ if flag != 0 -%}\n  \n {% endif -%}\n \
\ {% set flag = 1 -%}\n {{tok}}:{{ _chunk_lable_dict[lable]\
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\
\ + 1 -%}\n{% endfor -%} \n\n"
name: chunk_flat_question_without_label
reference: Natural question
d10ab706-e2ea-4b7a-b6b4-371aa0ab483b: !Template
id: d10ab706-e2ea-4b7a-b6b4-371aa0ab483b
jinja: "{% set _pos_lable_dict = ({\n0:'\"',\n1:\"''\",\n2:\"#\",\n3:\"\\$\",\n\
4:\"(\",\n5:\")\",\n6:\",\",\n7:\".\",\n8:\":\",\n9:\"``\",\n10:\"CC\",\n11:\"\
CD\",\n12:\"DT\",\n13:\"EX\",\n14:\"FW\",\n15:\"IN\",\n16:\"JJ\",\n17:\"JJR\"\
,\n18:\"JJS\",\n19:\"LS\",\n20:\"MD\",\n21:\"NN\",\n22:\"NNP\",\n23:\"NNPS\"\
,\n24:\"NNS\",\n25:\"NN|SYM\",\n26:\"PDT\",\n27:\"POS\",\n28:\"PRP\",\n29:\"\
PRP\\$\",\n30:\"RB\",\n31:\"RBR\",\n32:\"RBS\",\n33:\"RP\",\n34:\"SYM\",\n35:\"\
TO\",\n36:\"UH\",\n37:\"VB\",\n38:\"VBD\",\n39:\"VBG\",\n40:\"VBN\",\n41:\"\
VBP\",\n42:\"VBZ\",\n43:\"WDT\",\n44:\"WP\",\n45:\"WP$\",\n46:\"WRB\"\n}) %}\n\
\n\nGenerate parts of speech from the following sentence. The parts of speech\
\ tags are\n{% for k,v in _pos_lable_dict.items() -%}\n {{ v }}\n {{-\
\ \", \" if not loop.last else \"\" -}}\n {{- \"\\n\" if loop.last -}}\n\
{% endfor %} \n{% set flag = 0 %}\n{% for i in tokens -%}\n {% if flag !=\
\ 0 -%}\n  \n {% endif -%}\n {% set flag = 1 -%}\n {{ i\
\ }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\
\ {% for lable in pos_tags -%}\n {% if outer_cnt.value == inner_cnt.value\
\ -%}\n {% if flag != 0 -%}\n  \n \
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _pos_lable_dict[lable]\
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\
\ + 1 -%}\n{% endfor -%} "
name: pos_flat_question_with_lable
reference: ' Natural question'