-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CoNLL-2003 dataset -Ner, Chunk and POS tagger #170
Closed
Closed
Changes from 2 commits
Commits
Show all changes
37 commits
Select commit
Hold shift + click to select a range
d97c6b2
conll basic format created
sbmaruf 33c58b0
chunk and pos template updated.
sbmaruf 6539b83
lable->label, code simplification
sbmaruf dc8d60e
prompt with random lable.
sbmaruf e053759
adding zip instead of nested loop
sbmaruf fe7e368
prompt nlu_evaluation_data (#132)
zaidalyafeai cf73ab9
added templates for google_wellformed_query (#156)
josrozen 4af5b83
Added pubmed_qa prompts (#157)
jason-fries e2b2a05
Add neural_code_search_prompts (#159)
thomasw21 41ba7a2
Add choice filter (#149)
craffel 2147b57
Prompts for esnli (#147)
elsanns 06a40fe
Prompts for `quoref` (#88)
manandey 93a8ae7
copying prompts to `winogrande` subsets (#89)
manandey c426182
Add workaround for error in template highlighting. (#175)
stephenbach d76181f
first commit (#177)
zaidalyafeai 17dc003
Add a new task template field for fields corresponding to the origina…
srush ea09839
Add templates for the MC-TACO dataset (#151)
abheesht17 50c2ca9
Add template for snips_built_in_intents (#154)
trishalaneeraj d4571a3
Added templates for selqa dataset (#158)
rbawden 083ab21
Squad v2 (#129)
823e4f0
Add MDD templates (#146)
gchhablani 2848d76
Templates for `ncbi_disease` (#176)
drugilsberg 580612d
Add social_i_qa prompts (#161)
thomasw21 0734967
Templates for stsb_multi_mt_en (#160)
NohTow 65da59f
Common gen (#165)
nihalnayak 058f1ff
Add choice filter for duorc templates (#183)
gchhablani 015606b
wiki split (#130)
arunraja-hub 84a4b47
Add app_reviews templates (#148)
gchhablani d965737
Prompts for quartz (#185)
elsanns 115cfc3
Add scientific_papers templates (#179)
gchhablani 3e200fc
longer timeout for extremely big datasets
VictorSanh 1f71910
Templates for spider (#162)
NohTow 3911d6c
Prompts for art more natural punctuation (#167)
elsanns 6a967d7
Adding ASNQ (#202)
patrick-s-h-lewis 23700ac
Add Open-Domain QA (without psg description) (#204)
sbmaruf 5d6f8ed
bug-fixes, task template checkbox updated (#205)
sbmaruf c5979ca
newspop template (#186)
debajyotidatta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
dataset: conll2003 | ||
templates: | ||
3b777e30-bdd1-4c4a-b5d9-3372c7a756f4: !Template | ||
id: 3b777e30-bdd1-4c4a-b5d9-3372c7a756f4 | ||
jinja: "{% set _ner_lable_dict = ({\n0:\"O\",\n1:\"B-PER\",\n2:\"I-PER\",\n3:\"\ | ||
B-ORG\",\n4:\"I-ORG\",\n5:\"B-LOC\",\n6:\"I-LOC\",\n7:\"B-MISC\",\n8:\"I-MISC\"\ | ||
\n}) %}\nGenerate named entities from the following sentence. \n{{\"\"}}\n{%\ | ||
\ for i in tokens -%}\n {{- \" \" if not loop.last else \"\" -}}\n {{\ | ||
\ i }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\ | ||
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\ | ||
\ {% for lable in ner_tags -%}\n {% if outer_cnt.value == inner_cnt.value\ | ||
\ -%}\n {% if flag != 0 -%}\n \n \ | ||
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _ner_lable_dict[lable]\ | ||
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\ | ||
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\ | ||
\ + 1 -%}\n{% endfor -%} \n\n" | ||
name: ner_flat_question_without_lable | ||
reference: Natural question | ||
6345fd1a-5272-4a8d-b5b2-02c1d837e521: !Template | ||
id: 6345fd1a-5272-4a8d-b5b2-02c1d837e521 | ||
jinja: "{% set _pos_lable_dict = ({\n0:'\"',\n1:\"''\",\n2:\"#\",\n3:\"\\$\",\n\ | ||
4:\"(\",\n5:\")\",\n6:\",\",\n7:\".\",\n8:\":\",\n9:\"``\",\n10:\"CC\",\n11:\"\ | ||
CD\",\n12:\"DT\",\n13:\"EX\",\n14:\"FW\",\n15:\"IN\",\n16:\"JJ\",\n17:\"JJR\"\ | ||
,\n18:\"JJS\",\n19:\"LS\",\n20:\"MD\",\n21:\"NN\",\n22:\"NNP\",\n23:\"NNPS\"\ | ||
,\n24:\"NNS\",\n25:\"NN|SYM\",\n26:\"PDT\",\n27:\"POS\",\n28:\"PRP\",\n29:\"\ | ||
PRP\\$\",\n30:\"RB\",\n31:\"RBR\",\n32:\"RBS\",\n33:\"RP\",\n34:\"SYM\",\n35:\"\ | ||
TO\",\n36:\"UH\",\n37:\"VB\",\n38:\"VBD\",\n39:\"VBG\",\n40:\"VBN\",\n41:\"\ | ||
VBP\",\n42:\"VBZ\",\n43:\"WDT\",\n44:\"WP\",\n45:\"WP$\",\n46:\"WRB\"\n}) %}\n\ | ||
\n\nGenerate parts of speech from the following sentence. \n{% set flag = 0\ | ||
\ %}\n{% for i in tokens -%}\n {% if flag != 0 -%}\n \n {%\ | ||
\ endif -%}\n {% set flag = 1 -%}\n {{ i }}\n{% endfor %} \n|||\n{% set\ | ||
\ flag = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens\ | ||
\ -%}\n {% set inner_cnt = namespace(value=0) -%}\n {% for lable in pos_tags\ | ||
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\ | ||
\ if flag != 0 -%}\n \n {% endif -%}\n \ | ||
\ {% set flag = 1 -%}\n {{tok}}:{{ _pos_lable_dict[lable] }}\n\ | ||
\ {% endif -%}\n {% set inner_cnt.value = inner_cnt.value + 1\ | ||
\ -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value + 1\ | ||
\ -%}\n{% endfor -%} " | ||
name: pos_flat_question_without_label | ||
reference: Natural Question | ||
71eadc14-9533-43a8-82dc-6d6a171119ef: !Template | ||
id: 71eadc14-9533-43a8-82dc-6d6a171119ef | ||
jinja: "{% set _chunk_lable_dict = ({\n0:\"O\",\n1:\"B-ADJP\",\n2:\"I-ADJP\",\n\ | ||
3:\"B-ADVP\",\n4:\"I-ADVP\",\n5:\"B-CONJP\",\n6:\"I-CONJP\",\n7:\"B-INTJ\",\n\ | ||
8:\"I-INTJ\",\n9:\"B-LST\",\n10:\"I-LST\",\n11:\"B-NP\",\n12:\"I-NP\",\n13:\"\ | ||
B-PP\",\n14:\"I-PP\",\n15:\"B-PRT\",\n16:\"I-PRT\",\n17:\"B-SBAR\",\n18:\"I-SBAR\"\ | ||
,\n19:\"B-UCP\",\n20:\"I-UCP\",\n21:\"B-VP\",\n22:\"I-VP\"\n}) %}\nGenerate\ | ||
\ chunk tag from the following sentence. The chunk tags are\n{% for k,v in _chunk_lable_dict.items()\ | ||
\ -%}\n {{ v }}\n {{- \", \" if not loop.last else \"\" -}}\n {{- \"\ | ||
\\n\" if loop.last -}}\n{% endfor %} \n{% for i in tokens -%}\n {{- \" \"\ | ||
\ if not loop.last else \"\" -}}\n {{ i }}\n{% endfor %} \n|||\n{% set flag\ | ||
\ = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens -%}\n\ | ||
\ {% set inner_cnt = namespace(value=0) -%}\n {% for lable in ner_tags\ | ||
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\ | ||
\ if flag != 0 -%}\n \n {% endif -%}\n \ | ||
\ {% set flag = 1 -%}\n {{tok}}:{{ _chunk_lable_dict[lable]\ | ||
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\ | ||
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\ | ||
\ + 1 -%}\n{% endfor -%} \n\n" | ||
name: chunk_flat_question_with_label | ||
reference: Natural Question | ||
8613a9bb-94c7-4b91-998c-3c298087d719: !Template | ||
id: 8613a9bb-94c7-4b91-998c-3c298087d719 | ||
jinja: "{% set _ner_lable_dict = ({\n0:\"O\",\n1:\"B-PER\",\n2:\"I-PER\",\n3:\"\ | ||
B-ORG\",\n4:\"I-ORG\",\n5:\"B-LOC\",\n6:\"I-LOC\",\n7:\"B-MISC\",\n8:\"I-MISC\"\ | ||
\n}) %}\nGenerate named entities from the following sentence. The named entities\ | ||
\ are\n{% for k,v in _ner_lable_dict.items() -%}\n {{ v }}\n {{- \", \"\ | ||
\ if not loop.last else \"\" -}}\n {{- \"\\n\" if loop.last -}}\n{% endfor\ | ||
\ %} \n{% for i in tokens -%}\n {{- \" \" if not loop.last else \"\" -}}\n\ | ||
\ {{ i }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\ | ||
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\ | ||
\ {% for lable in ner_tags -%}\n {% if outer_cnt.value == inner_cnt.value\ | ||
\ -%}\n {% if flag != 0 -%}\n \n \ | ||
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _ner_lable_dict[lable]\ | ||
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\ | ||
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\ | ||
\ + 1 -%}\n{% endfor -%} \n\n" | ||
name: ner_flat_question_with_lable | ||
reference: Natural question | ||
87bb05ff-f6bf-4c0d-bbcc-56f75095f4a1: !Template | ||
id: 87bb05ff-f6bf-4c0d-bbcc-56f75095f4a1 | ||
jinja: "{% set _chunk_lable_dict = ({\n0:\"O\",\n1:\"B-ADJP\",\n2:\"I-ADJP\",\n\ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lable - > label There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
3:\"B-ADVP\",\n4:\"I-ADVP\",\n5:\"B-CONJP\",\n6:\"I-CONJP\",\n7:\"B-INTJ\",\n\ | ||
8:\"I-INTJ\",\n9:\"B-LST\",\n10:\"I-LST\",\n11:\"B-NP\",\n12:\"I-NP\",\n13:\"\ | ||
B-PP\",\n14:\"I-PP\",\n15:\"B-PRT\",\n16:\"I-PRT\",\n17:\"B-SBAR\",\n18:\"I-SBAR\"\ | ||
,\n19:\"B-UCP\",\n20:\"I-UCP\",\n21:\"B-VP\",\n22:\"I-VP\"\n}) %}\nGenerate\ | ||
\ chunk tag from the following sentence. \n{% for i in tokens -%}\n {{- \"\ | ||
\ \" if not loop.last else \"\" -}}\n {{ i }}\n{% endfor %} \n|||\n{% set\ | ||
\ flag = 0 %}\n{% set outer_cnt = namespace(value=0) -%}\n{% for tok in tokens\ | ||
\ -%}\n {% set inner_cnt = namespace(value=0) -%}\n {% for lable in ner_tags\ | ||
\ -%}\n {% if outer_cnt.value == inner_cnt.value -%}\n {%\ | ||
\ if flag != 0 -%}\n \n {% endif -%}\n \ | ||
\ {% set flag = 1 -%}\n {{tok}}:{{ _chunk_lable_dict[lable]\ | ||
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\ | ||
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\ | ||
\ + 1 -%}\n{% endfor -%} \n\n" | ||
name: chunk_flat_question_without_label | ||
reference: Natural question | ||
d10ab706-e2ea-4b7a-b6b4-371aa0ab483b: !Template | ||
id: d10ab706-e2ea-4b7a-b6b4-371aa0ab483b | ||
jinja: "{% set _pos_lable_dict = ({\n0:'\"',\n1:\"''\",\n2:\"#\",\n3:\"\\$\",\n\ | ||
4:\"(\",\n5:\")\",\n6:\",\",\n7:\".\",\n8:\":\",\n9:\"``\",\n10:\"CC\",\n11:\"\ | ||
CD\",\n12:\"DT\",\n13:\"EX\",\n14:\"FW\",\n15:\"IN\",\n16:\"JJ\",\n17:\"JJR\"\ | ||
,\n18:\"JJS\",\n19:\"LS\",\n20:\"MD\",\n21:\"NN\",\n22:\"NNP\",\n23:\"NNPS\"\ | ||
,\n24:\"NNS\",\n25:\"NN|SYM\",\n26:\"PDT\",\n27:\"POS\",\n28:\"PRP\",\n29:\"\ | ||
PRP\\$\",\n30:\"RB\",\n31:\"RBR\",\n32:\"RBS\",\n33:\"RP\",\n34:\"SYM\",\n35:\"\ | ||
TO\",\n36:\"UH\",\n37:\"VB\",\n38:\"VBD\",\n39:\"VBG\",\n40:\"VBN\",\n41:\"\ | ||
VBP\",\n42:\"VBZ\",\n43:\"WDT\",\n44:\"WP\",\n45:\"WP$\",\n46:\"WRB\"\n}) %}\n\ | ||
\n\nGenerate parts of speech from the following sentence. The parts of speech\ | ||
\ tags are\n{% for k,v in _pos_lable_dict.items() -%}\n {{ v }}\n {{-\ | ||
\ \", \" if not loop.last else \"\" -}}\n {{- \"\\n\" if loop.last -}}\n\ | ||
{% endfor %} \n{% set flag = 0 %}\n{% for i in tokens -%}\n {% if flag !=\ | ||
\ 0 -%}\n \n {% endif -%}\n {% set flag = 1 -%}\n {{ i\ | ||
\ }}\n{% endfor %} \n|||\n{% set flag = 0 %}\n{% set outer_cnt = namespace(value=0)\ | ||
\ -%}\n{% for tok in tokens -%}\n {% set inner_cnt = namespace(value=0) -%}\n\ | ||
\ {% for lable in pos_tags -%}\n {% if outer_cnt.value == inner_cnt.value\ | ||
\ -%}\n {% if flag != 0 -%}\n \n \ | ||
\ {% endif -%}\n {% set flag = 1 -%}\n {{tok}}:{{ _pos_lable_dict[lable]\ | ||
\ }}\n {% endif -%}\n {% set inner_cnt.value = inner_cnt.value\ | ||
\ + 1 -%}\n {% endfor -%} \n {% set outer_cnt.value = outer_cnt.value\ | ||
\ + 1 -%}\n{% endfor -%} " | ||
name: pos_flat_question_with_lable | ||
reference: ' Natural question' |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very complex. If there a reason you can't use loop index?
https://stackoverflow.com/questions/1567291/get-loop-index-of-outer-loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.