-
Notifications
You must be signed in to change notification settings - Fork 2.2k
WTQ dataset reader and models moved from iterative-search-semparse #2764
Conversation
@matt-gardner This PR is finally ready to be reviewed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome @pdasigi! It looks like it was a whole lot of work, thanks for doing this. I had a few questions, but it looks very good overall.
@@ -32,28 +32,19 @@ | |||
class WikiTablesDatasetReader(DatasetReader): | |||
""" | |||
This ``DatasetReader`` takes WikiTableQuestions ``*.examples`` files and converts them into | |||
``Instances`` suitable for use with the ``WikiTablesSemanticParser``. This reader also accepts | |||
pre-processed JSONL files produced by ``scripts/preprocess_wikitables.py``. Processing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We put this caching into the base DatasetReader
recently, which is (I assume) why you just removed this code; maybe also because the corenlp preprocessing is already done, and you're reading those by default? Have you tried using the caching functionality with this at all, or is it already fast enough that it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not aware of the caching functionality. That sounds great! I removed this because I thought reading in the preprocessed files was not too slow.
target_values_field = MetadataField(target_values) | ||
world = WikiTablesLanguage(table_context) | ||
world_field = MetadataField(world) | ||
# Note: Not passing any featre extractors when instantiating the field below. This will make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want this to be configurable anymore?
# pylint: enable=protected-access | ||
instance = self._dataset_reader.text_to_instance(question_text, # type: ignore | ||
table_rows, | ||
tokenized_question=tokenized_question) | ||
table_rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need additional processing? Does this have to be formatted as a corenlp-tagged table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just the raw table content. I added a comment here and updated the docstring of text_to_instance
to make that clear.
allennlp/models/semantic_parsing/wikitables/wikitables_erm_semantic_parser.py
Outdated
Show resolved
Hide resolved
logical_form = world.get_logical_form(action_strings) | ||
lisp_string = state.extras[batch_index] | ||
if self._executor.evaluate_logical_form(logical_form, lisp_string): | ||
logical_form = world.action_sequence_to_logical_form(action_strings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can execute action sequences directly, which is definitely recommended here, so you don't have to go back and forth between strings. It'll make things at least a little faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks! Added a methodWikiTablesLanguage.evaluate_action_sequence
action_strings = [action_mapping[(i, action_index)] for action_index in action_indices] | ||
has_logical_form = False | ||
try: | ||
logical_form = world[i].action_sequence_to_logical_form(action_strings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, too, just execute the action sequence directly, instead of converting to a logical form first. If we need to add a world.evaluate_action_sequence(actions, targets)
, that's fine.
It'll simplify the logic below, also, as you don't need to worry about a ParsingError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're also measuring whether the parser is producing valid logical forms (has_logical_form
), for which we need to make the logical forms. The value of that metric is usually 1.0
, but its nice to have that additional check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should already be guaranteed by construction - we only allow actions that produce valid logical forms. I don't think it's worth slowing down training to test something that should have been tested in a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also get ExecutionErrors
if something was up with the action sequence, which is another place you could catch this. This would catch two kinds of issues, which are the only ones that we should run into, anyway: (1) cases where we were loose in our type system, but have stricter execution requirements, like around lists vs. single items; and (2) cases where the action sequence didn't terminate, but we're trying to evaluate it anyway (keeping unfinished sequences at the end of beam search).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to action sequences that got terminated early hence producing incomplete logical forms. Do you have an estimate of how much slow down this could cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but anything you do on the CPU is time that's not utilizing the GPU for training. And those action sequences will be caught with execution errors anyway, so there's no reason to go to a logical form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could just instruct the beam search to not include unfinished sequences, which is probably the right thing to do, anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, removing the has_logical_form
metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. one more thing. We do need to show the logical form in the demo, and to be able to convert the action sequence into a logical form, we need the corresponding world objects. I think this is the best place to make a logical form and store it in the output dict. Also, this method is only called during validation, so should not significantly slow down training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, fair enough, that's a good point. You could pass the world object in the output dictionary, and create the logical form in decode
, but it's probably fine to just put it here during validation.
@@ -163,17 +163,14 @@ def get_table_knowledge_graph(self) -> KnowledgeGraph: | |||
self._table_knowledge_graph = KnowledgeGraph(entities, dict(neighbors), entity_text) | |||
return self._table_knowledge_graph | |||
|
|||
# TODO (pradeep): Make a ``read_from_json`` method similar to what we had in ``TableQuestionKnowledgeGraph`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need that for a demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. Removing that TODO.
lines: List[List[str]]) -> Tuple[List[Dict[str, Dict[str, str]]], | ||
Dict[str, Set[str]]]: | ||
""" | ||
This method will be called only when we do not have tagged information from CoreNLP. That is, when we are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to just call corenlp's python API here instead of this? Or would that require a call to java, which we're trying to avoid? It probably would...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're missing out on a lot by not using CoreNLP here.
@@ -320,6 +320,21 @@ def get_agenda(self, | |||
agenda.append(f"Number -> {number}") | |||
return agenda | |||
|
|||
@staticmethod | |||
def is_instance_specific_entity(entity_name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems like we should have a function for this on the base class, but maybe it doesn't fit... I'll think about this some more.
@pdasigi would like to train a model with this code before merging this PR. |
146983f
to
457d8da
Compare
I spent a lot of time trying to reproduce results from the best model we got using the code in iterative-search-semparse (reported in this paper). I made some progress in getting there, but it's not fully done. I think the biggest discrepancy still remaining is that eneric types (like I may find other minor differences after I fix these two issues. That said, I think reproducing the result is beyond the scope of this PR. Given that this PR is already huge, and it has important changes (like removing jdk dependency), I've decided to merge this now and open another PR with changes that ensure we get the same result. Any objections @matt-gardner? |
cbfc8be
to
49cac30
Compare
Is the result close? |
@matt-gardner Update: I trained an MML model, followed by an ERM model initialized from it using this code. The dev accuracies are both only 1 point behind those from comparable models using the old code. I think that's close enough, and I'm going to merge this PR (after resolving conflicts). I'll spend some time running experiments to try out the things we discussed offline yesterday, and if they give better results, I'll open a new PR. |
49cac30
to
8a6a12a
Compare
…llenai#2764) * added dataset reader * moved dataset reader * moved models * removed unnecessary except * fixed the name of the model in docs * change trainer_test * minor fixes * removed unnecessary modules * removed unnecessary tests * removed table knowledge graph * removed table knowledge graph * updated knowledge graph field test with new context * fixed docs after removing table knowledge graph * got the predictor tests passing * fixed a bug in checking if data is tagged * minor fixes * addressed PR comments and added a test for reading untagged tables * fixed a silly doc issue * removed jdk dependency * more language changes * undid changes made for backwards compatibility * pylint and mypy fixes * better error raising
Fixes #2691
Most of the tests are passing locally. I'll merge this after they all pass and after I evaluate the trained models with the new code and get the same numbers. Also, I suppose there are a bunch of things that can now be deleted (type declarations, SEMPRE related stuff...). I'll add those deletions to this PR as well.