Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline's YAML: syntax validation #2226

Merged
merged 177 commits into from
Mar 15, 2022
Merged

Pipeline's YAML: syntax validation #2226

merged 177 commits into from
Mar 15, 2022

Conversation

ZanSara
Copy link
Contributor

@ZanSara ZanSara commented Feb 21, 2022

Proposed changes:

  • Add BasePipeline.validate_config()
  • Add BasePipeline.validate_yaml()
  • Add some custom exception classes: HaystackError, PipelineError, PipelineValidationError, etc... (see haystack/errors.py)

Status:

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

Closes #1981
Closes #2252
Closes #2246
Closes #600

@ZanSara ZanSara marked this pull request as ready for review February 24, 2022 14:36
@ZanSara ZanSara marked this pull request as draft February 24, 2022 14:36
Copy link
Member

@tstadel tstadel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A huge leap forward!
@julian-risch please check if my last finding regarding distillation does not hurt too much. Otherwise we're good to go.
@ZanSara please remove the commented out setconfig calls and change _component_configuration to _component_config before merging or in another PR.

haystack/nodes/base.py Outdated Show resolved Hide resolved
haystack/document_stores/deepsetcloud.py Outdated Show resolved Hide resolved
haystack/document_stores/graphdb.py Outdated Show resolved Hide resolved
haystack/document_stores/memory.py Outdated Show resolved Hide resolved
haystack/document_stores/sql.py Outdated Show resolved Hide resolved
haystack/nodes/retriever/sparse.py Outdated Show resolved Hide resolved
haystack/nodes/retriever/text2sparql.py Outdated Show resolved Hide resolved
haystack/nodes/summarizer/transformers.py Outdated Show resolved Hide resolved
haystack/nodes/translator/transformers.py Outdated Show resolved Hide resolved
@@ -884,7 +884,7 @@ def _get_checksum(self):
"max_seq_len": self.processor.max_seq_len,
"dev_split": self.processor.dev_split,
"tasks": self.processor.tasks,
"teacher_name_or_path": self.teacher.pipeline_config["params"]["model_name_or_path"],
"teacher_name_or_path": self.teacher.name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the same? I can't find the name field be set anywhere outside a Pipeline. In test_distillation.py it always seems to be None. @julian-risch could you take a quick look on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not the same. self.teacher is a FARMReader, which inherits name from BaseComponent. While pipeline_config is also inherited from BaseComponent, its model_name_or_path has a different meaning than name. name is typically Reader, whereas model_name_or_path is typically deepset/roberta-base-squad2 but it could also be a local file path. @ZanSara I believe this change needs to be reverted.

@ZanSara ZanSara merged commit 11cf94a into master Mar 15, 2022
@ZanSara ZanSara deleted the yaml_validation branch March 15, 2022 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants