-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add plugin operators #574
Add plugin operators #574
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #574 +/- ##
==========================================
+ Coverage 87.54% 87.66% +0.12%
==========================================
Files 83 84 +1
Lines 7104 7143 +39
==========================================
+ Hits 6219 6262 +43
+ Misses 885 881 -4 ☔ View full report in Codecov by Sentry. |
If I understand there are two changes here.
The Plugin operator is a bit tricky to understand (even the example is). |
|
||
default_operator (List[str]): A list of default operators to be used if no operators are found in the instance. | ||
|
||
Example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is example is really not clear to me. Why is "processors.to_string" added to field "c"?
@@ -18,7 +20,13 @@ | |||
AddFields({"context_type": "table"}), | |||
TruncateTableCells(max_length=15, table="table", text_output="answer"), | |||
TruncateTableRows(field="table", rows_to_keep=50), | |||
SerializeTableAsIndexedRowMajor(field_to_field=[["table", "context"]]), | |||
PlugInOperator( | |||
field="table_serializer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call it "CustomizableOperator" because it's more directly describes what you can do with it.
Then I would change "field" to "overide_operator_field" so it will be more clear what this field means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that people can replace the default operator by any other operator, so it can dramatically change the semantic of the card.
) | ||
stream = recipe() | ||
|
||
for instance in stream["train"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is tested in this example? We add a field but don't check or use it.
demo_format="User:{source}\nAgent:{target}\n\n", | ||
model_input_format="{instruction}\n\n{demos}User:{source}\nAgent:", | ||
), | ||
additional_field="test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whta if people will start adding fields that will not be compatible with different changes.
Perhaps, we will require a prefix like 'custom_field_override_serializer".
A naïve question.. : Since the card is a json file, a simpler, and perhaps better understood way (addressing @yoavkatz concern) would be to first load that json file into a json object, then change that json object, and use the edited version as the input to the recipe? |
fec24aa
to
fce0293
Compare
…guments Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
Signed-off-by: Elron Bandel <elron.bandel@ibm.com>
fce0293
to
05fbded
Compare
This is a proposal for a new behaviour that allows to change card operators from the final command such that:
load_dataset("card=cards.wikitq,table_serializer=serializers.table.markdown")
will be loading the wikitq with different table serializer, markdown serializer in this case.