New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
evaluation refactoring #23
Conversation
evaluation/base_metric.py
Outdated
|
||
def evaluate(self, | ||
## TODO zhangir: figure out how to import SummModel | ||
model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want to write is some version of model: SummModel
, but I'm not sure how to import SummModel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So for this you can do from model.base_model import SummModel
.
However, I don't think you should design the evaluate()
interface in a way that depends on the SummModel
. When designing classes, we typically minimize the coupling between classes unless it's absolutely necessary. Now think about changes, if we make changes to the SummModel
in the future, it would affect all the evaluation metrics as well, which is not ideal.
My rule of thumb for doing this is to think about what the class is designed to do and nothing more. For evaluation metrics, it doesn't need to know how the models work, it only does the evaluation. So take the gold and prediction and compute the score, that's its functionality, nothing more.
If we need a general method that takes in the model and evaluation metric, it could be a static function that we provide outside of any class. But no matter the eval or model changes, we only need to change this standalone function, which is much easier for maintenance.
evaluation/base_metric.py
Outdated
|
||
class SummMetric(): | ||
def __init__(self): | ||
self.score_dict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think score_dict
should be an attribute of the object. An attribute should be something that is the property of the objects of this class. For example, think about height/age for humans.
There are also static variables for a class itself, for this, you can check out the base_model.py
in model
, things like is_query_based
and is_multi_document
are all class (not object) attributes, since they are shared across the class. For example, think about the number of eyes for humans.
In the case of SummMetric
, I think things you could consider as attributes are like is_higher_better
as class (static) attributes. I will leave object attribute (i.e., those with self.*
) for you to think about :)
evaluation/base_metric.py
Outdated
""" | ||
raise NotImplementedError("the base class for metrics shouldn't be instantiated!") | ||
|
||
def get_dict(self, keys: List[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on my comment on self.score_dict
, this method is not necessary as well. They could all just be a part of evaluate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, we can discuss about those tmrw
Hi Ansong, |
@zhangir-azerbayev Did you forget to check the Also, there are some relative imports, which we should try to avoid. Other than those, I think the class design makes sense to me and is much better than last time :) |
@niansong1996 |
|
||
class BertScore(SummEvalMetric): | ||
metric_name = 'bert score' | ||
range = (0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice design. I would also add a comment on top of the range
variable whether it's inclusive or exclusive on the boundaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, generally very good work!
@zhangir-azerbayev I think one other thing that is missing is testing. Can you follow what we have in |
The commits looks good so far, let me know when you resolved all my previous comments by requesting a review, then I can see if this can be merged into main, thanks! |
@zhangir-azerbayev Where are we on this thread? |
Hi @niansong1996. I added a commit with unit testing. |
@zhangir-azerbayev Please resolve the comments I made above, also there is currently a conflict on |
Should be ready to merge. |
@zhangir-azerbayev Why is RougeWE still removed? I thought we fixed the loading issue? |
@niansong1996 good catch, I fixed it. |
@zhangir-azerbayev LGTM, are all of the evaluation metrics passing the test? If so, I think this is ready to merge |
@niansong1996 Found a minor bug in the testing script. Tests now pass. |
Modified evaluation library to better align with style conventions.
One thing I can't figure out how to do is import SummModel into base_metric.py for type annotation purposes. Any help with this is appreciated.