Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluation refactoring #23

Merged
merged 20 commits into from Jun 17, 2021
Merged

Conversation

zhangir-azerbayev
Copy link
Collaborator

Modified evaluation library to better align with style conventions.

One thing I can't figure out how to do is import SummModel into base_metric.py for type annotation purposes. Any help with this is appreciated.


def evaluate(self,
## TODO zhangir: figure out how to import SummModel
model,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to write is some version of model: SummModel, but I'm not sure how to import SummModel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So for this you can do from model.base_model import SummModel.

However, I don't think you should design the evaluate() interface in a way that depends on the SummModel. When designing classes, we typically minimize the coupling between classes unless it's absolutely necessary. Now think about changes, if we make changes to the SummModel in the future, it would affect all the evaluation metrics as well, which is not ideal.

My rule of thumb for doing this is to think about what the class is designed to do and nothing more. For evaluation metrics, it doesn't need to know how the models work, it only does the evaluation. So take the gold and prediction and compute the score, that's its functionality, nothing more.

If we need a general method that takes in the model and evaluation metric, it could be a static function that we provide outside of any class. But no matter the eval or model changes, we only need to change this standalone function, which is much easier for maintenance.


class SummMetric():
def __init__(self):
self.score_dict = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think score_dict should be an attribute of the object. An attribute should be something that is the property of the objects of this class. For example, think about height/age for humans.

There are also static variables for a class itself, for this, you can check out the base_model.py in model, things like is_query_based and is_multi_document are all class (not object) attributes, since they are shared across the class. For example, think about the number of eyes for humans.

In the case of SummMetric, I think things you could consider as attributes are like is_higher_better as class (static) attributes. I will leave object attribute (i.e., those with self.*) for you to think about :)

"""
raise NotImplementedError("the base class for metrics shouldn't be instantiated!")

def get_dict(self, keys: List[str]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on my comment on self.score_dict, this method is not necessary as well. They could all just be a part of evaluate()

Copy link
Collaborator

@niansong1996 niansong1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, we can discuss about those tmrw

@zhangir-azerbayev
Copy link
Collaborator Author

Hi Ansong,
I fixed the issues raised in your comments with a new commit. Let me know what you think. I've also added a new class called SummEvalMetric, which inherits from SummMetric and is specifically for metrics that use SummEval as a backend (as of right now, this is all of them) .

@niansong1996
Copy link
Collaborator

@zhangir-azerbayev Did you forget to check the summeval_metric.py into git? It seems to be missing from my end.

Also, there are some relative imports, which we should try to avoid.

Other than those, I think the class design makes sense to me and is much better than last time :)

@zhangir-azerbayev
Copy link
Collaborator Author

@niansong1996
Sorry, I forgot to check the summeval_metric.py, this is fixed now. I also got rid of the relative imports.


class BertScore(SummEvalMetric):
metric_name = 'bert score'
range = (0, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice design. I would also add a comment on top of the range variable whether it's inclusive or exclusive on the boundaries

Copy link
Collaborator

@niansong1996 niansong1996 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, generally very good work!

@niansong1996
Copy link
Collaborator

@zhangir-azerbayev I think one other thing that is missing is testing.

Can you follow what we have in tests and add a eval_test.py and add some testing, to make sure it works as expected?

@niansong1996
Copy link
Collaborator

The commits looks good so far, let me know when you resolved all my previous comments by requesting a review, then I can see if this can be merged into main, thanks!

@niansong1996
Copy link
Collaborator

@zhangir-azerbayev Where are we on this thread?

@zhangir-azerbayev
Copy link
Collaborator Author

@zhangir-azerbayev Where are we on this thread?

Hi @niansong1996. I added a commit with unit testing.

@niansong1996
Copy link
Collaborator

niansong1996 commented Jun 17, 2021

@zhangir-azerbayev Please resolve the comments I made above, also there is currently a conflict on demo.ipynb, have you made any important changes to that file?

@zhangir-azerbayev
Copy link
Collaborator Author

Should be ready to merge.

@niansong1996
Copy link
Collaborator

@zhangir-azerbayev Why is RougeWE still removed? I thought we fixed the loading issue?

@zhangir-azerbayev
Copy link
Collaborator Author

@niansong1996 good catch, I fixed it.

@niansong1996
Copy link
Collaborator

@zhangir-azerbayev LGTM, are all of the evaluation metrics passing the test? If so, I think this is ready to merge

@zhangir-azerbayev
Copy link
Collaborator Author

@niansong1996 Found a minor bug in the testing script. Tests now pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants