evaluation refactoring #23

zhangir-azerbayev · 2021-06-02T20:29:25Z

Modified evaluation library to better align with style conventions.

One thing I can't figure out how to do is import SummModel into base_metric.py for type annotation purposes. Any help with this is appreciated.

zhangir-azerbayev · 2021-06-02T22:05:43Z

evaluation/base_metric.py

+
+    def evaluate(self,
+                ## TODO zhangir: figure out how to import SummModel 
+                 model,


What I want to write is some version of model: SummModel, but I'm not sure how to import SummModel.

I see. So for this you can do from model.base_model import SummModel.

However, I don't think you should design the evaluate() interface in a way that depends on the SummModel. When designing classes, we typically minimize the coupling between classes unless it's absolutely necessary. Now think about changes, if we make changes to the SummModel in the future, it would affect all the evaluation metrics as well, which is not ideal.

My rule of thumb for doing this is to think about what the class is designed to do and nothing more. For evaluation metrics, it doesn't need to know how the models work, it only does the evaluation. So take the gold and prediction and compute the score, that's its functionality, nothing more.

If we need a general method that takes in the model and evaluation metric, it could be a static function that we provide outside of any class. But no matter the eval or model changes, we only need to change this standalone function, which is much easier for maintenance.

niansong1996 · 2021-06-03T01:40:44Z

evaluation/base_metric.py

+
+class SummMetric():
+    def __init__(self):
+        self.score_dict = {}


I don't think score_dict should be an attribute of the object. An attribute should be something that is the property of the objects of this class. For example, think about height/age for humans.

There are also static variables for a class itself, for this, you can check out the base_model.py in model, things like is_query_based and is_multi_document are all class (not object) attributes, since they are shared across the class. For example, think about the number of eyes for humans.

In the case of SummMetric, I think things you could consider as attributes are like is_higher_better as class (static) attributes. I will leave object attribute (i.e., those with self.*) for you to think about :)

niansong1996 · 2021-06-03T01:44:16Z

evaluation/base_metric.py

+        """
+        raise NotImplementedError("the base class for metrics shouldn't be instantiated!")
+
+    def get_dict(self, keys: List[str]):


Following up on my comment on self.score_dict, this method is not necessary as well. They could all just be a part of evaluate()

niansong1996

Left some comments, we can discuss about those tmrw

zhangir-azerbayev · 2021-06-07T16:51:24Z

Hi Ansong,
I fixed the issues raised in your comments with a new commit. Let me know what you think. I've also added a new class called SummEvalMetric, which inherits from SummMetric and is specifically for metrics that use SummEval as a backend (as of right now, this is all of them) .

niansong1996 · 2021-06-07T21:12:45Z

@zhangir-azerbayev Did you forget to check the summeval_metric.py into git? It seems to be missing from my end.

Also, there are some relative imports, which we should try to avoid.

Other than those, I think the class design makes sense to me and is much better than last time :)

zhangir-azerbayev · 2021-06-07T23:36:11Z

@niansong1996
Sorry, I forgot to check the summeval_metric.py, this is fixed now. I also got rid of the relative imports.

evaluation/summeval_metric.py

evaluation/base_metric.py

niansong1996 · 2021-06-08T00:21:01Z

evaluation/bertscore_metric.py

+
+class BertScore(SummEvalMetric):
+    metric_name = 'bert score'
+    range = (0, 1)


This is a nice design. I would also add a comment on top of the range variable whether it's inclusive or exclusive on the boundaries

niansong1996

I left some comments, generally very good work!

niansong1996 · 2021-06-08T00:23:46Z

@zhangir-azerbayev I think one other thing that is missing is testing.

Can you follow what we have in tests and add a eval_test.py and add some testing, to make sure it works as expected?

niansong1996 · 2021-06-09T22:15:57Z

The commits looks good so far, let me know when you resolved all my previous comments by requesting a review, then I can see if this can be merged into main, thanks!

niansong1996 · 2021-06-16T19:54:54Z

@zhangir-azerbayev Where are we on this thread?

zhangir-azerbayev · 2021-06-16T21:06:49Z

@zhangir-azerbayev Where are we on this thread?

Hi @niansong1996. I added a commit with unit testing.

evaluation/bleu_metric.py

tests/evaluation_test.py

niansong1996 · 2021-06-17T01:12:28Z

@zhangir-azerbayev Please resolve the comments I made above, also there is currently a conflict on demo.ipynb, have you made any important changes to that file?

zhangir-azerbayev · 2021-06-17T20:27:53Z

Should be ready to merge.

niansong1996 · 2021-06-17T20:30:12Z

@zhangir-azerbayev Why is RougeWE still removed? I thought we fixed the loading issue?

zhangir-azerbayev · 2021-06-17T20:33:30Z

@niansong1996 good catch, I fixed it.

niansong1996 · 2021-06-17T20:35:24Z

@zhangir-azerbayev LGTM, are all of the evaluation metrics passing the test? If so, I think this is ready to merge

zhangir-azerbayev · 2021-06-17T20:45:47Z

@niansong1996 Found a minor bug in the testing script. Tests now pass.

zhangir-azerbayev added 3 commits June 2, 2021 16:07

refactored evaluation

c7f5950

updated demo

1c3de44

updated demo

e4a9690

zhangir-azerbayev requested a review from niansong1996 June 2, 2021 20:29

zhangir-azerbayev commented Jun 2, 2021

View reviewed changes

niansong1996 reviewed Jun 3, 2021

View reviewed changes

evaluation refactoring

c6c19a4

zhangir-azerbayev requested a review from niansong1996 June 7, 2021 16:51

zhangir-azerbayev added 3 commits June 7, 2021 18:10

added missing summeval_metric.py

8613990

removed relative imports

8247d60

updated demo (see new issue)

74d046f

niansong1996 reviewed Jun 8, 2021

View reviewed changes

evaluation/summeval_metric.py Show resolved Hide resolved

niansong1996 reviewed Jun 8, 2021

View reviewed changes

evaluation/base_metric.py Show resolved Hide resolved

niansong1996 reviewed Jun 8, 2021

View reviewed changes

zhangir-azerbayev added 3 commits June 9, 2021 12:09

added missing type annotations

2ef0924

documentation formatting

00a24ed

fixed missing Dict import

b70724b

added unit testing

66ec1d7

niansong1996 reviewed Jun 16, 2021

View reviewed changes

evaluation/bleu_metric.py Outdated Show resolved Hide resolved

niansong1996 reviewed Jun 16, 2021

View reviewed changes

evaluation/bleu_metric.py Outdated Show resolved Hide resolved

niansong1996 reviewed Jun 16, 2021

View reviewed changes

tests/evaluation_test.py Show resolved Hide resolved

zhangir-azerbayev added 7 commits June 17, 2021 15:00

changed class variable to

4876756

fixed bleu normalization

38991bc

fixed bug in evaluation test

bb0ea46

updated demo

298a086

fixed demo

d0cb520

more demo fixes

af2189d

even more demo fixes

0870d08

added back rougewe

2acee36

missings equals sign

4c2d4cb

niansong1996 merged commit b368402 into main Jun 17, 2021

zhangir-azerbayev mentioned this pull request Jun 17, 2021

ROUGE-WE takes too long to load #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation refactoring #23

evaluation refactoring #23

zhangir-azerbayev commented Jun 2, 2021

zhangir-azerbayev Jun 2, 2021 •

edited

niansong1996 Jun 3, 2021

niansong1996 Jun 3, 2021

niansong1996 Jun 3, 2021

niansong1996 left a comment

zhangir-azerbayev commented Jun 7, 2021

niansong1996 commented Jun 7, 2021

zhangir-azerbayev commented Jun 7, 2021

niansong1996 Jun 8, 2021

niansong1996 left a comment

niansong1996 commented Jun 8, 2021

niansong1996 commented Jun 9, 2021

niansong1996 commented Jun 16, 2021

zhangir-azerbayev commented Jun 16, 2021

niansong1996 commented Jun 17, 2021 •

edited

zhangir-azerbayev commented Jun 17, 2021

niansong1996 commented Jun 17, 2021

zhangir-azerbayev commented Jun 17, 2021

niansong1996 commented Jun 17, 2021

zhangir-azerbayev commented Jun 17, 2021

evaluation refactoring #23

evaluation refactoring #23

Conversation

zhangir-azerbayev commented Jun 2, 2021

zhangir-azerbayev Jun 2, 2021 • edited

Choose a reason for hiding this comment

niansong1996 Jun 3, 2021

Choose a reason for hiding this comment

niansong1996 Jun 3, 2021

Choose a reason for hiding this comment

niansong1996 Jun 3, 2021

Choose a reason for hiding this comment

niansong1996 left a comment

Choose a reason for hiding this comment

zhangir-azerbayev commented Jun 7, 2021

niansong1996 commented Jun 7, 2021

zhangir-azerbayev commented Jun 7, 2021

niansong1996 Jun 8, 2021

Choose a reason for hiding this comment

niansong1996 left a comment

Choose a reason for hiding this comment

niansong1996 commented Jun 8, 2021

niansong1996 commented Jun 9, 2021

niansong1996 commented Jun 16, 2021

zhangir-azerbayev commented Jun 16, 2021

niansong1996 commented Jun 17, 2021 • edited

zhangir-azerbayev commented Jun 17, 2021

niansong1996 commented Jun 17, 2021

zhangir-azerbayev commented Jun 17, 2021

niansong1996 commented Jun 17, 2021

zhangir-azerbayev commented Jun 17, 2021

zhangir-azerbayev Jun 2, 2021 •

edited

niansong1996 commented Jun 17, 2021 •

edited