text2text model evaluation not working #41

Khalid-Usman · 2021-07-27T15:58:45Z

Description

Model evaluation is not working properly to output the precision and recall

How to Reproduce?

I run the following line of code,

python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt --text-item-path ./output-labels.txt

where,
--pred-path is the path of file produced during model prediction,
--truth-path is the path of test file, e.g. Out1, Out2, Out3 \t cheap door
Out1, Out2 and Out3 are the line number in the the following output file
--text-item-path ./output-labels.txt

What have you tried to solve it?

Error message or code output

Traceback (most recent call last):
  File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/khalid/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 130, in <module>
    do_evaluation(args)
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/pecos/apps/text2text/evaluate.py", line 119, in do_evaluation
    Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/compressed.py", line 55, in __init__
    dtype=dtype))
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 196, in __init__
    self._check()
  File "/home/khalid/PECOS/pecos_venv/lib/python3.7/site-packages/scipy/sparse/coo.py", line 285, in _check
    raise ValueError('column index exceeds matrix dimensions')
ValueError: column index exceeds matrix dimensions

Environment

Operating system:
Python version:
PECOS version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

The text was updated successfully, but these errors were encountered:

OctoberChang · 2021-07-27T17:51:57Z

@Khalid-Usman , from the error message, it seems like the label_id in your ./test.txt is out of range, which is defined by your provided ./output-labels.txt.

Can you verify the label_ids in the ./test.txt is within the range of ./output-labels.txt?

Khalid-Usman · 2021-07-27T17:54:39Z

I don't think so , but let me double check ...

Also instead of this i used the following command and get precision / Recall. Is that correct ?

python3 -m pecos.apps.text2text.evaluate --pred-path ./test-prediction.txt --truth-path ./test.txt

OctoberChang · 2021-07-27T18:38:22Z

If --text-item-path is not provided, the evaluation script assume your groun truth file ./test.txt has format 1 (https://github.com/amzn/pecos/blob/mainline/pecos/apps/text2text/evaluate.py#L43). If the label IDs in './test.txt' and './train.txt' are aligned, then should be fine.

Khalid-Usman · 2021-07-28T12:29:57Z

@OctoberChang I verified , there were few labels_ids in ./test.txt which are greater than the maximum index value of ./output-labels.txt file. So, I removed those labels_ids from ./test.txt file and still getting the same error.

Khalid-Usman · 2021-07-28T13:57:23Z

@OctoberChang there is something wrong in the evaluation code.

I tried to debug and for ground-truth items, I found error in the following line "

column index exceed matrix dimensions

".

Y_true = smat.csr_matrix((val_t, (row_id_t, col_id_t)), shape=(num_samples_t, len(item_dict)))

I printed each variable and got,

len(item_dict) = 2580153
num_samples_t = 132715
len(val_t) = 273316
len(row_id_t) = 273316
len(col_id_t) = 273316

Moreover,

num_samples_t = num_samples_p

OctoberChang · 2021-07-28T17:04:06Z

@Khalid-Usman , you should print the max(col_id_t) and check if max(col_id_t) < len(item_dict). If not, then this means that your label_id still out-of-scope from the pre-defined label set, as specified in your ./output-labels.txt.

Khalid-Usman · 2021-07-28T20:11:16Z

@OctoberChang , yes max(col_id_t) < len(item_dict) but i found max index in ./output-labels.txt file and removed all rows from ./test.txt containing indices greater than max index of ./output-labels.txt.

So I don't think so, there exist any label_id that has no corresponding line number in ./output-labels.txt.

Thanks, please verify that.

OctoberChang · 2021-07-28T20:20:22Z

@Khalid-Usman , can you try just evaluating the first line of your prediction ./test-prediction.txt against the first line of the ground truth ./test.txt given the pre-defined label set ./output-labels.txt?

If this still not working, you can share the first line of those two files as well as the output-labels.txt with me, and i can take a look perhaps.

Khalid-Usman · 2021-07-29T12:17:08Z

@OctoberChang what is the score in ./test-prediction.txt e.g. for my query i got 20 related documents with scores sorted in descending w.r.t. scores. So these 20 retrieval documents are actually precision for top-k (top-20) ? Or this score has nothing to do with precision and we have to calculate ourselves by matching these documents with the ground truth ?

OctoberChang · 2021-07-29T18:50:10Z

the prediction score matrix (i.e., Y_pred) from text2text models are not related to the ground truth labels.
In other word, to get precision@k (e.g., a function of Y_true and Y_pred), you also need to match Y_pred with Y_true.

simonhughes22 · 2021-07-30T18:11:05Z

I found the recall numbers output by that function to be wildly different from what i was expecting. So i did the calculation myself and got much higher numbers. If a query has 4 labels, and if the top result is in the recall set, for recall@1 do you compute that as 1.0 or 0.25? It could also be that i am having a similar label misalignment issue.

OctoberChang · 2021-07-30T20:28:32Z

@simonhughes22 , In your example, the Recall@1 should be 0.25 while the Prec@1 is 1.0. See the definition of Prec@k and Recall@k in http://manikvarma.org/downloads/XC/XMLRepository.html#metrics

OctoberChang · 2021-08-04T19:03:03Z

Closing this issue. Feel free to reopen if you still have any questions related to the text2text evaluation module.

Khalid-Usman added the bug Something isn't working label Jul 27, 2021

OctoberChang closed this as completed Aug 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text2text model evaluation not working #41

text2text model evaluation not working #41

Khalid-Usman commented Jul 27, 2021

OctoberChang commented Jul 27, 2021

Khalid-Usman commented Jul 27, 2021

OctoberChang commented Jul 27, 2021

Khalid-Usman commented Jul 28, 2021

Khalid-Usman commented Jul 28, 2021 •

edited

OctoberChang commented Jul 28, 2021

Khalid-Usman commented Jul 28, 2021

OctoberChang commented Jul 28, 2021

Khalid-Usman commented Jul 29, 2021 •

edited

OctoberChang commented Jul 29, 2021

simonhughes22 commented Jul 30, 2021 •

edited

OctoberChang commented Jul 30, 2021 •

edited

OctoberChang commented Aug 4, 2021

text2text model evaluation not working #41

text2text model evaluation not working #41

Comments

Khalid-Usman commented Jul 27, 2021

Description

How to Reproduce?

What have you tried to solve it?

Error message or code output

Environment

OctoberChang commented Jul 27, 2021

Khalid-Usman commented Jul 27, 2021

OctoberChang commented Jul 27, 2021

Khalid-Usman commented Jul 28, 2021

Khalid-Usman commented Jul 28, 2021 • edited

OctoberChang commented Jul 28, 2021

Khalid-Usman commented Jul 28, 2021

OctoberChang commented Jul 28, 2021

Khalid-Usman commented Jul 29, 2021 • edited

OctoberChang commented Jul 29, 2021

simonhughes22 commented Jul 30, 2021 • edited

OctoberChang commented Jul 30, 2021 • edited

OctoberChang commented Aug 4, 2021

Khalid-Usman commented Jul 28, 2021 •

edited

Khalid-Usman commented Jul 29, 2021 •

edited

simonhughes22 commented Jul 30, 2021 •

edited

OctoberChang commented Jul 30, 2021 •

edited