CDQA: F1-recall

Dimension: Generated Answer <-> GroundTruth Answer
Reference: Let LLMs Take on the Latest Challenges! A Chinese Dynamic Question Answering Benchmark
Type: Token-wise Accuracy

F1-recall measures the overlap between model-generated responses and ground truth, focusing on the model's ability to reproduce key elements from the reference.

Tokenization: Both the generated text and ground truth are segmented into token lists using word segmentation tools.
Calculation: Determine the ratio of tokens in the model's output that also appear in the ground truth token list.
Formula: F1-recall = (Number of common tokens) / (Total number of tokens in ground truth)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDQA_F1-recall.md

CDQA_F1-recall.md

CDQA: F1-recall

Files

CDQA_F1-recall.md

Latest commit

History

CDQA_F1-recall.md

File metadata and controls

CDQA: F1-recall