Add text comparison metrics to Ben's PR #632

ntlind · 2024-06-18T20:55:10Z

Improvements

Add ROUGE and BLEU metrics to evaluate_text_generation
Make a few minor suggested changes to the PR
Change back to python-slim from python-alpine since the latter doesn't seem to work with hugging face's dependencies (e.g., pyarrow). Rationale is listed below. Alternatively, we can try to stick with alpine and try manually installing pyarrow's dependencies.
- We've repeatedly run into dependency issues with python3.10-alpine; first it was scipy, now it's pyarrow. Switching back means shorter build times and fewer dependency limitations
- We originally switched to alpine because John said that "ubuntu/debian has historically been pretty bad at patching security issues.". This doesn't seem like an overriding concern, and one that matters less in a future world where we're decoupling from Chariot.
- python3.10-slim is a "Docker Official" image; it seems secure enough for our use.

api/tests/functional-tests/backend/metrics/test_text_generation.py

api/valor_api/backend/metrics/text_generation.py

ntlind added 7 commits June 18, 2024 14:54

adding rough examples

7c66f63

push notebook changes

e48825f

pass notebook tests

d2fe366

Merge branch 'add_llm_guided_metrics' into add_text_comparison

fdbfea1

delete notebook and add most functional and unit tests

2e2d69c

fix

26aab2e

Merge branch 'add_llm_guided_metrics' into add_text_comparison

d881034

ntlind self-assigned this Jun 21, 2024

ntlind added the improvement label Jun 21, 2024

fix linter error

112b21b

ntlind marked this pull request as ready for review June 21, 2024 07:12

ntlind requested review from czaloom and ekorman as code owners June 21, 2024 07:12

ntlind added 11 commits June 21, 2024 09:14

try bumping python versions

119d8d8

update numpy

f64820d

add new version pins

9a6f8e4

change .env name

cfb2fdb

another pip attempt

41bfee6

try changing metadata again

4d3b666

more pip attempts

dd9de5b

test moving away from alpine

9489f5c

fix bug

4d29874

fix bug

8af7c3d

Merge branch 'add_llm_guided_metrics' into add_text_comparison

317f088

bnativi reviewed Jun 21, 2024

View reviewed changes

ntlind added 4 commits June 21, 2024 13:42

incorporate feedback

ad1c8da

download tokenizer

f97c48a

swap to regex tokenizer

d404ec0

fix tests

99fb5c4

ntlind merged commit de9be2e into add_llm_guided_metrics Jun 21, 2024
10 of 11 checks passed

ntlind deleted the add_text_comparison branch June 21, 2024 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add text comparison metrics to Ben's PR #632

Add text comparison metrics to Ben's PR #632

ntlind commented Jun 18, 2024 •

edited

Loading

Add text comparison metrics to Ben's PR #632

Add text comparison metrics to Ben's PR #632

Conversation

ntlind commented Jun 18, 2024 • edited Loading

Improvements

ntlind commented Jun 18, 2024 •

edited

Loading