Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text generation task type #569

Merged
merged 162 commits into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
1ad92be
copy of Nicks draft for llm guided metric integration tests
Apr 19, 2024
e5cc8ad
updates to llm guided integration test
Apr 22, 2024
73b09d4
Merge branch 'main' into add_llm_guided_metrics
Apr 22, 2024
f03fba5
typo in llm guided integration test
Apr 22, 2024
283eb6e
LLM_EVALUATION task type, evaluate_llm_output(), first draft of test_…
Apr 23, 2024
0cc5f11
Merge branch 'main' into add_llm_guided_metrics
Apr 23, 2024
0615a8b
Merge branch 'main' into add_llm_guided_metrics
Apr 24, 2024
660ee71
llm_url and llm_api_key added in multiple places in client, api and i…
Apr 24, 2024
36a9a17
finalize integration test data llm_evaluation
Apr 25, 2024
d7b53b4
Merge branch 'main' into add_llm_guided_metrics
Apr 25, 2024
d2f2ebe
llm_evaluation functional tests
Apr 26, 2024
c054b47
all llm evaluation metrics added
Apr 26, 2024
0ac0f4c
commit with all tests except new tests passing
Apr 29, 2024
af90c95
Merge branch 'main' into add_llm_guided_metrics
Apr 29, 2024
aa9c007
first draft of coherence working
May 3, 2024
e1ae78c
Merge branch 'main' into add_llm_guided_metrics
May 3, 2024
b0657d1
Merge branch 'main' into add_llm_guided_metrics
May 6, 2024
631f3f9
updated metric specifications to metrics_to_return
May 6, 2024
a761d6c
add OpenAIClient class
May 6, 2024
b041eb1
fix on client side to match with metrics_to_return in EvaluationParam…
May 6, 2024
7b3d62e
fix type error
May 6, 2024
99d880b
add mocking for coherence
May 6, 2024
52acfa4
add test for openai api connection
May 8, 2024
36eb7dd
remove Metric from metric names, improved error handling in coherence
May 8, 2024
15c4030
mock openai api connection
May 8, 2024
c9a29cb
Merge branch 'main' into add_llm_guided_metrics
May 8, 2024
72aaa7e
use grouper_id = label.key for llm evaluation
May 8, 2024
d31a26c
change CoherenceMetric from using label_key to using label
May 8, 2024
cbe79ec
Merge branch 'main' into add_llm_guided_metrics
May 8, 2024
ddcb7e3
Merge branch 'main' into add_llm_guided_metrics
May 10, 2024
ffef625
text generation examples
May 15, 2024
5a4fee8
Merge branch 'main' into add_llm_guided_metrics
May 15, 2024
95b25dc
rename misnamed qa folder to text-generation
May 15, 2024
544b401
remove .txt from example
May 15, 2024
8a3a159
add all three use case examples for text generation
May 16, 2024
061065c
update text generation examples
May 17, 2024
63d10a8
Merge branch 'main' into add_llm_guided_metrics
May 17, 2024
c7f923a
rename llm_evaluation to text_generation
May 17, 2024
92abc09
update datum and annotation to work with text gen, functional tests p…
May 22, 2024
c927e3f
add client side unit tests for Annotation and Datum
May 22, 2024
9224ce7
fix github checks error for Context
May 22, 2024
0e701f4
attempt number two to fix github checks issue with Context
May 22, 2024
168867f
remove unused create_datums_with_dataset_names() method
May 22, 2024
092bf7a
rename QAG to Summarization metric
May 22, 2024
59fca69
add LLMClient parent class
May 23, 2024
45118d3
add MistralValorClient
May 23, 2024
9ea2b82
add mistralai client dependency
May 23, 2024
de9911b
add support for mistral client in valor_api text_generation.py
May 24, 2024
8fd9ada
fix type error for EvaluationParameters.llm_api_params
May 24, 2024
db85df5
temporarily comment out client text gen test until ready
May 29, 2024
c57a464
Merge branch 'main' into add_llm_guided_metrics
ntlind Jun 13, 2024
ae39f0a
fix all tests except test_text_generation.py
ntlind Jun 13, 2024
b1f8405
Merge branch 'main' into add_llm_guided_metrics
ntlind Jun 14, 2024
191cee2
Merge branch 'main' into add_llm_guided_metrics
ntlind Jun 17, 2024
bd4f92a
fix text generation functional tests
Jun 17, 2024
70ecee7
add metric_params to EvaluationParameters in api and client
Jun 17, 2024
3616e5e
Merge branch 'main' into add_llm_guided_metrics
Jun 17, 2024
de34b88
Merge branch 'main' into add_llm_guided_metrics
ntlind Jun 18, 2024
334442b
text gen metric names added to client
Jun 18, 2024
b4f7fb5
Merge remote-tracking branch 'refs/remotes/origin/add_llm_guided_metr…
Jun 18, 2024
6c11315
fix Filter issues in text generation
Jun 18, 2024
118dafe
fix filter issue for text generation in coretypes
Jun 18, 2024
091df90
remove label_id from db_mapping of text gen metrics
Jun 18, 2024
53d51f4
isort pre-commit error in evaluation.py
Jun 18, 2024
dd476c1
text gen integration test passing
Jun 18, 2024
9ee7c50
mock client integration test for text generation
Jun 20, 2024
dd7188a
cleanup text generation code
Jun 21, 2024
34520b3
add minor PR improvements
Jun 21, 2024
de9be2e
Add text comparison metrics to Ben's PR (#632)
ntlind Jun 21, 2024
aa19435
remove a few todos in llm_call.py
Jun 21, 2024
cbcbcc8
Merge branch 'add_llm_guided_metrics' of github.com:Striveworks/valor…
Jun 21, 2024
9bef346
move llm_client and fix pre-commit config and import issue
ntlind Jun 25, 2024
543bea8
part 1 address Nicks comments
Jun 25, 2024
70b44fa
rename llm_call to llm_clients
Jun 25, 2024
faee691
Increase code coverage to 90% (#633)
ntlind Jun 25, 2024
91b2f6a
add answer relevance metric to api
Jun 26, 2024
155d343
add answer relevance to integration test
Jun 26, 2024
a17d78d
Merge branch 'main' into add_llm_guided_metrics
ntlind Jun 26, 2024
7c32c64
additional test coverage for answer relevance metric
Jun 26, 2024
2b87c9d
Merge branch 'add_llm_guided_metrics' of github.com:Striveworks/valor…
Jun 26, 2024
eb8d171
test mistral and invalid clients in text_generation.py
Jun 26, 2024
58167d3
remove unimplemented metrics
Jun 26, 2024
93aed41
Add docs (#634)
ntlind Jun 26, 2024
5f3adbe
cleanup of text_generation.py, validation for AnswerRelevance and Coh…
Jun 26, 2024
58c13cf
partial support for BLEU and ROUGE metrics in client
Jun 26, 2024
efd4be5
cut down example notebook for text generation
Jun 26, 2024
ca5c273
test BLEU and ROUGE metrics in client, test invalid api key
Jun 26, 2024
3a6e20b
Coherence and AnswerRelevance added to the docs
Jun 26, 2024
0d0ec90
fix client.connect() test for llm guided metrics
Jun 26, 2024
e9b0655
Merge branch 'main' into add_llm_guided_metrics
Jun 27, 2024
7b99188
VALOR_VERSION -> VERSION
ntlind Jun 27, 2024
da79792
cleanup for answer relevance llm response error handling, testing for…
Jun 28, 2024
4c251a7
Merge branch 'add_llm_guided_metrics' of github.com:Striveworks/valor…
Jun 28, 2024
c372803
unit tests for AnswerRelevance and Coherence
Jun 28, 2024
d7d6366
MockLLMClient added and used in integration tests, BLEU and ROUGE add…
Jul 1, 2024
52162e5
remove EvaluationParameters.metric_params
Jul 1, 2024
966c044
add ROUGE parameters
Jul 1, 2024
7e4fb48
remove asserts from text gen production code
Jul 1, 2024
30b5dde
remove Text and Context
Jul 1, 2024
ffa66c8
fix pre-commit issue in test_text_generation.py
Jul 1, 2024
a39fc91
add integration tests for openai and mistral external api calls
Jul 1, 2024
b95f1cf
testing external tests 1
Jul 2, 2024
51f8a16
testing external tests 2
Jul 2, 2024
6e8b001
testing external tests 3
Jul 2, 2024
47d4594
testing external tests 3
Jul 2, 2024
913b4e8
testing external tests 5
Jul 2, 2024
40a4690
testing external tests failing
Jul 2, 2024
c7eb936
secrets in tests-and-coverage.yml
Jul 2, 2024
5cbfd7a
add manual api key specification in llm_api_params
Jul 2, 2024
e9ccbad
improve code coverage on valor_api text_generation.py
Jul 2, 2024
1a384fb
test that external integrations tests dont run on other branch pushes
Jul 2, 2024
95864e9
external integration tests fail when no OPENAI_API_KEY is set
Jul 2, 2024
fc06423
the normal integration tests cant access the external API keys
Jul 2, 2024
a7e9742
rename files, integration tests should pass
Jul 2, 2024
ac5ae43
code coverage and cleanup
Jul 2, 2024
5206e15
Merge branch 'main' into add_llm_guided_metrics
Jul 3, 2024
ee24191
improved validation for llm messages
Jul 4, 2024
3451558
merge with main
Jul 4, 2024
d1f0c05
Merge branch 'main' into add_llm_guided_metrics
Jul 5, 2024
bfaa830
metric_params for evaluate_text_generation() on client side, other co…
Jul 5, 2024
696dcbb
Merge branch 'main' into add_llm_guided_metrics
Jul 5, 2024
903bf1e
fix in test_groundtruth.py
Jul 5, 2024
b8b9864
update metric docs
Jul 5, 2024
e9b088c
add Context class for Annotation.context
Jul 8, 2024
0498612
Merge branch 'main' into add_llm_guided_metrics
Jul 8, 2024
8232bbd
example notebook working for RAG and summarization
Jul 8, 2024
096b1fe
merge with main
Jul 8, 2024
aec4b63
add ROUGEType to enums, cleanup tests, address some comments
Jul 9, 2024
4461286
Merge branch 'main' into add_llm_guided_metrics
Jul 9, 2024
28e08b7
addressing comments
Jul 9, 2024
3f99c62
fix outdated code in coretypes.py where datasets was optional
Jul 9, 2024
420b244
fix tex gen metrics didnt work if no groundtruth labels
Jul 9, 2024
530cca1
text gen example notebook working
Jul 9, 2024
2579971
update external test mistral model to latest
Jul 10, 2024
637ac23
merge with main
Jul 10, 2024
f344d77
address comments with Nick
Jul 10, 2024
67f27d6
Merge branch 'main' into add_llm_guided_metrics
Jul 10, 2024
f3cabc6
improvements to text gen integration tests
Jul 10, 2024
ba5eb41
improvements to example notebook, test_text_generation.py
Jul 10, 2024
7cdaa37
fix filters for text generation
Jul 11, 2024
0375bae
merge with main, datum filtering is no longer functional for text gen…
Jul 12, 2024
e7d2179
add datum_filtering back into text generation metrics
Jul 12, 2024
d87c05e
try removing slim for alpine
Jul 15, 2024
e7823b0
switch back to slim
Jul 15, 2024
1acda02
revert valor_api/backend/core/evaluation.py
Jul 16, 2024
53b87fc
comment out all text gen tests
Jul 16, 2024
4014a62
revert valor_api/backend/models.py and migrations
Jul 16, 2024
02cbd52
Revert "revert valor_api/backend/models.py and migrations"
Jul 18, 2024
bbe0458
Revert "revert valor_api/backend/core/evaluation.py"
Jul 18, 2024
e13a1fd
set object detection benchmark timeout at 60
Jul 18, 2024
88e8e0c
uncomment text gen tests
Jul 18, 2024
e601217
Merge branch 'main' into add_llm_guided_metrics
Jul 18, 2024
be74b2a
Add more llm guided metrics (#677)
bnativi Jul 25, 2024
03ddb6b
merge with main
Jul 25, 2024
fcc277b
set object detection benchmark timeout back to the same time as main
Jul 25, 2024
5527262
reverted filter preparation
czaloom Jul 25, 2024
4d3e1c3
Merge branch 'add_llm_guided_metrics' of github.com:Striveworks/valor…
czaloom Jul 25, 2024
6ea9d01
code coverage
Jul 26, 2024
67442ca
Merge branch 'text_comparison_llm_guided_metrics' into add_llm_guided…
Jul 26, 2024
3755cac
revert removal of _format_context()
Jul 26, 2024
0baa229
Revert "Merge branch 'add_llm_guided_metrics' of github.com:Strivewor…
Jul 26, 2024
efbbe20
revert to last stable branch
Jul 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .github/workflows/tests-and-coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,16 @@ jobs:
- name: install client
run: pip install -e ".[test]"
working-directory: ./client
- run: coverage run --source="api/valor_api,client/valor" -m pytest -v integration_tests/client/*
- name: run integration tests
run: coverage run --source="api/valor_api,client/valor" -m pytest -v integration_tests/client/*
- name: run external integration tests
run: |
if ${{ github.ref == 'refs/heads/main' }}; then
coverage run --source="api/valor_api,client/valor" -m pytest -v integration_tests/external/*
fi
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
- run: coverage report
- name: upload coverage report as artifact
uses: actions/upload-artifact@v3
Expand Down
6 changes: 6 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,10 @@ repos:
"GeoAlchemy2",
"psycopg2-binary",
"pgvector",
"openai",
"mistralai",
"absl-py",
"nltk",
"rouge_score",
"evaluate",
]
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,6 @@ start-server:

integration-tests:
python -m pytest -v ./integration_tests/client

external-integration-tests:
python -m pytest -v ./integration_tests/external
7 changes: 3 additions & 4 deletions api/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
FROM python:3.10-alpine
FROM python:3.10-slim

ARG VERSION="0.0.0-dev"

RUN apk add --update --no-cache build-base libpq-dev gcc libffi-dev
RUN apt-get update && apt-get install build-essential libpq-dev -y

COPY ./pyproject.toml /src/

Expand All @@ -13,6 +13,5 @@ RUN python -m pip install -U pip
# git and put .git (which setuptools_scm needs to determine the version) in the container
RUN SETUPTOOLS_SCM_PRETEND_VERSION=${VERSION} python -m pip install .
COPY ./valor_api /src/valor_api
RUN apk del build-base
USER 65532:65532

CMD ["uvicorn", "valor_api.main:app", "--host", "0.0.0.0", "--log-level", "warning"]
6 changes: 6 additions & 0 deletions api/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ dependencies = [
"pydantic-settings",
"structlog",
"pgvector",
"openai",
"mistralai",
"absl-py",
"nltk",
"rouge_score",
"evaluate"
]

[build-system]
Expand Down
Loading
Loading