Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
b8fea7c
cli runner for kunji benchmark
EdmundKorley May 29, 2025
1b4c0da
Refactor bench command to support multiple datasets
EdmundKorley May 29, 2025
4dfd55b
Update bench CLI configuration and naming
EdmundKorley May 29, 2025
2829fe7
Add responses API route and benchmark command
EdmundKorley Jun 2, 2025
bbc4453
Add RAG Benchmark workflow
EdmundKorley Jun 2, 2025
d85c3e1
Enhance benchmark workflow and CLI command
EdmundKorley Jun 2, 2025
d9e11c1
workflow with server setup and data seeding
EdmundKorley Jun 2, 2025
9c7ee5e
rm extra
EdmundKorley Jun 2, 2025
0aea62d
prestart.sh takes care of this
EdmundKorley Jun 2, 2025
1f80184
verbose logs
EdmundKorley Jun 2, 2025
b987720
try to get prestart failure logs
EdmundKorley Jun 2, 2025
87674ad
prestart fixes
EdmundKorley Jun 2, 2025
ffd8a92
failure logs as separate step
EdmundKorley Jun 2, 2025
e76b2a0
env vars not making way to docker
EdmundKorley Jun 2, 2025
7501e61
cp
EdmundKorley Jun 2, 2025
d3c11a3
debug
EdmundKorley Jun 2, 2025
88b0270
set environment
EdmundKorley Jun 2, 2025
3fb44ab
debug
EdmundKorley Jun 2, 2025
e53e9a1
debug credentials creation
EdmundKorley Jun 2, 2025
f93343a
add sleep
EdmundKorley Jun 2, 2025
7fa48ee
inline project key
EdmundKorley Jun 2, 2025
f641892
debug
EdmundKorley Jun 2, 2025
651fa48
one shot
EdmundKorley Jun 2, 2025
5357fb4
backend logs on failure
EdmundKorley Jun 2, 2025
0bd2aaf
more debug
EdmundKorley Jun 2, 2025
fa3ff46
debug
EdmundKorley Jun 2, 2025
9bb6922
mo debug
EdmundKorley Jun 2, 2025
8f34120
timeout minutes
EdmundKorley Jun 2, 2025
ef0b578
debug
EdmundKorley Jun 2, 2025
2694151
debug
EdmundKorley Jun 2, 2025
cbfc8c4
debug
EdmundKorley Jun 2, 2025
38193e1
:facepalm:
EdmundKorley Jun 2, 2025
c58ff68
patch artifact upload
EdmundKorley Jun 2, 2025
bbfe1ef
copy artifact from docker to runner
EdmundKorley Jun 2, 2025
f2543db
fix docker file cp & linter
EdmundKorley Jun 2, 2025
1a831bf
add back sleep
EdmundKorley Jun 2, 2025
9c73202
up count and patching costing
EdmundKorley Jun 2, 2025
d54de56
add bench results to job summary
EdmundKorley Jun 2, 2025
f569263
add mean duration to step summary header
EdmundKorley Jun 2, 2025
15b44b6
specific order to make sense to compare
EdmundKorley Jun 2, 2025
7e98278
up iterations to 100
EdmundKorley Jun 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
name: RAG Benchmark

run-name: RAG Benchmark by ${{ github.actor }}

on:
workflow_dispatch:

jobs:
benchmark:
environment: main

runs-on: ubuntu-latest

strategy:
matrix:
dataset: [kunji, sneha]
service: [assistants, responses]
count: [100]

env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
LANGFUSE_PUBLIC_KEY: ${{ secrets.LANGFUSE_PUBLIC_KEY }}
LANGFUSE_SECRET_KEY: ${{ secrets.LANGFUSE_SECRET_KEY }}
LANGFUSE_HOST: ${{ secrets.LANGFUSE_HOST }}
LOCAL_CREDENTIALS_ORG_OPENAI_API_KEY: ${{ secrets.LOCAL_CREDENTIALS_ORG_OPENAI_API_KEY }}
LOCAL_CREDENTIALS_API_KEY: ${{ secrets.LOCAL_CREDENTIALS_API_KEY }}

steps:
- name: Checkout code
uses: actions/checkout@v4

- run: |
cp .env.example .env
sed -i 's/changethis/secret123/g' .env

- name: Run server
run: |
docker compose up -d
sleep 10

- name: prestart logs on failure
if: failure()
run: |
docker compose logs -f prestart
exit 1

- name: Create local credentials
run: |
curl -X POST "http://localhost:8000/api/v1/credentials/" \
-H "Content-Type: application/json" \
-H "X-API-KEY: ${{ env.LOCAL_CREDENTIALS_API_KEY }}" \
-d '{
Comment on lines +47 to +52
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we also need to run seeder before adding credentials for organization_id 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeder is run in prestart in docker compose up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok cool

"organization_id": 1,
"project_id": 1,
"is_active": true,
"credential": {
"openai": {
"api_key": "${{ env.LOCAL_CREDENTIALS_ORG_OPENAI_API_KEY }}"
}
}
}'

- name: Run benchmark
run: |
docker compose exec backend uv run ai-cli bench ${{ matrix.service }} --dataset ${{ matrix.dataset }} --count ${{ matrix.count }} | tee benchmark_output.txt
# Extract mean duration from benchmark output
MEAN_DURATION=$(grep '^Mean duration:' benchmark_output.txt | awk '{print $3}')
echo "## Benchmark Results for ${{ matrix.service }} - ${{ matrix.dataset }} (${{ matrix.count }} queries, ${MEAN_DURATION} avg)" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
cat benchmark_output.txt >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
# Find latest benchmark file inside container first
CONTAINER_LATEST=$(docker compose exec backend sh -c "ls -t bench_results_*.csv | head -n1")
# Copy the specific file out
docker compose cp backend:/app/$CONTAINER_LATEST ./
cp $CONTAINER_LATEST bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv
ls -l bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv

- name: backend logs on failure
if: failure()
timeout-minutes: 1
run: |
docker compose logs -f backend
exit 1

- name: Upload benchmark results
uses: actions/upload-artifact@v4
with:
name: bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv
path: bench-${{ matrix.service }}-${{ matrix.dataset }}-${{ matrix.count }}.csv
Comment on lines +86 to +90
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we uploading the results of the benchmark from here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the github actions ui see link:

#200 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks


- name: Cleanup
if: always()
run: docker compose down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,13 @@ docker compose watch

This should start all necessary services for the project and will also mount file system as volume for easy development.

You verify backend running by doing health-check
You verify backend running by doing a health check

```bash
curl http://[your-domain]:8000/api/v1/utils/health/
```

or by visiting: http://[your-domain]:8000/api/v1/utils/health-check/ in the browser
or by visiting: http://[your-domain]:8000/api/v1/utils/health/ in the browser

## Backend Development

Expand Down
2 changes: 2 additions & 0 deletions backend/app/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
organization,
project,
project_user,
responses,
private,
threads,
users,
Expand All @@ -27,6 +28,7 @@
api_router.include_router(organization.router)
api_router.include_router(project.router)
api_router.include_router(project_user.router)
api_router.include_router(responses.router)
api_router.include_router(threads.router)
api_router.include_router(users.router)
api_router.include_router(utils.router)
Expand Down
132 changes: 132 additions & 0 deletions backend/app/api/routes/responses.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
from typing import Optional

import openai
from pydantic import BaseModel
from fastapi import APIRouter, Depends
from openai import OpenAI
from sqlmodel import Session

from app.api.deps import get_current_user_org, get_db
from app.crud.credentials import get_provider_credential
from app.models import UserOrganization
from app.utils import APIResponse

router = APIRouter(tags=["responses"])


def handle_openai_error(e: openai.OpenAIError) -> str:
"""Extract error message from OpenAI error."""
if isinstance(e.body, dict) and "message" in e.body:
return e.body["message"]
return str(e)

Check warning on line 21 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L19-L21

Added lines #L19 - L21 were not covered by tests


class ResponsesAPIRequest(BaseModel):
project_id: int

model: str
instructions: str
vector_store_ids: list[str]
max_num_results: Optional[int] = 20
temperature: Optional[float] = 0.1
response_id: Optional[str] = None

question: str


class Diagnostics(BaseModel):
input_tokens: int
output_tokens: int
total_tokens: int

model: str
Comment on lines +24 to +42
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move models to backend/app/models folder

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sees are for parsing requests not for backing database models. can introduce to a /schema/ folder for these

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure



class FileResultChunk(BaseModel):
score: float
text: str


class _APIResponse(BaseModel):
status: str

response_id: str
message: str
chunks: list[FileResultChunk]

diagnostics: Optional[Diagnostics] = None
Comment on lines +50 to +57
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have APIResponse model in backend/app/utils.py see if we can use same



class ResponsesAPIResponse(APIResponse[_APIResponse]):
pass


def get_file_search_results(response):
results: list[FileResultChunk] = []

Check warning on line 65 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L65

Added line #L65 was not covered by tests

for tool_call in response.output:
if tool_call.type == "file_search_call":
results.extend(

Check warning on line 69 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L67-L69

Added lines #L67 - L69 were not covered by tests
[FileResultChunk(score=hit.score, text=hit.text) for hit in results]
)

return results

Check warning on line 73 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L73

Added line #L73 was not covered by tests


@router.post("/responses/sync", response_model=ResponsesAPIResponse)
async def responses_sync(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean synchronous or asynchronous in the description as I'm assuming this is running asynchronous so fasten up the completion by running parallel

request: ResponsesAPIRequest,
_session: Session = Depends(get_db),
_current_user: UserOrganization = Depends(get_current_user_org),
):
"""
Temp synchronous endpoint for benchmarking OpenAI responses API
"""
credentials = get_provider_credential(

Check warning on line 85 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L85

Added line #L85 was not covered by tests
session=_session,
org_id=_current_user.organization_id,
provider="openai",
project_id=request.project_id,
)
if not credentials or "api_key" not in credentials:
return APIResponse.failure_response(

Check warning on line 92 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L91-L92

Added lines #L91 - L92 were not covered by tests
error="OpenAI API key not configured for this organization."
)

client = OpenAI(api_key=credentials["api_key"])

Check warning on line 96 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L96

Added line #L96 was not covered by tests

try:
response = client.responses.create(

Check warning on line 99 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L98-L99

Added lines #L98 - L99 were not covered by tests
model=request.model,
previous_response_id=request.response_id,
instructions=request.instructions,
tools=[
{
"type": "file_search",
"vector_store_ids": request.vector_store_ids,
"max_num_results": request.max_num_results,
}
],
temperature=request.temperature,
input=[{"role": "user", "content": request.question}],
include=["file_search_call.results"],
)

response_chunks = get_file_search_results(response)

Check warning on line 115 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L115

Added line #L115 was not covered by tests

return ResponsesAPIResponse.success_response(

Check warning on line 117 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L117

Added line #L117 was not covered by tests
data=_APIResponse(
status="success",
response_id=response.id,
message=response.output_text,
chunks=response_chunks,
diagnostics=Diagnostics(
input_tokens=response.usage.input_tokens,
output_tokens=response.usage.output_tokens,
total_tokens=response.usage.total_tokens,
model=response.model,
),
),
)
except openai.OpenAIError as e:
return ResponsesAPIResponse.failure_response(error=handle_openai_error(e))

Check warning on line 132 in backend/app/api/routes/responses.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/responses.py#L131-L132

Added lines #L131 - L132 were not covered by tests
11 changes: 10 additions & 1 deletion backend/app/api/routes/threads.py
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@
session=_session,
org_id=_current_user.organization_id,
provider="openai",
project_id=_current_user.project_id,
project_id=request.get("project_id"),
)
if not credentials or "api_key" not in credentials:
return APIResponse.failure_response(
Expand Down Expand Up @@ -321,6 +321,15 @@
message = process_message_content(
message_content, request.get("remove_citation", False)
)

diagnostics = {

Check warning on line 325 in backend/app/api/routes/threads.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/threads.py#L325

Added line #L325 was not covered by tests
"input_tokens": run.usage.prompt_tokens,
"output_tokens": run.usage.completion_tokens,
"total_tokens": run.usage.total_tokens,
"model": run.model,
}
request = {**request, **{"diagnostics": diagnostics}}

Check warning on line 331 in backend/app/api/routes/threads.py

View check run for this annotation

Codecov / codecov/patch

backend/app/api/routes/threads.py#L331

Added line #L331 was not covered by tests

return create_success_response(request, message)
else:
return APIResponse.failure_response(
Expand Down
Empty file added backend/app/cli/__init__.py
Empty file.
Empty file.
Loading