forked from huggingface/yourbench
-
Notifications
You must be signed in to change notification settings - Fork 0
Dockerizing #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
stepdi
wants to merge
92
commits into
main
Choose a base branch
from
docker
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Dockerizing #1
Changes from all commits
Commits
Show all changes
92 commits
Select commit
Hold shift + click to select a range
ef3536f
Dockerization
stepdi 0212ddb
Added excels and JSONL to the output zip file
stepdi 87f430c
Fix pipeline integration tests with proper mocking
sumukshashidhar 1cfec04
remove unsupported
sumukshashidhar 5f1915c
Fix CI workflow by adding virtual environment creation step
sumukshashidhar 769fb9e
Fix CI workflow: add permissions and activate virtual environment
sumukshashidhar 24a1a3d
CQ
sumukshashidhar d394a8d
feat: add cost tracking to inference engine
sumukshashidhar 8652c4e
Update inference_engine.py
sumukshashidhar a820209
fix push to hub
sumukshashidhar 6ec0b1e
Update pyproject.toml
sumukshashidhar cb7a819
Merge pull request #91 from huggingface/bugfix/dataset-push
sumukshashidhar c35cae3
Update inference_engine.py
sumukshashidhar b06025c
Merge pull request #90 from huggingface/fix/clean-cost-tracking
sumukshashidhar 80f01df
Fix Dockerfile
m-peko feb3948
Stability fixes [WIP]
alozowski d26927f
add offline mode
sumukshashidhar 40f06e5
add cq
sumukshashidhar ca7a3b0
Update dataset_engine.py
sumukshashidhar 141a696
Update dataset_engine.py
sumukshashidhar 2278ead
Merge pull request #93 from huggingface/cherry-pick-offline-mode
sumukshashidhar dd62da9
Merge pull request #92 from huggingface/release-v0.3.1
sumukshashidhar 942de83
add new readme, remove legacy figure
sumukshashidhar e0f335e
move video and highlights to bottom
sumukshashidhar 64ca98c
Merge pull request #99 from huggingface/update-readme
sumukshashidhar 951ceb0
Merge branch 'main' into tests/integration-test
sumukshashidhar 61f7387
Merge pull request #88 from huggingface/tests/integration-test
sumukshashidhar 98d98e4
Update README.md
sumukshashidhar 1572760
Update README.md
sumukshashidhar ae76f9d
Merge pull request #100 from huggingface/fix-readme-merge-conflict
sumukshashidhar 28d27f2
Delete yourbench/utils/load_task_config.py
sumukshashidhar b0e473f
refactor loading engine
sumukshashidhar 80f9f49
fix summarization and refactor
sumukshashidhar 99924d3
Merge pull request #103 from huggingface/remove-empty-file
sumukshashidhar e67a11c
add sample question viewer to analyze
sumukshashidhar 1c6239f
add docs
sumukshashidhar 264bf64
Merge pull request #107 from huggingface/analyze-sample-questions
sumukshashidhar 1acef89
Change output format of generated benchmark
m-peko 47357ad
Improve lighteval.py for MCQ and long task
alozowski 03a3036
Merge remote-tracking branch 'origin/main' into long-task-stability
alozowski 78c59a5
Apply Ruff
alozowski 364d215
Update quickstart with correct run command (#108)
patrickfleith f162207
Merge pull request #106 from huggingface/improve-summarization
sumukshashidhar fb58390
Merge pull request #104 from huggingface/refactor-loading-engine
sumukshashidhar a930329
remove main, unnecessary
sumukshashidhar 93e5715
remove plotting code
sumukshashidhar fe76858
remove info density metrics
sumukshashidhar 004fba7
refactor chunking and heavily reduce LoC
sumukshashidhar 1971b20
fix cq
sumukshashidhar f58c00d
update testcase
sumukshashidhar 2801943
add cq for tests
sumukshashidhar dc827a3
remove unnecessary dependencies based on semantic deduplications
sumukshashidhar 99bbee7
Merge remote-tracking branch 'origin/main' into long-task-stability
alozowski 3910a6f
Pull summarization.py from main
alozowski 3aaae2d
Update citation_score_filtering.py
sumukshashidhar feb96ac
remove semantic chunking reference and add warning
sumukshashidhar 0fbbab5
Update ingestion.py
sumukshashidhar 65a3cd5
Update pyproject.toml
sumukshashidhar c20a224
Merge pull request #112 from huggingface/refactor-chunking
sumukshashidhar d2b9ab5
Merge branch 'main' of github.com:huggingface/yourbench into long-tas…
alozowski 42cf7fe
Restore summarization.py from main
alozowski e008dde
Merge pull request #111 from huggingface/long-task-stability
alozowski 4057d64
Introduce BENCHMARK_SYSTEM_PROMPT environment variable
m-peko 951d257
use latest gemini flash model
c841506
use latest gemini flash model
68cdb5a
use correct model format for private evaluation
52efc59
use gemini flash for llm as a judge
2c4de98
Ensure summarization uses correct model by aligning step name
alozowski 162ac00
Apply Ruff
alozowski 33b830d
Merge pull request #117 from huggingface/fix-summarization-stepname-m…
alozowski da48707
remove import error and refactor block
sumukshashidhar 07497eb
add helper
sumukshashidhar 3801f63
fix cq
sumukshashidhar f698308
Merge pull request #116 from huggingface/improve-ingestion-markitdown
sumukshashidhar 69be970
Merge pull request #115 from huggingface/hotfix-citation-score-filtering
sumukshashidhar 5f56c05
Refactor CLI and pipeline init logic
alozowski 1fed2fa
Refactor ingestion and QA stages
alozowski 0e8d389
Split inference logic into modular files
alozowski c1f1446
Update parsing and QA model logic
alozowski fb87276
Refine question generation prompts
alozowski 074c0bb
Add chunk sampling logic
alozowski c98cb05
Update config and integration test for QA pipeline
alozowski 783a553
Refactor test pipeline to support unified question_generation
alozowski ed83d39
Merge pull request #121 from huggingface/qg-inference-clarity
alozowski b4883f8
Potential fix for code scanning alert no. 1: Workflow does not contai…
sumukshashidhar ee24b9f
Merge branch 'docker'
stepdi f5035f0
Added `include_docment_text` option to `lighteval` step to skip addin…
stepdi 50f8c23
Turned off inclusion of doc contents in `lighteval` step
stepdi b0af320
Added missing HF_HUB_ONLINE=1 to .env.template
stepdi 0cf0717
limit LLM query count to 50 for single-shot and 50 for multi-hop ques…
stepdi 2b86b62
Update llm judge model for yourbench to 2.5-flash
Robert-H-Leonard c2282f1
Merge pull request #2 from LayerLens/update-llm-judge-model
Robert-H-Leonard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| .venv | ||
| .git | ||
| __pycache__/ | ||
| datasets/ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,13 @@ | ||
| HF_TOKEN= | ||
| HF_ORGANIZATION= | ||
| OPENROUTER_API_KEY= | ||
|
|
||
| BENCHMARK_NAME="test" | ||
| BENCHMARK_SYSTEM_PROMPT="test prompt" | ||
| INPUT_S3_BUCKET="layerlens-private-test-organization" | ||
| INPUT_S3_KEY="benchmarks/test-project/benchmark-name/data.zip" | ||
| OUTPUT_S3_BUCKET="layerlens-private-test-organization" | ||
| OUTPUT_S3_KEY="benchmarks/test-project/benchmark-name/" | ||
|
|
||
| AWS_ACCESS_KEY_ID= | ||
| AWS_SECRET_ACCESS_KEY= | ||
|
|
||
| HF_HUB_OFFLINE=1 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| name: YourBench CI | ||
|
|
||
| on: | ||
| push: | ||
| branches: [ main ] | ||
| pull_request: | ||
| branches: [ main ] | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| test: | ||
| runs-on: ubuntu-latest | ||
| strategy: | ||
| matrix: | ||
| python-version: [3.12] | ||
|
|
||
| steps: | ||
| - uses: actions/checkout@v3 | ||
| - name: Set up Python ${{ matrix.python-version }} | ||
| uses: actions/setup-python@v4 | ||
| with: | ||
| python-version: ${{ matrix.python-version }} | ||
|
|
||
| - name: Install uv | ||
| run: pip install uv | ||
|
|
||
| - name: Create virtual environment | ||
| run: uv venv | ||
|
|
||
| - name: Install dependencies | ||
| run: | | ||
| . .venv/bin/activate | ||
| uv pip install -e . | ||
| uv pip install pytest pytest-cov | ||
|
|
||
| - name: Run tests | ||
| run: | | ||
| . .venv/bin/activate | ||
| python -m pytest tests/ --cov=yourbench --cov-report=xml | ||
|
|
||
| - name: Upload coverage to Codecov | ||
| uses: codecov/codecov-action@v3 | ||
| with: | ||
| file: ./coverage.xml | ||
| fail_ci_if_error: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,8 @@ | ||
| name: Quality | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| FROM python:3.12-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| # Install system dependencies | ||
| RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
| git \ | ||
| curl \ | ||
| && apt-get clean \ | ||
| && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # Copy all yourbench files | ||
| COPY . . | ||
|
|
||
| # Install dependencies and yourbench in editable mode | ||
| RUN pip install --upgrade pip && \ | ||
| pip install boto3 pyyaml awscli && \ | ||
| pip install -e . | ||
|
|
||
| # Verify installation | ||
| RUN yourbench --version || echo "Yourbench installation verification failed but continuing build" | ||
|
|
||
| # Environment variables (will be overridden at runtime) | ||
| ENV BENCHMARK_NAME="" | ||
| ENV BENCHMARK_SYSTEM_PROMPT="" | ||
| ENV INPUT_S3_BUCKET="" | ||
| ENV INPUT_S3_KEY="" | ||
| ENV OUTPUT_S3_BUCKET="" | ||
| ENV OUTPUT_S3_KEY="" | ||
| ENV OPENROUTER_API_KEY="" | ||
| ENV AWS_ACCESS_KEY_ID="" | ||
| ENV AWS_SECRET_ACCESS_KEY="" | ||
| ENV AWS_DEFAULT_REGION="us-east-1" | ||
| ENV WORKDIR="/app" | ||
|
|
||
| # Create a startup script to run the processing workflow | ||
| RUN printf '#!/bin/bash\n\ | ||
| echo "Running yourbench workflow..."\n\ | ||
| exec python run_yourbench.py\n' > /app/entrypoint.sh && \ | ||
| chmod +x /app/entrypoint.sh | ||
|
|
||
| # Use the startup script as entry point | ||
| ENTRYPOINT ["/app/entrypoint.sh"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| # YourbenchProcessor Docker Container | ||
|
|
||
| This Docker container automates the process of: | ||
| 1. Downloading data from AWS S3 | ||
| 2. Processing with yourbench | ||
| 3. Uploading results back to AWS S3 | ||
|
|
||
| ## Required Environment Variables | ||
|
|
||
| The container requires the following environment variables: | ||
|
|
||
| - `INPUT_S3_BUCKET`: S3 bucket name for input data | ||
| - `INPUT_S3_KEY`: S3 object key for input data (ZIP file) | ||
| - `OUTPUT_S3_BUCKET`: S3 bucket name for output results | ||
| - `OUTPUT_S3_KEY`: S3 object key for output results | ||
| - `OPENROUTER_API_KEY`: API key for OpenRouter | ||
| - `AWS_ACCESS_KEY_ID`: AWS access key with S3 permissions | ||
| - `AWS_SECRET_ACCESS_KEY`: AWS secret key with S3 permissions | ||
| - `AWS_DEFAULT_REGION`: AWS region (default: us-east-1) | ||
|
|
||
| ## Building the Docker Image | ||
|
|
||
| ```bash | ||
| docker build -t yourbench-processor . | ||
| ``` | ||
|
|
||
| ## Running the Container | ||
|
|
||
| ```bash | ||
| docker run -e INPUT_S3_BUCKET=your-input-bucket \ | ||
| -e INPUT_S3_KEY=input/data.zip \ | ||
| -e OUTPUT_S3_BUCKET=your-output-bucket \ | ||
| -e OUTPUT_S3_KEY=output/results.zip \ | ||
| -e OPENROUTER_API_KEY=your-openrouter-key \ | ||
| -e AWS_ACCESS_KEY_ID=your-aws-key-id \ | ||
| -e AWS_SECRET_ACCESS_KEY=your-aws-secret \ | ||
| -e AWS_DEFAULT_REGION=us-east-1 \ | ||
| yourbench-processor | ||
| ``` | ||
|
|
||
| ## Process Flow | ||
|
|
||
| 1. Downloads the specified zip file from S3 | ||
| 2. Extracts contents to `task/data/raw` directory | ||
| 3. Creates a `config.yaml` file in `task/dataset` directory | ||
| 4. Runs yourbench with the created config | ||
| 5. Zips the `task/dataset` directory | ||
| 6. Uploads the zipped results back to S3 | ||
|
|
||
| ## Local Testing | ||
|
|
||
| For local testing without Docker: | ||
|
|
||
| ```bash | ||
| # Set environment variables | ||
| export INPUT_S3_BUCKET=your-input-bucket | ||
| export INPUT_S3_KEY=input/data.zip | ||
| export OUTPUT_S3_BUCKET=your-output-bucket | ||
| export OUTPUT_S3_KEY=output/results.zip | ||
| export OPENROUTER_API_KEY=your-openrouter-key | ||
| export AWS_ACCESS_KEY_ID=your-aws-key-id | ||
| export AWS_SECRET_ACCESS_KEY=your-aws-secret | ||
| export AWS_DEFAULT_REGION=us-east-1 | ||
|
|
||
| # Run the script | ||
| python run_yourbench.py | ||
| ``` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we are missing some of the variables here and below in the example docker run command