chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage #15

spa5k · 2025-03-04T17:48:05Z

Add .dockerignore to optimize Docker build context
Refactor docker-compose files for CPU and GPU modes to add health check.
Update Dockerfile with multi-stage build and improved caching
Add Makefile for simplified project management
Implement health check endpoint in document converter route
Optimize container runtime and model download process
Improve documentation and setup instructions

- Add .dockerignore to optimize Docker build context - Refactor docker-compose files for CPU and GPU modes - Update Dockerfile with multi-stage build and improved caching - Add Makefile for simplified project management - Implement health check endpoint in document converter route - Optimize container runtime and model download process - Improve documentation and setup instructions

spa5k · 2025-03-05T07:56:01Z

@drmingler do you mind taking a look?

drmingler · 2025-03-05T11:42:50Z

@drmingler do you mind taking a look?

@spa5k Awesome work! Thank you. I will have a look and get back. Meanwhile can we have a unified lint rule so that we can avoid modifying the code formatting?

spa5k · 2025-03-05T11:44:24Z

Sure, I tried not to change much formatting, will reset wherever it happened. UV also provides a formatter, maybe it can be used

spa5k · 2025-03-05T16:26:12Z

What about this? Let's merge this, then I will create a PR to implement Ruff, as a GitHub action, and a pre commit in next PR, otherwise, there will be too many changes in this PR.

@drmingler

drmingler · 2025-03-06T16:32:44Z

What about this? Let's merge this, then I will create a PR to implement Ruff, as a GitHub action, and a pre commit in next PR, otherwise, there will be too many changes in this PR.

@drmingler

Okay that makes sense. I will merge this over the weekend. Thank you

…type safety and API documentation - Add comprehensive response examples and status codes - Introduce JobStatus and ImageType enums for better type safety - Enhance route error handling and status reporting - Add detailed field descriptions and validation - Improve health check endpoint with more granular service status reporting

…d docker caching

spa5k · 2025-03-08T10:30:34Z

Thanks, the PR is now in quite good stage, I tested the caching, and schema changes, including generating SDK using it for Go, and everything seems to be working well.

drmingler · 2025-03-08T16:22:38Z

README.md

 ```bash
-curl -sSL https://install.python-poetry.org | python3 -
+curl -LsSf https://astral.sh/uv/install.sh | sh


Can we also add instructions on how to install uv for those using windows or add a link to the uv installation doc? @spa5k

drmingler · 2025-03-08T17:07:02Z

README.md

+make docker-run-gpu
+
+# Or build and run with multiple workers
+make docker-run-gpu WORKER_COUNT=3
 ```


Should be "docker-start-cpu" not "docker-run-gpu"

drmingler

It doesn't work on GPU but works on CPU. Modules are not installed during docker build.

…build and runtime performance - Simplify Dockerfile with single-stage build - Improve PyTorch and EasyOCR model installation based on system architecture - Update docker-compose files to remove runtime target - Add auto-detection for CPU/GPU docker start in Makefile - Enhance .dockerignore with additional Python-related exclusions - Add entrypoint script for consistent container startup

- Introduce detect_gpu.sh script for dynamic GPU and Docker configuration detection - Update docker-compose.cpu.yml and docker-compose.gpu.yml to use uv for running commands - Enhance Dockerfile with multi-stage build, improved GPU/CPU detection, and model downloading - Add NVIDIA GPU capabilities and platform configuration in Docker Compose files - Improve runtime and build performance with better caching and environment setup

… docker-run

spa5k · 2025-03-10T03:57:46Z

Can you check the GPU part now? If it is not working, please move the pytorch and model downloads part to the runtime stage

drmingler · 2025-03-10T15:08:56Z

document_converter/route.py

+    try:
+        result = document_converter_service.get_single_document_task_result(job_id)
+
+        # Return 202 Accepted if job is still in progress
+        if result.status in ["IN_PROGRESS"]:
+            return JSONResponse(
+                status_code=status.HTTP_202_ACCEPTED,
+                content=result.dict(exclude_none=True)
+            )
+
+        # Return 422 for failed jobs
+        if result.status == "FAILURE":
+            return JSONResponse(
+                status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+                content=result.dict(exclude_none=True)
+            )
+
+        # Return 200 OK for successful jobs
+        return JSONResponse(
+            status_code=status.HTTP_200_OK,
+            content=result.dict(exclude_none=True)
+        )
+    except KeyError:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"Job not found: {job_id}"
+        )


Please take this off, during job processing we do not raise errors, we return a status of FAILURE if the job failed and the reason for the failure in the error field. It was intentionally done that way.

drmingler · 2025-03-10T15:09:08Z

document_converter/route.py

+        result = document_converter_service.get_batch_conversion_task_result(job_id)
+
+        # Return 202 Accepted if the batch job or any sub-job is still in progress
+        if result.status in ["IN_PROGRESS"] or any(
+            job.status in ["IN_PROGRESS"]
+            for job in result.conversion_results
+        ):
+            return JSONResponse(
+                status_code=status.HTTP_202_ACCEPTED,
+                content=result.dict(exclude_none=True)
+            )
+
+        # Return 422 for failed batch jobs
+        if result.status == "FAILURE" or any(
+            job.status == "FAILURE"
+            for job in result.conversion_results
+        ):
+            return JSONResponse(
+                status_code=status.HTTP_422_UNPROCESSABLE_ENTITY,
+                content=result.dict(exclude_none=True)
+            )
+
+        # Return 200 OK for successful batch jobs (all success)
+        return JSONResponse(
+            status_code=status.HTTP_200_OK,
+            content=result.dict(exclude_none=True)
+        )
+    except KeyError:
+        raise HTTPException(
+            status_code=status.HTTP_404_NOT_FOUND,
+            detail=f"Batch job not found: {job_id}"
+        )


drmingler · 2025-03-10T15:11:02Z

Can you check the GPU part now? If it is not working, please move the pytorch and model downloads part to the runtime stage

I will test on gpu and get back to you. @spa5k

spa5k added 3 commits March 3, 2025 23:13

chore: Migrate from Poetry to UV and update project configuration

c785f7c

Merge remote-tracking branch 'origin/main' into docker-uv

d78b947

spa5k marked this pull request as ready for review March 5, 2025 07:55

spa5k changed the title ~~chore: Migrate from Poetry to UV and update project configuration~~ chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage Mar 5, 2025

spa5k added 2 commits March 7, 2025 23:01

refactor: Simplify document conversion schemas and routes and improve…

b4f97e2

…d docker caching

drmingler self-requested a review March 8, 2025 16:07

drmingler reviewed Mar 8, 2025

View reviewed changes

drmingler requested changes Mar 8, 2025

View reviewed changes

spa5k added 3 commits March 10, 2025 08:30

docs: Update README.md Docker commands to use docker-start instead of…

f8fc3fd

… docker-run

spa5k force-pushed the docker-uv branch from a51531c to f8fc3fd Compare March 10, 2025 13:24

drmingler reviewed Mar 10, 2025

View reviewed changes

spa5k force-pushed the docker-uv branch from 2d3e014 to f8fc3fd Compare March 11, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage #15

chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage #15

Uh oh!

spa5k commented Mar 4, 2025 •

edited

Loading

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

drmingler commented Mar 5, 2025

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

drmingler commented Mar 6, 2025

Uh oh!

spa5k commented Mar 8, 2025

Uh oh!

drmingler Mar 8, 2025 •

edited

Loading

Uh oh!

drmingler Mar 8, 2025

Uh oh!

drmingler left a comment

Uh oh!

spa5k commented Mar 10, 2025

Uh oh!

drmingler Mar 10, 2025

Uh oh!

drmingler Mar 10, 2025

Uh oh!

drmingler commented Mar 10, 2025

Uh oh!

Uh oh!

chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage #15

Are you sure you want to change the base?

chore: Migrate from Poetry to UV, added health checkpoint, and improved dockerimage #15

Uh oh!

Conversation

spa5k commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

drmingler commented Mar 5, 2025

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

spa5k commented Mar 5, 2025

Uh oh!

drmingler commented Mar 6, 2025

Uh oh!

spa5k commented Mar 8, 2025

Uh oh!

drmingler Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drmingler Mar 8, 2025

Choose a reason for hiding this comment

Uh oh!

drmingler left a comment

Choose a reason for hiding this comment

Uh oh!

spa5k commented Mar 10, 2025

Uh oh!

drmingler Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

drmingler Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

drmingler commented Mar 10, 2025

Uh oh!

Uh oh!

spa5k commented Mar 4, 2025 •

edited

Loading

drmingler Mar 8, 2025 •

edited

Loading