Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Extract text from documents #108

Open
mawandm opened this issue Jun 3, 2024 · 0 comments
Open

[Feature] Extract text from documents #108

mawandm opened this issue Jun 3, 2024 · 0 comments
Assignees
Labels
API Backend API enhancement New feature or request rag Rag Engine

Comments

@mawandm
Copy link
Contributor

mawandm commented Jun 3, 2024

Description
As a user, I'd like to extract text from the document.

Detail
Text extraction is useful to allow for intermediary steps to document ingestion. This will be allow for other processes such as;

  1. Data cleansing
  2. Data exclusion based on an exclusion list.
  3. Approval workflows

Acceptance Criteria

  1. An API /v1/extractions/text in the RAG microservice.
  2. Extraction path added to the API microservice during document processing.
  3. Persisting the extracted text to an external SQL datasource.
@mawandm mawandm added enhancement New feature or request rag Rag Engine API Backend API labels Jun 3, 2024
@mawandm mawandm self-assigned this Jun 3, 2024
@mawandm mawandm changed the title Extract text from documents [Feature] Extract text from documents Jun 3, 2024
mawandm added a commit that referenced this issue Jul 1, 2024
This PR adds a text extraction API to the RAG service.

Part of #108
mawandm added a commit that referenced this issue Jul 5, 2024
This PR
- makes new document columns nullable
- upgrades rag transformers and tokenizers to fix the error
huggingface/transformers#31789

Part of #108
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Backend API enhancement New feature or request rag Rag Engine
Projects
None yet
Development

No branches or pull requests

1 participant