-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Enhancement: PDF to Markdown Conversion Functionality to the Web Svelte Chat Interface #1319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please resolve the merge conflict. |
@eugeis Yes Eugen, will do tomorrow. |
|
Hi @jmd1010, I watched the video and really liked it! The new folders with uppercase letters don’t look very clean. It would be better to move them into a sub folder and use shorter, lowercase names for a neater structure. Please update the link of the updated video in this PR. |
|
@eugeis Yes agreed, we will bring folder naming lowercase and we can move around, within Web folder maybe? Good idea to package with GO, we should do it also for Pdf_to Markdown that i'm doing a final testing run on it today. Can we collaborate on this GO packaging? Don't want to mess it up.... |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Simplified install process V1 + Retest process.
- Move 3 out 4 README files to /web. The conflict prevented me from moving the pr-1284-update.md file, nor update the video and delete the folder. Hopefully you can finalize from your end. Kind of a pesting behind the scene git sync thing that I can't resolve from my end.
- We'll address Go implementation in Next step. got to get back to work:)
- Fnally brought back some .png that were excluded from my .gitignore. I suppose. Cheers, Jean
6c9a3b7 to
a74da4a
Compare
PDF TO MARKDOWN CONVERSION IMPLEMENTATION
The PDF conversion module has been integrated in the svelte web browser interface. Once installed, it will automatically detect pdf files in the chat interface and convert them to markdown automatically for llm processing. No extra servers required. Works with existing backend / front end svelte servers.
🎥 Demo Video (see at 4 min.)
https://youtu.be/bhwtWXoMASA
This document explains the new PDF to Markdown conversion implementation, detailing its functionality, installation process, and the file changes involved. Clone from https://github.com/jzillmann/pdf-to-markdown/tree/modularize.
Integration with Svelte
The integration approach focused on using the library's high-level API while maintaining SSR compatibility:
How it Works
The PDF to Markdown conversion is implemented as a separate module located in the
pdf-to-markdowndirectory. It leverages thepdf-parselibrary (likely viaPdfParser.ts) to parse PDF documents and extract text content. The core logic resides inPdfPipeline.ts, which orchestrates the PDF parsing and conversion process.Pdf-to-Markdownis a folk frompdf.js- Mozilla's PDF parsing & rendering platform which is used as a raw parserHere's a simplified breakdown of the process:
PdfParser.tsusespdf-parseto read the PDF file and extract text content from each page.PdfPipeline.tsthen converts the extracted and processed text content into Markdown format. This involves mapping PDF elements to Markdown syntax, attempting to preserve formatting like headings, lists, and basic text styles.PdfConversionService.tsin theweb/src/lib/servicesdirectory acts as a frontend service that utilizes thepdf-to-markdownmodule. It provides aconvertToMarkdownfunction that takes a File object (PDF file) as input, calls thepdf-to-markdownmodule to perform the conversion, and returns the Markdown output as a string.ChatInput.sveltecomponent uses thePdfConversionServiceto convert uploaded PDF files to Markdown before sending the content to the chat service for pattern processing.Installation
PDF TO MARKDOWN CONVERSION IMPLEMENTATION
The PDF conversion module has been integrated in the svelte web browser interface. Once installed, it will automatically detect pdf files in the chat interface and convert them to markdown automatically for llm processing.
HOW TO INSTALL
FROM FABRIC ROOT DIRECTORY
cd .. web
Install in this sequence:
Step 1
npm install -D patch-package
Step 2
npm install -D pdfjs-dist@2.5.207
Step 3
npm install -D github:jzillmann/pdf-to-markdown#modularize
File Changes
The following files were added or modified to implement the PDF to Markdown conversion:
web/src/lib/services/PdfConversionService.ts: (New file)** Modified files: **
web/src/lib/components/chat/ChatInput.svelte:PdfConversionServicein thereadFileContentfunction to handle PDF files.readFileContentto callpdfService.convertToMarkdownfor PDF files.These file changes introduce the new PDF to Markdown conversion functionality and integrate it into the chat input component of the web interface.