Skip to content

Interview-Challenge-Archive/document-markdown-reader

Repository files navigation

npm version License: MIT

Document Markdown Reader

A simple JavaScript library that converts various document formats into clean, readable Markdown text. Perfect for web applications that need to display content from uploaded files like Word documents, PDFs, or rich text files.

Instead of struggling with different file formats, this library automatically detects the file type and converts it to Markdown for you. Whether your users upload Word documents, PDFs, HTML files, or plain text, you'll always get consistent Markdown output that's easy to display or process further.

Install

npm install @interview-challenge-archive/document-markdown-reader

Supported file formats

Format Extensions
HTML .html, .htm
Markdown .md, .markdown, .mdx
OpenDocument .odt
Apple Pages .pages
PDF .pdf
Plain Text .txt
Rich Text Format .rtf
Word Document .doc, .docx, .docm

Examples

Check out the examples folder for complete working examples. Each example includes:

  • A README.md with instructions and explanations
  • A package.json with all required dependencies

Examples are organized by language, framework, and build tool in the format: examples/{language}/{framework}/{tool}.

For a complete overview of all examples, see examples.

API Reference

documentMarkdownReader.readDocument(file: DocumentFileLike): Promise<string>

Reads a document file and returns its content as Markdown.

Parameters:

  • file: A File-like object (browser File object or object with name, arrayBuffer(), text() methods)

Returns: Promise resolving to a string containing the Markdown content

Throws:

  • UnsupportedFormatError - When the file format is not supported
  • InvalidDocumentError - When a DOCX/DOCM, ODT, Pages, or PDF file is invalid or corrupted
  • UnreadableDocumentError - When a DOC, Pages, or PDF file content cannot be read

Properties

  • supportedExtensions: ReadonlyArray<string> - Array of all supported file extensions
  • acceptedExtensions: string - Comma-separated string suitable for HTML accept attribute

Browser Compatibility

This library is designed for browser environments where the following APIs are available:

API Description First appeared in Last browser added Polyfill
File API For handling file objects Firefox 3.6 (2009) Edge 12 (2015) Not needed
DOMParser For parsing HTML/XML content Firefox 1 (2004) Edge 12 (2015) Not needed
TextDecoder For decoding text from ArrayBuffer Firefox 19 (2013) Edge 79 (2020) text-encoding
ArrayBuffer For handling binary data Firefox 4 (2011) Edge 12 (2015) Not needed

No Node.js-specific APIs are used, making it compatible with modern browsers and browser-like environments.

Contributing

We'd love your help! Whether you're fixing a bug, adding support for a new format, or just spotting a typo in the docs, your contributions make this project better for everyone.

Getting started

  1. Fork & Clone: Grab your own copy of the repo and pull it down to your machine.
  2. Setup: Run npm install to get all the dependencies ready.
  3. Branch: Create a new branch for your work: git checkout -b my-awesome-improvement.

Development workflow

We use a few simple commands to keep everything running smoothly:

  • Build: npm run build - Compiles the project and generates type declarations.
  • Test: npm run test - Run this to make sure everything is working as expected. Use npm run test:watch while you're coding for instant feedback.
  • Quality check: Run npm run lint and npm run typecheck before submitting to catch any style issues or TypeScript errors.

Sharing your changes

  • Keep it simple: Try to keep your pull request focused on one specific change. It makes it much easier (and faster!) for us to review.
  • Tell us about it: When you open your pull request, give us a quick summary of what you've done and why it's helpful.

We're excited to see what you build! Thanks for being part of the community.

About

Browser-focused document reader that imports common file formats and returns Markdown

Resources

License

Stars

Watchers

Forks

Contributors