Skip to content
/ pdf_scraper Public template

This is a small but effective way to extract plain text from .pdf files, which makes it easier for those who utilize screen readers to obtain the information that sometimes is not accessible to them.

Notifications You must be signed in to change notification settings

Phlypper/pdf_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Scraper

This is a React 18 application that allows the user to upload a .pdf file and the app will extract the text from the file and create a formatted copy of the text in a .txt/plain text file that can be downloaded. Below are instructions on how to set up and run the application locally.

Table of Contents

  • [Installation]
  • [Running the App]
  • [Building for Production]
  • [Contributing]

##create a new folder for the app and open it in your code editor of choice

Installation

  1. Clone the repository:

    git clone https://github.com/Phlypper/pdf_scraper.git
    cd pdf_scraper
  2. Install dependencies:

    npm install tesseract.js
    npm install pdfjs-dist
    npm install file-saver
    

##Make certain: Copy ‘pdf.worker.min.mjs’ from node_module/pdefjs-diskt/build and paste it into the public folder

Running the App

To start the development server, run:

npm start

The application will be available at http://localhost:3000/.

Building for Production

To create a production build, run:

npm run build

The production-ready files will be in the build directory.

Contributing

Contributions are welcome! Please submit a pull request or open an issue if you have any suggestions or find any bugs.

This app was created with the use of multiple AI GPTs

About

This is a small but effective way to extract plain text from .pdf files, which makes it easier for those who utilize screen readers to obtain the information that sometimes is not accessible to them.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages