NoteScraper

A simple PDF scraper, which I used to create study guides based on lecture materials.

Usage is simple, add the PDF files you want to scrape into the input folder with a name of your choosing and run the following command:

python scraper.py

You will be prompted to enter the name of the folder containing the PDF files as well as the name of the output file. Then, the program will extract the text from the PDFs and save it to a text file in the working directory.

Cleanup then inputs a user prompted name for the output file along with the name for the output file, and uses Regex and string manipulation to clean up the text and format it into a more readable study guide. This file should be modified to suit your needs. Running cleanup is optional, but recommended. The command to run cleanup is as follows:

python cleanup.py

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cleanup.py		cleanup.py
pdfscraper.py		pdfscraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NoteScraper

About

Uh oh!

Releases

Packages

Languages

License

Aitgray/NoteScraper

Folders and files

Latest commit

History

Repository files navigation

NoteScraper

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages