PDF Text Cleaner

A Python tool for extracting and cleaning text from PDF files using PyMuPDF (fitz).

Workflow

See the Step1.pdf file.

Installation

All requirements are available in a requirements.txt file. To install use the following line: pip install -r requirements.txt

How to Run

python main.py <file_path> [-o <output_path/file_name.txt>]

Arguments

Argument	Description
`file_path`	Path to the PDF file (required)
`-o, --output`	Path to save cleaned text (optional)

If not provided, a default filename is generated based on the input file, in the script directory.

Examples:

    >>> # Basic usage (saves to default filename)
    $ python main.py thesis.pdf
    
    >>> # Custom output location
    $ python main.py annual_report.pdf --output ./clean/report.txt
    
    >>> # Using short option
    $ python main.py article.pdf -o cleaned_article.txt
    
    >>> # Processing file with spaces in name
    $ python main.py "my document.pdf" -o "my document clean.txt"
    
    >>> # Using absolute paths
    $ python main.py /home/user/documents/paper.pdf -o /home/user/output/paper.txt

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
notebook		notebook
scr		scr
.gitignore		.gitignore
README.md		README.md
Step1.pdf		Step1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Text Cleaner

Workflow

Installation

How to Run

Arguments

Examples:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Text Cleaner

Workflow

Installation

How to Run

Arguments

Examples:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages