GitHub - Rust-soham/humanizer-py-script

Humanize Research Paper Text (`texthumanize` CLI)

This mini-project lets you take a full research-paper-style .txt file (for example, exported from a paper editing site) and run it through the texthumanize library using its academic profile.

The script:

Preserves the overall formatting of your text file (headings, spacing, etc.) while normalizing the language.
Uses humanize_chunked() so very long papers can be processed safely.
Is configured for formal academic style by default (profile="academic").

Important limitation: Neither this script nor texthumanize can guarantee that your text will evade all external AI detectors (GPTZero, Turnitin, Originality.ai, etc.). The library is designed to reduce common AI-style markers and normalize style, but external detectors use their own proprietary models and signals.

1. Install dependencies

From the py-humanize folder:

pip install -r requirements.txt

This installs:

texthumanize ≥ 0.27.1
lxml (used for safe DOCX XML edits)

2. Prepare your input file

Place your formatted research paper text in a plain UTF‑8 .txt file, for example:

whole.txt

You can keep the editing site's template structure in this file; the script will humanize the content as raw text while preserving line breaks and most structural formatting.

3. Run the humanizer

Basic usage (from the py-humanize directory):

python humanize_paper.py whole.txt

This will:

Read whole.txt
Humanize it with profile="academic" and English language (lang="en")
Write the result to whole.txt.humanized.txt in the same directory

4. Advanced options

You can customize behavior via CLI flags:

python humanize_paper.py whole.txt \
  -o whole_academic_humanized.txt \
  --lang en \
  --profile academic \
  --intensity 30 \
  --seed 42 \
  --chunk-size 3000

-o / --output: Custom output path.
--lang: Language code (default: en).
--profile: Processing profile (default: academic). Other options in texthumanize include chat, web, seo, docs, formal, marketing, social, email.
--intensity: 0–100. If omitted, texthumanize uses the profile-specific default (for academic, ~25).
--seed: Ensures deterministic output for the same input.
--chunk-size: Approximate characters per chunk when splitting long documents internally.

5. Notes on AI detection

texthumanize focuses on style normalization (sentence length, connectors, bureaucratic phrases, etc.).
It can reduce AI-style markers, but the project itself explicitly states it cannot guarantee bypassing all third‑party AI detectors.
For best results:
- Start with profile="academic" and low–moderate intensity (20–40).
- Manually review the humanized text to ensure scholarly tone, correct terminology, and no template artefacts were altered in an unintended way.

DOCX Support (Word `.docx`)

This project also includes a DOCX-focused script: humanize_docx.py.

What it edits

Only the main body text inside word/document.xml by replacing the contents of <w:t> text nodes.
It skips:
- Word field instructions (w:instrText)
- Word math nodes (m:*)

Pitfalls / complications

Word often splits visible text across multiple runs/text nodes, so humanizing per <w:t> can reduce global coherence.
Formatting is preserved structurally (we do not change w:rPr, paragraph/run structure, etc.), but the wording changes are still inserted at the text-node level.

Usage

python humanize_docx.py input.docx -o output.docx

Optional flags:

python humanize_docx.py input.docx -o output.docx \
  --lang en \
  --profile academic \
  --intensity 30 \
  --seed 42 \
  --min-chars 2

Repairing DOCX Outputs (newline cleanup)

If you see odd reflow/layout issues after humanize_docx.py, it can help to remove newline characters that end up inside <w:t> text nodes.

Run:

python repair_docx_wt_newlines.py input.docx -o output.docx

This edits only word/document.xml and validates that no <w:t> nodes still contain \n or \r after repair.

PDF Support (`.pdf`)

This project also includes a PDF-focused script: humanize_pdf.py.

What it does

Uses PyMuPDF to extract embedded text along with bounding boxes.
Humanizes the extracted text line-by-line using texthumanize.
Covers the original text regions with a white rectangle and re-inserts the humanized text back into the same approximate regions.

Important limitations

This works best for PDFs that contain real, extractable text (not scanned images).
Even for text-based PDFs, fonts/colors/spacing may shift slightly depending on how the PDF stores glyph positioning.

Usage

python humanize_pdf.py input.pdf -o output.pdf

Optional flags:

python humanize_pdf.py input.pdf -o output.pdf \
  --lang en \
  --profile academic \
  --intensity 30 \
  --seed 42 \
  --min-chars 2 \
  --min-text-chars 50

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cursor/skills/latex-paper-humanize		.cursor/skills/latex-paper-humanize
.font_test_cache		.font_test_cache
.pdf_font_cache		.pdf_font_cache
__pycache__		__pycache__
Abstract_Road_safety_strongly_depen.txt		Abstract_Road_safety_strongly_depen.txt
Abstract_Road_safety_strongly_depen_humanized.txt		Abstract_Road_safety_strongly_depen_humanized.txt
README.md		README.md
aicontent1.txt		aicontent1.txt
aicontent1_humanized.txt		aicontent1_humanized.txt
fontfile_test_out.pdf		fontfile_test_out.pdf
humanize_docx.py		humanize_docx.py
humanize_paper.py		humanize_paper.py
humanize_pdf.py		humanize_pdf.py
humanize_plain.py		humanize_plain.py
manish_research_paper.docx		manish_research_paper.docx
manish_research_paper.humanized.docx		manish_research_paper.humanized.docx
manish_research_paper.humanized.fixed.docx		manish_research_paper.humanized.fixed.docx
manish_research_paper.humanized.fixed_rm_newlines.docx		manish_research_paper.humanized.fixed_rm_newlines.docx
manish_research_paper.humanized.int10.docx		manish_research_paper.humanized.int10.docx
manish_research_paper.humanized.int10.fixed_rm_newlines.docx		manish_research_paper.humanized.int10.fixed_rm_newlines.docx
manish_research_pdf_version.pdf		manish_research_pdf_version.pdf
repair_docx_wt_newlines.py		repair_docx_wt_newlines.py
repair_docx_wt_newlines_remove.py		repair_docx_wt_newlines_remove.py
requirements.txt		requirements.txt
researchContentManish.txt		researchContentManish.txt
researchContentManishHumanized.txt		researchContentManishHumanized.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Humanize Research Paper Text (`texthumanize` CLI)

1. Install dependencies

2. Prepare your input file

3. Run the humanizer

4. Advanced options

5. Notes on AI detection

DOCX Support (Word `.docx`)

What it edits

Pitfalls / complications

Usage

Repairing DOCX Outputs (newline cleanup)

PDF Support (`.pdf`)

What it does

Important limitations

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Humanize Research Paper Text (texthumanize CLI)

1. Install dependencies

2. Prepare your input file

3. Run the humanizer

4. Advanced options

5. Notes on AI detection

DOCX Support (Word .docx)

What it edits

Pitfalls / complications

Usage

Repairing DOCX Outputs (newline cleanup)

PDF Support (.pdf)

What it does

Important limitations

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Humanize Research Paper Text (`texthumanize` CLI)

DOCX Support (Word `.docx`)

PDF Support (`.pdf`)

Packages