This mini-project lets you take a full research-paper-style .txt file (for example, exported from a paper editing site) and run it through the texthumanize library using its academic profile.
The script:
- Preserves the overall formatting of your text file (headings, spacing, etc.) while normalizing the language.
- Uses
humanize_chunked()so very long papers can be processed safely. - Is configured for formal academic style by default (
profile="academic").
Important limitation: Neither this script nor
texthumanizecan guarantee that your text will evade all external AI detectors (GPTZero, Turnitin, Originality.ai, etc.). The library is designed to reduce common AI-style markers and normalize style, but external detectors use their own proprietary models and signals.
From the py-humanize folder:
pip install -r requirements.txtThis installs:
texthumanize≥ 0.27.1lxml(used for safe DOCX XML edits)
Place your formatted research paper text in a plain UTF‑8 .txt file, for example:
whole.txt
You can keep the editing site's template structure in this file; the script will humanize the content as raw text while preserving line breaks and most structural formatting.
Basic usage (from the py-humanize directory):
python humanize_paper.py whole.txtThis will:
- Read
whole.txt - Humanize it with
profile="academic"and English language (lang="en") - Write the result to
whole.txt.humanized.txtin the same directory
You can customize behavior via CLI flags:
python humanize_paper.py whole.txt \
-o whole_academic_humanized.txt \
--lang en \
--profile academic \
--intensity 30 \
--seed 42 \
--chunk-size 3000-o / --output: Custom output path.--lang: Language code (default:en).--profile: Processing profile (default:academic). Other options intexthumanizeincludechat,web,seo,docs,formal,marketing,social,email.--intensity: 0–100. If omitted,texthumanizeuses the profile-specific default (foracademic, ~25).--seed: Ensures deterministic output for the same input.--chunk-size: Approximate characters per chunk when splitting long documents internally.
texthumanizefocuses on style normalization (sentence length, connectors, bureaucratic phrases, etc.).- It can reduce AI-style markers, but the project itself explicitly states it cannot guarantee bypassing all third‑party AI detectors.
- For best results:
- Start with
profile="academic"and low–moderateintensity(20–40). - Manually review the humanized text to ensure scholarly tone, correct terminology, and no template artefacts were altered in an unintended way.
- Start with
This project also includes a DOCX-focused script: humanize_docx.py.
- Only the main body text inside
word/document.xmlby replacing the contents of<w:t>text nodes. - It skips:
- Word field instructions (
w:instrText) - Word math nodes (
m:*)
- Word field instructions (
- Word often splits visible text across multiple runs/text nodes, so humanizing per
<w:t>can reduce global coherence. - Formatting is preserved structurally (we do not change
w:rPr, paragraph/run structure, etc.), but the wording changes are still inserted at the text-node level.
python humanize_docx.py input.docx -o output.docxOptional flags:
python humanize_docx.py input.docx -o output.docx \
--lang en \
--profile academic \
--intensity 30 \
--seed 42 \
--min-chars 2If you see odd reflow/layout issues after humanize_docx.py, it can help to remove newline characters that end up inside <w:t> text nodes.
Run:
python repair_docx_wt_newlines.py input.docx -o output.docxThis edits only word/document.xml and validates that no <w:t> nodes still contain \n or \r after repair.
This project also includes a PDF-focused script: humanize_pdf.py.
- Uses
PyMuPDFto extract embedded text along with bounding boxes. - Humanizes the extracted text line-by-line using
texthumanize. - Covers the original text regions with a white rectangle and re-inserts the humanized text back into the same approximate regions.
- This works best for PDFs that contain real, extractable text (not scanned images).
- Even for text-based PDFs, fonts/colors/spacing may shift slightly depending on how the PDF stores glyph positioning.
python humanize_pdf.py input.pdf -o output.pdfOptional flags:
python humanize_pdf.py input.pdf -o output.pdf \
--lang en \
--profile academic \
--intensity 30 \
--seed 42 \
--min-chars 2 \
--min-text-chars 50