Lexiler

A tool for estimating the Lexile® level of English PDF books.

Overview

Lexiler extracts text from PDFs and calculates an approximate Lexile measure based on sentence length and word frequency statistics. It uses an adaptive calibration approach that adjusts to different text formats and writing styles.

Installation

# venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

Basic usage:

# After installing dependencies and activating the venv
python main.py path/to/book.pdf

With a custom calibration configuration:

python main.py path/to/book.pdf path/to/calibration_config.json

How It Works

The program:

Extracts text from a PDF using pdfminer.six
Segments the text into sentences and words
Calculates mean sentence length (MSL) and mean log word frequency (MLWF)
Uses these metrics in the Lexile formula with adaptive calibration from a config file
Outputs the estimated Lexile score along with supporting metrics

Calibration

Default Calibration

Lexiler uses a dynamic calibration approach based on text characteristics. The program has been calibrated using a diverse set of books with known Lexile measures:

Harry Potter series (Book 1-8): 500L to 1030L
The Kite Runner: 840L
The Martian: 680L

The formula selects an appropriate calibration constant based primarily on mean sentence length (MSL) with minor adjustments for word frequency (MLWF):

Sentence Length Category	MSL Range	Approx. Constant	Example Books
Very short sentences	< 10	~9.6	HP Book 2 (500L)
Short sentences	10-15	~13.2	The Martian (680L)
Medium-short sentences	15-20	~14.1	The Kite Runner (840L)
Medium sentences	20-24	~17.3	HP Book 8 (880L)
Medium-long sentences	24-30	~18.8	HP Books 1,3,4,7 (880L-1030L)
Long sentences	30-35	~20.8	HP Books 5,6 (870L-1030L)
Very long sentences	> 35	~21.7+	Advanced texts

Custom Calibration

You can generate a custom calibration configuration by using the included analyze.py script:

python analyze.py [directory_with_calibration_books]

This will:

Search for PDFs with Lexile levels in their filenames (format: "*-XXXL.pdf")
Extract text and compute metrics for each book
Calculate ideal calibration constants for each book
Group books by sentence length categories
Output a calibration configuration file (lexile_calibration.json)

You can then use this configuration file with main.py:

python main.py your_book.pdf lexile_calibration.json

Dependencies

pdfminer.six (PDF text extraction)
wordfreq (word frequency statistics)

Limitations

This is an approximation of the official Lexile measure
Results may vary based on PDF formatting and text extraction quality
For certified scores, use MetaMetrics' official Lexile Analyzer

Notes

The Lexile® Framework is a registered trademark of MetaMetrics, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
fixtures		fixtures
.gitignore		.gitignore
README.md		README.md
analyze.py		analyze.py
calibrate.py		calibrate.py
main.py		main.py
requirements.txt		requirements.txt
test_config.json		test_config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexiler

Overview

Installation

Usage

How It Works

Calibration

Default Calibration

Custom Calibration

Dependencies

Limitations

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lexiler

Overview

Installation

Usage

How It Works

Calibration

Default Calibration

Custom Calibration

Dependencies

Limitations

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages