Pupil Handwriting & Homework Dataset

Overview

This dataset consists of handwritten Chinese, Mathematics, and English homework collected from pupils. The dataset is structured to allow alignment across different subjects for each student, enabling tasks like OCR processing, memory learning, and transformer-based models.

Dataset Structure

We provide three different versions of the dataset, each designed for different research and machine learning applications:

Hard Start (Binary Processed & Normalized)

Pre-processed for OCR applications
Binarized, resized, and normalized for model compatibility
Optimized for qualitative OCR tasks

Smooth Procedure (Grayscale, Minimal Processing)

Maintains original handwriting characteristics
Grayscale images without binarization
Suitable for models that require raw stroke details

Raw Intel (Unprocessed Data)

Provides original, untouched scans
Ideal for custom preprocessing and analysis

Key Features

Student-Aligned Data: Each student has a consistent index across all subjects, allowing for long-term and short-term memory modeling.
Memory Learning Applications: The dataset structure enables models to analyze a student’s progression over time.
Transformer Compatibility: Designed with transformer-based models in mind for handwriting recognition, sequence learning, and pattern analysis.

Usage & Access

The dataset is available for free download on GitHub, Kaggle, and Hugging Face. Researchers and developers can utilize it for OCR, handwriting recognition, and educational AI applications. The offical website is http://handwriting-ocr.org/

Download here:https://huggingface.co/datasets/XinyueZhou/Handwriting_Homework_dataset

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pupil Handwriting & Homework Dataset

Overview

Dataset Structure

Hard Start (Binary Processed & Normalized)

Smooth Procedure (Grayscale, Minimal Processing)

Raw Intel (Unprocessed Data)

Key Features

Usage & Access

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Pupil Handwriting & Homework Dataset

Overview

Dataset Structure

Hard Start (Binary Processed & Normalized)

Smooth Procedure (Grayscale, Minimal Processing)

Raw Intel (Unprocessed Data)

Key Features

Usage & Access

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages