Skip to content

Lebron-Harden/CLFBL-MY-A-Chinese-historical-document-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Lebron-Harden-CLFBL-MY-A-Chinese-historical-document-dataset

Dataset

The Complete Library in the Four Branches of Literature of Ming history and Yuan history(CLFBL-MY) Dataset is released for the research of Traditional Chinese character recognition and detection. Text images are from the Ming history and Yuan history part of Complete Library in the Four Branches of Literature.

Download link:

Baidu netdisk: https://pan.baidu.com/s/1QSNTLHkjLL7Ea5RczDBDHA (password: 2k4b)

Google Drive: https://drive.google.com/file/d/1IYHfmxzI2nmR98_HonO4A4rx33o7Rw2B/view?usp=sharing

Dataset description

The dataset file is organized as follows:

image

The page folder and page_text file contain images and corresponding page text.

The text_line folder contains text line images which are cut from the original page images, and the images are all rotated 90° for our experiment requirements.

The line_text file contains text line labels of all the images in text_line folder.

The page_text file contains the location information of text lines in each page image.

Note: The number of page images and page location information is not equal, since we delete some loaction information when the image quality is bad.

Samples in CLFBL-MY Dataset

Here are some page images and text line images in CLFBL-MY Dataset: image

image

Contact

If you have any question about the dataset, please contact: 1468525124@qq.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages