No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
MNIST
Required
Test Images
Turk Related
Images_to_spreadsheets_Public_Release.m
LICENSE.md
README.md
Supplemental_Materials.pdf

README.md

Spreadsheet Transcription Software

Clinical researchers, historians, educators and field researchers alike still regularly capture data on paper spreadsheets. In the case of health care and education, data will often contain sensitive personal information, further complicating the process of transcribing paper-based archives into digital form.

This software utilizes machine learning and crowd intelligence to automatically transcribe images of paper-based spreadsheets into electronic form while protecting sensitive personal information. Our algorithm consists of four high-level stages:

(1) the extraction of cell-level images from the spreadsheet grid, (2) machine recognition of digits within the cells, (3) human transcription of cell contents that the machine was uncertain of and (4) feedback of human transcription results to the machine to improve future classification performance.

See: Images_to_spreadsheets_Public_Release.m for the implementation of the algorithm. The code is highly commented.

See: Supplemental_Materials.pdf for additional information on how to adjust the settings of the algorithm.

Also, please feel free to contact me personally with questions: ghassemi(at)mit(dot)edu