Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Spreadsheet Transcription Software

Clinical researchers, historians, educators and field researchers alike still regularly capture data on paper spreadsheets. In the case of health care and education, data will often contain sensitive personal information, further complicating the process of transcribing paper-based archives into digital form.

This software utilizes machine learning and crowd intelligence to automatically transcribe images of paper-based spreadsheets into electronic form while protecting sensitive personal information. Our algorithm consists of four high-level stages:

(1) the extraction of cell-level images from the spreadsheet grid, (2) machine recognition of digits within the cells, (3) human transcription of cell contents that the machine was uncertain of and (4) feedback of human transcription results to the machine to improve future classification performance.

See: Images_to_spreadsheets_Public_Release.m for the implementation of the algorithm. The code is highly commented.

See: Supplemental_Materials.pdf for additional information on how to adjust the settings of the algorithm.

Also, please feel free to contact me personally with questions: ghassemi(at)mit(dot)edu

About

No description, website, or topics provided.

Resources

License

Releases

No releases published

Packages

No packages published