Skip to content

divya1211/math-expression-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Math Expression Detection

Detect mathematical expressions in worksheets and draw bounding boxes.

Examples

How is it done?

  • Scraped data from Bing for the keyword "math worksheets" using google-images-download.
  • Annotated ~50 worksheets, assigning 0 to non-math expressions and 1 to math expressions. Alias as MathWorksheetsOCR dataset.
  • Using the CRAFT: Character-Region Awareness For Text detection detect general purpose OCR.
  • A trained binary classifier using BERT removes non-mathematical expressions using the annotated data.
  • Non-maximal supression to combine multiple intersecting bounding-boxes together.
  • Plot the bounding boxes over the images.

Data

  • data/train: The MathWorksheetsOCR dataset. List of 50 worksheets hand annotated for the binary classification. Total number of expresisons 2332. Where Math expression are 1859 and non-math expressions are 473. The dataset is skewed with 80% math experssions becuase math worksheets have mostly math expressions.
  • image-dataset/bing-scrap-dataset: 100 worksheets scraped from Bing.
  • image-dataset/worksheets: Used these 10 exmaples for our development set.
  • image-dataset/handwritten: Handwritten sheets provided.

Code

  • boundingbox.py: Takes in image folder. Computes bounding box. Plots them.
  • train_classifier.py: Takes in the annotated data exmaples. Trains a binary classifier on top of BERT.
  • classifier.py: Loads up trained BERT classifier. Runs inference.
  • data.py: Custom PyTorch Dataset class for Math Expressions.
  • non_maximal_supression.py: Performs non maximal supression. Credit

How was MathWorksheetsOCR created?

  • Scraped 50 worksheets from Bing.
  • Used the easyOCR to recognize text from each worksheet.
  • Hand annonated the recognized text as either 0 or 1.
  • The final dataset size is,

How was BERT classifer trained?

How does the final detection work?

  • Every image is passed through easyOCR to get both bounding boxes and the text for each box.
  • All the non-math expressions text is removed using the trained BERT classifier.
  • Non-maximal supression is applied to all the bounding boxes to combine intersecting windows.
  • Plot the final boxes and save them in bb folder.
  • Voila!

What did I observe?

  • The results for 3 different datasets can be viewed at image-dataset/bing-scrap-dataset/bb, image-dataset/handwritten/bb, image-dataset/worksheets/bb.
  • The detection is working well even for difficut exmaples, where the expressions are parted into two lines because of non-maximal supression.
  • All the non-math text, instructions like "Solving Quadratic Equations", and question numbers like "2b.", "3)", any other irrelevant text at the end of the worksheet are removed.
  • The precision without the BERT classifier was low, becuase a number of non-math noise was included in the predictions. After using the BERT classifier, the preciiosn increased.
  • I observed all these using qualative analysis. For quantative analysis, like computing precision/recall using IOU, ground truth bounding box for the data is required.

What didn't work?

  • I tried using ScanSSD pre-trained on datasetname. However, the results were not accurate. I believe this is because ScanSSD is trained on Math latex expressions, whereas we wanted it to perform on Math worksheets. Thereby the decision to create annotated examples.
  • Used perplexity from GPT-2 to remove non-math expression. I assumed that math expression perplexity would be higher than non-math expressions. However, no significant difference observed between them.

Final Thoughts

  • A better approach to solve this problme would be from ground-up constructing an annotated dataset for these math worksheets. These annotations should be bounding-boxes.
  • Perhaps, we can use Amazon Mechanical Turk to annotate different distribution of data. Example, hand-written, camera captured sheets, etc.
  • Using IOU, intersection over union, to compute precision and recall of the bounding boxes. Since, our dataset was not annotated at the moment, we used human evaluation for the results.
  • Unsupervised clustering of BERT embeddings of math and non-math text for removing noise.
  • Deep Learning works (sorta)!

About

Detect mathematical expressions in worksheets and draw bounding boxes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages