Skip to content

Visual Mathematical Equations (VME) is an open-source dataset designed for Mathematical Optical Character Recognition. The dataset is still being curated. Curators may be rewarded in OCEAN tokens for contributing to this dataset. Join VME's Discord community for more information! https://discord.gg/Ha7eCWPp2E

Notifications You must be signed in to change notification settings

SarahKay99/Visual-Mathematical-Equations-VME

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visual Mathematical Equations

WHAT DATA ARE WE LOOKING FOR?

VME is about VISUAL mathematical equations, meaning we don't want it all to be perfectly printed, linear text equations. Sometimes, messy is GOOD.

Whilst we're fine with having neat, linearly-written equations in our dataset, VME should contain a variety of data, including disorderly and messy data. Images of the following things are highly-sought after. DON'T CONTRIBUTE THESE EXACT IMAGES:

No promises, but there might be bounties for this data in the future. 😉

CONTRIBUTING

VME data can be found anywhere, but the most common places are:

  • 🌊 Old workbooks from school / university
  • 🌊 Workbooks from your children, nieces, nephews, grandchildren, etc.
  • 🌊 Your friends' old workbooks from school / university
  • 🌊 Photos of whiteboards from school / university lying around your phone

Beware of IP violations. VME data can be found in textbooks and on worksheets, but using this data requires IP permissions. Do not extract VME data from textbooks or worksheets unless you have IP permissions.

Right now, the dataset's UAV platform, DataUnion, is in development. So there are no tangible ways to contribute data yet, but you can PREPARE to contribute. To get in early, focus on gathering any visual mathematical data you have access to and photographing it.

What sort of data should I contribute?

To ensure our dataset is multi-purpose, we don't want it all to be perfectly printed, linear text equations. Sometimes, messy is GOOD. We're fine with neat, linearly-written equations, but the following types of data are highly-sought after. DON'T CONTRIBUTE THESE EXACT IMAGES:

  • 🐟 Chaotically laid-out equations, messy whiteboards without a linear structure. Examples: x x x x

  • 🐟 Column equations and long division. Examples: x x x x x

  • 🐟 Sketched graphs -- especially if they have enough information to derive the equation. Examples: x x x

  • 🐟 Geometry. Examples: x x x

There'll probably be bounties for this data in future.


This repo is a sample of the Visual Mathematical Equations (VME) dataset. VME will shortly be listed on Ocean Protocol where it will be open for pooling & data contribution.

Visual Mathematical Equations (VME) is a dataset for anyone interested in Mathematical Optical Character Recognition (Math OCR).

Using this dataset, data scientists can build models which identify maths equations and their components, as shown:

Figure 1


ANNOTATING

As aforementioned, an annotation software called DataUnion is being set up. It will allow contributors to upload, annotate and verify on the dataset.

This software will record your contributions to the dataset so you can be rewarded accordingly in DataTokens. For the time being, if you want to contribute to VME please just focus on collecting VME data and photographing it.


HME-1 Annotation Format

HME-1 stands for Hierarchical Mathematical Equations 1. It is a style of annotation where the equation is broken into hierarchical components:

  • Equation
  • Expression
  • Term
  • Integer / Coefficient / Seperator

The objective of this system is to teach the computer to understand the components of maths equations, visually.

Currently the HME-1 format is available in the Darknet format.

About

Visual Mathematical Equations (VME) is an open-source dataset designed for Mathematical Optical Character Recognition. The dataset is still being curated. Curators may be rewarded in OCEAN tokens for contributing to this dataset. Join VME's Discord community for more information! https://discord.gg/Ha7eCWPp2E

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published