Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a tool to find closest match in DLMF for a given mathematical expression #1777

Open
Daniel-Mietchen opened this issue Jul 20, 2023 · 5 comments

Comments

@Daniel-Mietchen
Copy link
Owner

Daniel-Mietchen commented Jul 20, 2023

Not sure whether that already exists but if I have some expressions like the ones below (from here)
Screenshot from 2023-07-21 00-57-47

in a machine friendly format, then it would be nice to see how they or their components could be mapped to the Digital Library of Mathematical Functions. Such a mapping could serve as a bridge to support finding other articles that contain similar mathematical constructs, as per

@Daniel-Mietchen
Copy link
Owner Author

Pinging @physikerwelt who I presume has thought about this before.

@Daniel-Mietchen
Copy link
Owner Author

Daniel-Mietchen commented Jul 20, 2023

Apart from finding articles, such a normalized representation of mathematical concepts could perhaps also be a useful component for a tool for finding software that does something with these concepts, or even dedicated hardware (should it exist) for computing such things.

@physikerwelt
Copy link

What is the context of this ticket?
What is the definition of close https://www.nist.gov/publications/evaluation-similarity-measure-factors-formulae-based-ntcir-11-math-task? I think there is no general answer. It depends on the aspect (according to the definition of @malteos) that is important for the user looking for the similarity.

@Daniel-Mietchen
Copy link
Owner Author

@physikerwelt The background is that I am interested in browsing the literature by mathematical formulas, as per

When I came across that paper, I was wondering which mathematical systems similar to that described by their equations might have been explored in other papers before, perhaps even in a completely different context. Yet I would not know an efficient mechanism by which I could find such papers based purely on the formulas / expressions or some abstract representation thereof. DLMF at least assists with the abstract representation bit, yet I am not aware of it having been used for literature search, hence the ticket.

In terms of defining similarity, I agree that there are multiple ways to go about that, and your paper illustrates this nicely. For now, I would be happy to use tooling based on any facet of similarity or even a combined measure as per Zhang and Youssef.

In short, if we have SwMATH to indicate which software was used in a useful subset of papers, it is probably not a far-fetched idea to think about a system that indicates which formulas were used in such a set of papers, and while exact matches of formulas may not work well in cases like my example above, something that maps onto a taxonomy like DLMF would seem like a good starting point.

@malteos
Copy link

malteos commented Oct 24, 2023

My recommendation would be to take a math-optimized LLM, like Llemma, feed the formulas through the model, take the embeddings and do a k-nearest neighbor search in the embedding space. Given that the recent LLMs got quite good in handling math I am confident that this would produce already somewhat good results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants