Showcase for 13-class scientific statement classification

Method

latexml converts the source into an HTML5 document
llamapun tokenizes the first paragraph into a plain-text representation with sub-formula lexemes
tensorflow executes a pre-trained BiLSTM model with 13 classification targets
served as a rocket web service

Details

For the scientific work behind this showcase, please read our paper

The current deployed model is a Keras BiLSTM(128)→BiLSTM(64)→LSTM(64), with a Dense(13) softmax output. The model file 13_class_statement_classification_bilstm.pb can be downloaded from this repository via git-lfs. It is compatible with the rust wrapper for tensorflow and compiled to use a CPU implementation of LSTM, as our demo server has no dedicated GPU.

The input layer is embedded via the arxmliv 08.2018 GloVe embeddings, as well as padded/truncated to a maximum length of 480 words. A paragraph is hence a fixed (480,300) matrix, as passed into the bilstm layer.

The specific model in this demo was trained on 8.3 million paragraphs from the arxmliv 08.2018 dataset, and tested on 2.1 million paragraphs respectively, obtaining a 0.91 F1 score on a target of 13 classes. The base rate baseline was 0.38, the frequency of the "proposition" class.

For more experimental details, please see the main experiment repository.

For practical evaluation, a likelihood threshold could be used, where entries with smaller likelihoods (e.g. <0.3) can be considered as an "other" label.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
public		public
src		src
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
13_class_statement_classification_bilstm.pb		13_class_statement_classification_bilstm.pb
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
word_index.json		word_index.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

public

public

src

src

templates

templates

.gitattributes

.gitattributes

.gitignore

.gitignore

.rustfmt.toml

.rustfmt.toml

13_class_statement_classification_bilstm.pb

13_class_statement_classification_bilstm.pb

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

word_index.json

word_index.json

Repository files navigation

Showcase for 13-class scientific statement classification

Method

Details

About

Releases 2

Packages

Contributors 2

Languages

License

dginev/showcase-statement-classification

Folders and files

Latest commit

History

Repository files navigation

Showcase for 13-class scientific statement classification

Method

Details

About

Resources

License

Stars

Watchers

Forks

Languages