In this notebook, we import `end_to_end.py` to work with our end-to-end classifier. To run an example from the terminal, check `example_cmd.py`.

In [1]:
import end_to_end
from end_to_end import math_classifier

#Importing classes that we had defined ourselves.
from Concat import ConcatModels
from Reduction import clf_reduction

#Instantiating 
clf=math_classifier()




The user should only work with the method `predict` of the `math_classifier` class. This method returns 2-character and 3-character MSC classes as well as the primary arXiv category along with their respective probabilities based on an input to `predict` in one of the forms below: 

1) a string through the argument `text`; this typically should be the title or the abstract of a math-related paper<a name="cite_ref-a"></a>[<sup>a</sup>](#cite_note-a) (or their concatenation); 

2) an arXiv identifier<a name="cite_ref-b"></a>[<sup>b</sup>](#cite_note-b) through the argument `identifier`; the corresponding article is then fetched from arXiv and prediction is made based on its title and abstract;

3) a positive integer through the argument `n_random`<a name="cite_ref-c"></a>[<sup>c</sup>](#cite_note-c); the desired number of most recent math-related preprints are fetched, and their URL's are printed as well as the predictions made based on titles and abstracts.  

If neither of these inputs is provided, five preprints will be scraped and their predicted classes will be printed along with their URL's.


Loading appropriate classifiers, scraping arXiv preprints, preprocessing the text, vectorization etc. all take place inside the class. 

<a name="cite_note-a"></a>[a](#cite_ref-a) Defined to be a preprint listed under one of the math archives or under the mathematical physics archive.

<a name="cite_note-b"></a>[b](#cite_ref-b) Both old and new [arXiv identifier schemes](https://info.arxiv.org/help/arxiv_identifier.html) are supported. Identifiers beginning with `arXiv:` or ending with the preprint version are supported too. If the input identifier is invalid or non-existent, a ValueError is raised.

<a name="cite_note-c"></a>[c](#cite_ref-c) We recommend not to set `n_random` larger than 20.


The cells below examine various modes of `predict`.

In [2]:
clf.predict(text="Locally bounded, local weak solutions to a doubly nonlinear parabolic equation, which models the multi-phase transition of a material, is shown to be locally continuous. Moreover, an explicit modulus of continuity is given. The effect of the p-Laplacian type diffusion is also considered.")

{'3-character MSC': [('35K', 0.92), ('35B', 0.41)],
 '2-character MSC': [('35', 0.92)],
 'Primary Category': ('math.AP', 0.88)}

In [3]:
clf.predict(text="The Belyi Characterization of a Class of Modular Curves")

{'3-character MSC': [('11G', 0.7), ('14G', 0.18), ('14H', 0.6)],
 '2-character MSC': [('14', 0.6), ('11', 0.7)],
 'Primary Category': ('math.NT', 0.51)}

In [4]:
clf.predict(text="Triangulated Manifolds with Few Vertices: Geometric 3-Manifolds We explicitly construct small triangulations for a number of well-known 3-dimensional manifolds and give a brief outline of some aspects of the underlying theory of 3-manifolds and its historical development.")

{'3-character MSC': [('57N', 0.27),
  ('52B', 0.15),
  ('57M', 0.46),
  ('57Q', 0.29)],
 '2-character MSC': [('57', 0.46), ('52', 0.15)],
 'Primary Category': ('math.GT', 0.61)}

In [5]:
clf.predict(identifier="2202.00768")

{'3-character MSC': [('32G', 0.68),
  ('57M', 0.6),
  ('37F', 0.84),
  ('30F', 0.73)],
 '2-character MSC': [('57', 0.6), ('30', 0.73), ('37', 0.84), ('32', 0.68)],
 'Primary Category': ('math.DS', 0.94)}

In [6]:
clf.predict(identifier="arXiv:math/0509440v1")

{'3-character MSC': [('35A', 0.47),
  ('55N', 0.48),
  ('32S', 0.15),
  ('53D', 0.13),
  ('14D', 0.12),
  ('14F', 0.87)],
 '2-character MSC': [('14', 0.87),
  ('32', 0.15),
  ('53', 0.13),
  ('35', 0.47),
  ('55', 0.48)],
 'Primary Category': ('math.AG', 0.95)}

In [7]:
clf.predict(n_random=20)

Paper 1: http://arxiv.org/abs/2401.04699v1
{'3-character MSC': [('11B', 0.38), ('11P', 0.36)], '2-character MSC': [('11', 0.38)], 'Primary Category': ('math.NT', 0.71)} 

Paper 2: http://arxiv.org/abs/2401.04617v1
{'3-character MSC': [('52C', 0.16), ('05C', 0.79)], '2-character MSC': [('52', 0.16), ('05', 0.79)], 'Primary Category': ('math.CO', 1.0)} 

Paper 3: http://arxiv.org/abs/2401.04519v1
{'3-character MSC': [('35P', 0.61), ('65N', 0.6)], '2-character MSC': [('65', 0.6), ('35', 0.61)], 'Primary Category': ('math.SP', 0.75)} 

Paper 4: http://arxiv.org/abs/2401.03984v1
{'3-character MSC': [('47L', 0.39), ('65F', 0.54), ('47A', 0.95), ('15A', 0.44), ('47B', 0.62)], '2-character MSC': [('47', 0.95), ('65', 0.54), ('15', 0.44)], 'Primary Category': ('math.FA', 0.58)} 

Paper 5: http://arxiv.org/abs/2401.03801v1
{'3-character MSC': [('11R', 0.62)], '2-character MSC': [('11', 0.62)], 'Primary Category': ('math.NT', 0.89)} 

Paper 6: http://arxiv.org/abs/2401.03787v1
{'3-character MSC':