In [1]:
import pandas as pd
import arxiv
import data_utils

In [2]:
## Example: Pulling the most recently updated 1000 articles
# with the primary subject category 'PDEs'

query = data_utils.format_query(cat='math.AP')
pdes = data_utils.query_to_df(query=query,max_results=1000)

pdes.head()

Unnamed: 0,entry_id,updated,published,title,summary,comment,journal_ref,doi,primary_category,pdf_url,authors,categories
0,http://arxiv.org/abs/2305.11166v1,2023-05-18 17:57:21+00:00,2023-05-18 17:57:21+00:00,On the stability of homogeneous equilibria in ...,"The goal of this article is twofold. First, we...","33 pages, submitted in March to a Focus Issue",,,math.AP,http://arxiv.org/pdf/2305.11166v1,"Alexandru D. Ionescu,Benoit Pausader,Xuecheng ...",math.AP
1,http://arxiv.org/abs/2305.11160v1,2023-05-18 17:48:32+00:00,2023-05-18 17:48:32+00:00,Infinitely many conservation laws for generali...,We give a complete description of nontrivial l...,"3 pages, no figures",,,math.AP,http://arxiv.org/pdf/2305.11160v1,A. Sergyeyev,math.AP
2,http://arxiv.org/abs/2305.11150v1,2023-05-18 17:42:56+00:00,2023-05-18 17:42:56+00:00,Islands in stable fluid equilibria,We prove that stable fluid equilibria with tri...,"6 pages, 2 figures",,,math.AP,http://arxiv.org/pdf/2305.11150v1,"Theodore D. Drivas,Daniel Ginsberg","math.AP,physics.flu-dyn"
3,http://arxiv.org/abs/2305.11148v1,2023-05-18 17:41:34+00:00,2023-05-18 17:41:34+00:00,Large Deviations Principle for the Inviscid Li...,"Using a weak convergence approach, we establis...",,,,math.PR,http://arxiv.org/pdf/2305.11148v1,"Federico Butori,Eliseo Luongo","math.PR,math.AP"
4,http://arxiv.org/abs/2103.01509v3,2023-05-18 16:32:31+00:00,2021-03-02 06:51:01+00:00,Hecke operators and analytic Langlands corresp...,We construct analogues of the Hecke operators ...,45 pages; v2: more details added; v3: to appea...,,,math.AG,http://arxiv.org/pdf/2103.01509v3,"Pavel Etingof,Edward Frenkel,David Kazhdan","math.AG,hep-th,math.AP,math.FA,math.RT"


## Ethan's thoughts

1) If we want to use techniques from natural language processing (NLP), we'll probably need a way of pulling the text of the papers as well. That could be its own challenge.

2) We should think about what exactly we want to accomplish. 

    a) Do we want to simply create a better paper recommender? If so, what would the metric be?

    b) Do we want an application that sifts through each day's arXiv submissions and passes on the ones that seem relevant? (That might be more like software engineering.)

    c) In an entirely different direction, maybe we want to use NLP to try to identify subfields within a certain discipline, and measure its performance by papers' subject tags.

    d) Similar to (c), we could use NLP to construct new identifiers (based on technical terms perhaps) that would help a user search with more precision.

In [3]:
## A check to make sure clean_cats functions properly

clean_pdes = data_utils.clean_cats(pdes)

In [4]:
clean_pdes.head()

Unnamed: 0,entry_id,updated,published,title,summary,comment,journal_ref,doi,primary_category,pdf_url,authors,categories
0,http://arxiv.org/abs/2305.11166v1,2023-05-18 17:57:21+00:00,2023-05-18 17:57:21+00:00,On the stability of homogeneous equilibria in ...,"The goal of this article is twofold. First, we...","33 pages, submitted in March to a Focus Issue",,,math.AP,http://arxiv.org/pdf/2305.11166v1,"Alexandru D. Ionescu,Benoit Pausader,Xuecheng ...",math.AP
1,http://arxiv.org/abs/2305.11160v1,2023-05-18 17:48:32+00:00,2023-05-18 17:48:32+00:00,Infinitely many conservation laws for generali...,We give a complete description of nontrivial l...,"3 pages, no figures",,,math.AP,http://arxiv.org/pdf/2305.11160v1,A. Sergyeyev,math.AP
2,http://arxiv.org/abs/2305.11150v1,2023-05-18 17:42:56+00:00,2023-05-18 17:42:56+00:00,Islands in stable fluid equilibria,We prove that stable fluid equilibria with tri...,"6 pages, 2 figures",,,math.AP,http://arxiv.org/pdf/2305.11150v1,"Theodore D. Drivas,Daniel Ginsberg",math.AP
3,http://arxiv.org/abs/2305.11148v1,2023-05-18 17:41:34+00:00,2023-05-18 17:41:34+00:00,Large Deviations Principle for the Inviscid Li...,"Using a weak convergence approach, we establis...",,,,math.PR,http://arxiv.org/pdf/2305.11148v1,"Federico Butori,Eliseo Luongo","math.PR,math.AP"
4,http://arxiv.org/abs/2103.01509v3,2023-05-18 16:32:31+00:00,2021-03-02 06:51:01+00:00,Hecke operators and analytic Langlands corresp...,We construct analogues of the Hecke operators ...,45 pages; v2: more details added; v3: to appea...,,,math.AG,http://arxiv.org/pdf/2103.01509v3,"Pavel Etingof,Edward Frenkel,David Kazhdan","math.AG,math.AP,math.FA,math.RT"
