New: arXiv CSL-JSON adapter
- Export-only
- Example:
$ dorsal run dorsalhub/arxiv-pdf /mnt/c/testsmol4/2405.06604v1.pdf --export=csl
[
{
"id": "2405.06604",
"type": "article",
"publisher": "arXiv",
"title": "Explaining Text Similarity in Transformer Models",
"abstract": "As Transformers have become state-of-the-art models for natural language\nprocessing (NLP) tasks, the need to understand and explain their predictions is\nincreasingly apparent. Especially in unsupervised
applications, such as\ninformation retrieval tasks, similarity models built on top of foundation model\nrepresentations have been widely applied. However, their inner prediction\nmechanisms have mostly remained opaque. Recent
advances in explainable AI have\nmade it possible to mitigate these limitations by leveraging improved\nexplanations for Transformers through layer-wise relevance propagation (LRP).\nUsing BiLRP, an extension developed for
computing second-order explanations in\nbilinear similarity models, we investigate which feature interactions drive\nsimilarity in NLP models. We validate the resulting explanations and\ndemonstrate their utility in three
corpus-level use cases, analyzing\ngrammatical interactions, multilingual semantics, and biomedical text\nretrieval. Our findings contribute to a deeper understanding of different\nsemantic similarity tasks and models, highlighting
how novel explainable AI\nmethods enable in-depth analyses and corpus-level insights.",
"author": [
{
"family": "Vasileiou",
"given": "Alexandros"
},
{
"family": "Eberle",
"given": "Oliver"
}
],
"issued": {
"date-parts": [
[
2024, 5
]
]
},
"URL": "https://arxiv.org/abs/2405.06604",
"number": "2405.06604"
}
]
Outputs saved successfully:
↳ /home/user/sandbox/2405.06604v1.dorsal.json
↳ /home/user/sandbox/2405.06604v1.csl.json
Removed: some low-value parsers:
arxiv.from_bibtex- Further testing revealed too many edge cases when the record wasn't natively exported. A future implementation should use https://github.com/sciunto-org/python-bibtexparserarxiv.from_ris- Fragile, and some edge cases that would take more effort to support properly.document.from_md- also fragile. Basically only works when dealing with Dorsal exported files.
Full Changelog: v0.3.0...v0.4.0