BibTeXML vs. bibteXMP #938

koppor · 2016-03-11T15:21:40Z

JabRef 3.2

It seems that JabRef offers a second kind of XML serialization in BibTeX:

xmlns:bibtex='http://jabref.sourceforge.net/bibteXMP/'

IMHO, it is not worth to keep two different XML Schemas for an XML serialization of BibTeX. AFAIK, there isn't even one for JabRef's XML. Therefore, I propose that we should use BibTeXML only and migrate old XMP meta data to the BibTeXML format.

XMP examples can be found at

jabref/src/test/java/net/sf/jabref/logic/xmp/XMPUtilTest.java

Line 139 in fc82796

    
           return "<rdf:Description rdf:about='' xmlns:bibtex='http://jabref.sourceforge.net/bibteXMP/' "

.

The text was updated successfully, but these errors were encountered:

koppor · 2016-03-11T15:47:00Z

Refs #898

oscargus · 2016-03-11T16:15:59Z

I'm not really following the argumentation. One may argue of different export formats, but how is it relevant that they are both XML? Isn't it more of an issue if it is a relevant format in itself?

koppor · 2016-03-11T17:11:16Z

Context: The format is used for storing BibTeX data in XML files using the XMP functionality (follow net.sf.jabref.logic.xmp.XMPSchemaBibtex). This PDF meta data is used by other people to exchange PDFs with the correct bibliographic data without being forced to send the bib entry along with the PDF in two files.

I am arguing that JabRef uses a proprietary format which is not used elsewhere. Thus, our XMP data cannot be processed by other software. I see the point, that the last commit at the current BibTeXML repository is from 2011. Nevertheless, I vote for joining forces. These formats are too similar to go into different directions.

I see following alternatives:

Replace JabRef's bibteXMP by the canonical bibtex representation
Completely use RDF. There seem multiple BibTeX2RDF converters available: https://www.w3.org/wiki/ConverterToRdf#BibTex
Maybe, OWL is also an option: http://zeitkunst.org/bibtex/0.1/
Move to BiBTeXML (as outlined in the original issue)
Use MODS
Keep everything as is

Somehow, the current code seems to use "Dublin Core", which reads good. Maybe, that code can just be used and the other serialization using {http://jabref.sourceforge.net/bibteXMP/}bibtex can be removed completely. Needs to be investigated further.

In case everything is replaced by Dublin Core, one can update PDFBox - see #1096.

oscargus · 2016-03-11T17:14:27Z

Ah, OK, so bibteXMP is JabRef's own format? Then it clearly makes more sense so not support exporting in that.

Siedlerchr · 2016-03-11T21:35:41Z

The question would be: How many people actually use the XMP feature?
From my point of view I would suggest supporting the BibteXML Format and maybe add the RDF/OWL stuff as an addition.
Interestingly there is also a Paper about BibtexML:
https://www.researchgate.net/publication/2564256_BIBTEXML_An_XML_Representation_of_BIBTEX

From a quick look at the Code you referenced, I saw that it uses rdf-Tags...:confused:

koppor · 2016-03-12T14:32:47Z

The XMP feature is the central tool to distribute PDFs with bibliographic information. I learned it from Adrian Daerr (possibly @adriandaerr?).

I am also confused by the code and also had the strange feelings about nesting JabRef's bibtexml into rdf tags. Therefore, I proposed to focus on Dublin Core (see above).

dret · 2016-03-13T19:03:39Z

thanks for inviting me to the discussion! the BibTeXML we developed and implemented (http://dret.net/netdret/publications#wil01e) is a different one than the sourceforge repo. the paper is from 15 years ago, and while we used the language in a later project (http://dret.net/projects/sharef/), the software produced by that project is not really used anywhere, as far as i can tell. i did hand the sources to some people who liked it and wanted to have a bibtex-xml converter, but i don't think anybody ever made their versions public. i think our XML schema was pretty well-desgined, but it's something i haven't looked at in quite a while.

Lenchik · 2016-04-04T18:31:21Z

Either format you prefer to embed in PDF, would be great if it is compatible with PDF/A compliance checks.
JabRef 2.x embeds caused errors like:

XMP metadata property used, which is not predefined in the XMP specification of January 2004. There is no XMP extension schema present in the PDF defining the use and contents of this property. Some PDF-based ISO standards require that all XMP metadata properties are either predefined or defined in an embedded extension schema.

If it will be format like BibteXML, that can be exported in xml it would be also great to have some minimal example for correct embedding it through latex with xmpincl or hyperxmp packages. Use case: compiling thesis with embeded metadata precomposed with JabRef.

lenhard · 2016-04-08T11:14:27Z

After dealing with this in #1096 I think the most portable solution would be to drop the JabRef bibteXMP and to encode everything into Dublin core (which we already do on top of our custom serialization).

That is, if we do not decide to drop the XMP functionality completely.

LSinev · 2016-04-22T17:38:58Z

Some info about correct storage of xmp inside pdf (to be compatible with pdf/a for example) can be found with samples at http://www.pdflib.com/knowledge-base/xmp-metadata/xmp-in-pdfa/
Here goes free xmp validator: http://www.pdflib.com/knowledge-base/xmp-metadata/free-xmp-validator/
Some java code samples can also be found at that website.

koppor · 2016-07-14T12:35:44Z

Idea (as discussed with @hummelriegel): Add bibtexs of cited entries to the PDF. This is especially useful for a self-written paper.

koppor · 2016-12-13T22:11:13Z

Further options include bibtexml and MODS. I think, dublin core is still the way to go as it is standards-based. We should go in this direction.

koppor · 2016-12-15T12:22:20Z

Refs koppor#6

DesBw · 2017-04-06T21:36:22Z

Hi guys. I am not developer. I am just another user. I really hope that you maintain the XML feature. This one of the most important unique feature of Jabref that keep me come back time and time again (after using great reference manager like Bookends). The XML is useful not just for sharing Pdf files. Embedding the information into the Pdf is very useful for powerful search tools like Deveonthink[Mac], Spotlight[MaC], dtSearch[Windows]. With the embedded data, it is possible to search Pdf files by their author, title and the like data. In addition, re-generating the Jabref library from the pdf files (incase the library is corrupted or deleted) is possible with the embedded data. I had a couple of cases where my pdf files get dissociated from the reference. I drag them back. Voilà, I have the whole reference. This is just so great.

lenhard · 2017-04-07T07:11:35Z

Hi dellu. Thanks for the praise! And no worries, we have no intentions of removing support for this feature. Quite the contrary, we would like to update and improve it. Unfortunately, this has so far failed due to issues in the libraries that we use for this functionality. As a result, I assume that there will be no significant changes here in the near future.

DesBw · 2017-04-20T00:14:25Z

Thank you @lenhard. I am glad you are going to keep the feature.

What do you guys think of this ?
They also write the metadata into the file using ExifTool. They use the standard bibtex tags. The standard Bibtex is nice.

lenhard · 2017-04-20T07:22:46Z

@Dellu

Interesting link, thanks! Unfortunately, it will not be easy to interact with that tool or the ExifTool. The former is written in C++ and the latter in Perl, whereas JabRef is written in Java. There is always a way around the language differences, but in my point of view we should stick to the Java ecosystem and build a JabRef where everything is closely integrated and without language-related friction.

Other developers might have a different opinion, though.

koppor · 2017-04-27T06:52:05Z

Together with @snisnisniksonah I am investigating whether we can use Dublin Core.

Current steps:

Read/write PDF annotations using Dublin Core using PDFBox 2.x (refs Update pdfbox to 2.0.0 and migrate from jempbox to xmpbox #1096)
Extract command line tool to convert old PDF annotations to the new format (Refs Removed a number of warnings, added copyright etc #266 (comment)) -> XMPUtil will released separately.

Results:

JabRef 4.x depending on PDFBox 2.x
XMPUtils depending on PDFBox 1.x

tobiasdiez · 2017-04-27T09:57:42Z

Nice! I think the XMPUtil is not that important since in most cases you can just write the information again to the PDF using Dublin Core and thus overwriting / "converting" the old XMP data.

koppor · 2018-02-07T02:45:11Z

Note to self: Do not forget #938 (comment). pdflatex can easily do that: authorarchive. Check the example PDF.

Fixes JabRef#938

This fixes #938 - Reading and writing multiple dublinCore entries works: XMPUtilWriter supports mutliple metadata entries in dublinCore and a single entry in the PDDocumentInformation. If you want to test the reading of multiple entries, the PDF file JabRef_multipleMetaEntries.pdf contains three metadata entries in DublinCore for testing locally. - Removed to much code when refactoring the XMPUtil. Non XMP metadata are also relevent, when retrieving org.apache.pdfbox.pdmodel.PDDocumentInformation - Update pdfbox and fontbox from 1.8.13 to 2.0.8 and migritate from jempbox to xmpbox. See pull #1096. - Refactor extraction from DublinCoreSchema - The tests cover the most important use cases, which include reading and writing metadata from pdf files. Both formats, DublinCore and PDMetadata (which are no XMP metadata) are tested. - Separated XMPUtils in a reader and a writer utitlity class. - add meaningful names in DublinCoreExtractor and use StringUtils.isNullOrEmpty - Log exception in XMPUtilShared

koppor added type: enhancement status: help wanted labels Mar 11, 2016

koppor assigned koppor and unassigned koppor Mar 11, 2016

koppor added the stupro label Mar 11, 2016

koppor mentioned this issue Mar 11, 2016

Completely move to GitHub Zearin/BibTeXML#1

Closed

This was referenced Apr 6, 2016

Update pdfbox to 2.0.0 and migrate from jempbox to xmpbox #1096

Closed

BibTeXMLImporterTest #511

Merged

koppor mentioned this issue May 2, 2016

corrupt BibTeXML export #1337

Closed

koppor mentioned this issue Jul 13, 2016

bib per pdf koppor/jabref#121

Closed

koppor removed the status: help wanted label Jul 14, 2016

koppor mentioned this issue Dec 12, 2016

Reenable more tests #2371

Merged

7 tasks

koppor mentioned this issue Dec 13, 2016

Investigate net.sf.jabref.importer.EntryFromPDFCreator koppor/jabref#78

Closed

koppor mentioned this issue Dec 16, 2016

Rename "XMP data" to "XMP-metadata" #2388

Merged

koppor removed the stupro label Mar 9, 2017

johannes-manner added a commit to johannes-manner/jabref that referenced this issue Feb 7, 2018

[WIP] Dublin core

282f066

Fixes JabRef#938

johannes-manner mentioned this issue Feb 7, 2018

[WIP] Dublin Core #3710

Merged

5 tasks

koppor closed this as completed in #3710 Feb 20, 2018

This was referenced Oct 19, 2019

Add ADR for dublin core koppor/jabref#357

Open

Add bibtex of cited papers to PDF koppor/jabref#358

Open

koppor mentioned this issue Jul 1, 2020

Add Collection of Comp Sci Bibliographies fetcher #6664

Merged

5 tasks

koppor mentioned this issue Mar 15, 2023

RDF export uses bad URI for namespace 'bibo' #8920

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BibTeXML vs. bibteXMP #938

BibTeXML vs. bibteXMP #938

koppor commented Mar 11, 2016

koppor commented Mar 11, 2016

oscargus commented Mar 11, 2016

koppor commented Mar 11, 2016 •

edited

Loading

oscargus commented Mar 11, 2016 via email

Siedlerchr commented Mar 11, 2016

koppor commented Mar 12, 2016

dret commented Mar 13, 2016

Lenchik commented Apr 4, 2016

lenhard commented Apr 8, 2016

LSinev commented Apr 22, 2016

koppor commented Jul 14, 2016

koppor commented Dec 13, 2016

koppor commented Dec 15, 2016

DesBw commented Apr 6, 2017 •

edited

Loading

lenhard commented Apr 7, 2017

DesBw commented Apr 20, 2017 •

edited

Loading

lenhard commented Apr 20, 2017

koppor commented Apr 27, 2017

tobiasdiez commented Apr 27, 2017

koppor commented Feb 7, 2018

BibTeXML vs. bibteXMP #938

BibTeXML vs. bibteXMP #938

Comments

koppor commented Mar 11, 2016

koppor commented Mar 11, 2016

oscargus commented Mar 11, 2016

koppor commented Mar 11, 2016 • edited Loading

oscargus commented Mar 11, 2016 via email

Siedlerchr commented Mar 11, 2016

koppor commented Mar 12, 2016

dret commented Mar 13, 2016

Lenchik commented Apr 4, 2016

lenhard commented Apr 8, 2016

LSinev commented Apr 22, 2016

koppor commented Jul 14, 2016

koppor commented Dec 13, 2016

koppor commented Dec 15, 2016

DesBw commented Apr 6, 2017 • edited Loading

lenhard commented Apr 7, 2017

DesBw commented Apr 20, 2017 • edited Loading

lenhard commented Apr 20, 2017

koppor commented Apr 27, 2017

tobiasdiez commented Apr 27, 2017

koppor commented Feb 7, 2018

koppor commented Mar 11, 2016 •

edited

Loading

DesBw commented Apr 6, 2017 •

edited

Loading

DesBw commented Apr 20, 2017 •

edited

Loading