Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revised Package Metadata proposal #642

Closed
mattgarrish opened this issue Jan 14, 2016 · 35 comments
Closed

Revised Package Metadata proposal #642

mattgarrish opened this issue Jan 14, 2016 · 35 comments
Labels
EPUB32 Issues from 3.0.1 resolved in the EPUB 3.2 specification Topic-PackageDoc The issue affects package documents
Milestone

Comments

@mattgarrish
Copy link
Member

In moving to have the Package Document include only bibliographic metadata used by reading systems for display/sorting in bookshelves, with richer metadata delegated to formal records, the following changes have been proposed:

  1. Establish that linked records are the proper location for metadata
  • give priority to bibliographic metadata contained in linked records over any found in the package document
  • have all linked records use the rel value "record"
  • records will be distinguished by their media types or, if necessary, through a properties attribute identifier
  1. Remove the following from the Packages spec:
  • refines attribute + publication refinement properties
  • OPF2 meta element
  1. Reduce metadata allowed in the package document to only the following:
  • Exactly one dc:identifier[@uid] + dcterms:modified
  • Exactly one of dc:title with new optional @file-as
  • One or more dc:language.
  • Zero or more of dc:creator with new optional @file-as.
  • Zero or one of dc:publisher.
  • Zero or more of dc:type
  • One or more baz

See also the proposal at https://docs.google.com/document/d/1okss2ictXwVqx7aQJ4ARi2ALl2GIovyz5nURE3QgmIA/edit#

@mattgarrish mattgarrish added the Topic-PackageDoc The issue affects package documents label Jan 14, 2016
@mattgarrish mattgarrish added this to the EPUB 3.1 milestone Jan 14, 2016
@mattgarrish
Copy link
Member Author

Media overlays will require a new duration attribute to handle per-item durations that were formerly handled by meta/refines.

@severdia
Copy link

severdia commented Feb 4, 2016

Hi Matt—Is there an example of "linked records" in the repo or somewhere else that shows a short example of how metadata is moved from the OPF to a linked record ?

@mattgarrish
Copy link
Member Author

The metadata group has a draft guide to common formats at http://www.idpf.org/epub/metadata/

There isn't a mapping guide to move dc elements to these. The topic of discrepancies in naming/structure was very briefly touched on on the metadata group's call today, but I don't know if the group will try to attempt cross mapping between metadata standards. I'll update if more happens on a future call.

@severdia
Copy link

severdia commented Feb 4, 2016

Excellent. Thanks!

@iherman
Copy link
Member

iherman commented Feb 5, 2016

I think that we do not have backward compatibility issues, so I would think that for RDFa we should refer to HTML5 and not XHTML. This will be better on long term. Ie, it should be metadata.html, and the media type text/html

Also, because we are talking about using Schema.org in this table and not any vocabulary in general, it is probably better to refer to RDFa 1.1. Lite, rather than RDFa 1.1 in general; RDFa 1.1 Lite has been developed in cooperation with the schema.org people after all, and it is way easier to use for end users.

@mattgarrish
Copy link
Member Author

I believe we did that table before the html serialization was formally approved. I've added both options for review, but we should discuss with the group on a future call whether we move ahead with all or some of these entries (not just the schema.org, but mods and marcxml, too).

I didn't really get much response to that question when I announced, but it was also back just before the holidays. We risk giving the perception that these are all widely used records and scaring people off when I'm not sure any get widely attached now and we don't really know which will come into common use in the future.

@TzviyaSiegman
Copy link
Contributor

We also risk looking like we are limiting the options to the handful listed
here. I can give samples of EPUB with external metadata in json-ld using a
variety of vocabularies.
On Feb 5, 2016 1:15 PM, "Matt Garrish" notifications@github.com wrote:

I believe we did that table before the html serialization was formally
approved. I've added both options for review
http://rawgit.com/IDPF/epub-revision/master/build/metadata/records.html,
but we should discuss with the group on a future call whether we move ahead
with all or some of these entries (not just the schema.org, but mods and
marcxml, too).

I didn't really get much response to that question when I announced, but
it was also back just before the holidays. We risk giving the perception
that these are all widely used records and scaring people off when I'm not
sure any get widely attached now and we don't really know which will come
into common use in the future.


Reply to this email directly or view it on GitHub
#642 (comment).

@pettarin
Copy link

pettarin commented Feb 8, 2016

It is not clear to me the rationale that led to choosing the arbitrary subset of dc:* attributes that will survive in EPUB 3.1. Use cases? Actual ones or theoretical ones? Which ones did the editors consider?

For example, certain users might be interested more in seeing dc:contributor strings than dc:publisher strings in their app library view, because "narrator" or "curator" convey more information than the "publisher" to them. For example, dc:subject values are useful when filtering books in the library. For example, users might want to read the dc:description because the author was kind enough to populate it with a synopsis.

With the proposed move of "richer metadata" to external records my feeling is that, in practice, creators will be discouraged to add said "richer metadata" to their ebooks, because a. their workflow will be more complex and b. independent reading systems will not support fetching and parsing said external meta (--- of course Readium will support whatever you decide to do...).

Also, if you want to take a radical approach, go for it in full: go "no optional, no multiple". Select a subset of metadata you want, and for each metadatum, require exactly one element with one value (possibly empty). One might ask why only one dc:identifier or one dc:title are allowed, while multiple dc:creator are allowed. I have not read a good, coherent reason for this choice. But of course I might be missing the nuances of "This is just what to show to the user in the bookshelf" when applied to different metadata...

And, for God's sake, at least deprecate the "not-blessed" dc:* elements, instead of making them suddenly illegal.

@mattgarrish
Copy link
Member Author

The metadata was changed based on a survey of developers to determine what reading systems are actually using, not based on what might be nice to have or what could be useful one day use cases. That approach has existed for the last five years and has only led to confusion and complaints that most metadata is not used anywhere.

The restriction to one identifier is because reading systems only use the unique identifier for identification. Allowing multiple titles hasn't led to their use in display, so publishers already are concatenating them, hence that restriction. The group was initially toying with the idea of a single creator field for the same reason, but, unlike the title, reading systems will sort and arrange by the separate creator names. They just sometimes concatenate names in ways the publishers don't want. I'm actually a bit surprised publisher ended up in the list and not contributors, but that's where surveying what is in use turned up some surprises.

And there's no particular difference between using a DCMES element and the equivalent property in the meta tag, so it's not like the ability to express the other metadata is gone. I've already started a discussion on the working group list since the release that we need to be more explicit about the relationship between the two, and note how the restrictions apply not just to elements because it will affect how metadata translates to the new browser-friendly format (i.e., elements and properties will both translate to properties in json).

There's also a proposal for the next cycle of the revision to allow nesting of meta elements and proper alignment with RDFa (Lite) so that a real framework exists for extensions, and so that the package document itself isn't restricted, only what is defined for use by all reading systems.

Having that minimal metadata set retains compatibility with epub 3.0 reading systems that expect the elements, while shifting the rest to the meta element extension and external records. That was the goal of the group.

@kevinhendricks
Copy link

I am a developer of Sigil, an opensource GPL epub editor, that runs on Linux, Mac, and Windows. Sigil has long supported epub2 but is only recently started to add epub3 support. Of course, I just found this issue immediately after coding up a user-friendly tree view based gui editor for epub3 metadata. Figures ...

Please consider the user perspective of how users use ebook library software like calibre to sort and find their ebooks. Removing things like dc:description goes quite contrary to the needs of typical users. Futhermore, you are generating yet again another non-backwards compatible change and ignoring small/independent epub publishers in the process. And you are partially reinventing the wheel you just broke. Epub2 small publishers already knew how to use the opf attribute namespace to add opf:scheme, opf:file-as, opf:role directly to dc:creator to achieve what they needed. Epub3 then broke that with the refines nonsense and then added insult to injury by allowing chaining of refines (a sure anti-KISS darwin award winning idea if I ever saw one!). Now you are proposing to drop support for role and contributer. Where is the sanity in this? Why not stick to simple dc:* metadata and allow role, scheme, and file-as attributes directly on them (no attribute namespace prefix needed). Display-sequence can be indicated by sequence presented in metadata, again simplifying things. This is rich enough for small publishers and users to actually use, would be well understood by current epub2 developers, but eliminates the need for refines and chained refines to simplify things?

Seems like a simple, logical choice to me.

@kevinhendricks
Copy link

Given the growth in self-publishing, using epub or kindle built from epubs, don't you think that adding the voice of the ebook user, and small ebook publishers and even self-publishers, would be important to your working group? Asking developers from just the big publishing houses and other institutional interests is what made the epub3 spec the mess it turned out to be. Simplification is a laudable goal and removing non-html5 spec complicating pieces such as epub:type (hopefully replaced by "role"), epub:switch, epub:trigger, the need for namespaces everyplace, seems like a good idea that everyone will support. And simplifying metadata is good too as it turned into a dumping ground for anything not fitting in the opf, supply chain info, and special interest groups. I just feel your proposal to drop almost everything is not geared to users and small, independent publishers, and would force a meta property duplicate version of standard dc tags just to do what we used to do quite successfully in epub2 and simple dc metadata and some simple extra attributes like file-as, role, and scheme.

@mattgarrish
Copy link
Member Author

We can't get rid of the meta tag, as it's need for core epub functionality (fixed layout metadata, media overlay metadata, etc.).

The problem of the dc elements and properties both translating to properties in json hasn't been fully addressed yet, and it could be a case for a return to allowing any dc: elements in the package and restricting the dc: properties for simplicity. It's too early to say, and I was only addressing the thinking of the metadata group that went into this proposed change.

But this is why we've put out the editor's draft for review and comment. The feedback is appreciated.

@pettarin
Copy link

pettarin commented Feb 8, 2016

Matt, thank you for taking time to write the rationale behind the draft.

However, I am still unconvinced. Kevin listed some use cases where some of the "forbidden" dc: elements are used. Let me just add two more examples --- with apologies for the self-citation.

  1. In my app Menestrello, which is designed for Audio-eBooks, the bookshelf shows the narrator metadatum for ebooks that have it, and the app allows the user to sort/search using it. Usually it is listed in the EPUB file as dc:contributor, not dc:creator, I think complying with the DC semantics.
  2. I am used to specify two dc:identifiers in my EPUB 3 files. The unique-identifier is a UUID4; the other is the ISBN, and again my app Menestrello show them both.

In both cases the current EPUB 3.1 would force me to coerce the dc semantics or to move some pieces of information in an external record or in a meta essentially replicating the role of the original dc: element. Of course I am not happy with this, especially because I still do not see the "harm" that allowing all the optional/multiple dc: elements produce.

On the other hand, I am the first person happy to see the current refine mechanism go, it never felt natural to me. A simpler, attribute-based mechanism for roles and machine-readable values would look more appealing to me as well.

@mattgarrish
Copy link
Member Author

Don't forget, this draft is intended to be provocative, as noted at the top of the changes document. The working group is trying to gauge which features are actually in use, as there's a strong desire to move forward without so much baggage that complicates integration with the open web.

But I'm also not out to argue that what is in the specification is right and unchangeable, as the ambition is not to force changes that aren't good for the ecosystem. I just wanted to give some clarification about how we ended up at that set.

The refines attribute is an example of too much compromise, and that's the kind of change we're trying to avoid.

@kovidgoyal
Copy link

I'm just dropping in to say that if you want to make backwards incompatible changes, please, dont do it in a point release. From glancing over your changes document, it seems to me that you want to make several breaking changes. That's great, EPUB 3 could do with some serious breaking. But name it EPUB 4. I really dont want to have tell my users that calibre supports EPUB 3.1 but not EPUB 3.

As for the proposed metadata changes. I'll say the following metadata fields are most often used by calibre users:

title
multiple authors
rating
series
series_index
tags
publisher
identifiers (isbn, asin, etc -- one identifier is pointless)
comments (dc:description)

Make the implementation of a small set of fields (preferably the ones I listed above) dead simple and as backwards compatible as possible. People in the wild write all sorts of broken software, and EPUB does not help with its insane and completely unnecessary level of complexity. That means that EPUB using applications have to deal not just with an overly complicated spec but also dozens of broken implementations of it.

@mihailim
Copy link

In addition to the (very good) points raised by Kevin and Kovid, I'd like to add a sample use case of my own (albeit more narrow in scope): Anthologies / short story collections.

Typically, such publications have one or more editors and a bunch of contributing authors. The canonical way to specify the contributors is:

  • one or more dc:creator elements with the relator code "edt" for the editor(s)
  • one or more dc:contributor elements with the relator code "aut" for the author(s) contributing the collected works

In this case, the dc:contributor elements are almost as important as the dc:creator elements; relegating them to separate storage in a backwards incompatible fashion will in practice result in this metadata being rendered inaccessible to presentation.

Please do not underestimate the importance of the distributed ecosystem of scattered, small-scale software. Like Kevin mentioned, there are more stakeholders in EPUB than the large publishing shops. Pre-ossifying the spec that way will impede grassroots adoption.

@HadrienGardeur
Copy link

cc @kovidgoyal @mihailim @kevinhendricks

In addition to the work on OPF itself, there's also an on-going effort to design an OPF alternative that'll be used for unzipped EPUB on the Web (and potentially for EPUB 4). This effort is based on JSON-LD and I've tried to accommodate some of the needs expressed in these comments:

  • in additions to authors, other contributors such as editors, translators and illustrators become first-class citizens
  • other contribution types can either be expressed with a generic "contributor" element or by relying on extensions (schema.org and the bib extension has a few of those)
  • support for series and collections

I'd like to get your opinion on the current proposal.

The complete proposal for an OPF alternative is available at: https://github.com/dauwhe/epub31-bff
There's also a separate Gist with two metadata example (a simple one and a complex one), two different JSON-LD context (schema.org and DublinCore) and the results in RDF (Turtle):
https://gist.github.com/HadrienGardeur/03ab96f5770b0512233a

Quick question for @kovidgoyal, for the series_index in Calibre, do you use an integer? schema.org supports both string and integer for the position in a series and I can find arguments for/against both of them.

@kovidgoyal
Copy link

calibre uses a floating point number for series_index with a max precision of two digits. So you can have 1.01 to 1.99. I have found that this level of precision meets the needs of ~ all users.

IMO it needs to be a numeric type, how does one order books in a series with a string type in the general case? Indeed, the very name of field series__index_ indicates it needs to be numeric.

@kovidgoyal
Copy link

From quickly looking through that gist, some comments;

  1. I could not see a way to specify multiple items of a type. For example, multiple authors, multiple identifiers.

  2. Is the description field free-form? Does it allow HTML?

  3. Extensibility: There needs to be some mechanism for applications to add their own fields in a namespace that is guaranteed to not be used by future revisions of the spec. So for example, application specific field names could be prefixed by a hyphen, or a period or something. The important point is to have it be specified in the spec. Perhaps it already is, I am only going of the gist here.

@HadrienGardeur
Copy link

  1. It's JSON, you can simply use an array for that, for example:
"author": ["Jules Verne", "Alexandre Dumas"]

It works for both literals and objects:

"author": [
  {
    "name": "Jules Verne",
    "identifier": "http://isni.org/isni/0000000121400562",
    "sort_as": "Verne, Jules"
  }, {
    "name": "Alexandre Dumas",
    "identifier": "http://isni.org/isni/0000000121012885",
    "sort_as": "Dumas, Alexandre"
  }
]

That said, EPUB BFF will most likely align with OPF in 3.1 for uniqueness of some elements (one identifier, one title). It doesn't mean that you can't include more identifiers though, but you'll have to use extensions for that:

"identifier": "urn:uuid:2e37ec76-1242-4698-8cf7-b65747676c0f",
"http://schema.org/isbn": "9780000000001",
"http://https://calibre-ebook.com/internal_identifier": "18492"
  1. For now we're using http://schema.org/description which expects http://schema.org/Text but we'll have to think about this a little more.

  2. It's JSON-LD and the idea is that we'll forbid local contexts, which means that you can define very easily your own extensions. All you have to do is to use full IRIs for all the new keys that you want to add to the metadata. Check my example above for identifiers.

@laudrain
Copy link

  1. I confirm that book description may contain HTML tags. We do actually push them in ONIX supporting text.

Luc

Le 26 févr. 2016 à 16:55, Hadrien Gardeur <notifications@github.commailto:notifications@github.com> a écrit :

  1. It's JSON, you can simply use an array for that, for example:

"creator": ["Jules Verne", "Alexandre Dumas"]

It works for both literals and objects:

"creator": [
{
"name": "Jules Verne",
"identifier": "http://isni.org/isni/0000000121400562",
"sort_as": "Verne, Jules"
}, {
"name": "Alexandre Dumas",
"identifier": "http://isni.org/isni/0000000121012885",
"sort_as": "Dumas, Alexandre"
}
]

That said, EPUB BFF will most likely align with OPF in 3.1 for uniqueness of some elements (one identifier, one title). It doesn't mean that you can't include more identifiers though, but you'll have to use extensions for that:

"identifier": "urn:uuid:2e37ec76-1242-4698-8cf7-b65747676c0f",
"http://schema.org/isbn": "9780000000001",
"http://https://calibre-ebook.com/internal_identifier": "18492"

  1. For now we're using http://schema.org/description which expects http://schema.org/Text but we'll have to think about this a little more.

  2. It's JSON-LD and the idea is that we'll forbid local contexts, which means that you can define very easily your own extensions. All you have to do is to use full IRIs for all the new keys that you want to add to the metadata. Check my example above for identifiers.


Reply to this email directly or view it on GitHubhttps://github.com//issues/642#issuecomment-189336670.

@kovidgoyal
Copy link

  1. I can guarantee that if you make the values of keys optionally strings or arrays, there will be software that will expect them to be either only strings or only arrays and will barf when it sees something else. Better to be more explicit and make the values always arrays or always strings. The fewer if statements there are, the less broken stuff there will be.

  2. HTML is good, please do not make descriptions text only. If you do, calibre for one, will be forced to ignore them in favor of a custom field, since comments in calibre allow HTML.

  3. OK. I'm not familiar with JSON-LD, but as long as there exists a well defined strategy for applications to add their own keys, that's fine.

@HadrienGardeur
Copy link

@kovidgoyal that's a fair point (1) but there are pretty big benefits for supporting both:

  1. Syntax when you're using literals is extremely simple and it only gets more complex if you need to.
  2. From a JSON-LD and RDF standpoint, they're both valid expressions (literals or objects).
  3. Having to always use an array, even when you have a single literal in there makes the syntax unnecessarily verbose.
  4. You'll need if statements anyway to separate literals from objects, and if we don't support that (always use an object) it'll make the syntax even more unnecessarily complex (won't feel much like JSON anymore).

@kovidgoyal
Copy link

  1. Syntax when you're using literals is extremely simple and it only gets more complex if you need to.

Syntax for the case of a single value is simply two extra characters:
"value" -> ["value"]

  1. From a JSON-LD and RDF standpoint, they're both valid expressions (literals or objects).

But, you say you are going to restrict title and identifiers to not allow
multiple values. So now not only do people have to remember that fields
can be both literals/objects and arrays, they also have to remember that this
applies only to some fields and not others. If you are restricting
some fields to a subset of valid expressions, I dont see why you cant
restrict others.

  1. Having to always use an array, even when you have a single literal in there makes the syntax unnecessarily verbose.

See (1)

  1. You'll need if statements anyway to separate literals from objects, and if we don't support that (always use an object) it'll make the syntax even more unnecessarily complex (won't feel much like JSON anymore).

Yes, but there is an extra if statement required. And people that write
software in the real world dont read specs, they look at examples. If
the examples they see contain only one of the possibilities, they will
often assume that is the only allowed possibility.

I cant count the number of programs that have problems with XML
namespaces in OPF because they assume that the prefixes are always opf:
and dc: I had to write special code in calibre's EPUB output plugin to
remap namespace prefixes to workaround this bug, it is so common. And I
wont even start on all the software that tries to parse XML using
regexes...

@HadrienGardeur
Copy link

For namespaces, that won't be a problem anymore since the current proposal is to disallow additional context definition. You can only encounter two type of elements:

  • the ones defined in the specification (identifier, title and such), without any namespace or prefix
  • or full IRIs for extensions

Regarding always using arrays vs allowing both strings/arrays, it's basically deciding between:

  • ease of authoring especially manually (it's the most natural syntax in JSON)
  • making life slightly easier for RS and software that parse such metadata (less if statements)

That said I'm not entirely sure what you're advocating for, are you saying that:

  1. properties that can have more than one value should always use an array?
  2. that we should always use an array, no matter what the property is?

I guess it's probably 1 since 2 really doesn't make much sense (people would get false expectations if we start using an array for identifier or title).

@kovidgoyal
Copy link

Regarding always using arrays vs allowing both strings/arrays, it's basically deciding between:

  • ease of authoring especially manually (it's the most natural syntax in JSON)

I really hope you are not encouraging people to write JSON by hand. That
way lies endless bugs.

  • making life slightly easier for RS and software that parse such metadata (less if statements)

Or to put it another way, reducing the probability and therefore number
of bugs in software that consumes and produces EPUB, thereby, reducing the number of
bugs seen by end users of EPUB.

That said I'm not entirely sure what you're advocating for, are you saying that:

  1. properties that can have more than one value should always use an array?

Yes.

Or do not allow any properties to have more than one value. Since
you are already artificially restricting some properties, restrict all
of them. If that is not feasible, then at least make properties that
accept multiple values always use an array.

@HadrienGardeur
Copy link

I really don't think that we want to restrict authors, translators and such to a single element. It's feasible of course but a bad idea plus definitely a step backwards in terms of what you can do with EPUB metadata.

I'll bring that point to the group to discuss it, as long as it's restricted to elements that are 1-* it's worth considering.

@kevinhendricks
Copy link

@kovidgoyal

I cant count the number of programs that have problems with XML
namespaces in OPF because they assume that the prefixes are always opf:
and dc: I had to write special code in calibre's EPUB output plugin to
remap namespace prefixes to workaround this bug, it is so common.

Luckily epub3 has a set of predefined prefixes that need no xmlns definitions and are not allowed to be redefined. It helps to clean up the overhead of having to track url prefixed elements all around the parsed tree. And I for one am very glad you/calibre remap inconsistent prefixes to more established versions given how inconsistent many xml packages are with namespaced attributes especially.

In general, I completely agree with you. Too much choice and not enough standardization simply makes software in the wild prone to bugs. And making backwards incompatible changes, leads to many problems as well. It can take years for code to find all of the corner cases and handle them. I personally think a simple epub4 with polyglot xhtml/html5 as its base, removal of refines and return to simple dc metadata with extra attributes for file-as, and role, standardized prefixes for namespaces, keeping the ncx, keeping the guide, stop farting with the recognized vocabulary, etc would allow, an epub 4 to flourish. Adding more "renditions", "collections", "distributable objects", and educational epubs, just clutters things up and adds no real value. Standards makers/developers really do need to take a course in basic engineering 101 - KISS.

@kovidgoyal
Copy link

@kevinhendricks Yes, namepsaces are XML's most unfortunate feature. I have never come across any use of namespaces in any XML based application that could not have been solved in a simpler and more robust fashion in other ways.

The problem with standards processes is that there are too many stakeholders and standards committees inevitably try to satisfy everyone, which usually means keep things simple is a lost cause. Human nature I guess.

@HadrienGardeur All the best, I hope you can convince the group. While I'm here, I'll just point out that calibre uses multiple values for the identifiers field as well. For example, it is used to store ids corresponding to a book from multiple sites: amazon, google books, goodreads, wordcat, etc. So for me, allowing only a single identifier will mean that calibre will have to use a custom identifiers field.

@acabal
Copy link

acabal commented Mar 16, 2016

After giving this some thought I'm convinced it's worth the effort to restrict what metadata can be included in the package document.

Publishers will include whatever metadata they deem necessary, regardless of where it goes. This appears to be acknowledged by the working group because it's being suggested to move metadata out to other files besides the package document.

But if we're explicitly allowing metadata outside of the package document, we haven't actually reduced metadata at all--and then reading systems are going to have to adapt their current epub parsing engines to deal with that change, for no noticeable gain. The metadata is still there somewhere, and it still has to be read somehow. (And publishers will have to adopt their workflows too.)

In this scenario all we've done is made a headache for developers of reading systems and big-data processors of epubs, who have to adjust how they parse this big change in the spec; and we've made a headeache for publishers, who have to adjust their publishing workflow to put their metadata elsewhere; and there's no benefit for human readers, who don't really care how or where metadata is stored, as long as the reading system displays it. It seems like ideological purity at the expense of almost everyone involved in the actual nuts and bolts production of ebooks and reading systems--shunting the problem of messy metadata from one file to another, but not actually solving it.

I for one think having as much metadata in ebooks as possible is important. Not all of it has to be displayed, but a metadata-rich self-contained ebook file is important for machine processing and archival purposes. Maybe the metadata isn't in use today, but a decade or two from now having a metadata-rich self-contained file will be appreciated by archivists and readers on future platforms.

Maybe instead of restricting metadata, the spec could pick an existing metadata schema to champion, like schema.org, instead of using a mess of DCMES/epub-specific definitions that each RS interprets in its own special way. A subset of that metadata could be considered "required" to be understood in a certain way by reading systems, and the rest could be ignored, but available to any person or software that has an interest.

Re. JSON, I'm a little confused as to where that comes in to play. If it's being suggested to include JSON as a metadata format in an epub file, I very strongly feel that the last thing we need in an epub is another language and format for ebook producers and RS developers to have to deal with. XML/HTML work just fine and are very well suited to describing the kind of metadata that a typical ebook includes.

@JayPanoz
Copy link

Sorry for being late to the party but I feel like this is one important question.

@mattgarrish wrote

The metadata was changed based on a survey of developers to determine what reading systems are actually using, not based on what might be nice to have or what could be useful one day use cases.

Are the (anonymized) results of this survey published somewhere? I've searched this morning but couldn't find anything about that, not even a link in the Google Docs.

Rest assured it’s not about trust, some of us are just a little bit curious since those surveys are being referred to but it seems—and I might be wrong so if this is indeed the case, sorry for the inconvenience—we don't have access to the results.

@mattgarrish
Copy link
Member Author

A summary of the result was written up here:
https://docs.google.com/document/d/1dv1357SIB1kFPSkuj_dtMD0PrzewcP7lqZJvtTYJpts/edit

@mattgarrish
Copy link
Member Author

There should be a formal update on this issue in the next week or so. The metadata group has been considering this feedback and a revised proposal would re-introduce the stripped elements with some additional attributes to replace functionality removed by taking away the refines attribute. It still needs to be vetted with the full working group, though.

@JayPanoz
Copy link

@mattgarrish Thank you very much.

@mattgarrish
Copy link
Member Author

It looks like we won't be publishing the next editor's draft for a little while still, but to update this issue the full set of DC elements will be returned. The refines attribute will be superseded by a set of dedicated attributes that cover the key information that reading systems need (role, file-as, id-type and a few others).

@mattgarrish mattgarrish added the EPUB32 Issues from 3.0.1 resolved in the EPUB 3.2 specification label Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EPUB32 Issues from 3.0.1 resolved in the EPUB 3.2 specification Topic-PackageDoc The issue affects package documents
Projects
None yet
Development

No branches or pull requests