Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Rename "sub verbo" term to "sub-verbo" #72

Open
rmzelle opened this Issue Aug 23, 2011 · 35 comments

Comments

Projects
None yet
7 participants
Owner

rmzelle commented Aug 23, 2011

As discussed in http://sourceforge.net/mailarchive/message.php?msg_id=27419940 , having a space in term names gives problem with conditional testing, so we should migrate "sub verbo" to "sub-verbo".

Owner

bdarcus commented Aug 23, 2011

Yes; the sooner the better.
On Aug 23, 2011 10:42 AM, "rmzelle" <
reply@reply.github.com>
wrote:

As discussed in
http://sourceforge.net/mailarchive/message.php?msg_id=27419940 , having a
space in term names gives problem with conditional testing, so we should
migrate "sub verbo" to "sub-verbo".

Reply to this email directly or view it on GitHub:
#72

Owner

rmzelle commented Aug 31, 2011

On second thought, this issue should probably be solved in CSL 1.0.1 by just also allowing "sub-verbo" as a value on the "locator" conditional, while keeping the "sub verbo" term. CSL processors will have to link the two. We'll have to make a note of this in the spec, but it's the simplest solution. We can do a full term migration in CSL 1.1.

Owner

bdarcus commented Aug 31, 2011

On Wed, Aug 31, 2011 at 11:07 AM, rmzelle
reply@reply.github.com
wrote:

On second thought, this issue should probably be solved in CSL 1.0.1 by just also allowing "sub-verbo" as a value on the "locator" conditional, while keeping the "sub verbo" term. CSL processors will have to link the two. We'll have to make a note of this in the spec, but it's the simplest solution. We can do a full term migration in CSL 1.1.

But I thought we established there are no (or a tiny handful of)
existing styles that use "sub verbo"? To me this is a bug that should
be fixed ASAP; before it's used too much and creates legacy headaches.

Owner

rmzelle commented Aug 31, 2011

Renaming the term itself is messy, and we already have legacy to worry about. Locale files, styles redefining the "sub verbo" term and (embedded) item metadata including this locator label would all be affected. Limiting the scope of the change to the conditional would be far less intrusive.

Owner

bdarcus commented Aug 31, 2011

On Wed, Aug 31, 2011 at 11:14 AM, rmzelle
reply@reply.github.com
wrote:

Renaming the term itself is messy, and we already have legacy to worry about. Locale files, styles redefining the "sub verbo" term and (embedded) item metadata including this locator label would all be affected.

Aren't those just search-and-replace functions though?

To remind us all: the schema is effectively inconsistent on this issue
now, so that if someone uses "sub verbo" in the context of a list, a)
it can't be valid, and b) a processor won't be able to deal with it
absent specialized code (because they should be splitting the list on
whitespace).

So I would think it's safer and less work to fix the bug now rather than later?

Maybe we ought to get some input from Mendeley and Zotero on how
costly the fix would be on their end?

Owner

rmzelle commented Aug 31, 2011

How do you search-and-replace all the styles and locale files in the wild?

Owner

bdarcus commented Aug 31, 2011

On Wed, Aug 31, 2011 at 12:08 PM, rmzelle
reply@reply.github.com
wrote:

How do you search-and-replace all the styles and locale files in the wild?

I wouldn't worry about them. They're already broken in any case.

Owner

rmzelle commented Aug 31, 2011

No, they're not broken. Breakage is limited to styles using "sub verbo" with the "locator" conditional, because here the attribute value is processed as a list (and "sub verbo" gets interpreted as two list tokens, "sub" and "verbo"). Locale files and styles that use the "sub verbo" term in other ways (e.g. through cs:term) are just fine, although I agree we should eventually migrate the "sub verbo" term to "sub-verbo".

My proposed fix is trivial to implement in the schema, a simple fix for CSL processors, and would be an effective stopgap solution for CSL 1.0.1. Style validation will bring the issue to attention of style authors, and we can add a remark to the spec to discuss this small workaround.

Owner

bdarcus commented Sep 2, 2011

As an implementer, I don't really want to write custom code to handle what is effectively a bug in the schema.

But part of my position on this is based on the assumption that so far "sub verbo" is so rarely used in practice, that the impact would be minimal.

Owner

rmzelle commented Sep 2, 2011

so far "sub verbo" is so rarely used in practice

You mean apart from the hundreds of thousands copies of the CSL locale files out there?

Member

fbennett commented Sep 2, 2011

The problem is in the unknown number of documents that use "sub verbo" as a user-selected locator label. Word processor plugins (in Zotero at least) use the string value as the key for the locator label. If you change the list of recognized values, documents will break.

Once the "sub verbo" term went live, the die was cast: implementers were going to have to accomodate "sub verbo" and "sub-verbo" simultaneously in order to provide a bridge for the conversion. That parallel implementation (which is in place for citeproc-js) is not about encouraging or promoting invalid CSL code; it's a necessary step in the recovery from the original error.

So whether the schema change takes place in one go or in stages, implementers have to do the same amount of work to lay a foundation for the change. And to avoid chaos, you need to have the adjustment in place in the implementations before making the change in the schema mandatory.

Someone might want to touch bases with the people at Mekentosj, as this may affect Papers2 as well as citeproc-js and citeproc-hs.

@fbennett fbennett closed this Sep 2, 2011

@rmzelle rmzelle reopened this Sep 2, 2011

Owner

bdarcus commented Sep 2, 2011

The problem is in the unknown number of documents that use "sub verbo" as a user-selected locator label. Word processor plugins (in Zotero at least) use the string value as the key for the locator label. If you change the list of recognized values, documents will break.

This (documents) is a separate issue though; no?

My question is really how feasible it is for Mendeley and Zotero to role out updated locales files quickly.

And my primary short-term goal is to remove the core bug in the schema: that it's internally-inconsistent. I'd prefer to make a quick change and entirely remove "sub verbo" from schema, styles, locales, docs.

But if we have to take the awkward step of supporting both "sub verbo" and "sub-verbo" for a (short) time, then we'll just have to do that.

But we have yet to hear from Zotero and Mendeley on this.

I don't think the document issue is separate, since the embedded citation JSON schema is tied to the CSL schema. But if citeproc-js will already take either "sub verbo" or "sub-verbo", I think the only issue is that the wrong locator will be selected if a user tries to edit a citation in a document created with a "sub-verbo" version of Zotero with a non "sub-verbo" version, which isn't really a big deal.

If citeproc-js already supports both "sub verbo" and "sub-verbo", I'm not sure what difference it makes when we roll out updated locale files?

Owner

rmzelle commented Sep 3, 2011

I think the only issue is that the wrong locator will be selected if a user tries to edit a citation in a document created with a "sub-verbo" version of Zotero with a non "sub-verbo" version, which isn't really a big deal.

How about documents with embedded metadata? You'd have to look for both "sub-verbo" and "sub verbo" locator labels if you want to parse that metadata.

If citeproc-js already supports both "sub verbo" and "sub-verbo", I'm not sure what difference it makes when we roll out updated locale files?

For anyone using an up to date citeproc-js, there would be no difference. But it's a non-compatible change. Non-patched CSL processors would break on encountering "sub-verbo", and any formerly valid CSL 1.0 locale or style file defining or calling the "sub verbo" term would become invalid.

My proposed fix is very limited in scope, and backward compatible. We would just rename the attribute value by which we test for the existence of a "sub verbo" locator label, whereas the locator label "sub verbo" itself wouldn't change! CSL 1.0 styles could never test for locator="sub verbo" anyway, because this wouldn't validate and parse correctly (because the attribute value is interpreted as a list, this would get read as "sub" and "verbo"). So this fix only adds a features, and doesn't eliminate anything that was previously possible in CSL 1.0.

Owner

rmzelle commented Nov 14, 2011

I've added "sub-verbo" to the list of values that can be tested on the "locator" conditional. We can migrate the "sub verbo" term to "sub-verbo" with CSL 1.1.

Spec: citation-style-language/documentation@3b65c3d
Schema: 034a18d

Owner

bdarcus commented Nov 14, 2011

Correct me if I'm wrong, but this is not the solution we agreed to.

Owner

rmzelle commented Nov 14, 2011

We never agreed on any solution. Is it at least acceptable for the short/mid-term? (or alternatively, can I be the benevolent dictator this time?)

I'll be more than happy to revisit this once we've figured out how to update to a new input JSON format (http://xbiblio-devel.2463403.n2.nabble.com/CSL-JSON-input-td6966094.html). I remain convinced though that renaming the "sub verbo" term to "sub-verbo" should happen in CSL 1.1.

Owner

bdarcus commented Nov 14, 2011

As I've thought about this, I believe this fragment demonstrates the problem with keeping "sub verbo":

<if locator="sub verbo volume">...</if>

Currently this will be invalid from an XML standpoint (even though it appears to be given the schema), and no CSL implementation will correctly interpret it unless they write specific code to work around this problem (because the correct way to code the conditional support is to treat the content of the attributes as space-separated lists, which would yield three tokens in this case). Maybe Frank did this, but I doubt anyone else did (I wouldn't).

Compare this to what happens if we remove "sub verbo" immediately. The same fragment becomes clearly invalid, and so a style that uses it needs to get updated. Once that happens, it will work correctly. And I'm not concerned with updating the styles; the earlier we do it, the easier it is to manage. Conversely, the more we wait, the more likely we'll see styles with this sort of pattern.

But we also have a bigger issue, which is the locales files. Here I'm stumped. I can see your impulse to want to wait for 1.1 here, but still feel really uneasy about leaving an obvious bug in the schema for another two or three years (I want to be really conservative about major changes). Can we really not find a middle ground that addresses my concerns and yours'?

Owner

bdarcus commented Nov 14, 2011

On this:

For anyone using an up to date citeproc-js, there would be no difference. But it's a non-compatible change. Non-patched CSL processors would break on encountering "sub-verbo", and any formerly valid CSL 1.0 locale or style file defining or calling the "sub verbo" term would become invalid.

Yes, but as I mentioned above, I think this is even more complicated. If I'm a CSL implementor, and I look at adding conditional support, I look at those attributes as space-separated lists. So I split a value like "sub verbo volume" into an array of ['sub','verbo','volume'] and then match that against a list of data values. If I do that today, it fails.

Owner

rmzelle commented Nov 14, 2011

Yes, your example is the right one. But note that "sub verbo" exists as three separate entities in CSL 1.0: as a value on the "locator" attribute (which is invalid XML), as a term, and as a field in the input JSON.

My proposed fix for CSL 1.0.1 only targets the first entity, by adding "sub-verbo" as an attribute value on "locator", which would then map to the "sub verbo" term and input field. I could reorganize the schema to also disallow "sub verbo" on the "locator" attribute, but that would get messy, and I'd argue that a) the specification overrules the schema anyway, and b) validation will object to the use of the "sub verbo" attribute value. I know that my fix isn't perfect, but it's a simple intermediate step towards full migration from "sub verbo" to "sub-verbo", which we can easily do when we go through another style/locale upgrade (i.e. CSL 1.1).

As for the downsides of outright renaming the "sub verbo" term and input field to "sub-verbo": embedded JSON metadata in documents would be affected, and valid CSL 1.0 locale files and styles (re)defining the "sub verbo" term would become invalid in CSL 1.0.1. I don't see how that's a desirable path to take.

Owner

bdarcus commented Nov 14, 2011

I'm not entirely following where you're going with your response, and am not sure you actually addressed my points.

To boil this down on at least one point, if we include a test in the test suite with the above fragment, what should implementers be expected to do now?

I would object to forcing them to workaround our bug (and there's really no other way to describe this; it's a really dumb oversight), so would effectively be saying such a fragment cannot, and will not, work (even though the spec suggests implicitly it should).

Or to put this differently, what specific example are you concerned will now break if we make the more radical change? This?

<text variable="sub verbo"/>
Owner

rmzelle commented Nov 14, 2011

In my proposed patch, I tell people to use "sub-verbo" instead of "sub verbo" on the "locator" attribute. If people ignore that, then <if locator="sub verbo">...</if> will still fail schema validation. And if after these precautions such an invalid style is still fed to a CSL processor, it should just ignore the "sub" and "verbo" values it reads from the attribute value. The only thing CSL processors have to do is map the "sub-verbo" "locator" attribute value to the "sub verbo" field in the input JSON data.

Or to put this differently, what specific example are you concerned will now break if we make the more radical change? This?
<text variable="sub verbo"/>

"sub verbo" is not a variable. But yeah, term definitions like
<term name="sub verbo"> <single>sub verbo</single> <multiple>sub verbis</multiple> </term>
will become invalid, which affects all CSL 1.0 locale files in existence, and possibly some styles as well that redefine the term.

Owner

bdarcus commented Nov 14, 2011

On first point, OK.

So can we boil down the remaining issue to the second point; how and when to:

  1. update and version the locales files from sub verbo to sub-verbo
  2. update and version the CSL data input schema from sub verbo to sub-verbo

E.g. the question is whether we do this as part of some future major 1.1 release, or whether we do it sooner.

I do still think to settle these questions we need more input; in particular from each of the implementers (beyond Frank). Because my understanding is that corner cases are going to break regardless of what we do (not to mention that sub verbo itself is a kind of corner case; few users will even use it, and my suspicion is Zotero is the only one to actually implement it).

So for implementers, please tell us whether you implement sub verbo in your implementations (in short, I want to know if any applications uses it other than Zotero), and what, if any, opinions you have on the two items above.

Owner

rmzelle commented Nov 14, 2011

I can only repeat my objection against doing it pre-CSL 1.1: the change would break currently valid styles and locales.

Cleaning up the input JSON format is a much larger effort, and I'm not sure we'll get to that before 1.0.1. See also http://xbiblio-devel.2463403.n2.nabble.com/CSL-JSON-input-td6966094.html

Member

inukshuk commented Nov 14, 2011

I think, as far as the specification is concerned, consistency is important and that sub verbo should be changed to sub-verbo in all cases.

Implementations that want to support 1.0 styles and locales will have to introduce a mapping anyway, but the specification should not have to worry about it.

citeproc-hs now recognizes "sub verbo" and will support "sub-verbo" when adopted. I didn't test corner cases because I have other priorities now, but I do not expect many troubles from this issue.

Owner

bdarcus commented Nov 15, 2011

citeproc-hs now recognizes "sub verbo" and will support "sub-verbo" when adopted.

Thanks.

Will citeproc-hs correctly handle a case where "sub verbo" is a
conditional list like "sub verbo volume"? I assume no, but just want
to document it.

Bruce

Owner

rmzelle commented Nov 15, 2011

Will citeproc-hs correctly handle a case where "sub verbo" is a
conditional list like "sub verbo volume"?

That's really a non-issue, since that wouldn't validate.

Owner

bdarcus commented Nov 15, 2011

That's really a non-issue, since that wouldn't validate.

"Won't validate" because of the bug, or because of your patch?

Owner

rmzelle commented Nov 15, 2011

Won't validate because "sub verbo", when used as part of an attribute value that is read as a list, will always be read as "sub" and "verbo". The schema doesn't allow for these values, before or after my patch.

Owner

bdarcus commented Nov 15, 2011

Won't validate because "sub verbo", when used as part of an attribute value that is read as a list, will always be read as "sub" and "verbo". The schema doesn't allow for these values, before or after my patch.

OK, fair enough. But that's "the bug"; it's inconsistent with the
intensions of both the schema design, and the spec. Not the same thing
as explicitly disallowing it, and explaining in the spec "yes, we made
a mistake, please ignore 'sub verbo'; we'll fix it later" (which is
what you're arguing).

To this point I see that Simon and Andrea don't really seem to have an
opinion on this, that Frank prefers what I'll call the "staged"
approach (this patch; though he seems to prefer this not so much for
the sake of citeproc-js, which is already ready for both, but because
of concerns for other implementers, who have yet to weigh in?), and
that Sylvester prefers the "quick fix" (my preference).

I say we keep this open for comments until the end of this week
(Novermber 18), and see where we're at.

Owner

rmzelle commented Nov 15, 2011

But that's "the bug"; it's inconsistent with the
intensions of both the schema design, and the spec. Not the same thing
as explicitly disallowing it, and explaining in the spec "yes, we made
a mistake, please ignore 'sub verbo'; we'll fix it later" (which is
what you're arguing).

I explicitly disallow it in my patch by modifying the spec, which now says "Use "sub-verbo" to test for the "sub verbo" locator type". I don't see how I can be much clearer than that.

Member

SteveRidout commented Jun 26, 2012

I'm fixing a bug at the moment in my CSL editor caused by the schema specifying sub verbo and sub-verbo as valid values for space delimited lists.

For now I'm going to implement a work around but it would be nice to see sub verbo removed from the schema, at least within if and else-if nodes.

Owner

bdarcus commented Jun 26, 2012

+1
On Jun 26, 2012 5:12 AM, "SteveRidout" <
reply@reply.github.com>
wrote:

I'm fixing a bug at the moment in my CSL editor caused by the schema
specifying sub verbo and sub-verbo as valid values for space delimited
lists.

For now I'm going to implement a work around but it would be nice to see
sub verbo removed from the schema, at least within if and else-if
nodes.


Reply to this email directly or view it on GitHub:

#72 (comment)

Owner

rmzelle commented Jun 26, 2012

@SteveRidout, this should be fixed now. I actually thought about doing this earlier but couldn't come up with an elegant way to code it in the CSL schema, but I think I found a solution that's relatively straightforward.

9d28050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment