Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APA requires special delimiter between years of collapsed citation #40

Closed
fbennett opened this issue Apr 6, 2011 · 30 comments
Closed

Comments

@fbennett
Copy link
Member

fbennett commented Apr 6, 2011

It looks like we we missed one. APA uses semicolons between full cites:

  • (Brown 1992; Smith 2001)

But the years of a collapsed cite are separated by commas:

  • (Brown 1992, 1998; Smith 2001)

What is the way forward?

@rmzelle
Copy link
Member

rmzelle commented Apr 6, 2011

See http://citationstyles.org/downloads/specification.html#citation-collapsing for the current options.

This has been discussed before: http://xbiblio-devel.2463403.n2.nabble.com/Collapsing-join-delimiters-td2602396.html#a2610962 and in an off-list email thread between Bruce, Frank and me ("Delimiter on cs:layout in cs:citation only?" on 8/30/09 and 8/31/09).

We could add a "year-delimiter" attribute for cs:citation, but as I mentioned in the xbiblio thread, the old Zotero CSL processor used a hard-coded ", " between collapsed years. Should we just specify the same behavior for CSL 1.0 processors?

@bdarcus
Copy link
Member

bdarcus commented Apr 6, 2011

I don't really understand the question. Is the collapsing behavior really the issue here, or is it the need for an item-by-author delimiter (which I could have sworn we already had)?

E.g. even if there was no collapsing, I'd still expect:

(Brown 1992, Brown 1998; Smith 2001)

@rmzelle
Copy link
Member

rmzelle commented Apr 6, 2011

I can't remember ever having seen a by-author delimiter. @adamsmith?

@bdarcus
Copy link
Member

bdarcus commented Apr 6, 2011

I'm not saying that's what it's called. I'm just saying this seems like an obvious feature that I could have sworn was there at some point. It's possible it dropped out along the way somehow. If it did (and we need to confirm that), then that's a bug that demands an immediate minor point release I'd say (except implementors may get pissed :-)).

I'm looking at the schema now, but not seeing it. It'd be good if others could confirm.

Basically what we have here is a distinction between two things: contributor groups (in the example, Brown and Smith qualify as two distinct groups), and items within those groups. We need delimiters for both. I say I could have sworn this was there because this is how I designed the first proof-of-concept; with an explicit concept of contributor groups. Here's an example of what an earlier version of CSL looked like for an APA citation:

    <layout prefix="(" suffix=")" delimiter="; ">
      <group delimiter=", ">
        <text macro="author-short"/>
        <text macro="issued-year"/>
        <text macro="citation-locator"/>
      </group>
    </layout>

So let's boil this down to two questions:

  1. do we agree this has nothing to do with (author name) collapsing?
  2. can we confirm whether or not the group vs. item distinction I note has dropped out or not, and so whether there's really a problem with CSL?

@bdarcus bdarcus closed this as completed Apr 6, 2011
@adam3smith
Copy link
Member

I brought this up on the Zotero forum today and frank posted it here. I haven't found anything in the schema or specs to address this.
Specifically to Bruce's questions:

  1. yes I agree - that's a good point. I think I've actually seen (Brown 1992, Brown 1998; Smith 2001) as a requirement (and it would obviously still be possible to get (Brown 1992; Brown 1998; Smith 2001) when required.
  2. I looked through csl specs and .rnc and couldn't find anything that would work here. I also don't think this used to be possible in 0.81, but will double-check.
    Can't say anything about past discussions.

@rmzelle
Copy link
Member

rmzelle commented Apr 6, 2011

Somewhat edited version of our previous email conversation: https://gist.github.com/906212

@bdarcus
Copy link
Member

bdarcus commented Apr 6, 2011

Yes, that illustrates the problem; thanks. Glad I raised it clearly in 2009, but unfortunate if we never resolved it.

But per above year-delimeter is misleading as a descriptor since it would also apply if the years were the same; the ultimate example output:

(Doe 1999a, 1999b; Smith 2000)

If, as I suspect, this slipped through, then we need to explicitly add a notion of "contributor group" and "within-group items", and to ensure we have delimiters for both. This might require changes in the spec, changes in the schema, or both.

In the end, we need delimiters for the following:

  • personal name-parts ("Doe, Jane")
  • names in multi-contributor lists ("Doe, Jane, Smith, John")
  • multiple items from same contributor; aka intra-contributor-group ("Doe 1999a, 1999b; Smith 1999")
  • multiple contributors in a citation ("Doe, 1999; Smith, 1999")

@rmzelle rmzelle reopened this Apr 6, 2011
@fbennett
Copy link
Member Author

fbennett commented Apr 6, 2011

It looks like hard-coding a comma for joins between cites to the same author can be used as a short-term solution, to allow time for discussion.

Good point that this is not necessarily about collapsing. I think I understand the use case.

I'm a little confused by some of Bruce's points above, though. In the proof-of-concept CSL example, I don't see anything that would make a contributor group explicit. I'm also not sure what "within-group items" refers to, as distinct from "contributor group".

(Edit: The list is helpful. The third type of delimiter is currently lacking.)

@fbennett fbennett closed this as completed Apr 6, 2011
@fbennett fbennett reopened this Apr 6, 2011
@bdarcus
Copy link
Member

bdarcus commented Apr 7, 2011

Forget about the earlier example; it does have it's own problems :-)

So it sounds like we have agreement on the problem. Now we need to move on to the solution. I'm having a problem getting nxml mode to properly auto-complete, which makes analysis a bit difficult. I see a few distinct possibilities:

  1. only change the spec to recast understanding of attributes already in the schema; is this even possible?
  2. add an attribute to the schema (maybe called contributor-group-delimiter?), but otherwise keep the spec the same
  3. both 1 & 2

I have a feeling we're going to need 3, which also means changing and/or adding tests to the test suite.

Once we settle this, I suggest we do an immediate 1.0.1 release of the schema.

@fbennett
Copy link
Member Author

fbennett commented Apr 7, 2011

Okay, I have a hard-coded comma running in citeproc-js. It causes 24 fixtures in the existing test suite to fail. That's fine in the short term, but it raises a question about how the new option will work:

  • Should the new attribute when introduced have a default value of ", "; or
  • Should this default to the value of cs:citation delimiter if not specified; or
  • Should this have no default value?

@rmzelle
Copy link
Member

rmzelle commented Apr 7, 2011

I don't understand why this issue is so acute; I see no need to rush a 1.0.1 release. For the short term, we can just adopt the behavior of Zotero 2.0.9 by hardcoding a ", " as Frank indicates above.

@bdarcus
Copy link
Member

bdarcus commented Apr 7, 2011

@fbennett - I'd say it should have the same behavior as other delimiters, which is to say no default value.

@rmzelle - because technically, it means every author-date style is broken?

@rmzelle
Copy link
Member

rmzelle commented Apr 7, 2011

FWIW, delimiter on cs:name has a default value of ", ". See http://citationstyles.org/downloads/specification.html#name

@fbennett
Copy link
Member Author

fbennett commented Apr 7, 2011

I overspoke about test breakage. If the implicit hard-coded behavior is applied to citations in in-text styles only, only three existing tests fail to clear. This looks good, so I'll push the revised code for use in Zotero. I agree with Rintze that we there is no need to rush to production with a solution in the schema. If the attribute makes it into 1.0.1 when it emerges in due course, that should be fine.

Understood about default of no value, so no hard-coded default. Should this fall back to the value (if any) of cs:citation delimiter though? It seems like that might be what people would expect.

@bdarcus
Copy link
Member

bdarcus commented Apr 7, 2011

OK, thanks rintze. Then let's go with that as well.

But note: it's not included in the schema; we should add a [ a:defaultValue = ", " ] annotation to these for utmost clarity.

So does this mean we want the following?

citation-layout = element cs:layout { affixes, font-formatting, delimiter, group-delimeter?, cs-element }
group-delimeter = attribute contributor-group-delimiter { text }  [ a:defaultValue = ", " ]

And you guys are right that the default means this is turned on so current styles are correct, and so there would be no rush on the schema release.

But we should update the test suite and spec so other implementors can deal with it.

@bdarcus
Copy link
Member

bdarcus commented Apr 12, 2011

This issue is broader than the delimiters, and also encompasses sorting:
http://forums.zotero.org/discussion/17446/multiple-citation-separators/

So, for example, if one wants to sort a citation by date and then author, it should be possible to get this:

(Smith, 2000, 2010; Doe, 2005)

Right now, the spec is confusing on this count, as it seems quite reasonable an implementation would do:

(Smith, 2000; Doe, 2005; Smith, 2010)

... which doesn't (necessarily) make a lot of practical sense.

I see two options (which aren't mutually exclusive):

  1. add an explicit discussion of grouping to the spec, and include language that would describe the first outcome (and also couch some behavior in the bibliography formatting in the same terms)
  2. add some attribute somewhere (probably on cs:citation) to flatten this grouping behavior (with a default value of false), and so to allow the second outcome

@fbennett
Copy link
Member Author

I just checked with Zotero 2.0.9, and it produces the same result as citeproc-js does currently. I've figured out how to produce a grouped sort without too much trouble, but it seems to me that it would be risky to implement this group-and-sort behavior implicitly. It's much easier to explain behavior that is controlled by an attribute or element in the CSL. If this is truly universal, styles can be modified in the next XSLT transform.

Rinze has put up a proposal like this:

<sort>
  <key macro="issued"/>
  <key macro="author" group-matches="true"/>
</sort>

I would favor using a single group-by-macro or group-by-variable attribute on cs:sort, like this:

<sort group-by-macro="author">
  <key macro="issued"/>
</sort>

This would make it clear that the "author" macro is not relevant as a sort key.

@bdarcus
Copy link
Member

bdarcus commented Apr 12, 2011

So we have a few questions:

  1. should grouping be default or not (e.g. which of the two examples I posted today makes more sense?)
  2. how should it be implemented in CSL syntax:
    1. a boolean, or something more flexible?
    2. where on the tree (cs:citation or cs:sort?)

Answers probably need to consider another, related, issue that Andrea recently ran into: how to implement suppress author.

I think author grouping is fundamental to the model (including in the bibliography), so perhaps we shouldn't be averse to accepting that here. E.g. unless we can envision an explicit use case for non-author grouping in cs:citation or cs:bibliography then maybe a boolean is fine? If we can, then that's another matter of course, and the group-by-macro idea makes more sense.

On q2.2, putting it on cs:sort would be more flexible, and allow us to use it on cs:bibliography as well.

On q1, I'm pretty sure (though could be convinced otherwise) that grouping is the expected default behavior. Perhaps, per list comments, we should try to get some data on this.

@fbennett
Copy link
Member Author

I don't understand how cites are to be grouped, if the characteristics of group members are not stated somewhere.

@bdarcus
Copy link
Member

bdarcus commented Apr 12, 2011

I started this before seeing latest reply from fbennett, but the comment towards the end on defaults anticipates his point.

Actually, thinking about this some more, I think Frank's suggestion makes the most sense; something like:

group-by-macro = attribute group-by-macro { token }
in-group-delimiter = attribute in-group-delimiter { text } # to replace previous proposal up top
cs-sort |= group-by-macro

It's simple, flexible, and probably not hard to add to existing styles. Moreover, with the value as macro, it makes little sense to talk about defaults.

What implications might this have for cs:bibliography?

@fbennett
Copy link
Member Author

Second thought: you could implicitly group on collasible "authors" if the collapse attribute has a value. That would keep cites aligned for potential collapsing, which I think is what you're after.

@fbennett
Copy link
Member Author

You wouldn't want any grouping in the bibliography. If it's tied to collapsing, you would get that result.

@bdarcus
Copy link
Member

bdarcus commented Apr 12, 2011

I guess with the bib thing I'm trying to get at the idea that there's already implicit grouping of sorts (on author; reflected in the subsequent-author-substitute attribute), and so hoping we can keep some consistency.

And yes, you're right that the collapse attribute has similar implicit notions of grouping (also on author).

So what's your druthers now, given all this?

@fbennett
Copy link
Member Author

At the implementation level, it would be done by first sorting on the group key (the bit that would go away with "suppress-author"), then assign the first-listed explict key value as primary key to the members of each group, with the group key as first secondary. The explicit cs:key elements would then be used as further secondary keys. So with collapse="year" in the style, and macro:author as the bit that is caught by "suppress-author", this:

<sort>
   <key macro="issued"/>
</sort>

Would produce keys like this:

[macro:issued of member 0] [macro:author] [macro:issued]

While an explicit sort of authors like this:

<sort>
  <key macro="author"/>
  <key macro="issued"/>
</sort>

Would produce keys like this:

[macro:author of member 0] [macro:author] [macro:author] [macro:issued]

In the second example, the first three keys come out identical, so you would get the cites ordered by author, with a year sort of each author's cites, which is what one would expect.

Still not sure I like the idea of turning this on without an explicit attribute to invoke the grouping, but it does seem like it would produce sane results.

@fbennett
Copy link
Member Author

If the sole key set in cs:bibliography is the year, we can assume that the style just wants the cites ordered by year, regardless of authorship. The same in cs:citation, for a style that doesn't do any collapsing.

@fbennett
Copy link
Member Author

I've come around to wondering now whether this enhancement of sort behavior might be safe to implement in the processor on the current schema, with the collapse attribute controlling whether the grouping behavior is turned on. It would be a significant change, but does look like that would be the expected behavior. If that sounds like a possibility, shall float this as a proposal on zotero-dev, at Mendeley, and with Andrea and other implementers?

@bdarcus
Copy link
Member

bdarcus commented Apr 14, 2011

Sure.

I do think, as I mentioned above, that we'd want to add some explicit discussion of author grouping to the spec where collapsing gets recast in those terms, and to perhaps even make a point of calling out a reserved author macro.

@fbennett
Copy link
Member Author

I've implemented grouped sorting as discussed here in citeproc-js, limiting the behavior to non-numeric in-text styles that have the collapse attribute and do not use the position attribute for context sensitivity things like "Ibid". There is a test to go with the code.

I'm pretty sure that this will be non-controversial among users, so I've bundled the changes in a processor release, with a comment in the release note concerning the change, and an offer to add a toggle to turn off grouped sorting if necessary. We'll see how it goes. Thanks for persisting on this, Bruce; it took me awhile to focus properly on the use case, but it's pretty clear that "grouped" sorting is the only correct method of automatic sorting for this category of citation styles.

@fbennett
Copy link
Member Author

This thread has raised two issues:

  • The need for a special internal delimiter between collapsed citations (implicitly a comma in CSL 0.8.1 processors); and
  • The need for grouped sorting behavior in author-date styles.

Looking at what is in the tests, and running in citeproc-js, it looks like both issues have been addressed -- the first by adopting the same implicit behavior as in 0.8.1, and the second by introducing implicit grouped sorting behavior. In that state, both issues require only notes in the specification. I'll add a note to that effect to the documentation tracker, with a link back to this thread.

The discussion here is pretty cluttered (largely thanks to yours truly), so I'm closing this ticket, and will open a fresh one, with a milestone of 1.1, for the introduction of an explicit delimiter for use between collapsed cites.

@rmzelle
Copy link
Member

rmzelle commented Oct 20, 2011

For posterity, the "explicit delimiter for use between collapsed cites"-ticket is #52.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants